Quant Analytics

Best alternative development stack for R with Hadoop? Forget MYSQL? Cassandra and Java listeners for market tick data

Best alternative development stack for R with Hadoop? Forget MYSQL? Cassandra and Java listeners for market tick data After reading about the limitation of MYSQL and how expensive it can get, I decided to take chance on Cassandra. One big reason there is a RCassandra package within CRAN so yippee for that. Also, the install does not look hard and better yet, you can integrate it with Hadoop. Yipee for that! Also, Cassandra may be faster for writing than HBase which was part of the RHadoop offering so boo to that. Also, I plan to have to some Java listeners to my market data to populate the Cassandra database. This stack may work so let’s cross my fingers.  Here are my links that got me thinking this way: http://stackoverflow.com/questions/4884967/hadoop-hbase-hdfs-vs-mysql-or-postgres-loads-of-independent-structured-d http://ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved/ http://blog.milford.io/2010/06/installing-apache-cassandra-on-centos/ This above install of Cassandra appears to work with a few tricks as running as root but does work and installed fine. http://www.datastax.com/docs/0.7/map_reduce/hadoop_mr http://code.google.com/p/cassandra-java-client/ The last 2 links  I question but the Cassandra install may be worth doing but integrating with Hadoop could be a challenge. I also hope the RCassandra works to.    

Get our FREE Open Source Historical Database by answering the 2 WORLD'S FASTEST TRADER/QUANT QUESTIONS

Post to Twitter

Is this big data? What metrics would you run against this dataset for quant analysis?

View CommentsWritten on May 16th, 2012 by caustic
Categories: Quant Analytics
Is this big data? What metrics would you run against this dataset for quant analysis? The dataset consists of 65,000+ customers, 2,000+ skus, served through 20+ channels. The dataset is comprehensive, granular profitability calculated at the intersection of the customer, channel and product. ~300 million line items are calculated monthly, and includes YTD data. This is accomplished in a closing window of <18 hours, as subsegment P&L's are generated from the recordset. Do you consider this big data? If so, or if not, why? @quantalyst   == Nope. Too small. Easily fits inside a relational model.   == Do you plan to merge this data with any social media or other semi structured data (images, financial documents, etc)? What is the velocity associated with the collection of this data? I am guessing there has been some data modeling done here but what rate of change do you anticipate with the different data assets that you plan to use? Some of these items will provide insight into if this would leverage a big data solution or not. However, to a point if the data footprint can fit into a database and the data asset and elements within it can be serviced well in a relational solution...... pursue that first.   == this is certainly at the smaller end of the spectrum of what is generally accepted as Big Data. Big Data is typically defined by variety and volume of data, but add in velocity and that can all change. You say you're analyzing this all in monthly batches in under 18hours. Your data can become 'big' if your company needs to start getting this information weekly or even daily to benefit from tracking customer/market trends or evaluating channel metrics and performance and enable more agile decision making. Then you need to deal with bringing that 18 hour window down to say 3 or 4, as well as reducing your nightly batch windows by enough to accommodate the new process - a lot easier and cheaper than people tend to think, but still requires an investment. Decide what data is most important to you now, how often and quickly you need that data and base your solution strategy on that.   == Thanks - appreciate the responses...more to come, but I'll start some new threads. Some issues...velocity - definition and approprate standard measures....RDMS - don't we have to get the data into some type of OLAP to actually analyze?   == it depends on what the analytics are. If you can do what you need to in a relational format easily and are thinking in terms of cubes, you'll likely be better served doing that. If, however, your needs include flexibility to expand/add/drop data sets for experimentation, analytics where a query structure isn't the most natural way to express the question, or is computationally very expensive (think machine learning for pattern identification) even if the data set itself is not "big" ==   There is this expression "... If all you have is a hammer, then all your problems look like a nail". ... I'm sure I fractured that statement, but you get the idea... Based on your initial problem definition, you have to ask yourself if you can easily fit the problem in to an RDBMS. Meaning the data does not require the scale of Hadoop. Based on your initial numbers... Not even close. It's surprising that while Tom works for a large vendor that's spending lots of money, attempting to buy market share, he forgets that his company also sells two RDBMS engines that are probably better equipped to solve your use case. Or rather one of those engines. (This happens to be that proverbial hammer. ;-) Using IBM as an example, check out IDS. Here you can use the engine as your OLTP source. Which is what you will want since you are talking about a system of record. You have built in extensibility in that you can extend the relational model. See Stonebraker's Illustra that IFMX bought in '95. You have this thing called RTL where you can load over 50k of tick data a second. So you can really handle velocity. (Note: you added velocity as an after thought and your data uses do not suggest that level of velocity. RTL would be overkill. ) but that same extensibility allowed them to create IWA. Essentially an all in memory appliance which you can attach to your RDBMS engine and do queries across both machines using an industry standard SQL. Of course there are limitations in terms of scalability. However for your data set size, they could be an option. In terms of Analytics... There used to be a partnership with NAG hence the NAG data blade... So you've got that covered. It's a pitty that Janet killed Arrowhead. Had it gone through things would have looked a bit different in terms of the big data space. But I digress. The point is that I can solve your problem with a different toolset. As someone who is not a talking head, but is actually working as a solutions architect, each solution has its share of trade offs. You have to balance the pluses and minuses when trying to ind the ight solution for you...   == evidentially missed the part where I said "If you can do what you need to in a relational format easily and are thinking in terms of cubes, you'll likely be better served doing that."   == You were just regurgitating what was already posted in earlier responses. :-) You went on to make. Comment about how it's not always the size of data, but the complexity of tha Analytics... Which again I point to advances in IDS that have been stable for the past 10+ years that handle complex Analytics. Again size and complexity point to a non Hadoop solution. Add velocity which the OP did and again IDS solves that issue. While you work for IBM, in IM, you don't really know your own product sets. Typical for IBM. Don't feel bad though, I seriously doubt there are any if not a handful of people who could tie all the products in IBMs portfolio together... Like I said in a terse post way at the top... Not a big data problem...;-)   = Statistics are maintained at each of the 300 million data points. The stats first, drive costs to the appropriate channel/customer/product; and, second intersect the costs to arrive at a cost at the channel/customer/product. Some of the statistics and sources include: ..a) Route Management System (#Services, Channel_ID, TimeOnRoute); ..b) "Hand Held" System - time stamps (#MinutesAtService for various activities); ..c) Warehouse management system (labor activity distribution); ..d) 3rd Party Freight Management System (Freight Lanes and Costs); ..e) Inventory (#QuantityOnHand); ..f) Certan specialized databases (Assets, Payroll) ..f) ERP transactional system including Inventory Transfer (SKU counts across lanes), G/L (costs); Order Entry (order counts by SKU & Customer) The 300 million result set is a single source for all sku/customer/channel reporting and analytics giving management transparency into the customer supply chain. It drives tactical and strategic decisions including; a) Subsegment P&Ls; b) Pricing (value of customer relationship); c) Process Improvement; d) Logistics optimize; e) Product release profile; f) new customer profile. Yes, it is RDMS/OLAP driven, monthly (auditable to the G/L). It seems that the consensus is that calling this "big data" is a stretch....maybe "very large data". However, this is driving decision making - is "big data" doing the same? @quantalyst   == If you want to fix the 18 hour job cycle then, yes, some big data technologies might be useful for parallelizing the analysis. Hadoop processes don't really allow you to pack much algorithm in one pass. You could make a preprocessing phase that creates a common input dataset, and then run separate parallel algorithms on that for your various analyses. There are open source OLAP databases (Pentaho for one) so you can have several servers, each running a different analysis.  

Get our FREE Open Source Historical Database by answering the 2 WORLD'S FASTEST TRADER/QUANT QUESTIONS

Post to Twitter

Demos or quant time series with quant analytics in Meetup ?

View CommentsWritten on May 15th, 2012 by caustic
Categories: Quant Analytics
A query about the Meetup.com/quant-finance group: Hello Group, I am new to this group and I was wondering if there are any plans to start a series on quant techniques that can be used somewhere in the life cycle of quantitative analysis and the development of a trading strategy. Concepts such as PCAs, OLS etc.. it can be simple and more advanced (copulas and dynamic linear models for instance). I'm not mathematician so I often struggle with the more advanced math concepts but I definitely want to learn more. Please feel free to chime in and make suggestions. I look forward to networking and interacting with all of you. My response: I just finished my developing my  algo course but I am no mathematician either. Also, I plan to do one more tech infrastructure meetup soon. From there, it may focus on purely on algos and starts which there is little talk of. You can get tech stuff everywhere these days. I hope this helps. Bryan

Get our FREE Open Source Historical Database by answering the 2 WORLD'S FASTEST TRADER/QUANT QUESTIONS

Post to Twitter

anyone know of a method for determining market impact of a strategy? maybe a white paper?

View CommentsWritten on May 14th, 2012 by caustic
Categories: Quant Analytics
anyone know of a method for determining market impact of a strategy? maybe a white paper?   -- Please see Algorithmic Trade Execution and Market Impact post on this page: http://www.algotradinggroup.com/cgi-bin/yabb2/YaBB.pl?num=1214230287/15

Get our FREE Open Source Historical Database by answering the 2 WORLD'S FASTEST TRADER/QUANT QUESTIONS

Post to Twitter

Learning r YouTube videos

View CommentsWritten on May 14th, 2012 by caustic
Categories: Quant Analytics, Quant Development, R
This is really quick to learn http://m.youtube.com/#/watch?desktop_uri=%2Fwatch%3Fv%3DZoPJGmpYJzw&v=ZoPJGmpYJzw&gl=CA http://m.youtube.com/#/watch?desktop_uri=%2Fwatch%3Fv%3DZoPJGmpYJzw&v=ZoPJGmpYJzw&gl=CA Ihttp://m.youtube.com/#/watch?desktop_uri=%2Fwatch%3Fv%3DZoPJGmpYJzw&v=ZoPJGmpYJzw&gl=CA http://m.youtube.com/#/watch?desktop_uri=%2Fwatch%3Fv%3DZoPJGmpYJzw&v=ZoPJGmpYJzw&gl=CA

Get our FREE Open Source Historical Database by answering the 2 WORLD'S FASTEST TRADER/QUANT QUESTIONS

Post to Twitter

Learn Random walk theory for market inefficiency

View CommentsWritten on May 11th, 2012 by caustic
Categories: Quant Analytics
Learn Random walk theory for market inefficiency Get more details here to get access to the course and other huge benefits including High Frequency Trading platform building, QuantLibXL analysis, Matlab, etc. http://quantlabs.net/quant-member-benefits/slash-your-quant-learning-curve/ Thanks for reading Bryan

Get our FREE Open Source Historical Database by answering the 2 WORLD'S FASTEST TRADER/QUANT QUESTIONS

Post to Twitter

Learn Maximum number of intraday Sharpe ratio

View CommentsWritten on May 11th, 2012 by caustic
Categories: Quant Analytics
Learn Maximum number of intraday Sharpe ratio Get more details here to get access to the course and other huge benefits including High Frequency Trading platform building, QuantLibXL analysis, Matlab, etc. http://quantlabs.net/quant-member-benefits/slash-your-quant-learning-curve/ Thanks for reading Bryan  

Get our FREE Open Source Historical Database by answering the 2 WORLD'S FASTEST TRADER/QUANT QUESTIONS

Post to Twitter

ABC Elliot Wave correction in the world quant analytics?

View CommentsWritten on May 10th, 2012 by caustic
Categories: Quant Analytics
ABC Elliot Wave correction in the world quant analytics? http://CMTTRADER.blogspot.com It looks to me as if the market is settling into a nice little support area and from the looks of things we could be seeing an ABC correction. from the high in April to a couple of the ups and downs in between it seems we could be heading for a move up once the bears sell off. Today I thought I saw some capitulation on the 50 day 60 minute chart right around 10:30am. http://CMTTRADER.com   -- I noticed your comment as I was "surfing" the discussions. Which market are you talking about?   -- I was looking at the S&P 500 SPDRS  

Get our FREE Open Source Historical Database by answering the 2 WORLD'S FASTEST TRADER/QUANT QUESTIONS

Post to Twitter

How to get started in quant analysis?

View CommentsWritten on May 10th, 2012 by caustic
Categories: Quant Analytics
How to get started in quant analysis? Hello, I am a fresh graduate from the engineering field and have discovered that I have a huge interest in the algorithmic trading field. I guess I just wanted to ask some advice from people in the industry on what my next stepts should be. My interest mainly lays with software development. I have experience using C and Matlab. Also if anyone happens to know of any entry level or internship positions, please let me know. That would be awesome :) Thanks!   -- look into the programs these guys have. http://www.xyber9trends.com/x9t/4371.html they are some of the best that are out there.  

Get our FREE Open Source Historical Database by answering the 2 WORLD'S FASTEST TRADER/QUANT QUESTIONS

Post to Twitter

How R, Hadoop, RHIPE can handle 400TB of market tick data that kdb+ cannot do. Also, all fo free thanks to open source

How R, Hadoop, RHIPE can handle 400TB of market tick data that kdb+ cannot do. Also, all fo free thanks to open source Read this from the well-known geniuses at Lab49.com http://blog.lab49.com/archives/4978

Get our FREE Open Source Historical Database by answering the 2 WORLD'S FASTEST TRADER/QUANT QUESTIONS

Post to Twitter

Follow

Get every new post delivered to your Inbox

Join other followers: