Which database works best for live trading to store current and historic data for quant development?
What other data/connectivity issues should I be aware of as I am moving to live trading?
Which database would you choose to store tick data? I gather 2GB a day. I analyze it with sqlite and R. Looking forward when I am trading live, I would be writing data (to the database) and at the same time analyzing it to produce live trade signals. Sqlite is not good for that purpose because it puts exclusive locks on tables when writing.
I know kdb+ is recommended for funds and banks, I am much smaller than that, infact I am a solo trader.
SQL Server 2008 can handle your needs perfectly
Hi, you can try out our QuantBase engine,http://www.smartquant.com/quantbase.php It works hand to hand with QuantRouter, an application which allows you to split data feed between data capture engine and your trading / research clients (for example OpenQuant or your custom applications)
SQL Server works well. MySQL is another option, and it’s free http://dev.mysql.com/downloads/
I am currently using Sqlite, not sure how different it is from Mysql, but Sqlite puts locks on the data tables when its writing data which mean your program can’t read the data for analysis which also means you can’t consider the more recent data in producing trade signals live (real-time). Now I am new to all of this, I might be wrong, but this is my current understanding. I thought about creating two different databases as a work around one for the current data and one for the historical but again I am trying to decide what is best.
SQLite is not intended to be a multi user system hence the locking issues you’re experiencing.
SQLServer is good for starters as it has user friendly (sort of) admin tools and designers.
MySQL should be fine although it doesn’t support stored procedures and functions to the same extent that SQLServer does. There’s lots of support out there as it functions as the back end for most online forums etc.
Firebird is an open source, free DBE equivilent to SQLServer (roots in Interbase) and also has good tools. It much more robust than SQLServer, lighter weight and under certain conditions, much faster. I’ve used it for hi tech automated manufacturing control systems as well as POS / workflow control systems for the hospitality industry and it never breaks.
PostGreSQL is another open source DB.
Basically If you’re going to get serious then I would go for SQLServer, Firebird or PostgreSQL bearing in mind that the data types and sql syntax varies from one to the other.
How are you reading and writing to/from the DB?
Are you constantly writing streaming data to the DB?
Can you modify your connection if necessary?
If you perform an INSERT into a table, then immediately retrieve that same record with a SELECT statement, the new record should be there, right? As long as you COMMIT your changes before you do the SELECT.
At the risk of getting off subject, is your app multi-threaded so you can always write to the DB, while having a separate process do the reads?
I am a short term trader; my trades are mostly less than 15 minutes long. I need to use the most recently received data to trigger my entries and exits.
I can run read and write processes in parallel, but wouldn’t I have to keep committing the newly received data in some kind of frequency like every minute or couple of minutes or whatever to be able to access that data with the reader?
Has anyone here tried BerkeleyDb?
It can be used it via a sqlite interface but I wonder
whether using only its very basic (non SQL)
features would be enough for handling
market data very efficiently.
Relational databases like SQL and any variant are not designed for use as a tick data capture and storage solution that supports routine querying such as for backtesting. You really want a columnar database. I’ve used OneMarketData’s OneTick solution for years and I believe it is the best product on the market. kdb+ is also good, but much more difficult to learn how to use. There are a couple other solutions like Vertica, etc. All of these solutions are very expensive.
If you are doing this on a very small scale and don’t use the DB to process ticks such as with stored procedures for aggregation before trying to feed data to a real-time production strategy you might be able to get away with strictly archiving the tick data and very small scale backtesting using an SQL variant. You would probably want to use a well designed database server to read/write with an in-memory data windowing scheme to collect the incoming tick data and write it to the DB and to read the archived data in the database while making it available to your backtesting software.
I also suggest directing the output of the backtest to a file and avoiding trying to write it back into the database, which can be painfully slow.
My historical data is handled well with SQLite (soon to be moved to another SQL venue) and R. The need is limited to handling the current day’s tick data and applying the same analysis that was applied to the historical data to the current data.
My historical data actually gets summarized and parameters that are needed for my trading are extracted to interact with today’s tick data. The size of data needed for trading on any one day doesn’t exceed 4GB and might be even less than 3GB (2GB for the current day tick data and much less than 2GB worth of summary and alert levels representing all historical data.) Now 3 or 4GB can easily fit in any modern computers memory, but since I am new to databases, I am not sure which venue would be best for such interaction. Do I even need a database to handle 4GB of data? can they be handled with code written in python or C?
the databases you mentions as expensive are we talking many thousands or many hundreds of dollars when you are looking at one person one machine license?
my apologies, I didn’t see that you were a solo trader until just now. These solutions are generally 100k+++ per year.Your scale is pretty small, so you might be able to make a relational database work for you in terms of archiving. I would still suggest that for any real-time production trading you collect the data in-memory and process it there and not write it into the DB except as a background task post-processing. Backtesting against the tick data you have archived in the DB might be really slow however unless you store it as you intend to use it with your backtesting software, e.g., pre-aggregated, pre-calc’ed VWAP, etc.
NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!