Quant development is Seeking reliable historical data

(Last Updated On: April 12, 2011)

Seeking reliable historical data

Of Currencies, Commodities, Indexes and Stocks.
I have developed a new semi artificial intelligent / learning Algo-Trading system capable of learning and trading on any type of financial asset.
In order to test it I built a simulator and have so far tested it with 5-6 years of several Currencies historical data (downloaded from a MT4 platform via EA).
In order to further test it I am looking for a reliable source to get historical every “tic” data starting from 01 Jan 2005 for a verity of Currencies, Commodities, Indexes and Stocks.
Any recommendations ?

I know you think you want “every tic” but all data is filtered and needs to be filtered.

Back in the old days when I started trading bad ticks were where numbers were transposed or the mantissa was missing (or other dyslexic anomalies)… as well as the regular out of sequence trades. Interestingly, with the advent of electronic markets the frequency of bad ticks is actually no less common.

Out of sequence trades are the hardest to work with many exchange not requiring OTC trades to be reported from minutes to hours after execution. Even latency between venues can create significant anomalies in the market micro-structure.

The bottom-line is all data needs to be filtered and it needs to be filtered the same historically as well as going forward. Virtually all data vendors do this and if your model is sensitive you need to be very aware, because of the compromise filtering creates. At the one side filtering can seem to clean anomalous ticks on the other hand it can remove important market information.

As for recommendations, if you are still focused on back-testing you can’t go past “Tick-Data Inc” but most real-time vendors from eSignal to Thomson-Reuters offer significant history.

Drop me a note if you want some cheaper alternatives.

Thank you for your opinion, I agree with you and am very aware to the sensitivity of the issue, therefore in addition to back / forward testing I am also conducting a sequence of unique testing using “random data”.
Even if I test my system / theory on hundreds of financial instruments on periods of 5 or even 10 years – it does not guaranty that the system / theory will preform similarly in the future / in production.
The “random data” testing enables me to test my system / theory in a greater variation of potential scenarios, this allows an intense and extreme stress testing.
The “random data” testing is very simple – a program creates a random series of millions of financial instruments based on randomly changing boundaries, than the system / theory is being analyzed on it.
I have developed such a program that is multi threaded and is capable of generating and analyzing data equal to 4 years of a “random data” financial instruments per 1 second

I have never been a fan of using “random data” much as I have not been a fan of “Monte Carlo” simulations (in financial markets). To me there is just too much information potentially being lost and you have no idea of telling how much of that information is relevant to your model or not. Then as you try to make the random/simulated data more relevant you just end up moving closer and closer to the source time-series.

I would be interested to know how you are addressing this issue in your formulation of the data.

hat is a very good and valid question, first let me make one thing clear – the “random data” testing does not replace in any way the “traditional” back / forward testing on real historical data!
As for your question, the random data is similar to every-tick data, therefore there is no loss of any information.
For example, let’s say we want to simulate a Forex instrument – we can define it like this:
Starting value – 1.4321
Minimum movement – 0.0001
Now the random data program will start creating millions of options for it, for example – a very basic one can be “in each movement – move randomly 50%-50% up/down by 1 Minimum movement(0.0001)”.
This of course will not be a very interesting / challenging test, but as I mentioned earlier, the instruments created are based on randomly changing boundaries – so the boundaries in the above example can change from 50%-50% to 30%-70% up/down movements(while creation of a specific “random instrument”).
there are several boundaries you can define that can effect the outcome of the “random data” instrument being created and tested.
After several millions of such tests you eventually get a very large and diverse range of testing data.
If you want to discuss the “random data” programs features farther feel free to contact me privately and I’ll be happy to do so over the phone / skype.
I estimate that if my System / theory have succeeded showing good results after testing on both random data in addition to real historical data of several hundreds of financial instruments – it is more likely to succeed in production

www.tickdata.com // best quality, high price
www.tickdatamarket.com // unknown, high price
www.cqgdatafactory.com // useful for rare request
www.anfutures.com // only S&P E-Mini,Nasdaq Index E-Mini, eFX CME
iqfeed.net // best price to quality 4 months of tick data, only USA
www.kibot.com // USA
www.grainmarketresearch.com // futures, nasdaq stocks
professional.teletrader.com // need request

Do you know of any vendor that supplies minute by minute best bid-ask
for US stocks besides those that Validimir mentioned. ( Thanks Validimir for that.

I heard that tickdata is the best as far as quality but it’s 14K for 1 year I think. Thanks

Although I’m sure if you look around in the “Eastern Block” you could find some cheap historical databases.

The only other vendor that springs to mind is Glen Larson’s Genesis Data. Glen transformed the company somewhat from a data vendor into a software vendor over the years. When he did this a lot of the software vendors stopped recommending his data service, so I’m not too sure if their data service works with third party software anymore, but you could give them a call.
http://www.genesisft.com/products.php

Also, I presume you know that if you sign up to someone like eSignal or even direct from Interactive Brokers you can pull in historical best bid best ask. I think TradeStation.com only offer best bid from memory

Generally the system developers I come across use random data as way to invalidate, not validate their models, although your method seems to incorporate some trend persistence. This should make it tradeable but I am still concerned about so much information being missing.

For example, markets I find are innately perverse, in that the more money they give you in the shorter period of time the harder they try to take it back from you. This is borne out by runs tests on models and why the Turtles where given the “Last Trade a Loser” rule. As well as standing aside from markets a longer period of time, the larger the profit they had recently made. How do you incorporate these nuances into your artificial data?

hat’s helpful. Just one more bother because I’ve never doe live
trading high frequency ( if you want to call 15-30 minutes high frequency ) wise:

If you sign up with one of these trading vendors like tradestation or interactive brokers,
can you go back say a full year and get all the data for say top 1200 US cap ? It’s
down the road anyway ( once I’m done with what I’m building. right now I use stale data
from 2008 ) but I’d like to not have to pay say 10-15K for one year. Thanks a lot.

Appreciating that you haven’t gone live, there are issues your question raises you haven’t encountered yet.

From the top, with TradeStation.com and Interactive Brokers you can get a year’s history. If you want more history through IB you will want to get an eSignal feed which should allow you about 3 years.

Next issue… these ‘retail’ feeds are often limited to about 50 quotes per second, so you need to balance how short term you are trading versus the number of issues you’re following. You can speed this up with IB to about 150 quotes per second by going with their Computer To Computer Interface (CTCI), but you will need your own proprietary solution to interface through FIX.

Also, if you are using packages like TradeStation or MultiCharts to place trades electronically you have to realise that “EasyLanguage / PowerLanguage” do not support “Stop-Limit” order types to manage slippage (whereas Ninja-Trader does for example).

This is can be a MAJOR problem trading short term where your slippage can easily overwhelm you average trade. So, it is tough using these package to run live intraday given these limitation.

Given that, what happens if you consider increasing your time-frame to increase your average trade and ability to keep up with quotes and you expand the number of issues to follow to improve diversification…

Holding overnight is really fraught with a whole other set of issues as these packages do not maintain a full dialogue regarding your open positions. For example, if you get a glitch you can have all you automated models turned off (or TS just locks and you need to shut down and restart)…

Let’s say you have 20 open positions. When you reinstate the system automation the market-systems will be totally unaware of the existing open positions and will start placing trades as if you have a clean slate. So, you are left having to managing all your remaining open positions manually – which raises the question as to how you are supposed to do that when all your signals are internal to your automated system.

So, next you may try the move to your own custom written interface to the broker. Once again, you will find problems with availability of simulation platforms to test your code and flaky API’s, even with some of the higher-end applications like X-Trader. Even with vendors like Bloomberg we’ve had issues with poor change management. So don’t just accept the vendor upgrades – you should always test them locally.

Bottom line… most of the solutions are not all they are cracked up to be and it can take a while to find which strengths you need and which short-comings you can live with.

keep saying “last bother” and you keep providing SO MUCH INFO that
it’s hard to stop. I don’t want to bother you with so many questions here but is there
a way I could get in touch with you when I get closer ? I’ll explain a little better
here in case you want to say some more but definitely you said a lot already.

Backtesting wise I should be fine because I do that all on my own, in R, given the data. and yes, I’m pretty aware of the “unrealistic” property of backtests particularly the heisenberg uncertainty principle !!!!! I just try to be as conservative as possible.

But I definirely didn’t know about the limitations of all the live trading offerings.

My holding periods will be between 15-45 minutes per stock. and how many stocks at any one time are live obviously depends on how many are getting signalled long and short. and no, I don’t hold overnight ever so that’s not a problem.

But as far as how this translates in terms of using an IB-feed or tradestation feed
I don’t know and it sounds like you could help me with that later on for sure. Maybe
you consult or whatever ?

I’ve been working on this strategy for about 1.5 – 2 years and it’s getting there; albeit
more slowly than I expected !!!!!

Anyway, thank you very much for all of the wisdom. knowledge. But, if you wouldn’t mind, could I invite you as a friend on linked-in and get your email address ? If not, that’s okay. I understand how that could lead to overwhelming amounts of emails if you do that with too many people you don’t know. Thanks again.
——

Sounds like you’re on the right track looking at interfacing R directly to a broker API.

You mentioned Heisenberg – very insightful… few people think that way … and while you’re thinking in that vein don’t forget Prince Louis deBroglie (and the impact of Plank’s constant). As above, so below.

Cornelius at ‘trade commander’ provides consultancy for interfacing with IB. There are also various forums depending on your environment.
http://trade-commander.com/

I as well am using random data tests in order to invalidate, not validate my model.
If I find that the model is not stable with any of the tests (historical or random data) – it’s back to the drawing board !

You wrote : “markets I find are innately perverse, in that the more money they give you in the shorter period of time the harder they try to take it back from you”.
I have to say that your approach to financial markets as an entity is very interesting, however I don’t see it that way…

I have to admit I am not familiar with the “Last Trade a Loser” rule you mentioned, does it refer to High Frequency traders that are competing for speed ? where only the faster one will “win” the deal ?

In any case – speed in that sense is not relevant for my model, my models in general are not aimed to compete for “millisecond deals”.
Sure, my execution system must be fast and precise in order to execute the dozens of trades it needs to execute per minute (at peak points), but if it will take 500 ms or even 1 second of delay occasionally – no harm will be done.
The model is based on a lot of small deals that are not dependent on one another, so even if the system will miss a deal completely – no big damage is done. it will never “chase” a deal, if a limit order the system placed is not met – it will simply delete it and move forward.
I do however have a potential slippage problem regarding closing of trades, to deal with this issue I am taking in to account a certain percentage of slippage when testing my models theoretically.

If you want the most reliable data for Interbank FX testing you would have to go to factset. They provide data that is that granular. I do believe that if you want to test at that level, you must have bid ask data. I have collected and scrubbed bid ask data on the majors for at least a two year period. This is more accurate than tic which can only provide a last price or bid or offer. In any of these cases it is not accurate enough to test a short term algo

You say you tested your EA on MT4 and you wish to keep on testing it in order to find eventual flaws. Then one thing you could do is to download historical data from all the MT4 brokers (Alpari, FXDD, FXCM, FXOpen, and so on) and test the EA with all those brokers. Why is that? Because every single broker gives you different historical and real-time data. And that’s not all, if you leave your MT4 platform opened to record data for a while, you will see it’s different form the data you can download from the same broker later on.
About the slippage problem, you should modify the closing condition in your code, or try a different broker. Only some brokers have that problem.

Thank you for your input, I have to say it is a very interesting approach to historical data testing – sounds like a good idea, I’ll implement it.
As for the slippage potential problem, I don’t yet know if I would actually run into this problem, my experience so far has been on only one financial instrument.
Up until about a year ago I was a partner in a small Israeli Hedge fund that operated only on one Instrument in the Israeli Tel-Aviv stock exchange(called: “TA25”), this lasted about 6 years.
In the last year I have developed a new semi artificial intelligent / learning Algo-Trading system capable of learning and trading on any type of financial asset, and now it is in the testing stage.
The logic (system) I am developing is suitable for any financial instrument, including Stocks, Commodities, Indexes and Forex.
The TA25 market capacity is relatively very small in comparison to Forex capacity, I hope I won’t run into any significant slippage problems dealing with larger capacity instruments.

 

 

 

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!
This entry was posted in Quant Development and tagged , , , on by .

About caustic

Hi i there My name is Bryan Downing. I am part of a company called QuantLabs.Net This is specifically a company with a high profile blog about technology, trading, financial, investment, quant, etc. It posts things on how to do job interviews with large companies like Morgan Stanley, Bloomberg, Citibank, and IBM. It also posts different unique tips and tricks on Java, C++, or C programming. It posts about different techniques in learning about Matlab and building models or strategies. There is a lot here if you are into venturing into the financial world like quant or technical analysis. It also discusses the future generation of trading and programming Specialties: C++, Java, C#, Matlab, quant, models, strategies, technical analysis, linux, windows P.S. I have been known to be the worst typist. Do not be offended by it as I like to bang stuff out and put priorty of what I do over typing. Maybe one day I can get a full time copy editor to help out. Do note I prefer videos as they are much easier to produce so check out my many video at youtube.com/quantlabs