Quant analytics: Run test and variance ratio test
I have applied two tests for randomness to a few stocks – run test and variance ratio test – applying either to the close, the high or the low series of the stocks.
Just to give some details about the procedure, both the functions apply to univariate time series, the runstest function expects as input the returns (or the log returns) of the series, while the vratiotest function uses the prices. The vratiotest function provides also the option to test against either an i.i.d. random walk or a heteroscedastic random walk. I have chosen the latter option (its null hypothesis is stricter).
Well, both the tests cannot reject the null hypothesis (the series is a random walk) when applied to the close series, but they always reject the null when applied to the high or the low series of the same stocks, and with very low p-values too.
Now, I’m sure there is a very simple reason for this, that, having no experience in the field, I haven’t already seen by myself. Could you please explain me why the high and low series seem ‘less random’ than the close, and if ‘this property’ can be ‘exploited’ in some way?
This is really interesting observation and first of all I have to notice that no one can fully explain this and experience is not a solution here, as any liquid market has near-to-random prices just because overall complexity of forces influencing each quote.
Regarding your question – my guess is that high/low sequence holds significantly more objective numbers as this is aggregation of day data and there are option/round number/… levels that require some effort to break and the statement that some price level is (not) breached – the information that is actually contained in high/low data – is basically a signal of overall high significance.
On the other hand closes are just a discretization of this near-random price sequence. A term ‘just’ here is a little bit tricky because of consequences of night/holiday position reductions, but as this have somewhat limited influence, the overall randomness cannot be eliminated by this effect only (and I can’t name anything else that draw closes from mere discretization).
I was thinking about something similar, and I was searching for some sort of confirmation.
In the mean time, I have applied also the Ljung-Box-Pierce Q test, searching for departure from randomness based on the null hypothesis of no serial correlation in the returns. This test is usually applied to the residuals after a model is fitted to data, searching for lack of fit (incomplete fit). Applying it to the returns I assumed, as the null hypothesis, that no model could ever be fitted to the data. As before, I have applied to the daily close, high and low series, and also tested with different number of lags [5 10 15 20].
The null was rejected with very low p-values for the high and low series, and it was rejected also for the close series, at alfa = .05, for 7 stocks out of the 10 I have tested.
First of all, I would like to know whether this procedure for randomness test can be considered correct or not. Secondly, do these tests confirm that prices are a quasi-random walk, but not completely random, thus rejecting the EMH?
I have not done research specifically on this subject, but I have a conjecture. Highs and lows are often triggered by stops and limits which are often set at round numbers, say even dollars or rounded dollars (5s, 10s). This will induce some type of serial correlation into the data series. I am actually a little surprised that you did not see the rejection of iid on the close series, as the bid-ask bounce is a well-known effect. I suppose that is an older effect pre-decimaliation when the pricing was more discrete. I would love to hear others’ thoughts on this conjecture.
I actually saw the rejection of the i.i.d. on all the close series. But that wasn’t a surprise, because I was already expecting the presence of heteroscedasticity. That’s why I have tried the variance ratio test for the hypothesis of a heteroscedastic random walk rather than an i.i.d. (just choosing a different option in the function).
The first research I saw on your subject dates from the 1960’s. However, even Bachelier in 1900 expressed in his equations the quasi-randomness of stock prices.
You are starting with the wrong premise and that is that stock price distributions are Gaussian in nature but they are not. So applying runs tests or Ljung-Box test will tend to give wrong answers. Stock price distributions are Paretian in nature with fat tails (outliers) that can be of 6-sigma and often a lot more. This will slap any test based on the assumption of Gaussian distribution out of whack.
In your tests, you also assume that the price series are without data errors, glitches or anomalies. This too is too relaxed. Most data errors, and believe me there are a lot, occur most often on the high and low due to glitches, delayed or out of sequence quotes. The open and close are less venerable to these errors simply because they have a lot less time to happen and therefore a much smaller window of opportunity to appear in data series.
In my opinion, there is no benefit you can extract from your observations on the differences you found in your data series, just as the numerous other researchers of another age (60’s to present) have also found.
Sorry to be so direct. I hope you continue your research, I think we all go through that phase in our way to where we want to go.
I understand your point regarding the inconsistency of the Ljung-Box test when applied to fat-tailed distributions, but I still need some clarification regarding the runs test. I thought the runs test would have been immune to excess kurtosis, since it takes into account only the sign of the returns and not their module.
Another clarification, when you speak of glitches and anomalies, you are not referring to spikes, actual sudden increases or crashes of the price, right? but you are saying that the data are simply wrongly reported (the price never moved at the values reported for the high and low of that day, not even for a second).
Let’s say you have a million machines connected to their respective brokers around the world all being fed the aggregate of all trades on the various exchanges were trades are executed electronically in a fraction of a second or entered manually in the system to then be re-disseminated to all connected clients. So what you see on your monitor are all the quotes that arrive in the sequence (hopefully) that they were entered in the system. The queue is not totally sequential, a lot of the trades are out of order, reported late or with data errors not to mention canceled and busted orders or zero volume trades. I think the market maker and specialist on the floor has some 15 seconds to report a trade, otherwise he/she has to fill a form and then enter the trade in the system. This is often used to hide a big prearranged trade in order not to show one’s hand. All this is not addressing the state of your internet connection, the glitches in routing, the possible delayed transmissions put on high bandwidth users like traders.
There are many sources to have what you see as “correct” data to be just an approximation of what the market is really doing.
Look at the data from last year May 6th Flash Crash. Trades that could not probabilistically happen in 100 millions years were done by the thousands. The data was so erratic that you could see big names stocks exchange hand at a penny. Even exchange traded funds traded at $0.00.
To have the cleanest data possible I would recommend only using the open and close just because they have a much smaller window in time for something to go wrongFACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!