


HOW DO YOU START A PROFITABLE TRADING BUSINESS? Read more NOW >>>
NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!



HOW DO YOU START A PROFITABLE TRADING BUSINESS? Read more NOW >>>
NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!Viber? Couchbase NOSQL still limited
So who cares about this old video, Couchbase is so severely limited on the client languages it support. The open source edition seems limited . As said, it really comes down to Redis or MongoDB but we are already decided on that one
Also, the video below does not talk about metrics in speed improvement. Just half the servers: big whoop
Join my FREE newsletter to learn more about which NOSQL I like
HOW DO YOU START A PROFITABLE TRADING BUSINESS? Read more NOW >>>
NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!Last call for FLASH SALE on this limited Portfolio Optimization TOOL
That’s right, we are coming down to the wire with hours left on this sale. It is cheaper than your dinner!
Details:
LIMITED FLASH SALE MATLAB Source code and walkthrough video for this Portfolio Optimization Tool
Note: It looks like the source files have been pulled Mathworks.com so this is the only locations you can get this now
See the video below to see how this works with my NEW Matlab script to create an Excel spreadsheet import into this powerful tool
We did our LIVE Meetup event on this tool so you can view it clicking here.
HOW DO YOU START A PROFITABLE TRADING BUSINESS? Read more NOW >>>
NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!Free LIMITED beta release download available for pre production release of Posideon software by Paul Cottrell
http://thestudioreykjavik.com/poseidon/ Paul Cottrell will be starting the production release of Poseidon. The free beta release will be not available after 9/15. Hurry to experiment with the beta version of Poseidon. Note: the beta release is for windows 7 only. JOIN MY FREE NEWSLETTER FOR WHEN THIS SOFTWARE GOES INTO PRODUCTION
HOW DO YOU START A PROFITABLE TRADING BUSINESS? Read more NOW >>>
NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!Here are some limited examples of market inefficiency with Matlab M script source code
There are some limited examples of market inefficiency with Matlab M script source code:
https://www.google.ca/url?sa=t&rct=j&q=&esrc=s&source=web&cd=15&cad=rja&uact=8&ved=0CEEQFjAEOAo&url=http%3A%2F%2Fwww.centerforpbbefr.rutgers.edu%2FJan112008%2520papers%2F72.doc&ei=PzZVU_mJKyA2QWU4YCIDA&usg=AFQjCNF0b1YXvZ76jD4YJFx9GurGNaEEJg&bvm=bv.65177938,d.b2I
http://www.mathworks.com/matlabcentral/fileexchange/9264weakformmarketefficiencytests <– best example but still focus on cointegration with equity against a future (this seems to be the only valid arbitrage strategy unless you want a specific mutual fund for this) JUST functions with no test client code
http://erasmusmundus.univparis1.fr/fichiers_etudiants/6267_dissertation.pdf
Do Dark Pools Harm Price Discovery? http://www.mit.edu/~zhuh/Zhu.html <– no practical for trading but useful for those who want to know
Join my FREE newsletter to see if we implement any of these trading ideas
HOW DO YOU START A PROFITABLE TRADING BUSINESS? Read more NOW >>>
NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!Data stream sampling algorithm(s) when sample storage is limited and data storage is not?
The problem is: I have limited sample storage and unlimited data storage. Data is ingested :
1. in a realtime.
but can be ingested
2. incrementally (periodic batch mode)
Simple approach “take every n/th sample” does not work for obvious reason. I do not want to invent bicycle myself (I can of course 🙂 and would like to get your opinions on this subject.
—–
Additional requirements:
1) Minimum number of read operations from data storage (its random I/O). Ideally – no random reads
2) Operations which are allowed:
put_sample_storage (item),
delete_sample_storage(item),
get_sample_storage(item_key),
put_data_storage(item)
Sample storage is in RAM, Data storage is on disk(s)
—–What is the purpose of the sample storage? Are you trying to compute some aggregate function of the data (e.g. moving average, min/max/median, other statistics)? Why doesn’t “take every n/th sample” work?
Your initial question suggested that data is streaming in, and you want to process it online, but then in your additional requirements, you mention doing reads from data storage. You don’t seem to have such a read operation in your list of “Operations which are allowed”, suggesting you don’t want to do such reads.
There are lots of online algorithms for computing various functions of streaming data, either exactly or approximately; is that the sort of thing you’re looking for?
——
No, I am not into continuous queries stuff. Its more like RealTime Big Data Analytics stuff. To make myself more clear:
I have N data items on disk
I have M data items in memory
M << N
Data items are from continuous real time data stream. All data MUST be saved on disk. What is the most efficient algorithm which allows to keep uniform sample of N items in memory? By uniform sample I mean that the probability of any data item to be presented in a memory is ALWAYS M/N (where N is growing and M is fixed).
Correct but absolutely not optimal algorithm of M/N sampling data:
Randomly read M items from N item set.
—
Ah, OK, now I understand! Here’s an algorithm I think should work:
You start out with an empty sample set, and you start seeing a stream of data items.
If you don’t yet have N samples in the set, take the next item into the set with probability 1.0
If you already have N samples, and you’ve seen M items in the stream so far, then when you see the M+1st item, with probability N/(M+1) you should pick a random item from the N samples to discard and replace with the M+1st item.
——
N(t) – total number of events (data items) observed so far (on disk); t – time
M – maximum number of data items (in memory)
Simple algorithm of replacing existing item in M set with new one with probability M/N(t) does not work because the older item is the more chances it has to be evicted from M set. Basically it means that we will have data skew in our M set (more recent items and less old ones)
—–
But the probability for any given item to be evicted *should* go up as M increases!
After all, the first N items initially got selected into the sample set with 100% probability; as time goes by, the chance of that item staying in the set should be N/M, which goes down as M grows.
If you want uniform sampling, then you don’t want to preferentially keep early items in the set.
—
Ugh, I just realized that in my comments, I’ve reversed the meanings of N and M, sorry!
Anyway, my argument still stands, if you just swap M and N.
So the correct algorithm is:
Start out with empty set, and start seeing incoming stream of items.
If you don’t yet have M items in memory, the next item gets added to the sample set.
Once you’ve got M items in the sample set, the N+1st item should replace one of those items with probability M/(N+1).
Again, I apologize for mixing up the variables!
——
Hmm, you are right. Simple approach actually works. Need to double check it.
—–
If you just do a rolling algorithm where you keep the latest M elements in memory, you may not get the results you want.
You’re being pretty vague in describing the problem which is making it difficult to solve.
Looking at the solution … you fill the memory with set M which is your first set of data. As data grows, then you replace an element based on the probability of M/(N+1). This would mean that as the overall data set N grows, the odds of replacing an inmemory data set with a new value decreases over time.
So if I have a thousand elements in memory, and I now have a billion elements on disk, the next element I read has a 1 in a million chance of replacing an element in memory?
—–
At any given time I need to have random sample (M elements in memory) of a whole data set N (from disk). It means that at any given time T, the probability of any element from N(T) being in memory as well (in M set) is M/N(T). N(T) – monotonic function of T.
This solution is correct. For some reasons I have discarded it earlier (I thought it results in data skew in memory but it does not).
—–
This algorithm is a restatement of the standard shuffle algorithm for this. Usually the way it is implemented is that if you have collected M samples and seen N data points, you pick a location i for the new sample from [0…N1]. If the location is in [0… min(MAX_SAMPLES, M)], then put it there. Update M to be max(i, M).
This is fine for sampling, but for analytics you often want mean and various quantiles such as min, max, median and quartiles. The 5th and 95th percentiles are also common. Mean can be computed using Welford’s algorithm with only two storage locations and the quantiles can also be estimated quite accurately with a few storage locations per statistic of interest.
Apache Mahout has all of these algorithms. See http://mahout.apache.org
——
Other popular name for this technique is “Reservoir Sampling” by J. Vitter in 1985 (http://www.cs.umd.edu/~samir/498/vitter.pdf)
—–
In modern practice it is very, very rare to use anything but the inmemory variant described by Vitter. Also, the optimizations that give the outofcore algorithm Z are probably impossible to discern from the simpler outofcore algorithm X.
—–
keeping statistically correct sample (sub set of a larger set) of data in memory is useful for quick (and dirty) estimate of various aspects of data distribution. What you described two posts earlier is not enough because you may not know in advance what kind of mean (avg, percentile etc) value will you be interested in: mean salary in Brooklyn in 2006 in department stores or mean salary in NY for gas station workers in 2011.
HOW DO YOU START A PROFITABLE TRADING BUSINESS? Read more NOW >>>
NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!