Quant analytics: Dissertation is about Data mining in quant analysis. Huge database? I don’t have data?

(Last Updated On: November 30, 2011)

Quant analytics: Dissertation is about Data mining in quant analysis. Huge database? I don’t have data?

(I’m student of IT Management-Master) Thanks



How huge? # iof records or number of attributes (columns)?

“Bag of Words” is 8 Million records and 100K attributes

Some other large-ish datasets (> 1MM records), like previous KDD Cup datasets, are at the UCI (Univ. Of CA – Irvine) Machine Learning Repository:

Good luck!



I would strongly suggest talking to faculty and other students at your institution about possible collaborations, because I think a collaboration with the “owner” of some data would be a much better thesis project than just downloading some random government or academic data set. Somebody there may have a data set they could use help analyzing, in which case you could kill two birds with one stone — doing something interesting for your thesis AND helping them. Helping somebody else find knowledge in their data is useful practice for what most data analysts do for a living! Nearly everything I have accomplished since completing my own dissertation has been done in collaboration with others.

If you want specific database suggestions from us, you should tell us more about your domain knowledge and we can make better suggestions. My own field is genomics, so I could point you to plenty of biological data sets that are available on public servers. No doubt others could point you at interesting data sets in their domains. Good data mining is not done in a vacuum, in my experience it’s usually a collaboration between somebody with domain expertise and somebody who does the data mining. Nor does it have to be the case that the data miner has dramatically more knowledge of statistical tools than their collaborator, it could just be the case that the collaborator does not have time to play around with the data!

In any case, an interesting thesis on data mining does not necessarily have to be done on a huge data set, the most interesting database for your work might not be the biggest one. What matters is that you demonstrate the ability either to develop innovative algorithms or use existing algorithms in an innovative way to learn from some data set.



NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!
Don't miss out!

You will received instantly the download links.

Invalid email address
Give it a try. You can unsubscribe at any time.


Check NEW site on stock forex and ETF analysis and automation

Scroll to Top