Quant Analytics: It is better to predict density or probability?
I am working on histograms because histogram is a very parsimonious way of storing a distribution of observed values. In order to overcome the problem of the choice of the width of bins, I devised a method where, chosen the desired number of bins, the domain is binned into bins that have different width. OK, nothing new, just a piecewise interpolation of the distribution function. But I intend to compare the procedure against other methods. The kernel density estimation looks the more competitive (a lot of references stated its superiority in being consistent and having a fast convergence rate to the “true” density). However, I performed a test to asses the superiority of KDE vs “my histogram”. I generated 1million points from a mixture of two normals. I performed the KDE with the RBF kernel storing the distribution function estimated into 500 points. I estimated “my histogram” with just 16 bins. Well, after that I simulated 10k random queries about the probability of an interval of values. Whit my great surprise, “my histograms” is more accurate (MSE) in the prediction of the probability than the KDE. So, my question is: “Is it more correct (and\or useful) to predict density or probability?”If you are curios of that, I have implemented the procedures in MATLAB.
–are any side-by-side visualizations available for posting for the non-MATLAB users in this group? A picture might help me better understand the differences between the two distributions you are referring to. Thanks!
NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!