 # Bayesian Modeling Issue for potential quant analytics

(Last Updated On: August 5, 2011)

Bayesian Modeling Issue for potential quant analytics

Modeling Question
How do you compare Bayesian statistics posterior probabilities with standard probabilities.
A simplified example will illustrate the problem:
Suppose we are trying to build a Bayesian model to predict who the winner of a horserace. We will use 8 independent variables:
1. best time in past
2. last race finish
3. age of runner
4. post position
5. next fastest runner
6. number of races run in last month
7. time last race
8. number of years competing
The dependent variable “Win” is binary ({yes, no}
We have the racing history (i.e. the values of the 8 iv’s) and the dv) for each entrant in this race.
We build a model from a large amount of racing historical data that estimates the conditional (i.e. Bayesian) probability of winning the race for each entrant given the most predictive subset of the iv’s. We also get the track odds (which can be converted to a win probability) for each entrant which we call the subjective probabilities.
The subjective win probabilities sum to 1. However the conditional probabilities don’t sum to 1. This is because each entrant’s objective probability is built on a subset of anywhere from 1 to 8 of the iv’s. The following table is an example:

runner Modeled (Bayesian) probability of win.
Objective Probability Odds
Subjective Probability
1 .7 .6
2 .5 .3
3 .3 .06
4 .2 .01
5 .2 .03
Total n/a 1.0
The Bayesian model indicates:
that runner 1 has a 0.7 probability of winning, given the values of iv 1 and iv3
that runner 2 has a 0.5 probability of winning, given the values of iv2, iv3 and iv8
that runner 3 has a 0.3 probability of winning, given the values of iv2, iv7 and iv8
etc.
Each Objective probability comes from a subset because that subset is considered the most accurate predictor.
There is a payoff in the race which works like the pari-mutuel system where the winning betters share proportionately to the proportion of their bets.
Theory dictates that any horse with a ratio of the objective probability to the subjective probability > 1 is a “good” bet.
Forgetting such real world issues as house take, breakage etc. how can we get an estimate of the ratios considering that the Bayesian probabilities don’t sum to 1

—–

There is no such thing as “Bayesian probability.” Probability is probability, whether you have exercised Bayes’ law or not.

And, if your probabilities do not sum up to unity, then you have violated the third axiom of probability and done something wrong.

—-

I see your point but what I think you are missing is that each of the Bayesian probabilities is with respect to a different subset of the sample space. They are conditional each with a possibly different condition and thus do not have to add up to one.
Imagine if you had a boxing match and solicited the opinion as to the winner from a group of 1000 lay people. Say 600 picked boxer a and 400 boxer b. These probabilities of 0.6 and 0.4 do add up to one.
Now suppose you asked 100 experts as to their pick and the probability of that pick being true. Expert 1 might say I pick A with .7 probability, expert 2 says boxer A with probability .6, etc. Obviously their probabilities don’t sum to unity because the posterior conditions (i.e. the factors they consider and their subjective weights) are different.
Bayesian probabilities are measures of degree of belief while the lay group’s probabilities have to sum to unity and are thus more like frequentist probabilities.
Each set of probabilities (lay and expert) contains information. How do we take all the information into account?
That’s the essence of what I’m asking.

—–

You cannot add over the conditions. That violates the fourth axiom of probability.

I teach courses on how to do all this and solve such problems as you posed fully consistent with the axioms of probability. Your firm might benefit from such a course. Contact me if you are interested.

—–

Again thanks for your comments. Your responses seem to indicate that I am trying to violate the laws of probability. I’m not. I’m fully aware of these laws. What I’m trying to do is use the information as I presented in a mathematally rigorous rigorous manner. If you think you have something positive to add I’d be glad to contact you or please send me your contact information. At this point just stating the elementary laws of probability is not offfering any help.

—–

If you follow the rules of Bayesian analysis correctly, you will end up with a set of probabilities that add to 1. If you are trying to use quantities that do not sum to 1; these are not the probabilities you should be using.

—-

Thanks, this is what I have been trying to say. Only thing I would change in your statement is from “Bayesian analysis” to “probability theory.” This problem is truly just a probability problem.

—–

Great discussion and arguments. The author presented a good argument which is really spot on. I’m a cyber security architect and I’m dealing with vulnerabilities and risk on an hourly basis. He  presented a fantastic argument from an analysis verses theory based mitigation process. One of my jobs in protecting high end digital information systems is to present and mitigate the best we can the probabilities. Replace “horse racing” or “boxing” with information systems and you have the same arguments for the with the same conslusions (probablity + Vulnerability = Risk). Now this was a short edification of your arguments but either way I wanted to let you know that your discussion is in many unlikely professional fields. I thank you for your intersting articulation on probability.

—–

I’ve recently spent a lot of time defining and refining a predictive model which addresses something very similar to your horse racing example. In our case, we score potential prospects for fundraising, so the resulting binary would be “Will donate” {yes,no}, and we use several independent variables pulled from a solicitation and donation history.

While it is impossible to accurately predict any single individual outcome, such as “this horse will win this race” or “this donor will give this time”, we’ve had tremendous success in being able to rank our prospects according to a “Likelihood to Win/Donate”. For example, if we solicit a group of 1000 prospects that have a 40% LTR (Likelihood to Respond), while we don’t know specifically which individuals in that group will be our donors, we know we are going to get about 400 donations. And the same relationship between score and result continues to apply all the way down the LTR ranking.

It would be a pleasure to discuss methodology and exchange ideas with you.

—-

I’m guessing the problem is with your model. It sounds like (but I could have misunderstood) that you are taking 8 individual horses and creating 8 independent binary win/loss’s. These values have no constraint to sum up to 1.

If that is the case, what you really want to do/know is: given a set of particular 8 horses, which of those 8 *dependent on the others in the race* is most likely to win. I.e., having a different horse in a race must impact the other horses probability of winning or losing.

If that sounds like it is the case I can elaborate further. If not can you clarify your model a little more please.

8 days ago

Incidentally, you mention the problem of having N different experts each assigning a different probability (or better a p. distribution) of winning, and then trying to figure out how to merge them into a single result. The simplest way to handle that would be to treat those as N different measurements of p. The associated error of each estimate determining its weight (or you can just set them all equal). They should not just be summed together.

——

you cannot violate kolmogorov axioms: If the Probabilities add to a number greater than 1, you are adding some joint events more than one time. On the opposite if P adds to number <1 you are forgetting some event.

I see your point but what I think you are missing is that each of the Bayesian probabilities is with respect to a different subset of the sample space.”: THIS IS THE POINT!!! you are considering different event spaces! In this condition you are dropping the Lebesgue conditions to define a simple “measure”.
Just to be practical: you can call them “adimensional scores”. to convert them into probabilities (or much better convergent to probabilities) you can add statistics over these different subset space events (considering them like a new Event Space), and take like probabilities the frequencies… it is a dirty trick but it is an approach to reuse most part of you work!
…Thanks for your question… it recalled me the exercises proposed by my prof of “Measure theory” 🙂

—–

Others here obviously are obviously more eloquent about the technical aspects than I am but let me go back to your experts in the boxing match. Each expert has a subjective view of the probability of boxer a winning (and likewise boxer b). To get a collective ‘experts’ view one way would be to let each expert trade contracts in open trading (at say price \$p for a \$1 pay off if boxer a wins). The price would converge to a point where there were as many sellers (who think p>probability that boxer a is going to win) as buyers who think the opposite. The price would be the collective ‘experts’ probability.

From memory, this is in effect how a ‘winner takes all’ prediction market would work.

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat! 