Quant analytics: How to Transform a distribution to normal
A part from the box-cox transformation, are there other methods for transforming a distribution to normal?
I interpret the problem you posed as
X= your data set (a column of numbers)
You are looking for a function f(x) such that Y = f(x) ~ normal( some mean, some sd)
Box-Cox is one possibility – however there are many other possibilities depending on what is the distribution on your X
For example –
1) if X~Lognormal distribution, then log(x) ~Normal
2) If your X is like correlation coefficients (ranges between -1,1) – Fisher’s transform will convert it to normal – f(r) = 0.15 * ln( (1+r) / (1-r) )
3) In fact, based on theory of probability distributions – more bizzare choices of X and f(x) can be constructed.
– For an applied Statistician – you may need to focus on what is your original distribution of X and go from there.
For crude purpose you can substract the sample mean & divide by sample s.d.The resulting data will be approx Normal(0,1) for moderately large sample size
If you’re working in one dimension, you can always — given a value drawn from your distribution — calculate the fraction of values lower than this value (i.e. get the value of the cumulative function of your distribution) and then use the inverse of the cumulative normal distribution to look up the transformed value. The transformed values will correspond to a normal distribution by construction.
Maybe you could tell us *why* you want to transform your distribution into a normal one ?
• I agree with Andre’s question. Is it the only way to go that you need to let your data follow a normal distribution, say after any transformation process? In case it is impossible to achieve this goal, there are number of methods available in literature can handle non-normal data
Why? And I might add, it would be useless unless you could transform the results obtained after using the normal back to the proper domai
ransformation depends on the existing distribution. Statistician performs diagnosis before treatment as a doctor performs a diagnosis before treatment. You can try a series of transformations and examine the test Kolmogorov – Smirnov If the data close to a normal distribution. Choose the best transformation. Besides there are methods of non-parametric tests with powerful statistical distributions are not normal.
If we assume that you examine the days of hospitalization in – patients. You’ll find that a large population with 0 days then? Normal distribution of days of hospitalization. In this case the solution is not a transformation but the analysis in two separate phases. 0-1 as a function of logistics and then treated normal function during hospitalization. In short – there must be a first diagnosis of the underlying data is correct before the solution.
NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!