Quant analytics: Transform a distribution to normal
A part from the box-cox transformation, are there other methods for transforming a distribution to normal?
I interpret the problem you posed as
X= your data set (a column of numbers)
You are looking for a function f(x) such that Y = f(x) ~ normal( some mean, some sd)
Box-Cox is one possibility – however there are many other possibilities depending on what is the distribution on your X
For example –
1) if X~Lognormal distribution, then log(x) ~Normal
2) If your X is like correlation coefficients (ranges between -1,1) – Fisher’s transform will convert it to normal – f(r) = 0.15 * ln( (1+r) / (1-r) )
3) In fact, based on theory of probability distributions – more bizzare choices of X and f(x) can be constructed.
– For an applied Statistician – you may need to focus on what is your original distribution of X and go from there.
For crude purpose you can substract the sample mean & divide by sample s.d.The resulting data will be approx Normal(0,1) for moderately large sample size.
If you’re working in one dimension, you can always — given a value drawn from your distribution — calculate the fraction of values lower than this value (i.e. get the value of the cumulative function of your distribution) and then use the inverse of the cumulative normal distribution to look up the transformed value. The transformed values will correspond to a normal distribution by construction.
Maybe you could tell us *why* you want to transform your distribution into a normal one ?
Is it the only way to go that you need to let your data follow a normal distribution, say after any transformation process? In case it is impossible to achieve this goal, there are number of methods available in literature can handle non-normal data.
Why? And I might add, it would be useless unless you could transform the results obtained after using the normal back to the proper domain.
Transformation depends on the existing distribution. Statistician performs diagnosis before treatment as a doctor performs a diagnosis before treatment. You can try a series of transformations and examine the test Kolmogorov – Smirnov If the data close to a normal distribution. Choose the best transformation. Besides there are methods of non-parametric tests with powerful statistical distributions are not normal.
If we assume that you examine the days of hospitalization in – patients. You’ll find that a large population with 0 days then? Normal distribution of days of hospitalization. In this case the solution is not a transformation but the analysis in two separate phases. 0-1 as a function of logistics and then treated normal function during hospitalization. In short – there must be a first diagnosis of the underlying data is correct before the solution.
Stephen Few has written 3 that are very helpful – Show Me The Numbers is one. The others are at work & I’m not sure of the names.
There was an interesting talk at SAS Global Forum by Walter Stroup of the University of Nebraska (one of the authors of SAS Sytem for Mixed Models). Now that we have tools like generalized linear mixed models in which we can use link functions to many different non-normal distributions, he argued that it’s less important to limit ourselves to models that assume residuals are normally distributed.
Link to Dr. Stroup’s paper:
FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!