Tag Archives: data reduction

Quant analytics: how to check that Data reduction is correct after applying PCA to a data set

Quant analytics: how to check that Data reduction is correct after applying PCA to a data set

==

 

! You should check that the Cumulative Proportion of Variance of the number of dimensions you decide to take is enough (about 80%: it depends on the field). On R, you clearly see that with the “summary” command, where you see the proportion of variance due to each component. In a few words, you can reduce data if you do not lose too much information: so if you decide to take the two principal components, their cumulative proportion of variance should be enough in order to well represent the original data set. Of course the cumulative proportion reaches 100% only if you take all the dimensions, but very often only a couple of them are necessary to explain a big part of the original data. Hope this helps!

 

 

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!

can anyone help for quant analysis to implement Principal Component Analysis for satistical data reduction

can anyone help for quant analysis to implement Principal Component Analysis for satistical data reduction

 

==

There are LOTS of online resources that walk you through this… find the one that corresponds to the software you’re using.

For example, if you’re using SAS, consider:
http://support.sas.com/publishing/pubcat/chaps/55129.pdf

Or, if you’re using SPSS, consider:
http://www.unt.edu/rss/class/Jon/SPSS_SC/Module9/M9_PCA/SPSS_M9_PCA1.htm

If you’re using another statistical program, consider:
http://www.google.com 😉

 

==

can anyone help to CONJOINT ANALYSIS AND PERCEPTUAL MAPPING. using SAS and
SPSS.

 

==

If you are not into programming, you could try some Excel add-ins, e.g., XLSTAT

XLSTAT is commercial, but maybe you can try this one:
http://sourceforge.net/projects/imdev/
Very easy is using PCA in RapidMiner, which is also free

 

==

and there is also cookbook:
http://www.simafore.com/blog/bid/62910/How-to-run-Principal-Component-Analysis-with-RapidMiner-Part-1

 

==

as usual google is your friend and you should ask it before bothering groups with trivial questions.
Besides what has already been recommended here, I really like Numerical Recipes since it combines explaining mathematical concepts with actual code that works.

 

==

Lots of people have pointed you at tools for this technique.

But be aware that PCA is not a good tool for data reduction because it gives a coefficient to every variable. So it doesn’t reduce the variables needed. (unless you start playing with some arbitrary rule for deleting variables…).
Also if you have information about causality in the variables then that is ignored, eg some variables could be interrelated and others obviously not.

Also it assumes there are no subgroups in your sample (if there are the PCs are confounded with group differences). In practice this is often not true and then it is better to identify the variables determining the groups… which is a different problem.

 

==

John Parker’s SAS reference is a great one, very straight forward. Keep in mind even when used successfully in models, you still have to have a clear interpretation of the variable(s) you end up keeping. In many cases if you can not clearly interpret the final model output all the PCA work might have been for nothing (for example, a policy or economics paper for a peer reviewed journal). If you only have 20 or 30 variables there are better ways to handle multicolinearity and data reduction.

 

 

==

For most practical situations PCA is pretty useless.

 

==

It is useful. But these days with access to Neural Networks, Bagged and Boosted tress and SVM’s, variable reduction is redundant, unless of course your computing power is not up to the task.

 

==

There are some open source codes for free written in MATLAB. It is very easy to perform.

 

==

I have always used the statistical software R to perform PCA (it is free). You can simply use the command “princomp”. However, you can find lots of manuals on Internet about R.

 

 

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!