# Quant analytics: Can someone help me with the best method to transform a continuous variable to categorical variable

(Last Updated On: February 1, 2012)

Quant analytics: Can someone help me with the best method to transform a continuous variable to categorical variable

Which logistic regression model do you intend to use? If binary logistic: just decide on a cut-point which separates the two categories. In SPSS, you can use the recode method which is available on the Transform menu, or do it via syntax.

If you create more than two categories, you might look at ordinal regression, which is an extension of the binary logistic model, but quite challenging to interpret. I assume your categories would be ordered, so

==

I would urge caution and recommend you reconsider whether you want to really want to “bin” your continuous outcome variable.

Logistic regression is best applied when the two outcomes reflect distinct states (for example, has diabetes vs. does not have diabetes). If you took a continuous variable, like income, and binned it to “over \$40k” and “\$40k or less” you really don’t have distinct states … the difference between \$39,999 and \$40,001 is trivial.

If you are struggling with a skewed outcome variable, I recommend you consider these two alternatives before resorting to binning it:

(1) Use a generalized linear model and select an appropriate distribution (Poisson and Gamma are quite popular); or

(2) Try transforming your outcome variable (such as a log transformation) to see if that makes it “more normal”.

==

==

You can generate a seq of cut-off points and then try to separate the continuous data to binary using the cut-off. Based on each logistic regression, calculate the AUC. Find the highest AUC and the corresponding cut-off. I think that cut-off may be the optimal one to classify your data into binary. 