Quant analytics interaction terms: We need lower terms despite significance when higher order polynomial is? Tradeoffs?

For some of the respondent level models I am working on, we are trying to test for different interaction effects between media. I have been working on the premise of testing for significant first order effects, and testing for higher order interaction effects. In some models, I have tested for interactions even if the first order effects were insignificant. My question is: if x1*x2 term (interaction) is significant, should we need to have both x1 and x2 terms in the model irrespective of their significance? In the case when both x1 and x2 are insiginificant but x1*x2 is, what is the best way to specify the model? My feeling is that leaving out the first order effects (if they are insignificant) will give us biased coefficients for the interaction terms. Any thoughts?

==

You are exactly right that if there is ANY first order effect, it will bias your estimates of interaction. (My mantra: Even if an effect is not stat. significant in a particular test and data set, it may still be large enough to be important.) So keep first-order effects.

The only exception I’m aware of is when prior theory tells you that a particular first order effect is zero. For example, surface area is equal to height x width x a constant. In that situation, height alone, and width alone, should be left out. But it’s rare to have such a clear model.

==

I think that bias is what we have seen when we left out the first order terms for being insignificant while keeping the interaction terms in. So, with interaction terms, it is a tradeoff between bias (due to insignificant variables in the model) in estimates and variance of estimates (due to correlated terms coming into the model), right.

==

As more of a data miner than a statistician, I would ask which model performs better on a held back sample of the data? That is the model with the least bias. Interpretability is another story – it’s harder to say what models with both first and second order terms are telling you. But I also like to augment with a decision tree for insights into interactions between variables

==

Significance is not the important thing in this case (maybe not in any case). It is very rare that models with interactions but without the constituent main effects make sense. Here’s an example of such a model:

case 1

IV1 = 10, IV2 = 10

Predicted b0 + 100b1b2

case 2

IV1 = 10, IV2 = 0

predicted b0

case 3

IV1 = 0, IV2 = 10

predicted b0

case 4

IV1 = 0, IV2 = 0

predicted b0

so, you can see that if either IV is 0, then with only interactions, the predicted value is the same (by force, regardless of the data). This rarely makes sense.

You can plug in other numbers and see what the model is forcing to have happen.

**NOTE** I now post my

** ***TRADING ALERTS* into my personal

FACEBOOK ACCOUNT and

TWITTER. Don't worry as I don't post stupid cat videos or what I eat!