Summary
Objectives:
A new data-analysis strategy is proposed to solve the problems of selecting interaction terms in linear regression on the one hand, and of statistically testing the significance of regression trees on the other hand.
Methods:
The proposed strategy combines two data mining techniques: regression trees and regression analysis with optimal scaling (CATREG). The method traces small regression trees using the bootstrap and integrates the results as interaction variables (called “trunk variables”) into CATREG.
Results:
An application to data from cardiac patients shows a relative increase of 19% variance accounted for (16% cross-validated variance), by the CATREG model including the trunk variables compared to the model excluding these variables.
Conclusions:
This study indicates that trunk variables can be useful to model interaction effects in prediction problems.
Keywords
Linear Regression - Regression Tree - Categorical Data - Optimal Scaling - Interaction