Methods Inf Med 2006; 45(01): 44-50
DOI: 10.1055/s-0038-1634035
Original Article
Schattauer GmbH

Investigation on the Improvement of Prediction by Bootstrap Model Averaging

N. Holländer
1   Institut für Medizinische Biometrie und Medizinische Informatik, Universitätsklinikum Freiburg, Freiburg, Germany
,
N. H. Augustin
2   Department of Mathematical Sciences, University of Bath, Bath, UK
,
W. Sauerbrei
2   Department of Mathematical Sciences, University of Bath, Bath, UK
› Author Affiliations
Further Information

Publication History

Publication Date:
06 February 2018 (online)

Summary

Objectives: We illustrate a recently proposed two-step bootstrap model averaging (bootstrap MA) approach to cope with model selection uncertainty. The predictive performance is investigated in an example and in a simulation study. Results are compared to those derived from other model selection methods.

Methods: In the framework of the linear regression model we use the two-step bootstrap MA, which consists of a screening step to eliminate covariates thought to have no influence on the response, and a model-averaging step. We also apply the full model, variable selection using backward elimination based on Akaike’s Information Criterion (AIC), the Bayes Information Criterion (BIC) and the bagging approach. The predictive performance is measured by the mean squared error (MSE) and the coverage of confidence intervals for the true response.

Results: We obtained similar results for all approaches in the example. In the simulation the MSE was reduced by all approaches in comparison to the full model. The smallest values are obtained for bootstrap MA. Only the bootstrap MA and the full model correctly estimated the nominal coverage. The backward elimination procedures led to substantial underestimation and bagging to an overestimation of the true coverage. The screening step of bootstrap MA eliminates most of the unimportant factors.

Conclusion: The new bootstrap MA approach shows promising results for predictive performance. It increases practical usefulness by eliminating unimportant factors in the screening step.

 
  • References

  • 1 Sauerbrei W. The use of resampling methods to simplify regression models in medical statistics. Applied Statistics 1999; 48: 313-29.
  • 2 Breiman L. Heuristics of instability and stabilisation in model selection. The Annals of Statistics 1996; 24: 2350-83.
  • 3 Sauerbrei W, Schumacher M. A boostrap resampling procedure for model building: application to the Cox regression model. Statistics in Medicine 1992; 11: 2093-109.
  • 4 Altman DG, Andersen PK. Bootstrap investigation of the stability of a Cox regression model. Statistics in Medicine 1989; 8: 771-83.
  • 5 Blettner M, Sauerbrei W. Influence of modelbuilding strategies on the results of a case-control study. Statistics in Medicine 1993; 12: 1325-38.
  • 6 Chatfield C. Model uncertainty, data mining and statistical inference. Journal of the Royal Statistical Society, Series A 1995; 158: 419-66.
  • 7 Draper D. Assessment and propagation of model selection uncertainty (with) discussion. Journal of the Royal Statistical Society, Series B 1995; 57: 45-97.
  • 8 Hoeting JA, Madigan D, Rafferty AE, Volinsky CT. Bayesian model averaging: A tutorial. Statistical Science 1999; 14: 382-417.
  • 9 Holländer N, Sauerbrei W, Schumacher M. Confidence intervals for the effect of a prognostic factor after selection of an ’optimal’ cutpoint. Statistics in Medicine 2004; 23: 1701-13.
  • 10 Augustin N, Sauerbrei W, Schumacher M. The practical utility of incorporating model selection uncertainty into prognostic models for survival data. Statistical Modelling 2005; 5: 95-118.
  • 11 Buckland ST, Burnham KP, Augustin NH. Model selection: an integral part of inference. Biometrics 1997; 53: 603-18.
  • 12 Breiman L. Bagging predictors. Machine Learning 1996; 26: 123-40.
  • 13 Johnson RW. Fitting percentage of body fat to simple body measurements. Journal of Statistics Edication 1996 4. 01 See also http://www.amstat.org/publications/jse/v4n1/datasets.johnson.html
  • 14 Sauerbrei W. Variablenselektion in Regressionsmodellen unter besonderer Berücksichtigung medizinischer Fragestellungen (Variable selection in regression models with application in medical research). Dissertation 1992. Germany.: University of Dortmund, Dortmund;