Methods Inf Med 2005; 44(04): 561-571
DOI: 10.1055/s-0038-1634008
Original Article
Schattauer GmbH

Building Multivariable Regression Models with Continuous Covariates in Clinical Epidemiology

With an Emphasis on Fractional Polynomials
P. Royston
1   Cancer Division, MRC Clinical Trials Unit, London, UK
,
W. Sauerbrei
2   Institut für Medizinische Biometrie und Informatik, University Hospital of Freiburg, Freiburg, Germany
› Author Affiliations
Further Information

Publication History

Publication Date:
06 February 2018 (online)

Summary

Objectives: In fitting regression models, data analysts must often choose a model based on several candidate predictor variables which may influence the outcome. Most analysts either assume a linear relationship for continuous predictors, or categorize them and postulate step functions. By contrast, we propose to model possible non-linearity in the relationship between the outcome and several continuous predictors by estimating smooth functions of the predictors. We aim to demonstrate that a structured approach based on fractional polynomials can give a broadly satisfactory practical solution to the problem of simultaneously identifying a subset of 'important' predictors and determining the functional relationship for continuous predictors.

Methods: We discuss the background, and motivate and describe the multivariable fractional polynomial (MFP) approach to model selection from data which include continuous and categorical predictors. We compare our results with those from other approaches in examples. We present a small simulation study to compare the functional form of the relationship obtained by fitting fractional polynomials and splines to a single predictor variable.

Results: We illustrate the advantages of the MFP approach over standard techniques of model construction in two real example datasets analyzed with logistic and Cox regression models, respectively. In the simulation study, fractional polynomial models had lower mean square error and more realistic behaviour than comparable spline models.

Conclusions: In many practical situations, the MFP approach can satisfy the aim of finding models that fit the data well and also are simple, interpretable and potentially transportable to other settings.

 
  • References

  • 1 Altman DG, Lausen B, Sauerbrei W, Schumacher M. The dangers of using ‘optimal’ cutpoints in the evaluation of prognostic factors. Journal of the National Cancer Institute 1994; 86: 829-35.
  • 2 Holländer N, Sauerbrei W, Schumacher M. Confidence intervals for the effect of a prognostic factor after selection of an ‘optimal’ cutpoint. Statistics in Medicine 2004; 23: 1701-13.
  • 3 Lausen B, Hothorn T, Bretz F, Schumacher M. Assessment of optimal selected prognostic factors. Biometrical Journal 2004; 46: 364-74.
  • 4 de Boer C. A Practical Guide to Splines. Revised edition. New York: Springer; 2001
  • 5 Hastie TJ, Tibshirani RJ. Generalized Additive Models. New York: Chapman and Hall; 1990
  • 6 Rosenberg PS, Katki H, Swanson CA, Brown LM, Wacholder S, Hoover RN. Quantifying epidemiologic risk factors using nonparametric regression: model selection remains the greatest challenge. Statistics in Medicine 2003; 22: 3369-81.
  • 7 Oxford University Press, New Shorter Oxford English Dictionary. Oxford: OUP; 1997
  • 8 Royston P, Altman DG. Regression using fractional polynomials of continuous covariates: Parsimonious parametric modelling (with discussion). Applied Statistics 1994; 43: 429-67.
  • 9 Sauerbrei W, Royston P. Building multivariable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials. Journal of the Royal Statistical Society 1999 (Series A) 162: 71-94. Corrigendum: Journal of the Royal Statistical Society 2002 (Series A); 165: 399-400
  • 10 Royston P, Ambler G, Sauerbrei W. The use of fractional polynomials to model continuous risk variables in epidemiology. International Journal of Epidemiology 1999; 28: 964-74.
  • 11 Cleveland WS, Devlin SJ. Locally weighted regression: an approach to regression analysis ny local fitting. Journal of the American Statistical Association 1988; 83: 596-610.
  • 12 Green PJ, Silverman BW. Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. London: Chapman and Hall; 1994
  • 13 Vidakovic B. Statistical Modeling by Wavelets. New York: Wiley Interscience; 1999
  • 14 Fan J, Gijbels I. Local Polynomial Modelling And Its Applications. London: Chapman and Hall; 1996
  • 15 Runge C. Über empirische Funktionen und die Interpolation zwischen äquidistanten Ordinaten [On empirical functions and the interpolation between equidistant ordinates]. Zeitschrift Mathematische Physik 1901; 46: 224-43.
  • 16 Box GEP, Tidwell PW. Transformation of the independent variables. Technometrics 1962; (04) 531-50.
  • 17 Sauerbrei W, Schumacher M. A bootstrap resampling procedure for model building: application to the Cox regression model. Statistics in Medicine 1992; 11: 2093-109.
  • 18 Sauerbrei W. The use of resampling methods to simplify regression models in medical statistics. Applied Statistics 1999; 48: 313-29.
  • 19 Royston P, Sauerbrei W. Stability of multivariable fractional polynomial models with selection of variables and transformations: a bootstrap investigation. Statistics in Medicine 2003; 22: 639-59.
  • 20 Ambler G, Royston P. Fractional polynomial model selection procedures: investigation of Type I error rate. Journal of Statistical Simulation and Computation 2001; 69: 89-108.
  • 21 Marcus R, Peritz E, Gabriel KR. On closed test procedures with special reference to ordered analysis of variance. Biometrika 1976; 76: 655-60.
  • 22 Wyatt J, Altman DG. Prognostic models: clinically useful or quickly forgotten?. British Medical Journal 1995; 311: 1539-41.
  • 23 Miller AJ. Subset Selection in Regression. New York: Chapman and Hall; 1990
  • 24 Holländer N, Schumacher M. Estimating the functional form of a continuous covariate’s effect on survival time. Computational Statistics and Data Analysis. 2005. (in press)
  • 25 Concato J, Peduzzi P, Holford TR, Feinstein AR. Importance of events per independent variable in proportional hazards analysis. I. Background, goals and general strategy. Journal of Clinical Epidemiology 1995; 48: 1495-1501.
  • 26 Peduzzi P, Concato J, Feinstein AR, Holford TR. Importance of events per independent variable in proportional hazards analysis. II. Accuracy and precision of regression estimates. Journal of Clinical Epidemiology 1995; 48: 1503-10.
  • 27 Harrell FE, Lee KL, Califf RM, Pryor DB, Rosati RA. Regression modelling strategies for improved prognostic prediction. Statistics in Medicine 1984; 3: 143-52.
  • 28 Harrell FE, Lee KL, Matchar DB, Reichert TA. Regression models for prognostic prediction: advantages, problems and suggested solutions. Cancer Treatment Reports 1985; 69: 1071-7.
  • 29 Royston P, Sauerbrei W. Improving the robustness of fractional polynomial models by preliminary covariate transformation. Resubmitted to Computational Statistics and Data Analysis 2005
  • 30 Royston P, Sauerbrei W. A new approach to modelling interactions between treatment and continuous covariates in clinical trials by using fractional polynomials. Statistics in Medicine 2004; 23: 2509-25.
  • 31 Berger U, Schäfer J, Ulm K. Dynamic Cox modelling based on fractional polynomials: time-variations in gastric cancer prognosis. Statistics in Medicine 2003; 22: 1160-80.
  • 32 Sauerbrei W, Royston P, Holländer N. Modelling time-varying effects in survival. Biometrical Journal 2004; 46 Supplement 89
  • 33 Faes C, Geys H, Aerts M, Molenberghs G. Use of fractional polynomials for dose-response modelling and quantitative risk assessment in developmental toxicity studies. Statistical Modelling 2003; 3: 109-25.
  • 34 Stata Corp. Stata Reference Manual, version 8. Stata Press 2003
  • 35 R Development Core Team. R: A language and environment for statistical computing. 2004. See http://www.R-project.org
  • 36 Sauerbrei W, Meier-Hirmer C, Benner A, Royston P. Combining selection of variables with determination of functional relationships for continuous predictors in multivariable regression models by using fractional polynomials: description of SAS, Stata and R programs. Computational Statistics and Data Analysis 2004. (in press)