Subscribe to RSS
DOI: 10.3414/ME16-01-0033
Approaches to Regularized Regression – A Comparison between Gradient Boosting and the Lasso[*]
FundingsThe work on this article was supported by the German Research Foundation (DFG), grant SCHM 2966/1–2 and the Interdisciplinary Center for Clinical Research (IZKF) of the Friedrich-Alexander-University Erlangen-Nürnberg (Project J49).Publication History
Received
11 March 2016
Accepted in revised form:
21 June 2016
Publication Date:
08 January 2018 (online)
Summary
Background: Penalization and regularization techniques for statistical modeling have attracted increasing attention in biomedical research due to their advantages in the presence of high-dimensional data. A special focus lies on algorithms that incorporate automatic variable selection like the least absolute shrinkage operator (lasso) or statistical boosting techniques. Objectives: Focusing on the linear regression framework, this article compares the two most-common techniques for this task, the lasso and gradient boosting, both from a methodological and a practical perspective. Methods: We describe these methods highlighting under which circumstances their results will coincide in low-dimensional settings. In addition, we carry out extensive simulation studies comparing the performance in settings with more predictors than observations and investigate multiple combinations of noise-to-signal ratio and number of true non-zero coeffcients. Finally, we examine the impact of different tuning methods on the results. Results: Both methods carry out penalization and variable selection for possibly highdimensional data, often resulting in very similar models. An advantage of the lasso is its faster run-time, a strength of the boosting concept is its modular nature, making it easy to extend to other regression settings. Conclusions: Although following different strategies with respect to optimization and regularization, both methods imply similar constraints to the estimation problem leading to a comparable performance regarding prediction accuracy and variable selection in practice.
Keywords
Penalization - lasso - regularization - boosting - variable selection - high-dimensional data* Supplementary material published on our web-site http://dx.doi.org/10.3414/me16-01-0033
-
References
- 1 Saeys Y, Inza In, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics 2007; 23 (Suppl. 19) 2507-2517.
- 2 Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society (Series B) 1996; 58: 267-288.
- 3 Bühlmann P, Hothorn T. Boosting algorithms: regularization, prediction and model fitting. Stat Sci 2007; 22 (Suppl. 04) 477-505.
- 4 Mayr A, Binder H, Gefeller O, Schmid M. The evolution of boosting algorithms. From machine learning to statistical modelling. Meth Inf Med 2014; 53 (Suppl. 06) 419-427.
- 5 Friedman J, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Statist 2000; 28 (Suppl. 02) 337-407.
- 6 Ridgeway G. The state of boosting. Computing Science and Statistics 1999; 31: 172-181.
- 7 Hothorn T. Boosting – an unusual yet attractive optimiser. Meth Inf Med 2014; 53 (Suppl. 06) 417-418.
- 8 Hastie T, Tibshirani R, Friedman J. The elements of statistical learning. Springer Series in Statistics. New York (NY): Springer; 2001
- 9 Bühlmann P, Gertheiss J, Hieke S, Kneib T, Ma S, Schumacher M. et al. Discussion of The evolution of boosting algorithms and Extending statistical boosting. Meth Inf Med 2014; 53 (Suppl. 06) 436-445.
- 10 Hoerl AE, Kennard RW. Ridge regression: applications to nonorthogonal problems. Technometrics 1970; 12: 69-82.
- 11 Breiman L. Better subset regression using the nonnegative garrote. Technometrics 1995; 37 (Suppl. 04) 373-384.
- 12 Knight K, Fu W. Asymptotics for lasso-type estimators. Ann Statist 2000; 28 (Suppl. 05) 1356-1378.
- 13 Greenshtein E, Ritov Y. Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 2004; 10 (Suppl. 06) 971-988.
- 14 Bunea F, Tsybakov A, Wegkamp M. Sparsity oracle inequalities for the Lasso. Electron J Stat 2007; 1: 169-194.
- 15 van de Geer SA. High-dimensional generalized linear models and the lasso. Ann Statist 2008; 36 (Suppl. 02) 614-645.
- 16 Meinshausen N, Bühlmann P. High-dimensional graphs and variable selection with the Lasso. Ann Statist 2006; 34 (Suppl. 03) 1436-1462.
- 17 Zhao P, Yu B. On Model Selection Consistency of Lasso. J Mach Learn Res 2006; 7: 2541-2563.
- 18 Zou H. The Adaptive Lasso and Its Oracle Properties. J Am Stat Assoc 2006; 101 (Suppl. 476) 1418-1429.
- 19 Wainwright MJ. Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery UsingConstrained Quadratic Programming (Lasso). IEEE Trans Inf Theory 2009; 55 (Suppl. 05) 2183-2202.
- 20 Zhang CH, Huang J. The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann Statist 2008; 36 (Suppl. 04) 1567-1594.
- 21 Meinshausen N, Yu B. Lasso-type recovery of sparse representations for high-dimensional data. Ann Statist 2009; 37 (Suppl. 01) 246-270.
- 22 Zhang T, Yu B. Boosting with early stopping: Convergence and consistency. Ann Statist 2005; 33 (Suppl. 04) 1538-1579.
- 23 Bühlmann P. Boosting for high-dimensional linear models. Ann Statist 2006; 34 (Suppl. 02) 559-583.
- 24 Mayr A, Hofner B, Schmid M. The importance of knowing when to stop – a sequential stopping rule for component-wise gradient boosting. Meth Inf Med 2012; 51 (Suppl. 02) 178-186.
- 25 Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. Ann Statist 2004; 32: 407-499.
- 26 Meinshausen N, Rocha G, Yu B. Discussion: a tale of three cousins: Lasso, l2boosting and Dantzig. Ann Statist 2007; 35 (Suppl. 06) 2373-2384.
- 27 Duan J, Soussen C, Brie D, Idier J, Wang YP. On lars/homotopy equivalence conditions for over-determined lasso. IEEE Signal Process Lett 2012; 19 (Suppl. 12) 894-897.
- 28 Hastie T, Taylor J, Tibshirani R, Walther G. Forward stagewise regression and the monotone lasso. Electro J Stat 2007; 1: 1-29.
- 29 Binder H, Schumacher M. Adapting Prediction Error Estimates for Biased Complexity Selection in High-Dimensional Bootstrap Samples. Stat Appl Genet Mol Biol 2008; 7 (Suppl. 01) 1-28.
- 30 R Core Team.. R: A Language and Environment for Statistical Computing. Vienna, Austria: 2014 Available from: http://www.R-project.org.
- 31 Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw 2010; 33 (Suppl. 01) 1-22.
- 32 Hothorn T, Buehlmann P, Kneib T, Schmid M, Hofner B. mboost: Model-Based Boosting. 2015 R package version R package version 2.5–0. Available from: http://CRAN.R-project.org/pack-age=mboost.
- 33 Scheipl F, Kneib T, Fahrmeir L. Penalized likelihood and Bayesian function selection in regression models. Advances in Statistical Analysis 2013; 97 (Suppl. 04) 349-385.
- 34 Wang Z, Wang C. Buckley-James Boosting for Survival Analysis with High-Dimensional Biomarker Data. Stat Appl Genet Mol Biol 2010; 9 (Suppl. 01) 1-33.
- 35 Friedman J. Greedy function approximation: a gradient boosting machine. Ann Statist 2001; 29 (Suppl. 05) 1189-1232.
- 36 Bühlmann P, Yu B. Sparse Boosting. J Mach Learn Res 2006; 7: 1001-1024.
- 37 Seibold H, Bernau C, Boulesteix AL, Bin RD. On the choice and influence of the number of boosting steps. 2016 Available from: http://nbn-resolving.de/urn/resolver.pl?urn=nbn:de:bvb:19-epub26724–1.
- 38 Harris N, Sepehri A. The Accessible Lasso Models. 2015 Available from: http://arxiv.org/abs/1501.02559.
- 39 Fenske N, Kneib T, Hothorn T. Identifying risk factors for severe childhood malnutrition by boosting additive quantile regression. J Am Stat Assoc 2011; 106 (Suppl. 494) 494-510.
- 40 Ma S, Huang J. Regularized ROC method for disease classification and biomarker selection with microarray data. Bioinformatics 2005; 21 (Suppl. 24) 4356-4362.
- 41 Schmid M, Hothorn T. Boosting additive models using component-wise P-splines. Comput Stat Data Anal 2008; 53: 298-311.
- 42 Sobotka F, Kneib T. Geoadditive expectile regression. Comput Stat Data Anal 2012; 56: 755-767.
- 43 Hofner B, Kneib T, Hothorn T. A unified framework of constrained regression. Stat Comput 2014; 26 (Suppl. 01) 1-14.
- 44 Kneib T, Hothorn T, Tutz G. Variable Selection and Model Choice in Geoadditive Regression Models. Biometrics 2009; 65 (Suppl. 02) 626-634.
- 45 Tutz G, Binder H. Generalized Additive Modeling with Implicit Variable Selection by Likelihood-based Boosting. Biometrics 2006; 62: 961-971.
- 46 Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society (Series B) 2006; 68 (Suppl. 01) 49-67.
- 47 Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society (Series B) 2005; 67 (Suppl. 02) 301-320.
- 48 Candes E, Tao T. The Dantzig selector: Statistical estimation when p is much larger than n. Ann Statist 2007; 35 (Suppl. 06) 2313-2351.
- 49 Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 2008; 9 (Suppl. 03) 432-441.
- 50 Gertheiss J, Hogger S, Oberhauser C, Tutz G. Selection of Ordinally Scaled Independent Variables with Applications to International Classification of Functioning Core Sets. Applied Statistics 2010; 60 (Suppl. 03) 377-395.
- 51 Tutz G, Gertheiss J. Feature Extraction in Signal Regression: A Boosting Technique for Functional Data Regression. J Comput Graph Stat 2010; 19: 154-174.
- 52 Wang Z. HingeBoost: ROC-Based Boost for Classification and Variable Selection. The International Journal of Biostatistics 2011; 7 (Suppl. 01) 1-30.
- 53 Mayr A, Binder H, Gefeller O, Schmid M. Extending statistical boosting: an overview of recent methodological developments. Meth Inf Med 2014; 53 (Suppl. 06) 428-435.