The Evolution of Boosting Algorithms

A. Mayr; H. Binder; O. Gefeller; M. Schmid

doi:10.3414/ME13-01-0122

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035037.xml

Share / Bookmark

Facebook Linkedin Weibo

Download PDF

Methods Inf Med 2014; 53(06): 419-427
DOI: 10.3414/ME13-01-0122

Original Articles

Schattauer GmbH

The Evolution of Boosting Algorithms^[*]

From Machine Learning to Statistical Modelling

A. Mayr

¹Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany

,

H. Binder

²Institut für Medizinische Biometrie, Epidemiologie und Informatik, Johannes Gutenberg-Universität Mainz, Germany

,

O. Gefeller

¹Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany

,

M. Schmid

¹Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany

³Institut für Medizinische Biometrie, Informatik und Epidemiologie, Rheinische Friedrich-Wilhelms-Universität Bonn, Germany

› Author Affiliations

Further Information

Publication History

received: 11 November 2013

accepted: 02 May 2014

Publication Date:
20 January 2018 (online)

Abstract
Full Text
References
Supplementary Material

Permissions and Reprints

Summary

Background: The concept of boosting emerged from the field of machine learning. The basic idea is to boost the accuracy of a weak classifying tool by combining various instances into a more accurate prediction. This general concept was later adapted to the field of statistical modelling. Nowadays, boosting algorithms are often applied to estimate and select predictor effects in statistical regression models.

Objectives: This review article attempts to highlight the evolution of boosting algorithms from machine learning to statistical modelling.

Methods: We describe the AdaBoost algorithm for classification as well as the two most prominent statistical boosting approaches, gradient boosting and likelihood-based boosting for statistical modelling. We highlight the methodological background and present the most common software implementations.

Results: Although gradient boosting and likelihood-based boosting are typically treated separately in the literature, they share the same methodological roots and follow the same fundamental concepts. Compared to the initial machine learning algorithms, which must be seen as black-box prediction schemes, they result in statistical models with a straight-forward interpretation.

Conclusions: Statistical boosting algorithms have gained substantial interest during the last decade and offer a variety of options to address important research questions in modern biomedicine.

Keywords

Statistical computing - statistical models - algorithms - classification - machine learning

^* Supplementary material published on our web-site www.methods-online.com

Online Supplementary Material (PDF)

References
1 Freund Y, Schapire R. Experiments With a New Boosting Algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning Theory. San Francisco, CA: Morgan Kaufmann Publishers Inc.; 1996: 148-156.

MissingFormLabel
Search in Google Scholar
2 Friedman JH, Hastie T, Tibshirani R. Additive Logistic Regression: A Statistical View of Boosting (with Discussion). The Annals of Statistics 2000; 28: 337-407.

MissingFormLabel
PubMed Search in Google Scholar
3 Friedman JH. Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics 2001; 29: 1189-1232.

MissingFormLabel
PubMed Search in Google Scholar
4 Schmid M, Gefeller O, Hothorn T. Boosting Into a New Terminological Era. Methods Inf Med 2012; 51 (02) 150-151.

MissingFormLabel
Thieme Connect PubMed Search in Google Scholar
5 Bühlmann P, Hothorn T. Boosting Algorithms: Regularization, Prediction and Model Fitting (with Discussion). Statistical Science 2007; 22: 477-522.

MissingFormLabel
Crossref PubMed Search in Google Scholar
6 Tutz G, Binder H. Generalized Additive Modeling with Implicit Variable Selection by Likelihood-based Boosting. Biometrics 2006; 62: 961-971.

MissingFormLabel
Crossref PubMed Search in Google Scholar
7 Faschingbauer F, Beckmann M, Goecke T, Yazdi B, Siemer J, Schmid M. et al. A New Formula for Optimized Weight Estimation in Extreme Fetal Macrosomia (≥ 4500 g). European Journal of Ultrasound 2012; 33 (05) 480-488.

MissingFormLabel
PubMed Search in Google Scholar
8 Lin K, Futschik A, Li H. A Fast Estimate for the Population Recombination Rate Based on Regression. Genetics 2013; 194 (02) 473-484.

MissingFormLabel
Crossref PubMed Search in Google Scholar
9 Saintigny P, Zhang L, Fan YHH, El-Naggar AK, Papadimitrakopoulou VA, Feng L. et al. Gene Expression Profiling Predicts the Development of Oral Cancer. Cancer Prevention Research 2011; 4 (02) 218-229.

MissingFormLabel
Crossref PubMed Search in Google Scholar
10 Li H, Luan Y. Boosting Proportional Hazards Models Using Smoothing Splines with Applications to High-Dimensional Microarray Data. Bioinformatics 2005; 21 (10) 2403-2409.

MissingFormLabel
Crossref PubMed Search in Google Scholar
11 Binder H, Benner A, Bullinger L, Schumacher M. Tailoring Sparse Multivariable Regression Techniques for Prognostic Single-Nucleotide Polymorphism Signatures. Statistics in Medicine 2013; 32 (10) 1778-1791.

MissingFormLabel
Crossref PubMed Search in Google Scholar
12 Mayr A, Schmid M. Boosting the Concordance Index for Survival Data - A Unified Framework to Derive and Evaluate Biomarker Combinations. PloS ONE 2014; 9 (01) e84483

MissingFormLabel
Crossref PubMed Search in Google Scholar
13 Mayr A, Binder H, Gefeller O, Schmid M. Extending Statistical Boosting - An Overview of Recent Methodological Developments. Methods Inf Med 2014; 53: 428-435.

MissingFormLabel
Thieme Connect PubMed Search in Google Scholar
14 Bishop CM. et al. Pattern Recognition and Machine Learning. Vol. 4. Springer New York: 2006

MissingFormLabel
Search in Google Scholar
15 Kearns MJ, Valiant LG. Cryptographic Limitations on Learning Boolean Formulae and Finite Automata. In: Johnson DS. editor Proceedings of the 21st Annual ACM Symposium on Theory of Computing, May 14-17, 1989. Seattle, Washington, USA: ACM; 1989: 433-444.

MissingFormLabel
Search in Google Scholar
16 Zhou ZH. Ensemble Methods: Foundations and Algorithms. CRC Machine Learning & Pattern Recognition. Chapman & Hall 2012

MissingFormLabel
Search in Google Scholar
17 Schapire RE. The Strength of Weak Learnability. Machine Learning 1990; 5 (02) 197-227.

MissingFormLabel
PubMed Search in Google Scholar
18 Freund Y. Boosting a Weak Learning Algorithm by Majority. In: Fulk MA, Case J. editors Proceedings of the Third Annual Workshop on Computational Learning Theory, COLT 1990, University of Rochester, Rochester, NY, USA, August 6-8. 1990. 1990 202-216.

MissingFormLabel
Search in Google Scholar
19 Schapire RE, Freund Y. Boosting: Foundations and Algorithms. MIT Press 2012

MissingFormLabel
Search in Google Scholar
20 Littlestone N, Warmuth MK. The Weighted Majority Algorithm. In: Foundations of Computer Science, 1989. 30th Annual Symposium on IEEE; 1989: 256-261.

MissingFormLabel
Search in Google Scholar
21 Ridgeway G. The State of Boosting. Computing Science and Statistics 1999; 31: 172-181.

MissingFormLabel
PubMed Search in Google Scholar
22 Meir R, Rätsch G. An Introduction to Boosting and Leveraging. Advanced Lectures on Machine Learning 2003: 118-183.

MissingFormLabel
Search in Google Scholar
23 Bauer E, Kohavi R. An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants. Journal of Machine Learning 1999; 36: 105-139.

MissingFormLabel
Crossref PubMed Search in Google Scholar
24 Breiman L. Bagging Predictors. Machine Learning 1996; 24: 123-140.

MissingFormLabel
PubMed Search in Google Scholar
25 Breiman L. Arcing Classifiers (with Discussion). The Annals of Statistics 1998; 26: 801-849.

MissingFormLabel
Crossref PubMed Search in Google Scholar
26 Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed.. New York: Springer; 2009

MissingFormLabel
Search in Google Scholar
27 Dietterich T. Overfitting and Undercomputing in Machine Learning. ACM Computing Surveys (CSUR) 1995; 27 (03) 326-327.

MissingFormLabel
Crossref PubMed Search in Google Scholar
28 Grove AJ, Schuurmans D. Boosting in the Limit: Maximizing the Margin of Learned Ensembles. In: Proceeding of the AAAI-98. John Wiley & Sons Ltd; 1998: 692-699.

MissingFormLabel
Search in Google Scholar
29 Rätsch G, Onoda T, Müller KR. Soft Margins for AdaBoost. Machine Learning 2001; 42 (03) 287-320.

MissingFormLabel
Crossref PubMed Search in Google Scholar
30 Blumer A, Ehrenfeucht A, Haussler D, Warmuth MK. Occam’s Razor. Information Processing Letters 1987; 24: 377-380.

MissingFormLabel
Crossref PubMed Search in Google Scholar
31 Schapire RE, Freund Y, Bartlett P, Lee WS. Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods. The Annals of Statistics 1998; 26 (05) 1651-1686.

MissingFormLabel
Crossref PubMed Search in Google Scholar
32 Reyzin L, Schapire RE. How Boosting the Margin can also Boost Classifier Complexity. In: Proceeding of the 23rd International Conference on Machine Learning. 2006: 753-760.

MissingFormLabel
Search in Google Scholar
33 Breiman L. Prediction Games and Arcing Algorithms. Neural Computation 1999; 11: 1493-1517.

MissingFormLabel
Crossref PubMed Search in Google Scholar
34 Mease D, Wyner A. Evidence Contrary to the Statistical View of Boosting. The Journal of Machine Learning Research 2008; 9: 131-156.

MissingFormLabel
PubMed Search in Google Scholar
35 Bühlmann P, Hothorn T. Rejoinder: Boosting Algorithms: Regularization, Prediction and Model Fitting. Statistical Science 2007; 22: 516-522.

MissingFormLabel
Crossref PubMed Search in Google Scholar
36 Bühlmann P, Yu B. Response to Mease and Wyner, Evidence Contrary to the Statistical View of Boosting. Journal of Machine Learning Research 2008; 9: 187-194.

MissingFormLabel
PubMed Search in Google Scholar
37 Hastie T, Tibshirani R. Generalized Additive Models. London: Chapman & Hall; 1990

MissingFormLabel
Search in Google Scholar
38 Bühlmann P, Yu B. Boosting with the L2 Loss: Regression and Classification. Journal of the American Statistical Association 2003; 98: 324-338.

MissingFormLabel
Crossref PubMed Search in Google Scholar
39 Bühlmann P. Boosting for High-Dimensional Linear Models. The Annals of Statistics 2006; 34: 559-583.

MissingFormLabel
Crossref PubMed Search in Google Scholar
40 Schmid M, Hothorn T. Boosting Additive Models Using Component-Wise P-splines. Computational Statistics & Data Analysis 2008; 53: 298-311.

MissingFormLabel
Crossref PubMed Search in Google Scholar
41 Hofner B, Mayr A, Robinzonov N, Schmid M. Model-Based Boosting in R: A Hands-on Tutorial Using the R Package mboost. Computational Statistics 2014; 29: 3-35.

MissingFormLabel
Crossref PubMed Search in Google Scholar
42 Tibshirani R. Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society - Series B 1996; 58 (01) 267-288.

MissingFormLabel
PubMed Search in Google Scholar
43 Ma S, Huang J. Regularized ROC Method for Disease Classification and Biomarker Selection with Microarray Data. Bioinformatics 2005; 21 (24) 4356-4362.

MissingFormLabel
Crossref PubMed Search in Google Scholar
44 Efron B, Hastie T, Johnstone L, Tibshirani R. Least Angle Regression. Annals of Statistics 2004; 32: 407-499.

MissingFormLabel
Crossref PubMed Search in Google Scholar
45 Tutz G, Binder H. Boosting Ridge Regression. Computational Statistics & Data Analysis 2007; 51 (12) 6044-6059.

MissingFormLabel
Crossref PubMed Search in Google Scholar
46 Groll A, Tutz G. Regularization for Generalized Additive Mixed Models by Likelihood-based Boosting. Methods Inf Med 2012; 51 (02) 168-177.

MissingFormLabel
Thieme Connect PubMed Search in Google Scholar
47 Binder H, Schumacher M. Allowing for Mandatory Covariates in Boosting Estimation of Sparse High-Dimensional Survival Models. BMC Bioinformatics 2008. 9 (14)

MissingFormLabel
Search in Google Scholar
48 Mayr A, Hofner B, Schmid M. The Importance of Knowing When to Stop - A Sequential Stopping Rule for Component-Wise Gradient Boosting. Methods Inf Med 2012; 51 (02) 178-186.

MissingFormLabel
Thieme Connect PubMed Search in Google Scholar
49 Hansen MH, Yu B. Model Selection and the Principle of Minimum Description Length. Journal of the American Statistical Association 2001; 96 (454) 746-774.

MissingFormLabel
Crossref PubMed Search in Google Scholar
50 Hastie T. Comment: Boosting Algorithms: Regularization, Prediction and Model Fitting. Statistical Science 2007; 22 (04) 513-515.

MissingFormLabel
Crossref PubMed Search in Google Scholar
51 R Development Core Team R: A Language and Environment for Statistical Computing. Vienna, Austria 2014. ISBN 3-900051 07-0 Available from: http://www.R-project.org

MissingFormLabel
PubMed
52 Hothorn T, Bühlmann P, Kneib T, Schmid M, Hofner B. mboost: Model-Based Boosting; 2013. R package version 2. 2-3. Available from: http://CRAN.R-project.org/package=mboost

MissingFormLabel
PubMed
53 Ridgeway G. gbm: Generalized Boosted Regression Models; 2012. R package version 1. 6-3.2. Available from: http://CRAN.R-project.org/package=gbm

MissingFormLabel
PubMed
54 Binder H. GAMBoost: Generalized Linear and Additive Models by Likelihood Based Boosting.; 2011. R package version 1.2-2. Available from: http://CRAN.R-project.org/package=GAMBoost

MissingFormLabel
PubMed
55 Binder H. CoxBoost: Cox Models by Likelihood-based Boosting for a Single Survival Endpoint or Competing Risks; 2013. R package version 1.4. Available from: http://CRAN.R-project.org/package=CoxBoost

MissingFormLabel
PubMed
56 Bühlmann P, Yu B. Sparse Boosting. Journal of Machine Learning Research 2007; 7: 1001-1024.

MissingFormLabel
PubMed Search in Google Scholar
57 Fenske N, Kneib T, Hothorn T. Identifying Risk Factors for Severe Childhood Malnutrition by Boosting Additive Quantile Regression. Journal of the American Statistical Association 2011; 106 (494) 494-510.

MissingFormLabel
Crossref PubMed Search in Google Scholar

Supplementary Material

Online Supplementary Material (PDF)

Subscribe to RSS

Share / Bookmark

The Evolution of Boosting Algorithms[*]

Publication History

Summary

Keywords

References

The Evolution of Boosting Algorithms^[*]