Methods Inf Med 2004; 43(05): 439-444
DOI: 10.1055/s-0038-1633894
Original Article
Schattauer GmbH

Penalized Binary Regression for Gene Expression Profiling

Michae G. Schimek
1   Medical University of Graz, Institute for Medical Informatics, Statistics and Documentation, Graz, Austria
› Author Affiliations
Further Information

Publication History

Publication Date:
05 February 2018 (online)

Summary

Objectives: A typical bioinformatics task in microarray analysis is the classification of biological samples into two alternative categories. A procedure is needed which, based on the expression levels measured, allows us to compute the probability that a new sample belongs to a certain class.

Methods: For the purpose of classification the statistical approach of binary regression is considered. High-dimensionality and at the same time small sample sizes make it a challenging task. Standard logit or probit regression fails because of condition problems and poor predictive performance. The concepts of frequentist and of Bayesian penalization for binary regression are introduced. A Bayesian interpretation of the penalized log-likelihood is given. Finally the role of cross-validation for regularization and feature selection is discussed.

Results: Penalization makes classical binary regression a suitable tool for microarray analysis. We illustrate penalized logit and Bayesian probit regression on a well-known data set and compare the obtained results, also with respect to published results from decision trees.

Conclusions: The frequentist and the Bayesian penalization concept work equally well on the example data, however some method-specific differences can be made out. Moreover the Bayesian approach yields a quantification (posterior probabilities) of the bias due to the constraining assumptions.