Methods Inf Med 2003; 42(02): 161-168
DOI: 10.1055/s-0038-1634328
Original article
Schattauer GmbH

Probabilistic Graphical Models for Computational Biomedicine

Y. Moreau
1   Department of Electrical Engineering ESAT-SCD (SISTA), Katholieke Universiteit Leuven, Leuven, Belgium
,
P. Antal
1   Department of Electrical Engineering ESAT-SCD (SISTA), Katholieke Universiteit Leuven, Leuven, Belgium
,
G. Fannes
1   Department of Electrical Engineering ESAT-SCD (SISTA), Katholieke Universiteit Leuven, Leuven, Belgium
,
B. De Moor
1   Department of Electrical Engineering ESAT-SCD (SISTA), Katholieke Universiteit Leuven, Leuven, Belgium
› Author Affiliations
Further Information

Publication History

Publication Date:
08 February 2018 (online)

Summary

Background: As genomics becomes increasingly relevant to medicine, medical informatics and bioinformatics are gradually converging into a larger field that we call computational biomedicine.

Objectives: Developing a computational framework that is common to the different disciplines that compose computational biomedicine will be a major enabler of the further development and integration of this research domain.

Methods: Probabilistic graphical models such as Hidden Markov Models, belief networks, and missing-data models together with computational methods such as dynamic programming, Expectation-Maximization, data-augmentation Gibbs sampling, and the Metropolis-Hastings algorithm provide the tools for an integrated probabilistic approach to computational biomedicine.

Results and Conclusions: We show how graphical models have already found a broad application in different fields composing computational biomedicine. We also indicate several challenges that lie at the interface between medical informatics, statistical genomics, and bioinformatics. We also argue that graphical models offer a unified framework making it possible to integrate in a statistically meaningful way multiple models ranging from the molecular level to cellular and to clinical levels. Because of their versatility and firm statistical underpinning, we assert that probabilistic graphical models can serve as the lingua franca for many computationally intensive approaches to biology and medicine. As such, graphical models should be a foundation of the curriculum of students in these fields. From such a foundation, students could then build towards specific computational methods in medical informatics, medical image analysis, statistical genetics, or bioinformatics while keeping the communication open between these areas.

 
  • References

  • 1 Geman S, Geman D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1984; 6 (Suppl. 06) 721-41.
  • 2 Baldi P, Brunak S. Bioinformatics: the machine learning approach. Cambridge, MA: MIT Press; 1998
  • 3 Antal P, Verrelst H, Timmerman D, Moreau Y, Van Huffel S, De Moor B, Vergote I. Bayesian networks in ovarian cancer diagnosis: potentials and limitations. In: Computer-Based Medical Systems. Los Alamitos, CA: IEEE Computer Society; 2000. p. 103-8.
  • 4 Antal P, Fannes G, De Moor B, Vandewalle J, Moreau Y, Timmerman D. Extended Bayesian regression models: a symbiotic application of belief networks and multilayer perceptrons for the classification of ovarian tumors. In: Quaglini S, Barahona P, Andreassen S. eds. Lecture Notes in Artificial Intelligence 2101. Berlin: Springer; 2001. p. 177-87.
  • 5 Antal P, Fannes G, Timmerman D, De Moor B, Moreau Y. Bayesian Applications of Belief Networks and Multilayer Perceptrons for Ovarian Tumor Classification with Rejection. Artificial Intelligence in Medicine 2002 in press.
  • 6 Antal P, Fannes G, Timmerman D, Moreau Y, De Moor B. Using domain literature and data to annotate and learn Bayesian networks. Accepted for publication in Artificial Intelligence in Medicine.
  • 7 Timmerman D, Valentin L, Bourne TH, Collins WP, Verrelst H, Vergote I. Terms, definitions and measurements to describe the sonographic features of adnexal tumors: a consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group. Ultrasound Obstet Gynecol. 2000; 16 (Suppl. 05) 500-5.
  • 8 Timmerman D, Verrelst H, Bourne TH, De Moor B, Collins WP, Vergote I, Vandewalle J. Artificial neural network models for the preoperative discrimination between malignant and benign adnexal masses. Ultrasound Obstet Gynecol. 1999; 13 (Suppl. 01) 17-25.
  • 9 Mahoney S, Blackmond Laskey K. Network engineering for complex belief networks. Proc Conf on Uncertainty in Artificial Intelligence. 1996; 389-96.
  • 10 van der Gaag L, Renooij S, Witteman C, Aleman B, Taal B. How to elicit many probabilities. Proc Conf on Uncertainty in Artificial Intelligence. 1999; 647-54.
  • 11 Durbin R, Eddy S, Krogh A, Mitchison G. Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge, UK: Cambridge University Press; 1998
  • 12 Bellman R. The theory of dynamic programming. Bull Am Math Soc. 1954; 60: 503-15.
  • 13 Bailey TL, Elkan C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol. 1994; 2: 28-36.
  • 14 Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via EM algorithm. J R Stat Soc Ser B 1977; 39: 1-38.
  • 15 Neal RM. Bayesian Learning for Neural Networks. Lecture Notes in Statistics 118. New York: Springer-Verlag; 1996
  • 16 Tanner MA, Wong WH. The calculation of posterior distributions by data augmentation. J Am Stat Assoc. 1987; 82: 528-50.
  • 17 Metropolis N, Rosenbluth A, Rosenbluth M, Teller A, Teller E. Equation of state calculations by fast computing machines. J Chem Phys. 1953; 21 (Suppl. 06) 1087-92.
  • 18 Kirkpatrick S, Gelatt CD, Vecchi MP. Optimization by simulated annealing. Science. 1983; 220: 671-80.
  • 19 Spiegelhalter DJ, Thomas A, Best NG. Computation on Bayesian graphical models. In: Bayesian Statistics 5. Oxford, UK: Oxford University Press; 1996. p. 407-25.
  • 20 Gilks WR, Richardson S, Spiegelhalter DJ. Markov Chain Monte Carlo Methods in Practice. CRC Press; 1996
  • 21 Cheeseman P, Stutz J. Bayesian Classification (AutoClass): Theory and Results. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R. editors. Advances in Knowledge Discovery and Data Mining. Menlo Park, CA: AAAI Press/MIT Press; 1996. p. 153-180.
  • 22 Heckerman DE, Horvitz EJ, Nathwani BN. Toward normative expert systems: Part I. The Pathfinder project. Methods Inf Med. 1992; 31 (Suppl. 02) p. 90-105.
  • 23 Kahn Jr CE, Roberts LM, Shaffer KA, Haddawy P. Construction of a Bayesian network for mammographic diagnosis of breast cancer. Comput Biol Med. 1997; 27 (Suppl. 01) 19-29.
  • 24 Lucas PJ, Abu-Hanna A. Prognostic methods in medicine. Artif Intell Med. 1999; 15 (Suppl. 02) 105-19.
  • 25 Excoffier L, Slatkin M. Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol 1995; 12 (Suppl. 05) 921-7.
  • 26 Long JC, Williams RC, Urbanek M. An E-M algorithm and testing strategy for multiple-locus haplotypes. Am J Hum Genet 1995; 56 (Suppl. 03) 799-810.
  • 27 Niu T, Qin ZS, Xu X, Liu JS. Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. Am J Hum Genet. 2002; 70 (Suppl. 01) 157-69.
  • 28 Liu JS, Sabatti C, Teng J, Keats BJ, Risch N. Bayesian analysis of haplotypes for linkage disequilibrium mapping. Genome Res. 2001; 11 (Suppl. 10) 1716-24.
  • 29 Felsenstein J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol. 1981; 17 (Suppl. 06) 368-76.
  • 30 Mau B, Newton MA, Larget B. Bayesian phylogenetic inference via Markov chain Monte Carlo methods. Biometrics. 1999; 55 (Suppl. 01) 1-12.
  • 31 Friedman N, Ninio M, Pe’er I, Pupko T. A structural em algorithm for phylogenetic inference. J Comput Biol. 2002; 9 (Suppl. 02) 331-53.
  • 32 Sonnhammer EL, Eddy SR, Durbin R. Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins. 1997; 28 (Suppl. 03) 405-20.
  • 33 Park J, Karplus K, Barrett C, Hughey R, Haussler D, Hubbard T, Chothia C. Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J Mol Biol. 1998; 284 (Suppl. 04) 1201-10.
  • 34 Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J Mol Biol. 1997; 268 (Suppl. 01) 78-94.
  • 35 Lukashin AV, Borodovsky M. GeneMark.hmm : new solutions for gene finding. Nucleic Acids Res. 1998; 26 (Suppl. 04) 1107-15.
  • 36 Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J. et al. Initial sequencing and analysis of the human genome. Nature. 2001; 409 6822 860-921.
  • 37 Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science. 1993; 262 5131 208-14.
  • 38 Neuwald AF, Liu JS, Lawrence CE. Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Sci. 1995; 4 (Suppl. 08) 1618-32.
  • 39 Thijs G, Lescot M, Marchal K, Rombauts S, De Moor B, Rouze P, Moreau Y. A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics. 2001; 17: 1113-1122.
  • 40 Brown PO, Botstein D. Exploring the new world of the genome with DNA microarrays. Nat. Genet 1999; 21: 33-37.
  • 41 Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL. Model-based clustering and data transformations for gene expression data. Bioinformatics. 2001; Oct 17 (Suppl. 10) 977-87.
  • 42 Pe’er D, Regev A, Elidan G, Friedman N. Inferring subnetworks from perturbed expression profiles. Bioinformatics. 2001; 17 Suppl 1: S215-24.
  • 43 Segal E, Taskar B, Gasch A, Friedman N, Koller D. Rich probabilistic models for gene expression. Bioinformatics. 2001; 17 (Suppl. 01) S243-52.
  • 44 Hartemink AJ, Gifford DK, Jaakkola TS, Young RA. Combining location and expression data for principled discovery of genetic regulatory network models. Pac Symp Biocomput. 2002; 437-49.
  • 45 Tanay A, Shamir R. Computational expansion of genetic networks. Bioinformatics. 2001; 17 (Suppl. 01) S270-8.
  • 46 Shrager J, Langley P, Pohorille A. Guiding revision of regulatory models with expression data. Pac Symp Biocomput. 2002; 486-97.
  • 47 Jansen RC, Nap JP. Genetical genomics: the added value from segregation. Trends Genet. 2001; Jul;17 (Suppl. 07) 388-91.