Methods Inf Med 2005; 44(03): 444-448
DOI: 10.1055/s-0038-1633991
Original Article
Schattauer GmbH

Clustering Algorithms and Other Exploratory Methods for Microarray Data Analysis

J. Rahnenführer
1   Max Planck Institute for Informatics, Saarbrücken, Germany
› Author Affiliations
Further Information

Publication History

Publication Date:
06 February 2018 (online)

Summary

Objectives: We introduce methods for the exploratory analysis of microarray data, especially focusing on cluster algorithms. Benefits and problems are discussed.

Methods: We describe application and suitability of unsupervised learning methods for the classification of gene expression data. Cluster algorithms are treated in more detail, including assessment of cluster quality.

Results: When dealing with microarray data, most cluster algorithms must be applied with caution. As long as the structure of the true generating models of such data is not fully understood, the use of simple algorithms seems to be more appropriate than the application of complex black-box algorithms. New methods explicitly targeted to the analysis of micro-array data are increasingly being developed in order to increase the amount of useful information extracted from the experiments.

Conclusions: Unsupervised methods can be a helpful tool for the analysis of microarray data, but a critical choice of the algorithm and a careful interpretation of the results are required in order to avoid false conclusions.

 
  • References

  • 1 Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999; 286: 531-37.
  • 2 Alizadeh AA, Eisen MB. Davis RE and 28 others. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 2000; 403: 503-11.
  • 3 Jain A, Dubes RC. Algorithms for Clustering Data. Englewood Cliffs, New Jersey: Prentice Hall; 1988
  • 4 Azuaje F. Clustering-based approaches to discovering and visualising microarray data patterns. Brief Bioinformatics 2003; 4: 31-42.
  • 5 Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. PNAS 1998; 95: 14863-8.
  • 6 Chiaretti S, Li X, Gentleman R, Vitale A, Vignetti M, Mandelli F, Ritz J, Foa R. Gene expression profile of adult T-cell acute lymphoblastic leukemia identifies distinct subsets of patients with different response to therapy and survival. Blood 2004; 103 (07) 2771-8.
  • 7 Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR. Interpreting patterns of gene expression with self- organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci 1999; 96: 2907-12.
  • 8 Kaufman L, Rousseeuw P. Finding Groups in Data. New York: John Wiley and Sons; 1990
  • 9 Ben-Dor A, Shamir R, Yakhini Z. Clustering gene expression patterns. J Comput Biol 1999; 6: 281-97.
  • 10 Cheng Y, Church GM. Biclustering of expression data. Proc Int Conf Intell Syst Mol Biol 2000; 8: 93-103.
  • 11 Tanay A, Sharan R, Shamir R. Discovering statistically significant biclusters in gene expression data. Bioinformatics 2002; (Suppl. 01) Suppl 136-44.
  • 12 Fellenberg K, Hauser NC, Brors B, Neutzner A, Hoheisel JD, Vingron M. Correspondence analysis applied to microarray data. Proc Natl Acad Sci 2001; 98: 10781-6.
  • 13 Hastie T, Tibshirani R, Eisen MB, Alizadeh A, Levy R, Staudt L, Chan WC, Botstein D, Brown P. ‘Gene shaving’ as a method for identifying distinct sets of genes with similar expression patterns. Genome Biol 2000; 1: 02 RESEARCH0003
  • 14 Yeung KY, Haynor DR, Ruzzo WL. Validating clustering for gene expression data. Bioinformatics 2001; 17: 309-18.
  • 15 Rahnenführer J. Efficient clustering methods for tumor classification with microarrays. In Between Data Science and Applied Data Analysis. Schader MW, Gaul WM, Vichi M. (eds) Springer: Proc 26th Ann Conf GfKl 2002: 670-9.
  • 16 Dudoit S, Fridlyand J. A prediction-based resampling method to estimate the number of clusters in a dataset. Genome Biology. 2002: 3 RESEARCH0036
  • 17 Smolkin M, Ghosh D. Cluster stability scores for microarray data in cancer studies. BMC Bioinformatics 2003; 4: 36