Methods Inf Med 2006; 45(02): 146-152
DOI: 10.1055/s-0038-1634058
Original Article
Schattauer GmbH

A Generic Concept for Large-scale Microarray Analysis Dedicated to Medical Diagnostics

M. Dugas
1   Department of Medical Informatics and Biomathematics, University of Münster, Münster, Germany
,
F. Weninger
1   Department of Medical Informatics and Biomathematics, University of Münster, Münster, Germany
,
S. Merk
1   Department of Medical Informatics and Biomathematics, University of Münster, Münster, Germany
,
A. Kohlmann
2   Roche Molecular Systems Inc., Pleasanton, CA, USA
,
T. Haferlach
3   Munich Leukemia Laboratory, Munich, Germany
› Author Affiliations
Further Information

Publication History

Publication Date:
06 February 2018 (online)

Summary

Background: The development of diagnostic procedures based on microarray analysis confronts the bioinformatician and the biomedical researcher with a variety of challenges. Microarrays generate a huge amount of data. There are many, not yet clearly defined, data processing steps and many clinical response variables which may not match gene expression patterns.

Objectives: To design a generic concept for large-scale microarray experiments dedicated to medical diagnostics; to create a system capable of handling several 1000 microarrays per analysis and more than 100 clinical response variables; to design a standardized workflow for quality control, data calibration, identification of differentially expressed genes and estimation of classification accuracy; and to provide a user-friendly interface for clinical researchers with respect to biomedical interpretation.

Methods: We designed a database structure suitable for the storage of microarray data and analysis results. We applied statistical procedures to identify differential genes and developed a technique to estimate classification accuracy of gene patterns with confidence intervals.

Results: We implemented a Gene Analysis Management System (GAMS) based on this concept, using MySQL for data storage, R/Bioconductor for analysis and PHP for a web-based front-end for the exploration of microarray data and analysis results. This system was utilized with large data sets from several medical disciplines, mainly from oncology (~ 2000 micro-arrays).

Conclusions: A systematic approach is necessary for the analysis of microarray experiments in a medical diagnostics setting to get comprehensible results. Due to the complexity of the analysis, data processing (by bioinformaticians) and interactive exploration of results (by biomedical experts) should be separated.

 
  • References

  • 1 Butte A. The use and analysis of microarray data. Nat Rev Drug Discov 2002; 1 (12) 951-60.
  • 2 Schulze A, Downward J. Navigating gene expression using microarrays – a technology review. Nat Cell Biol 2001; 3 (08) E190-E195.
  • 3 Schoch C. et al. Acute myeloid leukemias with reciprocal rearrangements can be distinguished by specific gene expression profiles. Proc Natl Acad Sci USA 2002; 99: 10008-13.
  • 4 Kohlmann A. et al. Molecular characterization of acute leukemias by use of microarray technology. Genes Chromosomes Cancer 2003; 37: 396-405.
  • 5 Dugas M. et al. XML-based visualization of design and completeness in medical databases. Med Inform 2001; 26 (04) 237-50.
  • 6 Kimball R. The Data Warehousing Toolkit. New York:: John Wiley; 1997
  • 7 Inmon WH. Building the Data Warehouse. New York:: John Wiley; 1996
  • 8 Storey JD. The positive false discovery rate: a Bayesian interpretation and the q-value. Annals of Statistics 2003; 31 (06) 2013-35.
  • 9 Furey TS. et al. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 2000; 16 (10) 906-14.
  • 10 Chang C-C, Lin C-J. LIBSVM: a library for support vector machines. 2001 Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
  • 11 Ihaka R, Gentleman R. R: a language for data analysis and graphics. J Comput Graphic Stat 1996; 5: 299-314.
  • 12 R: A language and environment for statistical computing [http://www.R-project.org]
  • 13 Gentleman R, Carey VJ. Visualization and annotation of genomic experiments. In: The analysis of gene expression data: methods and software. Parmigiani G, Garrett ES, Irizarry RA, Zeger SL. New York:: Springer; 2003: 46-72.
  • 14 Bioconductor – open source software for bioinformatics [http://www.bioconductor.org]
  • 15 The Apache HTTP Server Project [http://httpd.apache.org]
  • 16 PHP Hypertext Preprocessor [http://www.php.net]
  • 17 MySQL Open Source Database [http://www.mysql.com]
  • 18 Huber W. et al. Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 2002; 18 (Suppl. 01) S96-S104.
  • 19 Huber W. et al. Parameter estimation for the calibration and variance stabilization of microarray data. Statistical Applications in Genetics and Molecular Biology 2003. 2 01
  • 20 Lipshutz RJ. et al. High density synthetic oligonucleotide arrays. Nat Genet 1999; 21 (Suppl. 01) 20-4.
  • 21 Irizarry RA. et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 2003; 4: 249-64.
  • 22 Affymetrix Microarray Suite User’s Guide version 5.0., Affymetrix, Inc, Santa Clara, CA, 2001
  • 23 Mardia KV, Kent JT, Bibby JM. Multivariate Analysis. London:: Academic Press; 1979
  • 24 The Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res 2004; 32: D258-D261.
  • 25 GeneCards [http://bioinformatics.weizmann.ac.il/cards]
  • 26 Wain HM. et al. Guidelines for Human Gene Nomenclature. Genomics 2002; 79 (04) 464-70.
  • 27 Saeed AI. et al. TM4: A Free, Open-Source System for Microarray Data Management and Analysis. Bio Techniques 2003; 34: 374-8.
  • 28 Saal LH. et al. Bio Array Software Environment (BASE): a platform for comprehensive management and analysis of microarray data. Genome Biol 2002. 3 08 software0003.10003.6.
  • 29 Herrero J. et al. GEPAS: a web-based resource for microarray gene expression data analysis. Nucleic Acids Res 2003; 31 (13) 3461-7.
  • 30 Knudsen S. et al. Gene Publisher: automated analysis of DNA microarray data. Nucleic Acids Res 2003; 31 (13) 3471-6.
  • 31 Luscombe NM. et al. Express Yourself: a modular platform for processing and visualizing microarray data. Nucleic Acids Res 2003; 31 (13) 3477-82.
  • 32 Brazma A. et al. Minimum information about a microarray experiment (MIAME) – toward standards for microarray data. Nat Genet 2001; 29 (04) 365-71.