Subscribe to RSS
DOI: 10.1055/s-0038-1634058
A Generic Concept for Large-scale Microarray Analysis Dedicated to Medical Diagnostics
Publication History
Publication Date:
06 February 2018 (online)
Summary
Background: The development of diagnostic procedures based on microarray analysis confronts the bioinformatician and the biomedical researcher with a variety of challenges. Microarrays generate a huge amount of data. There are many, not yet clearly defined, data processing steps and many clinical response variables which may not match gene expression patterns.
Objectives: To design a generic concept for large-scale microarray experiments dedicated to medical diagnostics; to create a system capable of handling several 1000 microarrays per analysis and more than 100 clinical response variables; to design a standardized workflow for quality control, data calibration, identification of differentially expressed genes and estimation of classification accuracy; and to provide a user-friendly interface for clinical researchers with respect to biomedical interpretation.
Methods: We designed a database structure suitable for the storage of microarray data and analysis results. We applied statistical procedures to identify differential genes and developed a technique to estimate classification accuracy of gene patterns with confidence intervals.
Results: We implemented a Gene Analysis Management System (GAMS) based on this concept, using MySQL for data storage, R/Bioconductor for analysis and PHP for a web-based front-end for the exploration of microarray data and analysis results. This system was utilized with large data sets from several medical disciplines, mainly from oncology (~ 2000 micro-arrays).
Conclusions: A systematic approach is necessary for the analysis of microarray experiments in a medical diagnostics setting to get comprehensible results. Due to the complexity of the analysis, data processing (by bioinformaticians) and interactive exploration of results (by biomedical experts) should be separated.
-
References
- 1 Butte A. The use and analysis of microarray data. Nat Rev Drug Discov 2002; 1 (12) 951-60.
- 2 Schulze A, Downward J. Navigating gene expression using microarrays – a technology review. Nat Cell Biol 2001; 3 (08) E190-E195.
- 3 Schoch C. et al. Acute myeloid leukemias with reciprocal rearrangements can be distinguished by specific gene expression profiles. Proc Natl Acad Sci USA 2002; 99: 10008-13.
- 4 Kohlmann A. et al. Molecular characterization of acute leukemias by use of microarray technology. Genes Chromosomes Cancer 2003; 37: 396-405.
- 5 Dugas M. et al. XML-based visualization of design and completeness in medical databases. Med Inform 2001; 26 (04) 237-50.
- 6 Kimball R. The Data Warehousing Toolkit. New York:: John Wiley; 1997
- 7 Inmon WH. Building the Data Warehouse. New York:: John Wiley; 1996
- 8 Storey JD. The positive false discovery rate: a Bayesian interpretation and the q-value. Annals of Statistics 2003; 31 (06) 2013-35.
- 9 Furey TS. et al. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 2000; 16 (10) 906-14.
- 10 Chang C-C, Lin C-J. LIBSVM: a library for support vector machines. 2001 Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
- 11 Ihaka R, Gentleman R. R: a language for data analysis and graphics. J Comput Graphic Stat 1996; 5: 299-314.
- 12 R: A language and environment for statistical computing [http://www.R-project.org]
- 13 Gentleman R, Carey VJ. Visualization and annotation of genomic experiments. In: The analysis of gene expression data: methods and software. Parmigiani G, Garrett ES, Irizarry RA, Zeger SL. New York:: Springer; 2003: 46-72.
- 14 Bioconductor – open source software for bioinformatics [http://www.bioconductor.org]
- 15 The Apache HTTP Server Project [http://httpd.apache.org]
- 16 PHP Hypertext Preprocessor [http://www.php.net]
- 17 MySQL Open Source Database [http://www.mysql.com]
- 18 Huber W. et al. Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 2002; 18 (Suppl. 01) S96-S104.
- 19 Huber W. et al. Parameter estimation for the calibration and variance stabilization of microarray data. Statistical Applications in Genetics and Molecular Biology 2003. 2 01
- 20 Lipshutz RJ. et al. High density synthetic oligonucleotide arrays. Nat Genet 1999; 21 (Suppl. 01) 20-4.
- 21 Irizarry RA. et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 2003; 4: 249-64.
- 22 Affymetrix Microarray Suite User’s Guide version 5.0., Affymetrix, Inc, Santa Clara, CA, 2001
- 23 Mardia KV, Kent JT, Bibby JM. Multivariate Analysis. London:: Academic Press; 1979
- 24 The Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res 2004; 32: D258-D261.
- 25 GeneCards [http://bioinformatics.weizmann.ac.il/cards]
- 26 Wain HM. et al. Guidelines for Human Gene Nomenclature. Genomics 2002; 79 (04) 464-70.
- 27 Saeed AI. et al. TM4: A Free, Open-Source System for Microarray Data Management and Analysis. Bio Techniques 2003; 34: 374-8.
- 28 Saal LH. et al. Bio Array Software Environment (BASE): a platform for comprehensive management and analysis of microarray data. Genome Biol 2002. 3 08 software0003.10003.6.
- 29 Herrero J. et al. GEPAS: a web-based resource for microarray gene expression data analysis. Nucleic Acids Res 2003; 31 (13) 3461-7.
- 30 Knudsen S. et al. Gene Publisher: automated analysis of DNA microarray data. Nucleic Acids Res 2003; 31 (13) 3471-6.
- 31 Luscombe NM. et al. Express Yourself: a modular platform for processing and visualizing microarray data. Nucleic Acids Res 2003; 31 (13) 3477-82.
- 32 Brazma A. et al. Minimum information about a microarray experiment (MIAME) – toward standards for microarray data. Nat Genet 2001; 29 (04) 365-71.