Summary
Background:
The development of diagnostic procedures based on microarray analysis confronts the
bioinformatician and the biomedical researcher with a variety of challenges. Microarrays
generate a huge amount of data. There are many, not yet clearly defined, data processing
steps and many clinical response variables which may not match gene expression patterns.
Objectives:
To design a generic concept for large-scale microarray experiments dedicated to medical
diagnostics; to create a system capable of handling several 1000 microarrays per analysis
and more than 100 clinical response variables; to design a standardized workflow for
quality control, data calibration, identification of differentially expressed genes
and estimation of classification accuracy; and to provide a user-friendly interface
for clinical researchers with respect to biomedical interpretation.
Methods:
We designed a database structure suitable for the storage of microarray data and
analysis results. We applied statistical procedures to identify differential genes
and developed a technique to estimate classification accuracy of gene patterns with
confidence intervals.
Results:
We implemented a Gene Analysis Management System (GAMS) based on this concept, using
MySQL for data storage, R/Bioconductor for analysis and PHP for a web-based front-end
for the exploration of microarray data and analysis results. This system was utilized
with large data sets from several medical disciplines, mainly from oncology (~ 2000
micro-arrays).
Conclusions:
A systematic approach is necessary for the analysis of microarray experiments in
a medical diagnostics setting to get comprehensible results. Due to the complexity
of the analysis, data processing (by bioinformaticians) and interactive exploration
of results (by biomedical experts) should be separated.
Keywords
Microarray - Gene Analysis Management System (GAMS) - bioinformatics