Comparison of Preprocessing Procedures for Oligo-nucleotide Micro-arrays by Parametric Bootstrap Simulation of Spike-in Experiments

J. Freudenberg; H. Boriss; D. Hasenclever

doi:10.1055/s-0038-1633893

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035037.xml

Download PDF

Methods Inf Med 2004; 43(05): 434-438
DOI: 10.1055/s-0038-1633893

Original Article

Schattauer GmbH

Comparison of Preprocessing Procedures for Oligo-nucleotide Micro-arrays by Parametric Bootstrap Simulation of Spike-in Experiments

Authors

J. Freudenberg

¹Interdisciplinary Center of Bioinformatics (IZBI), University of Leipzig, Germany
H. Boriss

¹Interdisciplinary Center of Bioinformatics (IZBI), University of Leipzig, Germany
D. Hasenclever

²Institute of Medical Informatics, Statistics and Epidemiology (IMISE), University of Leipzig, Germany

Further Information

Publication History

Publication Date:
05 February 2018 (online)

Permissions and Reprints

Summary

Objective: Due to scarcity of calibration data for micro-array experiments, simulation methods are employed to assess preprocessing procedures. Here we analyze several procedures’ robustness against increasing numbers of differentially expressed genes and varying proportions of up-regulation.

Methods: Raw probe data from oligo-nucleotide micro-arrays are assumed to be approximately multivariate normally distributed on the log scale. Chips can be simulated from a multivariate normal distribution with mean and variance-covariance matrix estimated from a real raw data set.

A chip effect induces strong positive correlations. In reverse, sampling from a normal distribution with strong correlation variance-covariance matrix generates data exhibiting a chip effect. No explicit model of chip-effect is needed. Differences can be artificially spiked-in according to a given distribution of effect sizes.

Thirty preprocessing procedures combining background correction, normalization, perfect match correction and summarization methods available from the BioConductor project were compared.

Results: In the symmetrical setting “50% differentially expressed genes, 50% of which up-regulated” background correction reduces bias, but inflates low intensity probe variance as well as the mean squared error of the estimates. Any normalization reduces variance and increases sensitivity with no clear winner. Asymmetry between up and down regulation causes bias in the effect-size estimate of non-differentially expressed genes. This markedly inflates the false positive discovery rates. Variance stabilizing normalization (VSN) behaved best.

Conclusion: A simple parametric bootstrap was used to simulate oligo-nucleotide micro-array raw data. Current normalization methods inflate the false positive rate when many genes show an effect in the same direction.

Keywords

Gene expression profiling - oligo-nucleotide array - normalization

References
1 Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, Mittmann M, Wang C, Kobayashi M, Horton H, Brown EL. Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol 1996; 14 (13) 1675-80.

Crossref PubMed Search in Google Scholar
Download RIS citation
2 Ihaka R, Gentleman RR. A Language for Data Analysis and Graphics. Journal of Computational and Graphical Statistics 1996; 5 (03) 299-314. http://www.r-project.org http://www.bioconductor.org

Search in Google Scholar
Download RIS citation
3 Cope LM, Irizarry RA, Jaffeee H, Wu Z, Speed TP. A Benchmark for Affymetrix Gene Chip Expression Measures. Bioinformatics 2004; 20 (03) 323-31.

Crossref PubMed Search in Google Scholar
Download RIS citation
4 Efron B, Tibshirani RJ. An Introduction to the Bootstrap. Chapman & Hall: New York, NY, USA; 1993

Search in Google Scholar
Download RIS citation
5 van der Laan MJ, Bryan J. Gene expression analysis with the parametric bootstrap. Biostatistics 2001; (04) 445-61.

PubMed Search in Google Scholar
Download RIS citation
6 Rocke DM, Durbin B. A Model for Measurement Error for Gene Expression Arrays. Journal of Computational Biology 2001; 8 (06) 557-69.

Crossref PubMed Search in Google Scholar
Download RIS citation
7 Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 2003; 4 (02) 249-64.

Crossref PubMed Search in Google Scholar
Download RIS citation
8 Chudin E, Walker R, Kosaka A, Wu SX, Rabert D, Chang TK, Kreder DE. Assessment of the relationship between signal intensities and transcript concentration for Affymetrix Gene Chip arrays. Genome Biology 2001; 3: 1

Search in Google Scholar
Download RIS citation
9 Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, Loda M, Weber G, Mark EJ, Lander ES, Wong W, Johnson BE, Golub TR, Sugarbaker DJ, Meyerson M. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci USA 2001; 98 (24) 13790-5. Epub 2001; Nov 13

Crossref PubMed Search in Google Scholar
Download RIS citation
10 Eszlinger M, Krohn K, Frenzel R, Kropf S, Tonjes A, Paschke R. Gene expression analysis reveals evidence for inactivation of the TGF-beta signaling cascade in autonomously functioning thyroid nodules. Oncogene 2004; 23 (03) 795-804.

Crossref PubMed Search in Google Scholar
Download RIS citation
11 Affymetrix. Statistical Algorithms Description Document. Affymetrix, Inc., Santa Clara, CA, 2002. http://www.affymetrix.com/support/technical/whitepapers/sadd_whitepaper.pdf

Download RIS citation
12 Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 2003; 19 (02) 185-93.

Crossref PubMed Search in Google Scholar
Download RIS citation
13 Huber W, von Heydebreck A, Sültmann H, Poustka A, Vingron M. Parameter estimation for the calibration and variance stabilization of microarray microarray data. Statistical Applications in Genetics and Molecular Biology 2003; 2: 1

Search in Google Scholar
Download RIS citation
14 Li C, Wong WH. Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biology 2001; 2: 8

Search in Google Scholar
Download RIS citation
15 Bolstad BM. Comparing the effects of background, normalization and summarization on gene expression estimates. http://www.stat.berkeley.edu/users/bolstad/stuff/components.pdf Unpublished manuscript 2002

Search in Google Scholar
Download RIS citation
16 Lemon WJ, Palatini JJT, Krahe R, Wright FA. Theoretical and experimental comparisons of gene expression indexes for oligonucleotide arrays. Bioinformatics 2002; 18 (11) 1470-6.

Crossref PubMed Search in Google Scholar
Download RIS citation

Related Journals

Subscribe to RSS

Share / Bookmark

Comparison of Preprocessing Procedures for Oligo-nucleotide Micro-arrays by Parametric Bootstrap Simulation of Spike-in Experiments

Authors

Publication History

Summary

Keywords

References