Methods Inf Med 2004; 43(05): 434-438
DOI: 10.1055/s-0038-1633893
Original Article
Schattauer GmbH

Comparison of Preprocessing Procedures for Oligo-nucleotide Micro-arrays by Parametric Bootstrap Simulation of Spike-in Experiments

J. Freudenberg
1   Interdisciplinary Center of Bioinformatics (IZBI), University of Leipzig, Germany
,
H. Boriss
1   Interdisciplinary Center of Bioinformatics (IZBI), University of Leipzig, Germany
,
D. Hasenclever
2   Institute of Medical Informatics, Statistics and Epidemiology (IMISE), University of Leipzig, Germany
› Author Affiliations
Further Information

Publication History

Publication Date:
05 February 2018 (online)

Zoom Image

Summary

Objective: Due to scarcity of calibration data for micro-array experiments, simulation methods are employed to assess preprocessing procedures. Here we analyze several procedures’ robustness against increasing numbers of differentially expressed genes and varying proportions of up-regulation.

Methods: Raw probe data from oligo-nucleotide micro-arrays are assumed to be approximately multivariate normally distributed on the log scale. Chips can be simulated from a multivariate normal distribution with mean and variance-covariance matrix estimated from a real raw data set.

A chip effect induces strong positive correlations. In reverse, sampling from a normal distribution with strong correlation variance-covariance matrix generates data exhibiting a chip effect. No explicit model of chip-effect is needed. Differences can be artificially spiked-in according to a given distribution of effect sizes.

Thirty preprocessing procedures combining background correction, normalization, perfect match correction and summarization methods available from the BioConductor project were compared.

Results: In the symmetrical setting “50% differentially expressed genes, 50% of which up-regulated” background correction reduces bias, but inflates low intensity probe variance as well as the mean squared error of the estimates. Any normalization reduces variance and increases sensitivity with no clear winner. Asymmetry between up and down regulation causes bias in the effect-size estimate of non-differentially expressed genes. This markedly inflates the false positive discovery rates. Variance stabilizing normalization (VSN) behaved best.

Conclusion: A simple parametric bootstrap was used to simulate oligo-nucleotide micro-array raw data. Current normalization methods inflate the false positive rate when many genes show an effect in the same direction.