Methods Inf Med 2010; 49(06): 632-640
DOI: 10.3414/ME09-02-0055
Special Topic – Original Articles
Schattauer GmbH

Evaluating Strategies for Marker Ranking in Genome-wide Association Studies of Complex Traits

A. Scherag
1   Institute for Medical Informatics, Biometry and Epidemiology, University Hospital of Essen, University Duisburg-Essen, Essen, Germany
,
J. Hebebrand
2   Department of Child and Adolescent Psychiatry, University of Duisburg-Essen, Essen, Germany
,
H.-E. Wichmann
3   Helmholtz Zentrum München, German Research Center for Environmental Health, Institute of Epidemiology, Neuherberg, Germany
4   Ludwig-Maximilians University Munich, Institute of Medical Data Management, Biometrics and Epidemiology, Chair of Epidemiology, Munich, Germany
,
K.-H. Jöckel
1   Institute for Medical Informatics, Biometry and Epidemiology, University Hospital of Essen, University Duisburg-Essen, Essen, Germany
› Author Affiliations
Further Information

Publication History

received: 09 December 2009

accepted: 24 February 2010

Publication Date:
18 January 2018 (online)

Summary

Background: Genome-wide association studies (GWAS) were highly successful in identifying new susceptibility loci of complex traits. Such studies usually start with genotyping fixed arrays of genetic markers in an initial sample. Out of these markers, some are selected which will be further genotyped in independentsamples. Due tothevery low a priori probability of a true positive association, the vast majority of all marker signals will turn out to be false positive. Thus, several methods to sort marker data have been proposed which will be evaluated here.

Objectives: We compared statistical properties of ranking by p-values, q-values, the False Positive Report Probability (FPRP) and the Bayesian False-Discovery Probability (BFDP).

Methods: We performed simulation studies for a genomic region derived from GWAS data sets and calculated descriptive statistics as well as mean square errors with regard to the true marker ranking. Additionally, we applied all measures to a GWAS for early onset extreme obesity superimposing a priori information on candidate genes.

Results: Despite the known, more extreme probability results for traditional p-values, we observed that both p-values and the BFDP were more precise in reconstructing the “true” order of the markers in a region. In addition, the BFDP was useful to attenuate unexpected effects at a genome-wide scale.

Conclusions: For the purpose of selecting markers from an initial GWAS and within the limits of this study, we recommend either ranking by p-values or the application of a full Bayesian approach for which the BFDP is a first approximation.

 
  • References

  • 1 Altshuler D, Daly MJ, Lander ES. Genetic Mapping in Human Disease. Science 2008; 322 5903 881-888.
  • 2 Hirschhorn JN. Genomewide association studies – illuminating biologic pathways. N Engl J Med 2009; 360 (17) 1699-1701.
  • 3 Frayling TM. Genome-wide association studies provide new insights into type 2 diabetes aetiology. Nat Rev Genet 2007; 8 (Suppl. 09) 657-662.
  • 4 Storey JD. A direct approach to false discovery rates. Journal of the Royal Statistical Society Series B-Statistical Methodology 2002; 64: 479-498.
  • 5 Storey JD. The positive false discovery rate: A Bayesian interpretation and the q-value. Annals of Statistics 2003; 31 (Suppl. 06) 2013-2035.
  • 6 Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences of the United States of America 2003; 100 (16) 9440-9445.
  • 7 Storey JD. The optimal discovery procedure: a new approach to simultaneous significance testing. Journal of the Royal Statistical Society Series B – Statistical Methodology 2007; 69: 347-368.
  • 8 Storey JD, Taylor JE, Siegmund D. Strong Control, Conservative Point Estimation and Simultaneous Conservative Consistency of False Discovery Rates: A Unified Approach. Journal of the Royal Statistical Society Series B – Methodological 2004; 66 (Suppl. 01) 187-205.
  • 9 Wacholder S, Chanock S, Garcia-Closas M, Katki HA, El Ghormli L, Rothman N. Re: Assessing the probability that a positive report is false: An approach for molecular epidemiology studies – Response. Journal of the National Cancer Institute 2004; 96 (22) 1722-1723.
  • 10 Wacholder S, Chanock S, Garcia-Closas M, El Ghormli L, Rothman N. Assessing the probability that a positive report is false: An approach for molecular epidemiology studies. Journal of the National Cancer Institute 2004; 96 (Suppl. 06) 434-442.
  • 11 Wakefield J. A Bayesian measure of the probability of false discovery in genetic epidemiology studies. Am J Hum Genet 2007; 81 (Suppl. 02) 208-227 Erratum in: Am J Hum Genet 2008, 83 ,3:424.
  • 12 Wakefield J. Reporting and interpretation in genome-wide association studies. International Journal of Epidemiology 2008; 37 (Suppl. 03) 641-653.
  • 13 Wakefield J. Bayes factors for genome-wide association studies: comparison with P-values. Genet Epidemiol 2009; 33 (Suppl. 01) 79-86.
  • 14 Benjamini Y, Hochberg Y. Controlling the False Discovery Rate – A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society Series B – Methodological 1995; 57 (Suppl. 01) 289-300.
  • 15 Freidlin B, Zheng G, Li Z, Gastwirth JL. Trend tests for case-control studies of genetic markers: power, sample size and robustness. Hum Hered 2002; 53 (Suppl. 03) 146-152.
  • 16 Slager SL, Schaid DJ. Case-control studies of genetic markers: power and sample size approximations for Armitage’s test for trend. Hum Hered 2001; 52 (Suppl. 03) 149-153.
  • 17 Scherag A, Dina C, Hinney A, Vatin V, Scherag S, Vogel CI, Müller TD, Grallert H, Wichmann HE, Balkau B, Heude B, Jarvelin MR, Hartikainen AL, Levy-Marchal C, Weill J, Delplanque J, Körner A, Kiess W, Kovacs P, Rayner NW, Prokopenko I, McCarthy MI, Schäfer H, Jarick I, Boeing H, Fisher E, Reinehr T, Heinrich J, Rzehak P, Berdel D, Borte M, Biebermann H, Krude H, Rosskopf D, Rimmbach C, Rief W, Fromme T, Klingenspor M, Schür-mann A, Schulz N, Nöthen MM, Mühleisen TW, Erbel R, Jöckel KH, Moebus S, Boes T, Illig T, Froguel P, Hebebrand J, Meyre D. Two new loci for body-weight regulation identified in a joint analysis of genome-wide association studies for early-onset extreme obesity in French and German study groups. PLoS Genet 2010; 6 (Suppl. 04) e1000916.
  • 18 Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007; 81 (Suppl. 03) 559-575.
  • 19 Bland JM, Altman DG. Measuring agreement in method comparison studies. Statistical Methods in Medical Research 1999; 8 (Suppl. 02) 135-160.
  • 20 Hoggart CJ, Clark TG, De IM, Whittaker JC, Balding DJ. Genome-wide significance for dense SNP and resequencing data. Genet Epidemiol 2008; 32 (Suppl. 02) 179-185.
  • 21 Repsilber D, Mansmann U, Brunner E, Ziegler A. Tutorial on microarray gene expression experiments. An introduction. Methods Inf Med 2005; 44 (Suppl. 03) 392-399.
  • 22 Lucke JF. A critique of the false-positive report probability. Genet Epidemiol 2009; 33 (Suppl. 02) 145-150.
  • 23 Stephens M, Balding DJ. Bayesian statistical methods for genetic association studies. Nat Rev Genet 2009; 10 (10) 681-690.
  • 24 Wichmann HE. Genetic epidemiology in Germany – from biobanking to genetic statistics. Methods Inf Med 2005; 44 (Suppl. 04) 584-589.
  • 25 Pahl R, Schäfer H, Müller HH. Optimal multistage designs – a general framework for efficient genome-wide association studies. Biostatistics 2009; 10 (Suppl. 02) 297-309.
  • 26 Scherag A, Hebebrand J, Schäfer H, Müller HH. Flexible designs for genomewide association studies. Biometrics 2009; 65 (Suppl. 03) 815-821.