Subscribe to RSS
DOI: 10.3414/ME11-02-0049
Cost-effective GPU-Grid for Genome-wide Epistasis Calculations
Publication History
received:
01 December 2011
accepted:
13 September 2012
Publication Date:
20 January 2018 (online)
Summary
Background: Until recently, genotype stud -ies were limited to the investigation of single SNP effects due to the computational burden incurred when studying pairwise interactions of SNPs. However, some genetic effects as simple as coloring (in plants and animals) cannot be ascribed to a single locus but only understood when epistasis is taken into account [1]. It is expected that such effects are also found in complex diseases where many genes contribute to the clinical outcome of affected individuals. Only recently have such problems become feasible computationally.
Objectives: The inherently parallel structure of the problem makes it a perfect candidate for massive parallelization on either grid or cloud architectures. Since we are also dealing with confidential patient data, we were not able to consider a cloud-based solution but had to find a way to process the data in-house and aimed to build a local GPU-based grid structure.
Methods: Sequential epistatsis calculations were ported to GPU using CUDA at various levels. Parallelization on the CPU was compared to corresponding GPU counterparts with regards to performance and cost.
Results: A cost-effective solution was created by combining custom-built nodes equipped with relatively inexpensive consumer-level graphics cards with highly parallel GPUs in a local grid. The GPU method outperforms current cluster-based systems on a price/performance criterion, as a single GPU shows speed performance comparable up to 200 CPU cores.
Conclusion: The outlined approach will work for problems that easily lend themselves to massive parallelization. Code for various tasks has been made available and ongoing development of tools will further ease the transition from sequential to parallel algorithms.
-
References
- 1 Miko I. Epistasis: Gene interaction and phenotype effects. Nature Education 2008; 1: 1
- 2 Affymetrics [Internet]. Available from. http://www.affymetrics.com.
- 3 Illumina [Internet]. Available from. http://www.illumina.com.
- 4 Scott L, Mohlke K, Bonnycastle L, Willer C, Li Y, Duren W. et al A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 2007; 316: 1341-1645.
- 5 The International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature 2007; 449: 851-861. Available from http://hapmap.ncbi.nlm.nih.gov.
- 6 The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 2010; 467: 1061-1073.
- 7 Cordell HJ. Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans. Hum Mol Genet 2002; 11 (20) 2463-2468.
- 8 Schüpbach T, Xenarios I, Bergmann S, Kapur K. FastEpistasis: a high performance computing solution for quantitative trait epistasis. Bioinfor-matics 2010; 26 (11) 1468-1469. Available from http://www.vital-it.ch/software/FastEpistasis.
- 9 Kam-Thong T, Czamara D, Tsuda K, Borgwardt K, Lewis CM, Erhardt-Lehmann A. et al EPI-BLASTER - Fast exhaustive two-locus epistasis detection strategy using graphical processing units. European Journal of Human Genetics. 2010. Available from http://www.mpipsykl.mpg.de/epiblaster.
- 10 Kam-Thong T, Pütz B, Karbalai N, Müller-Myhsok B, Borgwardt K. Epistasis detection on quantitative phenotypes by exhaustive enumeration using GPUs. Bioinformatics 2011; 27 (13) i214-i221. Available from http://www.mpipsykl.mpg.de/epigpuhsic.
- 11 Hu X, Liu Q, Zhang Z, Li Z, Wang S, He L. et al SHEsisEpi, a GPU-enhanced genome-wide SNP-SNP interaction scanning algorithm, efficiently reveals the risk genetic epistasis in bipolar disorder. Cell Res 2010; 20 (07) 854-857.
- 12 Yung LS, Yang C, Wan X, Yu W. GBOOST: a GPU-based tool for detecting gene-gene interactions in genome-wide case control studies. Bioinformatics 2011; 27 (09) 1309-1310.
- 13 Hemani G, Theocharidis A, Wei W, Haley C. EpiGPU: exhaustive pairwise epistasis scans parallelized on consumer level graphics cards. Bioinformatics 2011; 27 (11) 1462-1465.
- 14 gpgpu.org [Internet]. Available from. http://gpgpu.org/papers.
- 15 MathWorks. Parallel Computing Toolkit;. Available from. http://www.mathworks.com/products/parallel-computing/index.html.
- 16 Buckner J, Wilson J, Seligman M, Athey B, Watson S, Meng F. The gputools package enables GPU computing in R. Bioinformatics 2010; 26: 134-135.
- 17 R Development Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria. 2012 ISBN 3-900051-07-0. Available from http://www.R-project.org/.
- 18 CULA [Internet]. Available from. http://www.culatools.com.
- 19 Cuda [Internet] NVidia. Available from. http://www.nvidia.com/cuda.
- 20 Stream [Internet] AMD. Available from. www.amd.com/stream.
- 21 Khronos OpenCL Working Group. The OpenCL Specification. 2011 Available from http://www.khronos.org/registry/cl/specs/opencl-1.1.pdf.
- 22 PGI Accelerator compilers [Internet] Portland Group. Available from. www.pgroup.com/resources/accel.htm.
- 23 CUDA compiler [Internet]. Available from. www.pgroup.com/resources/cuda-x86.htm.
- 24 OpenMP [Internet]. Available from. openmp.org.
- 25 MPI-Forum [Internet]. Available from. www.mpi-forum.org.
- 26 OpenMPI [Internet]. Available from. www.open-mpi.org.
- 27 Open ACCelerators [Internet]. Available from. www.openacc.org.
- 28 Buckner J. gputools;. R package, free for academic use. Available from. http://brainarray.mbni.med.umich.edu/Brainarray/Rgpgpu/.
- 29 Kam-Thong T. et al GLIDE - GPU-based linear regression for detection of epistasis. Hum Hered in review. Available from. http://www.mpipsykl.mpg.de/glide.
- 30 Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D. et al PLINK: a toolset for whole-genome association and population-based linkage analysis. Am J Hum Genet 2007; 41 (03) 559-575. Available from http://pngu.mgh.harvard.edu/purcell/plink/.
- 31 Breiman L. Random Forests. Machine Learning 2001; 45 (01) 5-32.
- 32 Freund Y, Shapire RE. Experiments with a new boosting algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning. 1996: 148-156.
- 33 Cortes C, Vapnik VN. Support-vector networks. Machine Learning 1995; 20: 273-297.