Skip to main content

Efficiently Identifying Significant Associations in Genome-Wide Association Studies

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2013)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7821))

Abstract

Over the past several years, genome wide association studies (GWAS) have implicated hundreds of genes in common disease. More recently, the GWAS approach has been utilized to identify regions of the genome which harbor variation affecting gene expression or expression quantitative trait loci (eQTLs). Unlike GWAS applied to clinical traits where only a handful of phenotypes are analyzed per study, in (eQTL) studies, tens of thousands of gene expression levels are measured and the GWAS approach is applied to each gene expression level. This leads to computing billions of statistical tests and requires substantial computational resources, particularly when applying novel statistical methods such as mixed-models. We introduce a novel two-stage testing procedure that identifies all of the significant associations more efficiently than testing all the SNPs. In the first-stage a small number of informative SNPs, or proxies, across the genome are tested. Based on their observed associations, our approach locates the regions which may contain significant SNPs and only tests additional SNPs from those regions. We show through simulations and analysis of real GWAS datasets that the proposed two-stage procedure increases the computational speed by a factor of 10. Additionally, efficient implementation of our software increases the computational speed relative to state of the art testing approaches by a factor of 75.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Baker, M.: Biorepositories: Building better biobanks. Nature 486(7401), 141–146 (2012)

    Article  Google Scholar 

  2. de Bakker, P.I.W., Yelensky, R., Pe’er, I., Gabriel, S.B., Daly, M.J., Altshuler, D.: Efficiency and power in genetic association studies. Nature Genetics 37(11), 1217–1223 (2005)

    Article  Google Scholar 

  3. Bochner, B.R.: Innovations: New technologies to assess genotype-phenotype relationships. Nature Rev. Genet. 4(4), 309–314 (2003)

    Article  MathSciNet  Google Scholar 

  4. Brem, R.B., Kruglyak, L.: The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proc. Natl. Acad. Sci. U S A 102(5), 1572–1577 (2005)

    Article  Google Scholar 

  5. Brem, R.B., Yvert, G., Clinton, R., Kruglyak, L.: Genetic dissection of transcriptional regulation in budding yeast. Science 296(5568), 752–755 (2002)

    Article  Google Scholar 

  6. Bystrykh, L., Weersing, E., Dontje, B., Sutton, S., Pletcher, M.T., Wiltshire, T., Su, A.I., Vellenga, E., Wang, J., Manly, K.F., Lu, L., Chesler, E.J., Alberts, R., Jansen, R.C., Williams, R.W., Cooke, M.P., de Haan, G.: Uncovering regulatory pathways that affect hematopoietic stem cell function using ‘genetical genomics’. Nat. Genet. 37(3), 225–232 (2005)

    Article  Google Scholar 

  7. Carlson, C.S., Eberle, M.A., Rieder, M.J., Yi, Q., Kruglyak, L., Nickerson, D.A.: Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. The American Journal of Human Genetics 74(1), 106–120 (2004)

    Article  Google Scholar 

  8. Chesler, E.J., Lu, L., Shou, S., Qu, Y., Gu, J., Wang, J., Hsu, H.C., Mountz, J.D., Baldwin, N.E., Langston, M.A., Threadgill, D.W., Manly, K.F., Williams, R.W.: Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function. Nat. Genet. 37(3), 233–242 (2005)

    Article  Google Scholar 

  9. Cheung, V.G., Spielman, R.S., Ewens, K.G., Weber, T.M., Morley, M., Burdick, J.T.: Mapping determinants of human gene expression by regional and genome-wide association. Nature 437(7063), 1365–1369 (2005)

    Article  Google Scholar 

  10. Cookson, W., Liang, L., Abecasis, G., Moffatt, M., Lathrop, M.: Mapping complex disease traits with global gene expression. Nature Rev. Genet. 10(3), 184–194 (2009)

    Article  Google Scholar 

  11. Cousin, E., Deleuze, J.F., Genin, E.: Selection of SNP subsets for association studies in candidate genes: comparison of the power of different strategies to detect single disease susceptibility locus effects. BMC Genetics 7 (2006)

    Google Scholar 

  12. Cousin, E., Genin, E., Mace, S., Ricard, S., Chansac, C., del Zompo, M., Deleuze, J.F.: Association studies in candidate genes: strategies to select SNPs to be tested. Human Heredity 56(4), 151–159 (2003)

    Article  Google Scholar 

  13. Devlin, B., Risch, N.: A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics 29(2), 311–322 (1995)

    Article  Google Scholar 

  14. Emilsson, V., Thorleifsson, G., Zhang, B., Leonardson, A.S., Zink, F., Zhu, J., Carlson, S., Helgason, A., Walters, G.B., Gunnarsdottir, S., Mouy, M., Steinthorsdottir, V., Eiriksdottir, G.H., Bjornsdottir, G., Reynisdottir, I., Gudbjartsson, D., Helgadottir, A., Jonasdottir, A., Jonasdottir, A., Styrkarsdottir, U., Gretarsdottir, S., Magnusson, K.P., Stefansson, H., Fossdal, R., Kristjansson, K., Gislason, H.G., Stefansson, T., Leifsson, B.G., Thorsteinsdottir, U., Lamb, J.R., Gulcher, J.R., Reitman, M.L., Kong, I., Schadt, E.E., Stefansson, K.: Genetics of gene expression and its effect on disease. Nature 452(7186), 423–428 (2008)

    Article  Google Scholar 

  15. Halperin, E., Kimmel, G., Shamir, R.: Tag SNP selection in genotype data for maximizing SNP prediction accuracy. Bioinformatics 21(suppl. 1) (2005)

    Google Scholar 

  16. Han, B., Kang, H.M., Eleazar, E.: Rapid and accurate multiple testing correction and power estimation for millions of correlated markers. PLoS Genet 5(4) (2009)

    Google Scholar 

  17. Hardy, J., Singleton, A.: Genomewide association studies and human disease. N. Engl. J. Med. 360(17), 1759–1768 (2009)

    Article  Google Scholar 

  18. Hindorff, L.A., Sethupathy, P., Junkins, H.A., Ramos, E.M., Mehta, J.P., Collins, F.S., Manolio, T.A.: Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. PNAS 106(23), 9362–9367 (2009)

    Article  Google Scholar 

  19. International HapMap Consortium: A haplotype map of the human genome. Nature 437(7063), 1299–1320 (2005)

    Google Scholar 

  20. Kang, H.M., Sul, J.H., Service, S.K., Zaitlen, N.A., Kong, S.Y., Freimer, N.B., Sabatti, C., Eskin, E.: Variance component model to account for sample structure in genome-wide association studies. Nature Genet. 42(4), 348 (2010)

    Article  Google Scholar 

  21. Keurentjes, J.J.B., Fu, J., Terpstra, I.R., Garcia, J.M., van den Ackerveken, G., Snoek, L.B., Peeters, A.J.M., Vreugdenhil, D., Koornneef, M., Jansen, R.C.: Regulatory network construction in arabidopsis by using genome-wide gene expression quantitative trait loci. Proc. Natl. Acad. Sci. U S A 104(5), 1708–1713 (2007)

    Article  Google Scholar 

  22. Kostem, E., Lozano, J.A., Eskin, E.: Increasing power of genome-wide association studies by collecting additional single-nucleotide polymorphisms. Genetics 188(2), 449–460 (2011)

    Article  Google Scholar 

  23. Li, Y., Willer, C.J., Ding, J., Scheet, P., Abecasis, G.: Mach: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol. 34(8), 816–834 (2010)

    Article  Google Scholar 

  24. Lin, Z., Altman, R.B.: Finding haplotype tagging SNPs by use of principal components analysis. The American Journal of Human Genetics 75(5), 850–861 (2004)

    Article  Google Scholar 

  25. Lippert, C., Listgarten, J., Liu, Y., Kadie, C.M., Davidson, R.I., Heckerman, D.: Fast linear mixed models for genome-wide association studies. Nature Methods 8(10), 833 (2011)

    Article  Google Scholar 

  26. Majewski, J., Pastinen, T.: The study of eQTL variations by RNA-seq: from snps to phenotypes. Trends Genet. 27(2), 72–79 (2011)

    Article  Google Scholar 

  27. Pardi, F., Lewis, C.M., Whittaker, J.C.: SNP selection for association studies: Maximizing power across SNP choice and study size. Annals of Human Genetics 69(6), 733–746 (2005)

    Article  Google Scholar 

  28. Pritchard, J.K., Przeworski, M.: Linkage disequilibrium in humans: models and data. Am. J. Hum. Genet. 69(1), 1–14 (2001)

    Article  Google Scholar 

  29. Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A.R., Bender, D., Maller, J., Sklar, P., de Bakker, P.I.W., Daly, M.J., Sham, P.C.: Plink: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81(3), 559–575 (2007)

    Article  Google Scholar 

  30. Qin, Z.S., Gopalakrishnan, S., Abecasis, G.R.: An efficient comprehensive search algorithm for tag SNP selection using linkage disequilibrium criteria. Bioinformatics 22(2), 220–225 (2006)

    Article  Google Scholar 

  31. Risch, N., Merikangas, K.: The future of genetic studies of complex human diseases. Science 273(5281), 1516–1517 (1996)

    Article  Google Scholar 

  32. Rockman, M.V., Kruglyak, L.: Genetics of global gene expression. Nature Rev. Genet. 7(11), 862–872 (2006)

    Article  Google Scholar 

  33. Saccone, S.F., Rice, J.P., Saccone, N.L.: Power-based, phase-informed selection of single nucleotide polymorphisms for disease association screens. Genetic Epidemiology 30(6), 459–470 (2006)

    Article  Google Scholar 

  34. Santana, R., Mendiburu, A., Zaitlen, N., Eskin, E., Lozano, J.A.: Multi-marker tagging single nucleotide polymorphism selection using estimation of distribution algorithms. Artificial Intelligence in Medicine 50(3), 193–201 (2010)

    Article  Google Scholar 

  35. Spielman, R.S., Bastone, L.A., Burdick, J.T., Morley, M., Ewens, W.J., Cheung, V.G.: Common genetic variants account for differences in gene expression among ethnic groups. Nat. Genet. 39(2), 226–231 (2007)

    Article  Google Scholar 

  36. Stram, D.O.: Tag SNP selection for association studies. Genetic Epidemiology 27(4), 365–374 (2004)

    Article  Google Scholar 

  37. Stram, D.O.: Software for tag single nucleotide polymorphism selection. Human Genomics 2(2), 144–151 (2005)

    Article  Google Scholar 

  38. Stranger, B.E., Montgomery, S.B., Dimas, A.S., Parts, L., Stegle, O., Ingle, C.E., Sekowska, M., Smith, G.D., Evans, D., Gutierrez-Arcelus, M., Price, A., Raj, T., Nisbett, J., Nica, A.C., Beazley, C., Durbin, R., Deloukas, P., Dermitzakis, E.T.: Patterns of cis regulatory variation in diverse human populations. PLoS Genet. 8(4), e1002639 (2012)

    Google Scholar 

  39. Stranger, B.E., Nica, A.C., Forrest, M.S., Dimas, A., Bird, C.P., Beazley, C., Ingle, C.E., Dunning, M., Flicek, P., Koller, D., Montgomery, S., Tavaré, S., Deloukas, P., Dermitzakis, E.T.: Population genomics of human gene expression. Nat. Genet. 39(10), 1217–1224 (2007)

    Article  Google Scholar 

  40. The 1000 Genomes Project Consortium: A map of human genome variation from population-scale sequencing. Nature 467(7319), 1061 (2010)

    Google Scholar 

  41. The ENCODE Project Consortium: The ENCODE (ENCyclopedia Of DNA Elements) project. Science 306(5696), 636–640 (2004)

    Google Scholar 

  42. The ENCODE Project Consortium: Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447(7146), 799–816 (2007)

    Google Scholar 

  43. The ENCODE Project Consortium: A user’s guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol. 9(4), e1001046 (2011)

    Google Scholar 

  44. The ENCODE Project Consortium: An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414), 57–74 (2012)

    Google Scholar 

  45. Wang, Z., Gerstein, M., Snyder, M.: RNA-seq: a revolutionary tool for transcriptomics. Nature Rev. Genet. 10(1), 57–63 (2009)

    Article  Google Scholar 

  46. Zhou, X., Stephens, M.: Genome-wide efficient mixed-model analysis for association studies. Nature Genet. 44(7), 821–824 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kostem, E., Eskin, E. (2013). Efficiently Identifying Significant Associations in Genome-Wide Association Studies. In: Deng, M., Jiang, R., Sun, F., Zhang, X. (eds) Research in Computational Molecular Biology. RECOMB 2013. Lecture Notes in Computer Science(), vol 7821. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37195-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37195-0_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37194-3

  • Online ISBN: 978-3-642-37195-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics