Efficiently Identifying Significant Associations in Genome-Wide Association Studies

Kostem, Emrah; Eskin, Eleazar

doi:10.1007/978-3-642-37195-0_10

Emrah Kostem²³ &
Eleazar Eskin²³

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7821))

Included in the following conference series:

Annual International Conference on Research in Computational Molecular Biology

3344 Accesses
1 Citations

Abstract

Over the past several years, genome wide association studies (GWAS) have implicated hundreds of genes in common disease. More recently, the GWAS approach has been utilized to identify regions of the genome which harbor variation affecting gene expression or expression quantitative trait loci (eQTLs). Unlike GWAS applied to clinical traits where only a handful of phenotypes are analyzed per study, in (eQTL) studies, tens of thousands of gene expression levels are measured and the GWAS approach is applied to each gene expression level. This leads to computing billions of statistical tests and requires substantial computational resources, particularly when applying novel statistical methods such as mixed-models. We introduce a novel two-stage testing procedure that identifies all of the significant associations more efficiently than testing all the SNPs. In the first-stage a small number of informative SNPs, or proxies, across the genome are tested. Based on their observed associations, our approach locates the regions which may contain significant SNPs and only tests additional SNPs from those regions. We show through simulations and analysis of real GWAS datasets that the proposed two-stage procedure increases the computational speed by a factor of 10. Additionally, efficient implementation of our software increases the computational speed relative to state of the art testing approaches by a factor of 75.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies

Article Open access 28 November 2016

Genome-wide association studies

Article 26 August 2021

A simple yet efficient method of local false discovery rate estimation designed for genome-wide association data analysis

Article 30 April 2021

References

Baker, M.: Biorepositories: Building better biobanks. Nature 486(7401), 141–146 (2012)
Article Google Scholar
de Bakker, P.I.W., Yelensky, R., Pe’er, I., Gabriel, S.B., Daly, M.J., Altshuler, D.: Efficiency and power in genetic association studies. Nature Genetics 37(11), 1217–1223 (2005)
Article Google Scholar
Bochner, B.R.: Innovations: New technologies to assess genotype-phenotype relationships. Nature Rev. Genet. 4(4), 309–314 (2003)
Article MathSciNet Google Scholar
Brem, R.B., Kruglyak, L.: The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proc. Natl. Acad. Sci. U S A 102(5), 1572–1577 (2005)
Article Google Scholar
Brem, R.B., Yvert, G., Clinton, R., Kruglyak, L.: Genetic dissection of transcriptional regulation in budding yeast. Science 296(5568), 752–755 (2002)
Article Google Scholar
Bystrykh, L., Weersing, E., Dontje, B., Sutton, S., Pletcher, M.T., Wiltshire, T., Su, A.I., Vellenga, E., Wang, J., Manly, K.F., Lu, L., Chesler, E.J., Alberts, R., Jansen, R.C., Williams, R.W., Cooke, M.P., de Haan, G.: Uncovering regulatory pathways that affect hematopoietic stem cell function using ‘genetical genomics’. Nat. Genet. 37(3), 225–232 (2005)
Article Google Scholar
Carlson, C.S., Eberle, M.A., Rieder, M.J., Yi, Q., Kruglyak, L., Nickerson, D.A.: Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. The American Journal of Human Genetics 74(1), 106–120 (2004)
Article Google Scholar
Chesler, E.J., Lu, L., Shou, S., Qu, Y., Gu, J., Wang, J., Hsu, H.C., Mountz, J.D., Baldwin, N.E., Langston, M.A., Threadgill, D.W., Manly, K.F., Williams, R.W.: Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function. Nat. Genet. 37(3), 233–242 (2005)
Article Google Scholar
Cheung, V.G., Spielman, R.S., Ewens, K.G., Weber, T.M., Morley, M., Burdick, J.T.: Mapping determinants of human gene expression by regional and genome-wide association. Nature 437(7063), 1365–1369 (2005)
Article Google Scholar
Cookson, W., Liang, L., Abecasis, G., Moffatt, M., Lathrop, M.: Mapping complex disease traits with global gene expression. Nature Rev. Genet. 10(3), 184–194 (2009)
Article Google Scholar
Cousin, E., Deleuze, J.F., Genin, E.: Selection of SNP subsets for association studies in candidate genes: comparison of the power of different strategies to detect single disease susceptibility locus effects. BMC Genetics 7 (2006)
Google Scholar
Cousin, E., Genin, E., Mace, S., Ricard, S., Chansac, C., del Zompo, M., Deleuze, J.F.: Association studies in candidate genes: strategies to select SNPs to be tested. Human Heredity 56(4), 151–159 (2003)
Article Google Scholar
Devlin, B., Risch, N.: A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics 29(2), 311–322 (1995)
Article Google Scholar
Emilsson, V., Thorleifsson, G., Zhang, B., Leonardson, A.S., Zink, F., Zhu, J., Carlson, S., Helgason, A., Walters, G.B., Gunnarsdottir, S., Mouy, M., Steinthorsdottir, V., Eiriksdottir, G.H., Bjornsdottir, G., Reynisdottir, I., Gudbjartsson, D., Helgadottir, A., Jonasdottir, A., Jonasdottir, A., Styrkarsdottir, U., Gretarsdottir, S., Magnusson, K.P., Stefansson, H., Fossdal, R., Kristjansson, K., Gislason, H.G., Stefansson, T., Leifsson, B.G., Thorsteinsdottir, U., Lamb, J.R., Gulcher, J.R., Reitman, M.L., Kong, I., Schadt, E.E., Stefansson, K.: Genetics of gene expression and its effect on disease. Nature 452(7186), 423–428 (2008)
Article Google Scholar
Halperin, E., Kimmel, G., Shamir, R.: Tag SNP selection in genotype data for maximizing SNP prediction accuracy. Bioinformatics 21(suppl. 1) (2005)
Google Scholar
Han, B., Kang, H.M., Eleazar, E.: Rapid and accurate multiple testing correction and power estimation for millions of correlated markers. PLoS Genet 5(4) (2009)
Google Scholar
Hardy, J., Singleton, A.: Genomewide association studies and human disease. N. Engl. J. Med. 360(17), 1759–1768 (2009)
Article Google Scholar
Hindorff, L.A., Sethupathy, P., Junkins, H.A., Ramos, E.M., Mehta, J.P., Collins, F.S., Manolio, T.A.: Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. PNAS 106(23), 9362–9367 (2009)
Article Google Scholar
International HapMap Consortium: A haplotype map of the human genome. Nature 437(7063), 1299–1320 (2005)
Google Scholar
Kang, H.M., Sul, J.H., Service, S.K., Zaitlen, N.A., Kong, S.Y., Freimer, N.B., Sabatti, C., Eskin, E.: Variance component model to account for sample structure in genome-wide association studies. Nature Genet. 42(4), 348 (2010)
Article Google Scholar
Keurentjes, J.J.B., Fu, J., Terpstra, I.R., Garcia, J.M., van den Ackerveken, G., Snoek, L.B., Peeters, A.J.M., Vreugdenhil, D., Koornneef, M., Jansen, R.C.: Regulatory network construction in arabidopsis by using genome-wide gene expression quantitative trait loci. Proc. Natl. Acad. Sci. U S A 104(5), 1708–1713 (2007)
Article Google Scholar
Kostem, E., Lozano, J.A., Eskin, E.: Increasing power of genome-wide association studies by collecting additional single-nucleotide polymorphisms. Genetics 188(2), 449–460 (2011)
Article Google Scholar
Li, Y., Willer, C.J., Ding, J., Scheet, P., Abecasis, G.: Mach: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol. 34(8), 816–834 (2010)
Article Google Scholar
Lin, Z., Altman, R.B.: Finding haplotype tagging SNPs by use of principal components analysis. The American Journal of Human Genetics 75(5), 850–861 (2004)
Article Google Scholar
Lippert, C., Listgarten, J., Liu, Y., Kadie, C.M., Davidson, R.I., Heckerman, D.: Fast linear mixed models for genome-wide association studies. Nature Methods 8(10), 833 (2011)
Article Google Scholar
Majewski, J., Pastinen, T.: The study of eQTL variations by RNA-seq: from snps to phenotypes. Trends Genet. 27(2), 72–79 (2011)
Article Google Scholar
Pardi, F., Lewis, C.M., Whittaker, J.C.: SNP selection for association studies: Maximizing power across SNP choice and study size. Annals of Human Genetics 69(6), 733–746 (2005)
Article Google Scholar
Pritchard, J.K., Przeworski, M.: Linkage disequilibrium in humans: models and data. Am. J. Hum. Genet. 69(1), 1–14 (2001)
Article Google Scholar
Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A.R., Bender, D., Maller, J., Sklar, P., de Bakker, P.I.W., Daly, M.J., Sham, P.C.: Plink: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81(3), 559–575 (2007)
Article Google Scholar
Qin, Z.S., Gopalakrishnan, S., Abecasis, G.R.: An efficient comprehensive search algorithm for tag SNP selection using linkage disequilibrium criteria. Bioinformatics 22(2), 220–225 (2006)
Article Google Scholar
Risch, N., Merikangas, K.: The future of genetic studies of complex human diseases. Science 273(5281), 1516–1517 (1996)
Article Google Scholar
Rockman, M.V., Kruglyak, L.: Genetics of global gene expression. Nature Rev. Genet. 7(11), 862–872 (2006)
Article Google Scholar
Saccone, S.F., Rice, J.P., Saccone, N.L.: Power-based, phase-informed selection of single nucleotide polymorphisms for disease association screens. Genetic Epidemiology 30(6), 459–470 (2006)
Article Google Scholar
Santana, R., Mendiburu, A., Zaitlen, N., Eskin, E., Lozano, J.A.: Multi-marker tagging single nucleotide polymorphism selection using estimation of distribution algorithms. Artificial Intelligence in Medicine 50(3), 193–201 (2010)
Article Google Scholar
Spielman, R.S., Bastone, L.A., Burdick, J.T., Morley, M., Ewens, W.J., Cheung, V.G.: Common genetic variants account for differences in gene expression among ethnic groups. Nat. Genet. 39(2), 226–231 (2007)
Article Google Scholar
Stram, D.O.: Tag SNP selection for association studies. Genetic Epidemiology 27(4), 365–374 (2004)
Article Google Scholar
Stram, D.O.: Software for tag single nucleotide polymorphism selection. Human Genomics 2(2), 144–151 (2005)
Article Google Scholar
Stranger, B.E., Montgomery, S.B., Dimas, A.S., Parts, L., Stegle, O., Ingle, C.E., Sekowska, M., Smith, G.D., Evans, D., Gutierrez-Arcelus, M., Price, A., Raj, T., Nisbett, J., Nica, A.C., Beazley, C., Durbin, R., Deloukas, P., Dermitzakis, E.T.: Patterns of cis regulatory variation in diverse human populations. PLoS Genet. 8(4), e1002639 (2012)
Google Scholar
Stranger, B.E., Nica, A.C., Forrest, M.S., Dimas, A., Bird, C.P., Beazley, C., Ingle, C.E., Dunning, M., Flicek, P., Koller, D., Montgomery, S., Tavaré, S., Deloukas, P., Dermitzakis, E.T.: Population genomics of human gene expression. Nat. Genet. 39(10), 1217–1224 (2007)
Article Google Scholar
The 1000 Genomes Project Consortium: A map of human genome variation from population-scale sequencing. Nature 467(7319), 1061 (2010)
Google Scholar
The ENCODE Project Consortium: The ENCODE (ENCyclopedia Of DNA Elements) project. Science 306(5696), 636–640 (2004)
Google Scholar
The ENCODE Project Consortium: Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447(7146), 799–816 (2007)
Google Scholar
The ENCODE Project Consortium: A user’s guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol. 9(4), e1001046 (2011)
Google Scholar
The ENCODE Project Consortium: An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414), 57–74 (2012)
Google Scholar
Wang, Z., Gerstein, M., Snyder, M.: RNA-seq: a revolutionary tool for transcriptomics. Nature Rev. Genet. 10(1), 57–63 (2009)
Article Google Scholar
Zhou, X., Stephens, M.: Genome-wide efficient mixed-model analysis for association studies. Nature Genet. 44(7), 821–824 (2012)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, University of California, Los Angeles, California, 90095, USA
Emrah Kostem & Eleazar Eskin

Authors

Emrah Kostem
View author publications
You can also search for this author in PubMed Google Scholar
Eleazar Eskin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Mathematics, Peking University, Beijing, P.R. China
Minghua Deng
Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, 100084, Beijing, P.R. China
Rui Jiang
Molecular and Computational Biology Program, University of Southern California, Los Angeles, California, USA
Fengzhu Sun
Department of Automation, Tsinghua University, P.O. Box, 100084, Beijing, China
Xuegong Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kostem, E., Eskin, E. (2013). Efficiently Identifying Significant Associations in Genome-Wide Association Studies. In: Deng, M., Jiang, R., Sun, F., Zhang, X. (eds) Research in Computational Molecular Biology. RECOMB 2013. Lecture Notes in Computer Science(), vol 7821. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37195-0_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-37195-0_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37194-3
Online ISBN: 978-3-642-37195-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics