Skip to main content

Efficient and Accurate Multiple-Phenotypes Regression Method for High Dimensional Data Considering Population Structure

  • Conference paper
  • First Online:
Research in Computational Molecular Biology (RECOMB 2015)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9029))

Abstract

A typical GWAS tests correlation between a single phenotype and each genotype one at a time. However, it is often very useful to analyze many phenotypes simultaneously. For example, this may increase the power to detect variants by capturing unmeasured aspects of complex biological networks that a single phenotype might miss. There are several multivariate approaches that try to detect variants related to many phenotypes, but none of them consider population structure and each may result in a significant number of false positive identifications. Here, we introduce a new methodology, referred to as GAMMA, that could both simultaneously analyze many phenotypes as well as correct for population structure. In a simulated study, GAMMA accurately identifies true genetic effects without false positive identifications, while other methods either fail to detect true effects or result in many false positive identifications. We further apply our method to genetic studies of yeast and gut microbiome from mouse and show that GAMMA identifies several variants that are likely to have a true biological mechanism.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Lockhart, D.J., Dong, H., Byrne, M.C., Follettie, M.T., Gallo, M.V., et al.: Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol. 14, 1675–1680 (1996)

    Article  Google Scholar 

  2. Gygi, S.P., Rist, B., Gerber, S.A., Turecek, F., Gelb, M.H., et al.: Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat. Biotechnol. 17, 994–999 (1999)

    Article  Google Scholar 

  3. Cervino, A.C., Li, G., Edwards, S., Zhu, J., Laurie, C., et al.: Integrating qtl and high-density snp analyses in mice to identify insig2 as a susceptibility gene for plasma cholesterol levels. Genomics 86, 505–17 (2005)

    Article  Google Scholar 

  4. Hillebrandt, S., Wasmuth, H.E., Weiskirchen, R., Hellerbrand, C., Keppeler, H., et al.: Complement factor 5 is a quantitative trait gene that modifies liver fibrogenesis in mice and humans. Nat. Genet. 37, 835–843 (2005)

    Article  Google Scholar 

  5. Wang, X., Korstanje, R., Higgins, D., Paigen, B.: Haplotype analysis in multiple crosses to identify a qtl gene. Genome. Res. 14, 1767–1772 (2004)

    Article  Google Scholar 

  6. O’Reilly, P.F., Hoggart, C.J., Pomyen, Y., Calboli, F.C.F., Elliott, P., et al.: Multiphen: joint model of multiple phenotypes can increase discovery in gwas. PLoS One 7, e34861 (2012)

    Article  Google Scholar 

  7. Alter, O., Brown, P.O., Botstein, D.: Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl. Acad. Sci. USA 97, 10101–10106 (2000)

    Article  Google Scholar 

  8. Quackenbush, J.: Computational analysis of microarray data. Nat. Rev. Genet. 2, 418–427 (2001)

    Article  Google Scholar 

  9. Nievergelt, C.M., Libiger, O., Schork, N.J.: Generalized analysis of molecular variance. PLoS Genet. 3, e51 (2007)

    Article  Google Scholar 

  10. Zapala, M.A., Schork, N.J.: Statistical properties of multivariate distance matrix regression for high-dimensional data analysis. Front Genet. 3, 190 (2012)

    Article  Google Scholar 

  11. Wessel, J., Schork, N.J.: Generalized genomic distance-based regression methodology for multilocus association analysis. Am. J. Hum. Genet. 79, 792–806 (2006)

    Article  Google Scholar 

  12. Kittles, R.A., Chen, W., Panguluri, R.K., Ahaghotu, C., Jackson, A., et al.: Cyp3a4-v and prostate cancer in african americans: causal or confounding association because of population stratification? Hum. Genet. 110, 553–560 (2002)

    Article  Google Scholar 

  13. Freedman, M.L., Reich, D., Penney, K.L., McDonald, G.J., Mignault, A.A., et al.: Assessing the impact of population stratification on genetic association studies. Nat. Genet. 36, 388–393 (2004)

    Article  Google Scholar 

  14. Marchini, J., Cardon, L.R., Phillips, M.S., Donnelly, P.: The effects of human population structure on large genetic association studies. Nat. Genet. 36, 512–517 (2004)

    Article  Google Scholar 

  15. Campbell, C.D., Ogburn, E.L., Lunetta, K.L., Lyon, H.N., Freedman, M.L., et al.: Demonstrating stratification in a european american population. Nat. Genet. 37, 868–872 (2005)

    Article  Google Scholar 

  16. Helgason, A., Yngvadttir, B., Hrafnkelsson, B., Gulcher, J., Stefnsson, K.: An icelandic example of the impact of population structure on association studies. Nat. Genet. 37, 90–95 (2005)

    Google Scholar 

  17. Reiner, A.P., Ziv, E., Lind, D.L., Nievergelt, C.M., Schork, N.J., et al.: Population structure, admixture, and aging-related phenotypes in african american adults: the cardiovascular health study. Am. J. Hum. Genet. 76, 463–477 (2005)

    Article  Google Scholar 

  18. Voight, B.F., Pritchard, J.K.: Confounding from cryptic relatedness in case-control association studies. PLoS Genet. 1, e32 (2005)

    Article  Google Scholar 

  19. Berger, M., Stassen, H.H., Khler, K., Krane, V., Mnks, D., et al.: Hidden population substructures in an apparently homogeneous population bias association studies. Eur. J. Hum. Genet. 14, 236–244 (2006)

    Article  Google Scholar 

  20. Seldin, M.F., Shigeta, R., Villoslada, P., Selmi, C., Tuomilehto, J., et al.: European population substructure: clustering of northern and southern populations. PLoS Genet. 2, e143 (2006)

    Article  Google Scholar 

  21. Foll, M., Gaggiotti, O.: Identifying the environmental factors that determine the genetic structure of populations. Genetics 174, 875–91 (2006)

    Article  Google Scholar 

  22. Flint, J., Eskin, E.: Genome-wide association studies in mice. Nat. Rev. Genet. 13, 807–817 (2012)

    Article  Google Scholar 

  23. Zhou, X., Stephens, M.: Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat. Methods 11, 407–409 (2014)

    Article  Google Scholar 

  24. Korte, A., Vilhjlmsson, B.J., Segura, V., Platt, A., Long, Q., et al.: A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nat. Genet. 44, 1066–1071 (2012)

    Article  Google Scholar 

  25. Kang, H.M., Ye, C., Eskin, E.: Accurate discovery of expression quantitative trait loci under confounding from spurious and genuine regulatory hotspots. Genetics 180, 1909–1925 (2008)

    Article  Google Scholar 

  26. Kang, H.M., Sul, J.H., Service, S.K., Zaitlen, N.A., Kong, S.Y.Y., et al.: Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010)

    Article  Google Scholar 

  27. Lippert, C., Listgarten, J., Liu, Y., Kadie, C.M., Davidson, R.I., et al.: Fast linear mixed models for genome-wide association studies. Nat. Methods 8, 833–835 (2011)

    Article  Google Scholar 

  28. Svishcheva, G.R., Axenovich, T.I., Belonogova, N.M., van Duijn, C.M., Aulchenko, Y.S.: Rapid variance components-based method for whole-genome association analysis. Nat. Genet. 44, 1166–1170 (2012)

    Article  Google Scholar 

  29. Zhou, X., Stephens, M.: Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012)

    Article  Google Scholar 

  30. Segura, V., Vilhjlmsson, B.J., Platt, A., Korte, A., Seren, U., et al.: An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat. Genet. 44, 825–830 (2012)

    Article  Google Scholar 

  31. Joo, J.W.J., Sul, J.H., Han, B., Ye, C., Eskin, E.: Effectively identifying regulatory hotspots while capturing expression heterogeneity in gene expression studies. Genome. Biol. 15, r61 (2014)

    Article  Google Scholar 

  32. Bennett, B.J., Farber, C.R., Orozco, L., Kang, H.M., Ghazalpour, A., et al.: A high-resolution association mapping panel for the dissection of complex traits in mice. Genome. Res. 20, 281–290 (2010)

    Article  Google Scholar 

  33. Michaelson, J.J., Loguercio, S., Beyer, A.: Detection and interpretation of expression quantitative trait loci (eqtl). Methods 48, 265–276 (2009)

    Article  Google Scholar 

  34. Foss, E.J., Radulovic, D., Shaffer, S.A., Ruderfer, D.M., Bedalov, A., et al.: Genetic basis of proteome variation in yeast. Nat. Genet. 39, 1369–1375 (2007)

    Article  Google Scholar 

  35. Perlstein, E.O., Ruderfer, D.M., Roberts, D.C., Schreiber, S.L., Kruglyak, L.: Genetic basis of individual differences in the response to small-molecule drugs in yeast. Nat. Genet. 39, 496–502 (2007)

    Article  Google Scholar 

  36. Devlin, B., Roeder, K., Wasserman, L.: Genomic control, a new approach to genetic-based association studies. Theor. Popul. Biol. 60, 155–166 (2001)

    Article  MATH  Google Scholar 

  37. Ley, R.E., Bckhed, F., Turnbaugh, P., Lozupone, C.A., Knight, R.D., et al.: Obesity alters gut microbial ecology. Proc. Natl. Acad. Sci. USA 102, 11070–11075 (2005)

    Article  Google Scholar 

  38. Karlsson, F.H., Tremaroli, V., Nookaew, I., Bergstrm, G., Behre, C.J., et al.: Gut metagenome in european women with normal, impaired and diabetic glucose control. Nature 498, 99–103 (2013)

    Article  Google Scholar 

  39. Parks, B.W., Nam, E., Org, E., Kostem, E., Norheim, F., et al.: Genetic control of obesity and gut microbiota composition in response to high-fat, high-sucrose diet in mice. Cell Metab. 17, 141–152 (2013)

    Article  Google Scholar 

  40. Gower, J.C.: Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53, 325–338 (1966)

    Article  MATH  MathSciNet  Google Scholar 

  41. McArdle, B.H., Anderson, M.J.: Fitting multivariate models to community data: a comment on distance-based redundancy analysis. Ecology 82, 290–297 (2001)

    Article  Google Scholar 

  42. Bray, J.R., Curtis, J.T.: An ordination of the upland forest communities of southern wisconsin. Ecological monographs 27, 325–349 (1957)

    Article  Google Scholar 

  43. Brem, R.B., Kruglyak, L.: The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proc. Natl. Acad. Sci. USA 102, 1572–1577 (2005)

    Article  Google Scholar 

  44. Bokulich, N.A., Subramanian, S., Faith, J.J., Gevers, D., Gordon, J.I., et al.: Quality-filtering vastly improves diversity estimates from illumina amplicon sequencing. Nat. Methods 10, 57–59 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eleazar Eskin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Joo, J.W.J. et al. (2015). Efficient and Accurate Multiple-Phenotypes Regression Method for High Dimensional Data Considering Population Structure. In: Przytycka, T. (eds) Research in Computational Molecular Biology. RECOMB 2015. Lecture Notes in Computer Science(), vol 9029. Springer, Cham. https://doi.org/10.1007/978-3-319-16706-0_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16706-0_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16705-3

  • Online ISBN: 978-3-319-16706-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics