Abstract
A typical GWAS tests correlation between a single phenotype and each genotype one at a time. However, it is often very useful to analyze many phenotypes simultaneously. For example, this may increase the power to detect variants by capturing unmeasured aspects of complex biological networks that a single phenotype might miss. There are several multivariate approaches that try to detect variants related to many phenotypes, but none of them consider population structure and each may result in a significant number of false positive identifications. Here, we introduce a new methodology, referred to as GAMMA, that could both simultaneously analyze many phenotypes as well as correct for population structure. In a simulated study, GAMMA accurately identifies true genetic effects without false positive identifications, while other methods either fail to detect true effects or result in many false positive identifications. We further apply our method to genetic studies of yeast and gut microbiome from mouse and show that GAMMA identifies several variants that are likely to have a true biological mechanism.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Lockhart, D.J., Dong, H., Byrne, M.C., Follettie, M.T., Gallo, M.V., et al.: Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol. 14, 1675–1680 (1996)
Gygi, S.P., Rist, B., Gerber, S.A., Turecek, F., Gelb, M.H., et al.: Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat. Biotechnol. 17, 994–999 (1999)
Cervino, A.C., Li, G., Edwards, S., Zhu, J., Laurie, C., et al.: Integrating qtl and high-density snp analyses in mice to identify insig2 as a susceptibility gene for plasma cholesterol levels. Genomics 86, 505–17 (2005)
Hillebrandt, S., Wasmuth, H.E., Weiskirchen, R., Hellerbrand, C., Keppeler, H., et al.: Complement factor 5 is a quantitative trait gene that modifies liver fibrogenesis in mice and humans. Nat. Genet. 37, 835–843 (2005)
Wang, X., Korstanje, R., Higgins, D., Paigen, B.: Haplotype analysis in multiple crosses to identify a qtl gene. Genome. Res. 14, 1767–1772 (2004)
O’Reilly, P.F., Hoggart, C.J., Pomyen, Y., Calboli, F.C.F., Elliott, P., et al.: Multiphen: joint model of multiple phenotypes can increase discovery in gwas. PLoS One 7, e34861 (2012)
Alter, O., Brown, P.O., Botstein, D.: Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl. Acad. Sci. USA 97, 10101–10106 (2000)
Quackenbush, J.: Computational analysis of microarray data. Nat. Rev. Genet. 2, 418–427 (2001)
Nievergelt, C.M., Libiger, O., Schork, N.J.: Generalized analysis of molecular variance. PLoS Genet. 3, e51 (2007)
Zapala, M.A., Schork, N.J.: Statistical properties of multivariate distance matrix regression for high-dimensional data analysis. Front Genet. 3, 190 (2012)
Wessel, J., Schork, N.J.: Generalized genomic distance-based regression methodology for multilocus association analysis. Am. J. Hum. Genet. 79, 792–806 (2006)
Kittles, R.A., Chen, W., Panguluri, R.K., Ahaghotu, C., Jackson, A., et al.: Cyp3a4-v and prostate cancer in african americans: causal or confounding association because of population stratification? Hum. Genet. 110, 553–560 (2002)
Freedman, M.L., Reich, D., Penney, K.L., McDonald, G.J., Mignault, A.A., et al.: Assessing the impact of population stratification on genetic association studies. Nat. Genet. 36, 388–393 (2004)
Marchini, J., Cardon, L.R., Phillips, M.S., Donnelly, P.: The effects of human population structure on large genetic association studies. Nat. Genet. 36, 512–517 (2004)
Campbell, C.D., Ogburn, E.L., Lunetta, K.L., Lyon, H.N., Freedman, M.L., et al.: Demonstrating stratification in a european american population. Nat. Genet. 37, 868–872 (2005)
Helgason, A., Yngvadttir, B., Hrafnkelsson, B., Gulcher, J., Stefnsson, K.: An icelandic example of the impact of population structure on association studies. Nat. Genet. 37, 90–95 (2005)
Reiner, A.P., Ziv, E., Lind, D.L., Nievergelt, C.M., Schork, N.J., et al.: Population structure, admixture, and aging-related phenotypes in african american adults: the cardiovascular health study. Am. J. Hum. Genet. 76, 463–477 (2005)
Voight, B.F., Pritchard, J.K.: Confounding from cryptic relatedness in case-control association studies. PLoS Genet. 1, e32 (2005)
Berger, M., Stassen, H.H., Khler, K., Krane, V., Mnks, D., et al.: Hidden population substructures in an apparently homogeneous population bias association studies. Eur. J. Hum. Genet. 14, 236–244 (2006)
Seldin, M.F., Shigeta, R., Villoslada, P., Selmi, C., Tuomilehto, J., et al.: European population substructure: clustering of northern and southern populations. PLoS Genet. 2, e143 (2006)
Foll, M., Gaggiotti, O.: Identifying the environmental factors that determine the genetic structure of populations. Genetics 174, 875–91 (2006)
Flint, J., Eskin, E.: Genome-wide association studies in mice. Nat. Rev. Genet. 13, 807–817 (2012)
Zhou, X., Stephens, M.: Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat. Methods 11, 407–409 (2014)
Korte, A., Vilhjlmsson, B.J., Segura, V., Platt, A., Long, Q., et al.: A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nat. Genet. 44, 1066–1071 (2012)
Kang, H.M., Ye, C., Eskin, E.: Accurate discovery of expression quantitative trait loci under confounding from spurious and genuine regulatory hotspots. Genetics 180, 1909–1925 (2008)
Kang, H.M., Sul, J.H., Service, S.K., Zaitlen, N.A., Kong, S.Y.Y., et al.: Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010)
Lippert, C., Listgarten, J., Liu, Y., Kadie, C.M., Davidson, R.I., et al.: Fast linear mixed models for genome-wide association studies. Nat. Methods 8, 833–835 (2011)
Svishcheva, G.R., Axenovich, T.I., Belonogova, N.M., van Duijn, C.M., Aulchenko, Y.S.: Rapid variance components-based method for whole-genome association analysis. Nat. Genet. 44, 1166–1170 (2012)
Zhou, X., Stephens, M.: Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012)
Segura, V., Vilhjlmsson, B.J., Platt, A., Korte, A., Seren, U., et al.: An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat. Genet. 44, 825–830 (2012)
Joo, J.W.J., Sul, J.H., Han, B., Ye, C., Eskin, E.: Effectively identifying regulatory hotspots while capturing expression heterogeneity in gene expression studies. Genome. Biol. 15, r61 (2014)
Bennett, B.J., Farber, C.R., Orozco, L., Kang, H.M., Ghazalpour, A., et al.: A high-resolution association mapping panel for the dissection of complex traits in mice. Genome. Res. 20, 281–290 (2010)
Michaelson, J.J., Loguercio, S., Beyer, A.: Detection and interpretation of expression quantitative trait loci (eqtl). Methods 48, 265–276 (2009)
Foss, E.J., Radulovic, D., Shaffer, S.A., Ruderfer, D.M., Bedalov, A., et al.: Genetic basis of proteome variation in yeast. Nat. Genet. 39, 1369–1375 (2007)
Perlstein, E.O., Ruderfer, D.M., Roberts, D.C., Schreiber, S.L., Kruglyak, L.: Genetic basis of individual differences in the response to small-molecule drugs in yeast. Nat. Genet. 39, 496–502 (2007)
Devlin, B., Roeder, K., Wasserman, L.: Genomic control, a new approach to genetic-based association studies. Theor. Popul. Biol. 60, 155–166 (2001)
Ley, R.E., Bckhed, F., Turnbaugh, P., Lozupone, C.A., Knight, R.D., et al.: Obesity alters gut microbial ecology. Proc. Natl. Acad. Sci. USA 102, 11070–11075 (2005)
Karlsson, F.H., Tremaroli, V., Nookaew, I., Bergstrm, G., Behre, C.J., et al.: Gut metagenome in european women with normal, impaired and diabetic glucose control. Nature 498, 99–103 (2013)
Parks, B.W., Nam, E., Org, E., Kostem, E., Norheim, F., et al.: Genetic control of obesity and gut microbiota composition in response to high-fat, high-sucrose diet in mice. Cell Metab. 17, 141–152 (2013)
Gower, J.C.: Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53, 325–338 (1966)
McArdle, B.H., Anderson, M.J.: Fitting multivariate models to community data: a comment on distance-based redundancy analysis. Ecology 82, 290–297 (2001)
Bray, J.R., Curtis, J.T.: An ordination of the upland forest communities of southern wisconsin. Ecological monographs 27, 325–349 (1957)
Brem, R.B., Kruglyak, L.: The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proc. Natl. Acad. Sci. USA 102, 1572–1577 (2005)
Bokulich, N.A., Subramanian, S., Faith, J.J., Gevers, D., Gordon, J.I., et al.: Quality-filtering vastly improves diversity estimates from illumina amplicon sequencing. Nat. Methods 10, 57–59 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Joo, J.W.J. et al. (2015). Efficient and Accurate Multiple-Phenotypes Regression Method for High Dimensional Data Considering Population Structure. In: Przytycka, T. (eds) Research in Computational Molecular Biology. RECOMB 2015. Lecture Notes in Computer Science(), vol 9029. Springer, Cham. https://doi.org/10.1007/978-3-319-16706-0_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-16706-0_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16705-3
Online ISBN: 978-3-319-16706-0
eBook Packages: Computer ScienceComputer Science (R0)