Abstract
Multi-trait genome-wide association studies (GWAS) use multi-variate statistical methods to identify associations between genetic variants and multiple correlated traits simultaneously, and have higher statistical power than independent univariate analyses of traits. Reverse regression, where genotypes of genetic variants are regressed on multiple traits simultaneously, has emerged as a promising approach to perform multi-trait GWAS in high-dimensional settings where the number of traits exceeds the number of samples. We analyzed different machine learning methods (ridge regression, naive Bayes/independent univariate, random forests and support vector machines) for reverse regression in multi-trait GWAS, using genotypes, gene expression data and ground-truth transcriptional regulatory networks from the DREAM5 SysGen Challenge and from a cross between two yeast strains to evaluate methods. We found that genotype prediction performance, in terms of root mean squared error (RMSE), allowed to distinguish between genomic regions with high and low transcriptional activity. Moreover, model feature coefficients correlated with the strength of association between variants and individual traits, and were predictive of true trans acting expression quantitative trait loci (trans-eQTL) target genes, with complementary findings across methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
McCarthy, M.I., et al.: Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat. Rev. Genet. 9(5), 356–369 (2008)
Manolio, T.A.: Bringing genome-wide association findings into clinical use. Nat. Rev. Genet. 14(8), 549–558 (2013)
Ferreira, M.A., Purcell, S.M.: A multivariate test of association. Bioinformatics 25(1), 132–133 (2009)
Galesloot, T.E., Van Steen, K., Kiemeney, L.A., Janss, L.L., Vermeulen, S.H.: A comparison of multivariate genome-wide association methods. PloS one 9(4), e95923 (2014)
Van Rheenen, W., Peyrot, W.J., Schork, A.J., Lee, S.H., Wray, N.R.: Genetic correlations of polygenic disease traits: from theory to practice. Nat. Rev. Genet. 20(10), 567–581 (2019)
Shabalin, A.A.: Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics 28(10), 1353–1358 (2012)
Qi, J., Asl, H.F., Björkegren, J., Michoel, T.: kruX: matrix-based non-parametric eQTL discovery. BMC Bioinform. 15(1), 1–7 (2014)
Ongen, H., Buil, A., Brown, A.A., Dermitzakis, E.T., Delaneau, O.: Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics 32(10), 1479–1485 (2016)
O’Reilly, P.F., et al.: MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS. PloS one 7(5), e34861 (2012)
Banerjee, S., et al.: Reverse regression increases power for detecting trans-eQTLs. bioRxiv. (2020)
Wang, H., et al.: From phenotype to genotype: an association study of longitudinal phenotypic markers to Alzheimer’s disease relevant SNPs. Bioinformatics 28(18), i619–i625 (2012)
Albert, F.W., Bloom, J.S., Siegel, J., Day, L., Kruglyak, L.: Genetics of trans-regulatory variation in gene expression. Elife 7, e35471 (2018)
Monteiro, P.T., et al.: YEASTRACT+: a portal for cross-species comparative genomics of transcription regulation in yeasts. Nucleic Acids Res. 48(D1), D642–D649 (2020)
Pinna, A., Soranzo, N., Hoeschele, I., de la Fuente, A.: Simulating systems genetics data with SysGenSIM. Bioinformatics 27(17), 2459–2462 (2011)
Yates, A.D., et al.: Ensembl 2020. Nucleic Acids Res. 48(D1), D682–D688 (2020)
Knijnenburg, T.A., Wessels, L.F., Reinders, M.J., Shmulevich, I.: Fewer permutations, more accurate P-values. Bioinformatics 25(12), i161–i168 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Malik, M.A., Ludl, AA., Michoel, T. (2022). High-Dimensional Multi-trait GWAS By Reverse Prediction of Genotypes Using Machine Learning Methods. In: Chicco, D., et al. Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2021. Lecture Notes in Computer Science(), vol 13483. Springer, Cham. https://doi.org/10.1007/978-3-031-20837-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-20837-9_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20836-2
Online ISBN: 978-3-031-20837-9
eBook Packages: Computer ScienceComputer Science (R0)