Skip to main content

High-Dimensional Multi-trait GWAS By Reverse Prediction of Genotypes Using Machine Learning Methods

  • Conference paper
  • First Online:
Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB 2021)

Abstract

Multi-trait genome-wide association studies (GWAS) use multi-variate statistical methods to identify associations between genetic variants and multiple correlated traits simultaneously, and have higher statistical power than independent univariate analyses of traits. Reverse regression, where genotypes of genetic variants are regressed on multiple traits simultaneously, has emerged as a promising approach to perform multi-trait GWAS in high-dimensional settings where the number of traits exceeds the number of samples. We analyzed different machine learning methods (ridge regression, naive Bayes/independent univariate, random forests and support vector machines) for reverse regression in multi-trait GWAS, using genotypes, gene expression data and ground-truth transcriptional regulatory networks from the DREAM5 SysGen Challenge and from a cross between two yeast strains to evaluate methods. We found that genotype prediction performance, in terms of root mean squared error (RMSE), allowed to distinguish between genomic regions with high and low transcriptional activity. Moreover, model feature coefficients correlated with the strength of association between variants and individual traits, and were predictive of true trans acting expression quantitative trait loci (trans-eQTL) target genes, with complementary findings across methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.synapse.org/#!Synapse:syn2820440/wiki/.

  2. 2.

    http://www.yeastract.com/formregmatrix.php.

References

  1. McCarthy, M.I., et al.: Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat. Rev. Genet. 9(5), 356–369 (2008)

    Google Scholar 

  2. Manolio, T.A.: Bringing genome-wide association findings into clinical use. Nat. Rev. Genet. 14(8), 549–558 (2013)

    Article  CAS  PubMed  Google Scholar 

  3. Ferreira, M.A., Purcell, S.M.: A multivariate test of association. Bioinformatics 25(1), 132–133 (2009)

    Article  CAS  PubMed  Google Scholar 

  4. Galesloot, T.E., Van Steen, K., Kiemeney, L.A., Janss, L.L., Vermeulen, S.H.: A comparison of multivariate genome-wide association methods. PloS one 9(4), e95923 (2014)

    Article  PubMed  PubMed Central  Google Scholar 

  5. Van Rheenen, W., Peyrot, W.J., Schork, A.J., Lee, S.H., Wray, N.R.: Genetic correlations of polygenic disease traits: from theory to practice. Nat. Rev. Genet. 20(10), 567–581 (2019)

    Article  PubMed  Google Scholar 

  6. Shabalin, A.A.: Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics 28(10), 1353–1358 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Qi, J., Asl, H.F., Björkegren, J., Michoel, T.: kruX: matrix-based non-parametric eQTL discovery. BMC Bioinform. 15(1), 1–7 (2014)

    Article  Google Scholar 

  8. Ongen, H., Buil, A., Brown, A.A., Dermitzakis, E.T., Delaneau, O.: Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics 32(10), 1479–1485 (2016)

    Article  CAS  PubMed  Google Scholar 

  9. O’Reilly, P.F., et al.: MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS. PloS one 7(5), e34861 (2012)

    Google Scholar 

  10. Banerjee, S., et al.: Reverse regression increases power for detecting trans-eQTLs. bioRxiv. (2020)

    Google Scholar 

  11. Wang, H., et al.: From phenotype to genotype: an association study of longitudinal phenotypic markers to Alzheimer’s disease relevant SNPs. Bioinformatics 28(18), i619–i625 (2012)

    Google Scholar 

  12. Albert, F.W., Bloom, J.S., Siegel, J., Day, L., Kruglyak, L.: Genetics of trans-regulatory variation in gene expression. Elife 7, e35471 (2018)

    Article  PubMed  PubMed Central  Google Scholar 

  13. Monteiro, P.T., et al.: YEASTRACT+: a portal for cross-species comparative genomics of transcription regulation in yeasts. Nucleic Acids Res. 48(D1), D642–D649 (2020)

    Article  CAS  PubMed  Google Scholar 

  14. Pinna, A., Soranzo, N., Hoeschele, I., de la Fuente, A.: Simulating systems genetics data with SysGenSIM. Bioinformatics 27(17), 2459–2462 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Yates, A.D., et al.: Ensembl 2020. Nucleic Acids Res. 48(D1), D682–D688 (2020)

    Google Scholar 

  16. Knijnenburg, T.A., Wessels, L.F., Reinders, M.J., Shmulevich, I.: Fewer permutations, more accurate P-values. Bioinformatics 25(12), i161–i168 (2009)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Ammar Malik .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3125 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Malik, M.A., Ludl, AA., Michoel, T. (2022). High-Dimensional Multi-trait GWAS By Reverse Prediction of Genotypes Using Machine Learning Methods. In: Chicco, D., et al. Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2021. Lecture Notes in Computer Science(), vol 13483. Springer, Cham. https://doi.org/10.1007/978-3-031-20837-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20837-9_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20836-2

  • Online ISBN: 978-3-031-20837-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics