skip to main content
10.1145/3580252.3586982acmconferencesArticle/Chapter ViewAbstractPublication PageschConference Proceedingsconference-collections
research-article

Interpreting High Order Epistasis Using Sparse Transformers

Published: 22 January 2024 Publication History

Abstract

Genome-Wide Association Studies aim to identify relations between Single Nucleotide Polymorphisms (SNPs) and the manifestation of certain diseases, which is an important challenge in biomedicine and personalized healthcare. However, most genetic diseases are explained by the interactions between several SNPs, known as epistasis. Detecting epistasis is a very computationally demanding task, due to the sheer number of SNP combinations to analyze. Recently, deep learning has emerged as a possible solution for genomic prediction, but the black-box nature of neural networks and the lack of explainability is a drawback yet to be solved. In this paper, a new, flexible framework for interpreting neural networks for anyorder epistasis detection is presented. Using sparse transformers, a technique not yet employed for epistasis detection, different SNP representations are explored and attention scores are assigned to each SNP to quantify its relevance for phenotype prediction. The results on simulated datasets show that the proposed framework outperforms state-of-the-art methods for explainability, identifying SNP interactions in diverse epistasis scenarios. The proposed framework is validated on a real breast cancer dataset, identifying second to fifth order interactions in the top 40% most relevant SNPs.

References

[1]
Andrew L. Beam, Alison Motsinger-Reif, and Jon Doyle. Bayesian neural networks for detecting epistasis in genetic association studies. BMC Bioinformatics, 15(1):368, November 2014. ISSN 1471-2105.
[2]
Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation, 15(6):1373--1396, 2003.
[3]
C. A. C. Montañez, P. Fergus, C. Chalmers, N. H. A. H. Malim, B. Abdulaimma, D. Reilly, and F. Falciani. SAERMA: Stacked Autoencoder Rule Mining Algorithm for the Interpretation of Epistatic Interactions in GWAS for Extreme Obesity. IEEE Access, 8:112379--112392, 2020. ISSN 2169-3536.
[4]
Rafael Campos, Diogo Marques, Sergio Santander-Jiménez, Leonel Sousa, and Aleksandar Ilic. Heterogeneous cpu+ igpu processing for efficient epistasis detection. In European conference on parallel processing, pages 613--628. Springer, 2020.
[5]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pretraining of deep bidirectional transformers for language understanding, 2019.
[6]
G. Ribeiro, N. Neves, S. Santander-Jiménez, and A. Ilic. HEDAcc: FPGA-based Accelerator for High-order Epistasis Detection. In 2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 124--132, May 2021. ISBN 2576-2621. Journal Abbreviation: 2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[7]
Borja González-Seoane, Christian Ponte-Fernández, Jorge González-Domínguez, and María J Martín. Pytoxo: a python tool for calculating penetrance tables of high-order epistasis models. BMC bioinformatics, 23(1):1--13, 2022.
[8]
Kari Hemminki, Asta Försti, and Justo Lorenzo Bermejo. The 'common disease-common variant'hypothesis and familial risks. PloS one, 3(6):e2504, 2008.
[9]
Torsten Hoefler, Dan Alistarh, Tal Ben-Nun, Nikoli Dryden, and Alexandra Peste. Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks, 2021.
[10]
Sebastian Jaszczur, Aakanksha Chowdhery, Afroz Mohiuddin, Łukasz Kaiser, Wojciech Gajewski, Henryk Michalewski, and Jonni Kanerva. Sparse is enough in scaling transformers. Advances in Neural Information Processing Systems, 34, 2021.
[11]
Siddhant Jayakumar, Razvan Pascanu, Jack Rae, Simon Osindero, and Erich Elsen. Top-kast: Top-k always sparse training. Advances in Neural Information Processing Systems, 33:20744--20754, 2020.
[12]
Rui Jiang, Wanwan Tang, Xuebing Wu, and Wenhui Fu. A random forest approach to the detection of epistatic interactions in case-control studies. BMC Bioinformatics, 10(1):S65, January 2009. ISSN 1471-2105.
[13]
Peng-Jie Jing and Hong-Bin Shen. MACOED: a multi-objective ant colony optimization algorithm for SNP epistasis detection in genome-wide association studies. Bioinformatics, 31(5):634--641, March 2015. ISSN 1367-4803.
[14]
Sasan Karamizadeh, Shahidan M Abdullah, Azizah A Manaf, Mazdak Zamani, and Alireza Hooman. An overview of principal component analysis. Journal of Signal and Information Processing, 4, 2020.
[15]
Amit V. Khera, Mark Chaffin, Kaitlin H. Wade, Sohail Zahid, Joseph Brancale, Rui Xia, Marina Distefano, Ozlem Senol-Cosar, Mary E. Haas, Alexander Bick, Krishna G. Aragam, Eric S. Lander, George Davey Smith, Heather Mason-Suares, Myriam Fornage, Matthew Lebo, Nicholas J. Timpson, Lee M. Kaplan, and Sekar Kathiresan. Polygenic Prediction of Weight and Obesity Trajectories from Birth to Adulthood. Cell, 177(3):587--596.e9, April 2019. ISSN 0092-8674. URL https://www.sciencedirect.com/science/article/pii/S0092867419302909.
[16]
Robert J Klein, Caroline Zeiss, Emily Y Chew, Jen-Yue Tsai, Richard S Sackler, Chad Haynes, Alice K Henning, John Paul SanGiovanni, Shrikant M Mane, Susan T Mayne, Michael B Bracken, Frederick L Ferris, Jurg Ott, Colin Barnstable, and Josephine Hoh. Complement factor H polymorphism in age-related macular degeneration. Science (New York, N.Y.), 308(5720):385--389, April 2005. ISSN 1095-9203. URL https:// .ncbi.nlm.nih.gov/15761122. Edition: 2005/03/10.
[17]
Brendan Maher. Personal genomes: The case of the missing heritability. Nature, 456(7218):18--21, November 2008. ISSN 1476-4687.
[18]
Jonathan Marchini, Peter Donnelly, and Lon R Cardon. Genome-wide strategies for detecting multiple loci that influence complex diseases. Nature genetics, 37(4): 413--417, 2005.
[19]
David L Mattson and Mingyu Liang. From gwas to functional genomics-based precision medicine. Nature Reviews Nephrology, 13(4):195--196, 2017.
[20]
Bettina Mieth, Alexandre Rozier, Juan Antonio Rodriguez, Marina M C Höhne, Nico Görnitz, and Klaus-Robert Müller. DeepCOMBI: explainable artificial intelligence for the analysis and discovery in genome-wide association studies. NAR Genomics and Bioinformatics, 3(3):lqab065, September 2021. ISSN 2631-9268.
[21]
Grégoire Montavon, Alexander Binder, Sebastian Lapuschkin, Wojciech Samek, and Klaus-Robert Müller. Layer-wise relevance propagation: an overview. Explainable AI: interpreting, explaining and visualizing deep learning, pages 193--209, 2019.
[22]
Clément Niel, Christine Sinoquet, Christian Dina, and Ghislain Rocheleau. A survey about methods dedicated to epistasis detection. Frontiers in Genetics, 6: 285, 2015. ISSN 1664-8021. URL https://www.frontiersin.org/article/10.3389/fgene.2015.00285.
[23]
Ricardo Nobre, Aleksandar Ilic, Sergio Santander-Jiménez, and Leonel Sousa. Fourth-Order Exhaustive Epistasis Detection for the XPU Era. In 50th International Conference on Parallel Processing. Association for Computing Machinery, New York, NY, USA, 2021. ISBN 978-1-4503-9068-2.
[24]
Mary Phuong and Marcus Hutter. Formal algorithms for transformers. arXiv, 2022. URL https://arxiv.org/abs/2207.09238.
[25]
Alexander Platzer. Visualization of snps with t-sne. PloS one, 8(2):e56883, 2013.
[26]
Christian Ponte-Fernández, Jorge González-Domínguez, and María J Martín. Fiuncho: a program for any-order epistasis detection in cpu clusters. The Journal of Supercomputing, pages 1--20, 2022.
[27]
Miguel Pérez-Enciso and Laura M. Zingaretti. A Guide on Deep Learning for Complex Trait Genomic Prediction. Genes, 10(7), 2019. ISSN 2073-4425.
[28]
Jack W Rae, Anna Potapenko, Siddhant M Jayakumar, and Timothy P Lillicrap. Compressive transformers for long-range sequence modelling. arXiv preprint arXiv:1911.05507, 2019.
[29]
Sam T Roweis and Lawrence K Saul. Nonlinear dimensionality reduction by locally linear embedding. science, 290(5500):2323--2326, 2000.
[30]
James C. Turton, James Bullock, Christopher Medway, Hui Shi, Kristelle Brown, Olivia Belbin, Noor Kalsheker, Minerva M. Carrasquillo, Dennis W. Dickson, Neill R. Graff-Radford, Ronald C. Petersen, Steven G. Younkin, and Kevin Morgan. Investigating Statistical Epistasis in Complex Disorders. Journal of Alzheimer's Disease, 25(4):635--644, 2011. ISSN 1875-8908. Publisher: IOS Press.
[31]
Suneetha Uppu and Aneesh Krishna. An Intensive Search for Higher-Order Gene-Gene Interactions by Improving Deep Learning Model. In 2018 IEEE 18th International Conference on Bioinformatics and Bioengineering (BIBE), pages 104--109, 2018.
[32]
Ryan J. Urbanowicz, Jeff Kiralis, Nicholas A. Sinnott-Armstrong, Tamra Heberling, Jonathan M. Fisher, and Jason H. Moore. GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Mining, 5(1):16, October 2012. ISSN 1756-0381.
[33]
Ryan J. Urbanowicz, Melissa Meeker, William La Cava, Randal S. Olson, and Jason H. Moore. Relief-based feature selection: Introduction and review. Journal of Biomedical Informatics, 85:189--203, September 2018. ISSN 1532-0464. URL https://www.sciencedirect.com/science/article/pii/S1532046418301400.
[34]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \Lukasz Kaiser, and Illia Polosukhin. Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS'17, pages 6000--6010, Red Hook, NY, USA, 2017. Curran Associates Inc. ISBN 978-1-5108-6096-4. event-place: Long Beach, California, USA.
[35]
Peter M. Visscher, Naomi R. Wray, Qian Zhang, Pamela Sklar, Mark I. McCarthy, Matthew A. Brown, and Jian Yang. 10 Years of GWAS Discovery: Biology, Function, and Translation. The American Journal of Human Genetics, 101(1):5--22, July 2017. ISSN 0002-9297. URL https://www.sciencedirect.com/science/article/pii/S0002929717302409.
[36]
Haohan Wang, Tianwei Yue, Jingkang Yang, Wei Wu, and Eric P. Xing. Deep mixed model for marginal epistasis detection and population stratification correction in genome-wide association studies. BMC Bioinformatics, 20(23): 656, December 2019. ISSN 1471-2105.
[37]
Cheng-Hong Yang, Yu-Da Lin, Li-Yeh Chuang, and Hsueh-Wei Chang. Evaluation of breast cancer susceptibility using improved genetic algorithms to generate genotype snp barcodes. IEEE/ACM transactions on computational biology and bioinformatics, 10(2):361--371, 2013.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CHASE '23: Proceedings of the 8th ACM/IEEE International Conference on Connected Health: Applications, Systems and Engineering Technologies
June 2023
232 pages
ISBN:9798400701023
DOI:10.1145/3580252
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 January 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. genome-wide association studies
  2. epistasis detection
  3. machine learning

Qualifiers

  • Research-article

Funding Sources

  • Fundação para a Ciência e a Tecnologia, Portugal (FCT)
  • Fundação para a Ciência e a Tecnologia, Portugal (FCT) and EuroHPC Joint Undertaking

Conference

CHASE '23
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)78
  • Downloads (Last 6 weeks)2
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media