Skip to main content

Machine Learning and Combinatorial Optimization to Detect Gene-gene Interactions in Genome-wide Real Data: Looking Through the Prism of Four Methods and Two Protocols

  • Conference paper
  • First Online:
Biomedical Engineering Systems and Technologies (BIOSTEC 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1211))

  • 420 Accesses

Abstract

For most genetic diseases, a wide gap exists between the heritability estimated from familial data and the heritability explained through standard genome-wide association studies. One of the incentive lines of research is epistasis - or gene-gene interaction -. However, epistasis detection poses computational challenges. This paper presents three contributions. Our first contribution aims at filling the lack of feedback on the behaviors of published methods dedicated to epistasis, when applied on real-world genetic data. We designed experiments to compare four published approaches encompassing random forests, Bayesian inference, optimization techniques and Markov blanket learning. We included in the comparison the recently developed approach SMMB-ACO (Stochastic Multiple Markov Blankets with Ant Colony Optimization). We used a published dataset related to Crohn’s disease. We compared the methods in all aspects: running times and memory requirements, numbers of interactions of interest (statistically significant 2-way interactions), p-value distributions, numbers of interaction networks and structure of these networks. Our second contribution assesses whether there is an impact of feature selection, performed upstream epistasis detection, on the previous statistics and distributions. Our third contribution consists in the characterization of SMMB-ACO’s behavior on large-scale real data. We report a great heterogeneity across methods, in all aspects, and highlight weaknesses and strengths for these approaches. Moreover, we conclude that in the case of the Crohn’s disease dataset, feature selection implemented through a random forest-based technique does not allow to increase the proportion of interactions of interest in the outputs.

HB and CS are the two joint first co-authors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aflakparast, M., Salimi, H., Gerami, A., Dubé, M.-P., Visweswaran, S., et al.: Cuckoo search epistasis: a new method for exploring significant genetic interactions. Heredity 112, 666–764 (2014)

    Article  Google Scholar 

  2. Ayers, K., Cordell, H.: SNP selection in genome-wide and candidate gene studies via penalized logistic regression. Genet. Epidemiol. 34(8), 879–891 (2010)

    Article  Google Scholar 

  3. Boisaubert, H., Sinoquet, C.: Detection of gene-gene interactions: methodological comparison on real-world data and insights on synergy between methods. In: Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2019), vol. 3, pp. 30–42. BIOINFORMATICS (2019)

    Google Scholar 

  4. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996). https://doi.org/10.1023/A:1018054314350

    Article  MATH  Google Scholar 

  5. Chang, Y.-C., Wu, J.-T., Hong, M.-Y., Tung, Y.-A., Hsieh, P.-H., et al.: GenEpi: gene-based epistasis discovery using machine learning (2018). bioRXiv, https://doi.org/10.1101/421719

  6. Chatelain, C., Durand, G., Thuillier, V., Augé, F.: Performance of epistasis detection methods in semi-simulated GWAS. BMC Bioinform. 19(1), 231 (2018)

    Google Scholar 

  7. Durinck, S., Moreau, Y., Kasprzyk, A., Davis, S., Moor, B.D., et al.: Biomart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics 21, 3439–3440 (2005)

    Article  Google Scholar 

  8. Fergus, P., Montanez, C., Abdulaimma, B., Lisboa, P., Chalmers, C.: Utilising deep learning and genome wide association studies for epistatic-driven preterm birth classification in African-American women (2018). arXiv preprint, arXiv:1801.02977

  9. Furlong, L.: Human diseases through the lens of network biology. Trends Genet. 29, 150–159 (2013)

    Article  Google Scholar 

  10. Gao, H., Granka, J., Feldman, M.: On the classification of epistatic interactions. Genetics 184(3), 827–837 (2010)

    Article  Google Scholar 

  11. Gibert, J.-M., Blanco, J., Dolezal, M., Nolte, V., Peronnet, F., Schlötterer, C.: Strong epistatic and additive effects of linked candidate SNPs for Drosophila pigmentation have implications for analysis of genome-wide association studies results. Genome Biol. 18, 126 (2017)

    Article  Google Scholar 

  12. Gilbert-Diamond, D., Moore, J.: Analysis of gene-gene interactions. Current Protocols in Human Genetics, 0 1: Unit1.14 (2011)

    Google Scholar 

  13. Gola, D., Mahachie John, J., van Steen, K., König, I.: A roadmap to multifactor dimensionality reduction methods. Briefings Bioinform. 17(2), 293–308 (2016)

    Article  Google Scholar 

  14. Graham, D., Xavier, R.: From genetics of inflammatory bowel disease towards mechanistic insights. Trends Immunol. 34, 371–378 (2013)

    Article  Google Scholar 

  15. Han, B., Chen, X.-W.: bNEAT: a Bayesian network method for detecting epistatic interactions in genome-wide association studies. BMC Genomics 12(Suppl. 2), S9 (2011)

    Google Scholar 

  16. Han, B., Chen, X.-W., Talebizadeh, Z.: FEPI-MB: identifying SNPs-disease association using a Markov blanket-based approach. BMC Bioinform. 12(Suppl. 12), S3 (2011)

    Google Scholar 

  17. Han, B., Park, M., Chen, X.-W.: A Markov blanket-based method for detecting causal SNPs in GWAS. BMC Bioinform. 11(Suppl. 3), S5 (2010)

    Google Scholar 

  18. Hohman, T., Bush, W., Jiang, L., Brown-Gentry, K., Torstenson, E., et al.: Discovery of gene-gene interactions across multiple independent datasets of Late Onset Alzheimer Disease from the Alzheimer Disease Genetics Consortium. Neurobiol. Aging 38, 141–150 (2016)

    Article  Google Scholar 

  19. Jiang, X., Neapolitan, R., Barmada, M., Visweswaran, S., Cooper, G.: A fast algorithm for learning epistatic genomic relationships. In: Proceedings of the Annual American Medical Informatics Association Symposium (AMIA 2010), pp. 341–345 (2010)

    Google Scholar 

  20. Jing, P., Shen, H.: MACOED: a multi-objective ant colony optimization algorithm for SNP epistasis detection in genome-wide association studies. Bioinformatics 31(5), 634–641 (2015)

    Article  Google Scholar 

  21. Khor, B., Gardet, A., Ramnik, J.: Genetics and pathogenesis of inflammatory bowel disease. Nature 474(7351), 307–317 (2011)

    Article  Google Scholar 

  22. Koller, D., Sahami, M.: Toward optimal feature selection. In: Proceedings of the 13th Conference on Machine Learning (ICML 1996), pp. 284–292. Morgan Kaufmann, San Fransisco (1996)

    Google Scholar 

  23. Krzywinski, M., Schein, J., Birol, I., Connors, J., Gascoyne, R., et al.: Circos: an information aesthetic for comparative genomics. Genome Res. 19(9), 1639–1645 (2009)

    Article  Google Scholar 

  24. Li, J., Malley, J., Andrew, A., Karagas, M., Moore, J.: Detecting gene-gene interactions using a permutation-based random forest method. BioData Min. 9, 14 (2016)

    Google Scholar 

  25. Lunetta, K., Hayward, L., Segal, J., Eerdewegh, P.V.: Screening large-scale association study data: exploiting interactions using random forests. BMC Genet. 5, 32 (2004)

    Google Scholar 

  26. McGovern, D., Kugathasan, S., Cho, J.: Genetics of inflammatory bowel diseases. Gastroenterology 149(5), 1163–1176 (2015)

    Article  Google Scholar 

  27. Nicodemus, K., Law, A., Radulescu, E., Luna, A., Kolachana, B., et al.: Biological validation of increased schizophrenia risk with NRG1, ERBB4, and AKT1 epistasis via functional neuroimaging in healthy controls. Arch. Gen. Psychiatry 67(10), 991–1001 (2013)

    Article  Google Scholar 

  28. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc., San Francisco (1988)

    MATH  Google Scholar 

  29. Sackton, T., Hartl, D.: Genotypic context and epistasis in individuals and populations. Cell 166(2), 279–287 (2016)

    Article  Google Scholar 

  30. Schwarz, D., König, I., Ziegler, A.: On safari to random jungle: a fast implementation of random forests for high-dimensional data. Bioinformatics 26(14), 1752–1758 (2010)

    Article  Google Scholar 

  31. Shen, Y., Liu, Z., Ott, J.: Support vector machines with L1 penalty for detecting gene-gene interactions. Int. J. Data Min. Bioinform. 6, 463–470 (2012)

    Article  Google Scholar 

  32. Sinoquet, C., Niel, C.: Enhancement of a stochastic Markov blanket framework with ant colony optimization, to uncover epistasis in genetic association studies. In: Proceedings of the 26th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2018), pp. 673–678 (2018)

    Google Scholar 

  33. Stanislas, V., Dalmasso, C., Ambroise, C.: Eigen-Epistasis for detecting gene-gene interactions. BMC Bioinform. 18, 54 (2017). https://doi.org/10.1186/s12859-017-1488-0

    Article  Google Scholar 

  34. Sun, Y., Shang, J., Liu, J.-X., Li, S., Zheng, C.-H.: epiACO - a method for identifying epistasis based on ant colony optimization algorithm. BioData Min. 10, 23 (2017)

    Article  Google Scholar 

  35. Uppu, S., Krishna, A., Gopalan, R.: Towards deep learning in genome-wide association interaction studies. In: Proceedings of the 20th Pacific Asia Conference on Information Systems (PACIS2016), p. 20 (2016)

    Google Scholar 

  36. Urbanowicz, R., Meeker, M., LaCava, W., Olson, R., Moore, J.: Relief-based feature selection: introduction and review. J. Biomed. Inform. 85, 189–203 (2018)

    Article  Google Scholar 

  37. Vineis, P., Pearce, N.: Missing heritability in genome-wide association study research. Nat. Rev. Genet. 11, 589–589 (2010)

    Article  Google Scholar 

  38. Visscher, P., Wray, N., Zhang, Q., Sklar, P., McCarthy, M., et al.: 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101(1), 5–22 (2017)

    Article  Google Scholar 

  39. Wang, Y., Liu, X., Robbins, K., Rekaya, R.: AntEpiSeeker: detecting epistatic interactions for case-control studies using a two-stage ant colony optimization algorithm. BMC Res. Notes 3, 117 (2010)

    Article  Google Scholar 

  40. Wright, M., Ziegler, A.: ranger: a fast implementation of random forests for high dimensional data in C++ and R. J. Stat. Softw. 77(1), 1–17 (2017)

    Article  Google Scholar 

  41. Zhang, Y.: A novel Bayesian graphical model for genome-wide multi-SNP association mapping. Genet. Epidemiol. 36(1), 36–47 (2012)

    Article  Google Scholar 

  42. Zhang, Y., Liu, J.: Bayesian inference of epistatic interactions in case-control studies. Nat. Genet. 39, 1167–1173 (2007)

    Article  Google Scholar 

  43. Zhu, Z., Tong, X., Zhu, Z., Liang, M., Cui, W., et al.: Development of MDR-GPU for gene-gene interaction analysis and its application to WTCCC GWAS data for type 2 diabetes. PLOS ONE 8(4), e61943 (2013)

    Article  Google Scholar 

  44. Zuk, O., Hechter, E., Sunyaev, S., Lander, E.: The mystery of missing heritability: genetic interactions create phantom heritability. Proc. Nat. Acad. Sci. 109, 1193–1198 (2012)

    Google Scholar 

Download references

Acknowledgment

This work was supported by the GRIOTE Research project funded by the Pays de la Loire Region. In this study, we have processed real-world data generated by the Wellcome Trust Case Control Consortium. The experiments reported in this paper were performed at the CCIPL (Centre de Calcul Intensif des Pays de la Loire).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christine Sinoquet .

Editor information

Editors and Affiliations

Appendix

Appendix

Table 7. Parameter adjustment for the five methods. (Table published in [3]).

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Boisaubert, H., Sinoquet, C. (2020). Machine Learning and Combinatorial Optimization to Detect Gene-gene Interactions in Genome-wide Real Data: Looking Through the Prism of Four Methods and Two Protocols. In: Roque, A., et al. Biomedical Engineering Systems and Technologies. BIOSTEC 2019. Communications in Computer and Information Science, vol 1211. Springer, Cham. https://doi.org/10.1007/978-3-030-46970-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-46970-2_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-46969-6

  • Online ISBN: 978-3-030-46970-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics