Abstract
For most genetic diseases, a wide gap exists between the heritability estimated from familial data and the heritability explained through standard genome-wide association studies. One of the incentive lines of research is epistasis - or gene-gene interaction -. However, epistasis detection poses computational challenges. This paper presents three contributions. Our first contribution aims at filling the lack of feedback on the behaviors of published methods dedicated to epistasis, when applied on real-world genetic data. We designed experiments to compare four published approaches encompassing random forests, Bayesian inference, optimization techniques and Markov blanket learning. We included in the comparison the recently developed approach SMMB-ACO (Stochastic Multiple Markov Blankets with Ant Colony Optimization). We used a published dataset related to Crohn’s disease. We compared the methods in all aspects: running times and memory requirements, numbers of interactions of interest (statistically significant 2-way interactions), p-value distributions, numbers of interaction networks and structure of these networks. Our second contribution assesses whether there is an impact of feature selection, performed upstream epistasis detection, on the previous statistics and distributions. Our third contribution consists in the characterization of SMMB-ACO’s behavior on large-scale real data. We report a great heterogeneity across methods, in all aspects, and highlight weaknesses and strengths for these approaches. Moreover, we conclude that in the case of the Crohn’s disease dataset, feature selection implemented through a random forest-based technique does not allow to increase the proportion of interactions of interest in the outputs.
HB and CS are the two joint first co-authors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aflakparast, M., Salimi, H., Gerami, A., Dubé, M.-P., Visweswaran, S., et al.: Cuckoo search epistasis: a new method for exploring significant genetic interactions. Heredity 112, 666–764 (2014)
Ayers, K., Cordell, H.: SNP selection in genome-wide and candidate gene studies via penalized logistic regression. Genet. Epidemiol. 34(8), 879–891 (2010)
Boisaubert, H., Sinoquet, C.: Detection of gene-gene interactions: methodological comparison on real-world data and insights on synergy between methods. In: Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2019), vol. 3, pp. 30–42. BIOINFORMATICS (2019)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996). https://doi.org/10.1023/A:1018054314350
Chang, Y.-C., Wu, J.-T., Hong, M.-Y., Tung, Y.-A., Hsieh, P.-H., et al.: GenEpi: gene-based epistasis discovery using machine learning (2018). bioRXiv, https://doi.org/10.1101/421719
Chatelain, C., Durand, G., Thuillier, V., Augé, F.: Performance of epistasis detection methods in semi-simulated GWAS. BMC Bioinform. 19(1), 231 (2018)
Durinck, S., Moreau, Y., Kasprzyk, A., Davis, S., Moor, B.D., et al.: Biomart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics 21, 3439–3440 (2005)
Fergus, P., Montanez, C., Abdulaimma, B., Lisboa, P., Chalmers, C.: Utilising deep learning and genome wide association studies for epistatic-driven preterm birth classification in African-American women (2018). arXiv preprint, arXiv:1801.02977
Furlong, L.: Human diseases through the lens of network biology. Trends Genet. 29, 150–159 (2013)
Gao, H., Granka, J., Feldman, M.: On the classification of epistatic interactions. Genetics 184(3), 827–837 (2010)
Gibert, J.-M., Blanco, J., Dolezal, M., Nolte, V., Peronnet, F., Schlötterer, C.: Strong epistatic and additive effects of linked candidate SNPs for Drosophila pigmentation have implications for analysis of genome-wide association studies results. Genome Biol. 18, 126 (2017)
Gilbert-Diamond, D., Moore, J.: Analysis of gene-gene interactions. Current Protocols in Human Genetics, 0 1: Unit1.14 (2011)
Gola, D., Mahachie John, J., van Steen, K., König, I.: A roadmap to multifactor dimensionality reduction methods. Briefings Bioinform. 17(2), 293–308 (2016)
Graham, D., Xavier, R.: From genetics of inflammatory bowel disease towards mechanistic insights. Trends Immunol. 34, 371–378 (2013)
Han, B., Chen, X.-W.: bNEAT: a Bayesian network method for detecting epistatic interactions in genome-wide association studies. BMC Genomics 12(Suppl. 2), S9 (2011)
Han, B., Chen, X.-W., Talebizadeh, Z.: FEPI-MB: identifying SNPs-disease association using a Markov blanket-based approach. BMC Bioinform. 12(Suppl. 12), S3 (2011)
Han, B., Park, M., Chen, X.-W.: A Markov blanket-based method for detecting causal SNPs in GWAS. BMC Bioinform. 11(Suppl. 3), S5 (2010)
Hohman, T., Bush, W., Jiang, L., Brown-Gentry, K., Torstenson, E., et al.: Discovery of gene-gene interactions across multiple independent datasets of Late Onset Alzheimer Disease from the Alzheimer Disease Genetics Consortium. Neurobiol. Aging 38, 141–150 (2016)
Jiang, X., Neapolitan, R., Barmada, M., Visweswaran, S., Cooper, G.: A fast algorithm for learning epistatic genomic relationships. In: Proceedings of the Annual American Medical Informatics Association Symposium (AMIA 2010), pp. 341–345 (2010)
Jing, P., Shen, H.: MACOED: a multi-objective ant colony optimization algorithm for SNP epistasis detection in genome-wide association studies. Bioinformatics 31(5), 634–641 (2015)
Khor, B., Gardet, A., Ramnik, J.: Genetics and pathogenesis of inflammatory bowel disease. Nature 474(7351), 307–317 (2011)
Koller, D., Sahami, M.: Toward optimal feature selection. In: Proceedings of the 13th Conference on Machine Learning (ICML 1996), pp. 284–292. Morgan Kaufmann, San Fransisco (1996)
Krzywinski, M., Schein, J., Birol, I., Connors, J., Gascoyne, R., et al.: Circos: an information aesthetic for comparative genomics. Genome Res. 19(9), 1639–1645 (2009)
Li, J., Malley, J., Andrew, A., Karagas, M., Moore, J.: Detecting gene-gene interactions using a permutation-based random forest method. BioData Min. 9, 14 (2016)
Lunetta, K., Hayward, L., Segal, J., Eerdewegh, P.V.: Screening large-scale association study data: exploiting interactions using random forests. BMC Genet. 5, 32 (2004)
McGovern, D., Kugathasan, S., Cho, J.: Genetics of inflammatory bowel diseases. Gastroenterology 149(5), 1163–1176 (2015)
Nicodemus, K., Law, A., Radulescu, E., Luna, A., Kolachana, B., et al.: Biological validation of increased schizophrenia risk with NRG1, ERBB4, and AKT1 epistasis via functional neuroimaging in healthy controls. Arch. Gen. Psychiatry 67(10), 991–1001 (2013)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc., San Francisco (1988)
Sackton, T., Hartl, D.: Genotypic context and epistasis in individuals and populations. Cell 166(2), 279–287 (2016)
Schwarz, D., König, I., Ziegler, A.: On safari to random jungle: a fast implementation of random forests for high-dimensional data. Bioinformatics 26(14), 1752–1758 (2010)
Shen, Y., Liu, Z., Ott, J.: Support vector machines with L1 penalty for detecting gene-gene interactions. Int. J. Data Min. Bioinform. 6, 463–470 (2012)
Sinoquet, C., Niel, C.: Enhancement of a stochastic Markov blanket framework with ant colony optimization, to uncover epistasis in genetic association studies. In: Proceedings of the 26th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2018), pp. 673–678 (2018)
Stanislas, V., Dalmasso, C., Ambroise, C.: Eigen-Epistasis for detecting gene-gene interactions. BMC Bioinform. 18, 54 (2017). https://doi.org/10.1186/s12859-017-1488-0
Sun, Y., Shang, J., Liu, J.-X., Li, S., Zheng, C.-H.: epiACO - a method for identifying epistasis based on ant colony optimization algorithm. BioData Min. 10, 23 (2017)
Uppu, S., Krishna, A., Gopalan, R.: Towards deep learning in genome-wide association interaction studies. In: Proceedings of the 20th Pacific Asia Conference on Information Systems (PACIS2016), p. 20 (2016)
Urbanowicz, R., Meeker, M., LaCava, W., Olson, R., Moore, J.: Relief-based feature selection: introduction and review. J. Biomed. Inform. 85, 189–203 (2018)
Vineis, P., Pearce, N.: Missing heritability in genome-wide association study research. Nat. Rev. Genet. 11, 589–589 (2010)
Visscher, P., Wray, N., Zhang, Q., Sklar, P., McCarthy, M., et al.: 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101(1), 5–22 (2017)
Wang, Y., Liu, X., Robbins, K., Rekaya, R.: AntEpiSeeker: detecting epistatic interactions for case-control studies using a two-stage ant colony optimization algorithm. BMC Res. Notes 3, 117 (2010)
Wright, M., Ziegler, A.: ranger: a fast implementation of random forests for high dimensional data in C++ and R. J. Stat. Softw. 77(1), 1–17 (2017)
Zhang, Y.: A novel Bayesian graphical model for genome-wide multi-SNP association mapping. Genet. Epidemiol. 36(1), 36–47 (2012)
Zhang, Y., Liu, J.: Bayesian inference of epistatic interactions in case-control studies. Nat. Genet. 39, 1167–1173 (2007)
Zhu, Z., Tong, X., Zhu, Z., Liang, M., Cui, W., et al.: Development of MDR-GPU for gene-gene interaction analysis and its application to WTCCC GWAS data for type 2 diabetes. PLOS ONE 8(4), e61943 (2013)
Zuk, O., Hechter, E., Sunyaev, S., Lander, E.: The mystery of missing heritability: genetic interactions create phantom heritability. Proc. Nat. Acad. Sci. 109, 1193–1198 (2012)
Acknowledgment
This work was supported by the GRIOTE Research project funded by the Pays de la Loire Region. In this study, we have processed real-world data generated by the Wellcome Trust Case Control Consortium. The experiments reported in this paper were performed at the CCIPL (Centre de Calcul Intensif des Pays de la Loire).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Boisaubert, H., Sinoquet, C. (2020). Machine Learning and Combinatorial Optimization to Detect Gene-gene Interactions in Genome-wide Real Data: Looking Through the Prism of Four Methods and Two Protocols. In: Roque, A., et al. Biomedical Engineering Systems and Technologies. BIOSTEC 2019. Communications in Computer and Information Science, vol 1211. Springer, Cham. https://doi.org/10.1007/978-3-030-46970-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-46970-2_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46969-6
Online ISBN: 978-3-030-46970-2
eBook Packages: Computer ScienceComputer Science (R0)