Abstract
Identification and characterization of interactions between genes have been increasingly explored in current Genome-wide association studies (GWAS). Several machine learning and data mining approaches have been proposed to identify the multi-locus interactions in higher order genomic data. However, detecting these interactions is challenging due to bio-molecular complexities and computational limitations. In this paper, a multifactor dimensionality reduction based associative classifier is proposed for detecting SNP interactions in genetic epidemiological studies. The approach is evaluated for one to six loci models by varying heritability, minor allele frequency, case-control ratios and sample size. The experimental results demonstrated significant improvements in accuracy for detecting interacting single nucleotide polymorphisms (SNPs) responsible for complex diseases when compared to the previous approaches. Further, the approach was successfully evaluated by using sporadic breast cancer data. The results show interactions among five polymorphisms in three different estrogen-metabolism genes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sheet, S.F., Human genome project. US Department of Energy Genome Program’s Biological and Environmental Research Information System (BERIS). http://www.ornl.gov/sci/techresources/Human_Genome/. Accessed 28 July 2010
Padyukov, L.: Between the Lines of Genetic Code: Genetic Interactions in Understanding Disease and Complex Phenotypes. Academic Press, Waltham, MA (2013)
Cordell, H.J.: Detecting gene–gene interactions that underlie human diseases. Nat. Rev. Genet. 10(6), 392–404 (2009)
Koo, C.L., et al.: A review for detecting gene-gene interactions using machine learning methods in genetic epidemiology. In: BioMed Research International (2013)
Qi, Y.: Random Forest for Bioinformatics. In: Zhang, C., Ma, Y. (eds.) Ensemble Machine Learning, pp. 307–323. Springer, New York (2012)
Chen, C.C., et al.: Methods for identifying SNP interactions: a review on variations of logic regression, random forest and Bayesian logistic regression. IEEE/ACM Trans. Comput. Biol. Bioinform. 8(6), 1580–1591 (2011)
Zhang, Y., Liu, J.S.: Bayesian inference of epistatic interactions in case-control studies. Nat. Genet. 39(9), 1167–1173 (2007)
Ritchie, M.D., et al.: Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am. J. Hum. Genet. 69(1), 138–147 (2001)
Motsinger-Reif, A.A., et al.: Comparison of approaches for machine-learning optimization of neural networks for detecting gene-gene interactions in genetic epidemiology. Genet. Epidemiol. 32(4), 325–340 (2008)
McKinney, B.A., et al.: Machine learning for detecting gene-gene interactions. Appl. Bioinform. 5(2), 77–88 (2006)
Ramanan, V.K., et al.: Pathway analysis of genomic data: concepts, methods, and prospects for future development. Trends Genet. 28(7), 323–332 (2012)
Upstill-Goddard, R., et al.: Machine learning approaches for the discovery of gene–gene interactions in disease data. Briefings Bioinform. 14(2), 251–260 (2013)
Moore, J.H., et al.: A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J. Theor. Biol. 241(2), 252–261 (2006)
Thabtah, F.: A review of associative classification mining. Knowl. Eng. Rev. 22(01), 37–65 (2007)
Yu, P., Wild, D.J.: Fast rule-based bioactivity prediction using associative classification mining. J. Cheminformatics 4(1), 1–10 (2012)
Uppu, S., Krishna, A., Gopalan, R.P.: Detecting SNP Interactions in balanced and imbalanced datasets using associative classification. Aust. J. Intell. Inf. Process. Syst. 14(1), 7–18 (2014)
Uppu, S., Krishna, A., Gopalan, R.P.: An associative classification based approach for detecting SNP-SNP interactions in high dimensional genome. In: IEEE International Conference on Bioinformatics and Bioengineering (BIBE). IEEE (2014)
Han, J.: CPAR: Classification based on predictive association rules. In: Proceedings of the Third SIAM International Conference on Data Mining (2003)
Velez, D.R., et al.: A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction. Genet. Epidemiol. 31(4), 306–315 (2007)
Hahn, L.W., Ritchie, M.D., Moore, J.H.: Multifactor dimensionality reduction software for detecting gene–gene and gene–environment interactions. Bioinformatics 19(3), 376–382 (2003)
Urbanowicz, R.J., et al.: GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Min. 5(1), 1–14 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Uppu, S., Krishna, A., Gopalan, R.P. (2015). A Multifactor Dimensionality Reduction Based Associative Classification for Detecting SNP Interactions. In: Arik, S., Huang, T., Lai, W., Liu, Q. (eds) Neural Information Processing. ICONIP 2015. Lecture Notes in Computer Science(), vol 9489. Springer, Cham. https://doi.org/10.1007/978-3-319-26532-2_36
Download citation
DOI: https://doi.org/10.1007/978-3-319-26532-2_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26531-5
Online ISBN: 978-3-319-26532-2
eBook Packages: Computer ScienceComputer Science (R0)