Abstract
Genome-wide association studies are aimed at identifying associations between commonly occurring variations in a group of individuals and a phonotype, in which the Deoxyribonucleic acid is genotyped in the form of single nucleotide polymorphisms. Despite the exsistence of various research studies for the prediction of chronic diseases using human genome data, more investigations are still required. Machine learning algorithms are widely used for prediction and genome-wide association studies. In this research, Random Forest was utilised for selecting most significant single nucleotide polymorphisms associated to Alzheimer’s Disease. Deep learning model for the prediction of the disease was then developed. Our extesnive similation results indicated that this hybrid model is promising in predicting individuals that suffer from Alzheimer’s disease, achieving area under the curve of 0.9 and 0.93 using Convolutional Neural Network and Multilayer perceptron respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
World Alzheimer Report 2018. https://www.alzint.org/u/WorldAlzheimerReport2018.pdf, Accessed 15 Jan 2021
Ford, A.: Alzheimer disease. Mol. Chem. Neuropathol. 28(1–3), 121–124 (1996). https://doi.org/10.1007/BF02815213
Isik, A.T.: Late onset alzheimer’s disease in older people. Clin. Interv. Aging 5, 307 (2010)
Williamson, J., Goldman, J., Marder, K.S.: Genetic aspects of alzheimer disease. Neurologist 15(2), 80–86 (2009). https://doi.org/10.1097/NRL.0b013e318187e76b
Bekris, L.M., Yu, C.-E., Bird, T.D., Tsuang, D.W.: Review article: genetics of alzheimer disease. J. Geriatr. Psychiatry Neurol. 23(4), 213–227 (2010). https://doi.org/10.1177/0891988710383571
Hofmann-Apitius, M., et al.: Bioinformatics mining and modeling methods for the identification of disease mechanisms in neurodegenerative disorders. Int. J. Molec. Sci. 16(12), 29179–29206 (2015). https://www.mdpi.com/1422-0067/16/12/26148
Kim, J., Kim, J., Kwak, M.J., Bajaj, M.: Genetic prediction of type 2 diabetes using deep neural network. Clin. Genet. 93(4), 822–829 (2018). https://doi.org/10.1111/cge.13175
Abdulaimma, B., Fergus, P., Chalmers, C., Montanez, C.C.: Deep learning and genome-wide association studies for the classification of type 2 diabetes, pp. 1-8. IEEE (2020)
Ghanem, S.I., Ghoneim, A.A., Ghanem, N.M., Ismail, M.A.: High performance computing for detecting complex diseases using deep learning. In: 2019 International Conference on Advances in the Emerging Computing Technologies, AECT 2019 (2020). https://doi.org/10.1109/AECT47998.2020.9194158, https://www.scopus.com/inward/record.uri?eid=2-s2.0-85092376858&doi=10.1109%2fAECT47998.2020.9194158&partnerID=40&md5=0252fbd3c9bf9226aaa8482e30f8aaec, https://ieeexplore.ieee.org/document/9194158/
Urbanowicz, R., Kiralis, J., Sinnott-Armstrong, N., Heberling, T., Fisher, J., Moore, J.: GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Mining 5(1) (2012). https://doi.org/10.1186/1756-0381-5-16
Sun, T., Wei, Y., Chen, W., Ding, Y.: Genome-wide association study-based deep learning for survival prediction. Stat. Med. Article (2020). https://doi.org/10.1002/sim.8743
Ghafouri-Fard, S., Taheri, M., Omrani, M.D., Daaee, A., Mohammad-Rahimi, H., Kazazi, H.: Application of single-nucleotide polymorphisms in the diagnosis of autism spectrum disorders: a preliminary study with artificial neural networks. J. Mol. Neurosci. 68(4), 515–521 (2019). https://doi.org/10.1007/s12031-019-01311-1
Guo, X., Yu, N., Gu, F., Ding, X., Wang, J., Pan, Y.: Genome-wide interaction-based association of human diseases-a survey. Tsinghua Sci. Technol. 19(6), 596–616 (2014)
Bush, W.S.: Genome-wide association studies. In: Ranganathan, S., Gribskov, M., Nakai, K., Schönbach, C. (eds.) Encyclopedia of Bioinformatics and Computational Biology, pp. 235-241. Academic Press, Oxford (2019)
Clarke, G., Anderson, C., Pettersson, F., Cardon, L., Morris, A., Zondervan, K.: Basic statistical analysis in genetic case-control studies. Nat. Protocols 6(2), 121–133 (2011). https://doi.org/10.1038/nprot.2010.182
Pearson, T.A., Manolio, T.A.: How to interpret a genome-wide association study. JAMA 299(11), 1335–1344 (2008)
Witten, I.H., Frank, E., Hall, M.A.: Chapter 1 - what’s it all about? In: Witten, I.H., Frank, E., Hall, M.A. (eds.) Data Mining: Practical Machine Learning Tools and Techniques (Third Edition), pp. 3–38. Morgan Kaufmann, Boston (2011)
Lin, E., et al.: A deep learning approach for predicting antidepressant response in major depression using clinical and genetic biomarkers. Front Psychiatry 9 (2018). https://doi.org/10.3389/fpsyt.2018.00290, (in eng)
Okser, S., Pahikkala, T., Airola, A., Salakoski, T., Ripatti, S., Aittokallio, T.: Regularized machine learning in the genetic prediction of complex traits. PLoS Genet. 10(11), e1004754 (2014)
Emre Celebi, M., Aydin, K. (eds.): Unsupervised learning algorithms. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-24211-8
Lopez, C., Tucker, S., Salameh, T., Tucker, C.: An unsupervised machine learning method for discovering patient clusters based on genetic signatures. J. Biomed. Inf. 85, 30–39 (2018). https://doi.org/10.1016/j.jbi.2018.07.004
Vivian-Griffiths, T., et al.: Predictive modeling of schizophrenia from genomic data: Comparison of polygenic risk score with kernel support vector machines approach. Am. J. Med. Genet. B Neuropsychiatr. Genet. 180(1), 80–85 (2019)
Laksshman, S., Bhat, R.R., Viswanath, V., Li, X.: DeepBipolar: Identifying genomic mutations for bipolar disorder via deep learning. Hum. Mutat. 38(9), 1217–1224 (2017)
Yang, J., et al.: Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42(7), 565–569 (2010)
Alipanahi, B., Delong, A., Weirauch, M.T., Frey, B.J.: Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat. Biotechnol. 33(8), 831–838 (2015)
Zhou, J., Troyanskaya, O.G.: Predicting effects of noncoding variants with deep learning–based sequence model. Nat. Methods 12(10), 931–934 (2015)
Scholz, M., Kaplan, F., Guy, C.L., Kopka, J., Selbig, J.: Non-linear PCA: a missing data approach. Bioinformatics 21(20), 3887–3895 (2005)
Yoon, K., Kwek, S.: An unsupervised learning approach to resolving the data imbalanced issue in supervised learning problems in functional genomics. In: Fifth International Conference on Hybrid Intelligent Systems (HIS 2005), p. 6. IEEE (2005)
Webster, J.A., et al.: Genetic control of human brain transcript expression in Alzheimer disease (in eng). Am. J. Hum. Genet. 84(4), 445–458 (2009). https://doi.org/10.1016/j.ajhg.2009.03.011
Purcell, S., et al.: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Gen. 81(3), 559–575 (2007). https://doi.org/10.1086/519795
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Cook, J., Mahajan, A., Morris, A.: Guidance for the utility of linear models in meta-analysis of genetic association studies of binary phenotypes. Eur. J. Hum. Gen. 25(2), 240–245 (2016). https://doi.org/10.1038/ejhg.2016.150
Chang, M., He, L., Cai, L.: An overview of genome-wide association studies. In: Huang, Tao (ed.) Computational Systems Biology. MMB, vol. 1754, pp. 97–108. Springer, New York (2018). https://doi.org/10.1007/978-1-4939-7717-8_6
Curbelo, C., et al.: SAERMA: stacked autoencoder rule mining algorithm for the interpretation of epistatic interactions in GWAS for extreme obesity. IEEE Access 8, 112379–112392 (2020). https://doi.org/10.1109/ACCESS.2020.3002923
Fergus, P., Montanez, C.C., Abdulaimma, B., Lisboa, P., Chalmers, C., Pineles, B.: Utilizing deep learning and genome wide association studies for epistatic-driven preterm birth classification in African-American women. IEEE/ACM Trans. Comput. Biol. Bioinf. 17(2), 668–678 (2020). Art no. 8454302, https://doi.org/10.1109/TCBB.2018.2868667
Aggarwal, C.C.: Neural networks and deep learning. Springer 10, 978–983 (2018)
Yamashita, R., Nishio, M., Do, R.K.G., Togashi, K.: Convolutional neural networks: an overview and application in radiology. Insights Imag. 9(4), 611–629 (2018). https://doi.org/10.1007/s13244-018-0639-9
Bush, W., Moore, J.: Chapter 11: genome-wide association studies. PLoS Comput. Biol. 8(12), e1002822 (2012). https://doi.org/10.1371/journal.pcbi.1002822
Yin, B., et al.: Using the structure of genome data in the design of deep neural networks for predicting amyotrophic lateral sclerosis from genotype (in eng). Bioinformatics 35(14), i538–i547 (2019). https://doi.org/10.1093/bioinformatics/btz369
Sharma, P., Singh, A.: Era of deep neural networks: a review. In: 2017 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT), 3–5 July 2017, pp. 1–5 (2017). https://doi.org/10.1109/ICCCNT.2017.8203938.
Romero-Rosales, B.-L., Tamez-Pena, J.-G., Nicolini, H., Moreno-Treviño, M.-G., Trevino, V.: Improving predictive models for Alzheimer’s disease using GWAS data by incorporating misclassified samples modeling. PloS One 15(4), e0232103 (2020). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7179850/pdf/pone.0232103.pdf
Jansen, I.E., et al.: Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat. Gen. 51(3), 404–413 (2019). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6836675/pdf/nihms-1031924.pdf
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Alatrany, A., Hussain, A., Mustafina, J., Al-Jumeily, D. (2021). A Novel Hybrid Machine Learning Approach Using Deep Learning for the Prediction of Alzheimer Disease Using Genome Data. In: Huang, DS., Jo, KH., Li, J., Gribova, V., Premaratne, P. (eds) Intelligent Computing Theories and Application. ICIC 2021. Lecture Notes in Computer Science(), vol 12838. Springer, Cham. https://doi.org/10.1007/978-3-030-84532-2_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-84532-2_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-84531-5
Online ISBN: 978-3-030-84532-2
eBook Packages: Computer ScienceComputer Science (R0)