Abstract
Machine learning is a discipline of artificial intelligence, geared towards the development of various critical applications. Due to its high precision, it is widely adopted in the process of extracting useful hidden patterns and valuable insights from complex data structures. Data extracted from the real-time environment might contain some irrelevant information. The presence of noise in the data degrades the model performance. Gene expression is an important source, carries the genetic information of species. Gene expression pattern reveals the significant relationship between genes associated with several diseases. But due to irregular molecular interactions and reactions occurs during the transcription process, the gene expressions are minimally affected. It causes a detrimental effect on the identification of biological markers of the diseases. To address this problem, a novel gene selection strategy is proposed to identify the candidate gene biomarkers from the genomic data. Signal to Noise ratio with logistic sigmoid function, Hilbert–Schmidt Independence Criterion Lasso, and regularized genetic algorithm amalgamation finds the optimal features. The proposed system is tested with the microarray gene expression dataset of autism spectrum disorder (ASD), accessed from gene expression omnibus repository. FAM104B, CCNDBP1, H1F0, ZER1 are identified as the candidate biomarkers of ASD. The methodical performance evaluation of the proposed model is examined with widely used machine learning algorithms. The proposed methodology enhanced the prediction rate of ASD and attained an accuracy of 97.62%, outperformed existing methods. Also, this system could act as a significant tool to assist the medical practitioners for accurate ASD diagnosis.
Similar content being viewed by others
References
Alshamlan H, Badr G, Alohali Y (2015) MRMR-Abc: a hybrid gene selection algorithm for cancer classification using microarray gene expression profiling. Biomed Res Int 2015:604910. https://doi.org/10.1155/2015/604910
Arunkumar C, Ramakrishnan S (2018) Attribute selection using fuzzy roughset based customized similarity measure for lung cancer microarray gene expression data. Future Comput Inform J 3(1):131–142
Barati M, Ebrahimi M (2016) Identification of genes involved in the early stages of Alzheimer disease using a neural network algorithm. Gene Cell Tissue 3(3):e38415. https://doi.org/10.17795/gct-38415.
Bennet J, Arul Ganaprakasam C, Arputharaj K (2014) A discrete wavelet based feature extraction and hybrid classification technique for microarray data analysis. Sci World J 2014:195470. https://doi.org/10.1155/2014/195470
Bennet J, Ganaprakasam C, Kumar N (2015) A hybrid approach for gene selection and classification using support vector machine. Int Arab J Inf Technol (IAJIT) 12:695–700
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28
Chen K-H, Wang K-J, Tsai M-L, Wang K-M, Adrian AM, Cheng W-C, Yang T-S, Teng N-C, Tan K-P, Chang K-S (2014) Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm. BMC Bioinform 15(1):49
Climente-González H, Azencott C-A, Kaski S, Yamada M (2019) Block Hsic Lasso: model-free biomarker detection for ultra-high dimensional data. Bioinformatics 35(14):i427–i435
Collins FS, Morgan M, Patrinos A (2003) The human genome project: lessons from large-scale biology. Science 300(5617):286–290
Duda M, Ma R, Haber N, Wall DP (2016) Use of machine learning for behavioral distinction of autism and adhd. Transl Psychiatry 6(2):e732
Edgar R, Domrachev M, Lash AE (2002) Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30(1):207–210
Eiben AE, Schippers CA (1998) On evolutionary exploration and exploitation. Fundam Inform 35(1–4):35–50
Faras H, Ateeqi NA, Tidmarsh L (2010) Autism spectrum disorders. Ann Saudi Med 30(4):295–300
Gök M (2019) A novel machine learning model to predict autism spectrum disorders risk gene. Neural Comput Appl 31(10):6711–6717
Goldberg DE, Holland JH (1988) Genetic algorithms and machine learning. Mach Learn 3(2):95–99
Gour DK, Jain YK, Pandey GS (2011) The classification of cancer gene using hybrid method of machine learning. Int J Adv Res Comput Sci 2(2)
Gunavathi C, Premalatha K (2015) Cuckoo search optimisation for feature selection in cancer classification: a new approach. Int J Data Min Bioinform 13(3):248–265
Hallmayer J, Cleveland S, Torres A, Phillips J, Cohen B, Torigoe T, Miller J et al (2011) Genetic heritability and shared environmental factors among twin pairs with autism. Arch Gen Psychiatry 68(11):1095–1102
Hameed SS, Hassan R, Muhammad FF (2017) Selection and classification of gene expression in autism disorder: use of a combination of statistical filters and a Gbpso-Svm algorithm. PLoS ONE 12(11):e0187371
Kalaiselvi N, Inbarani HH (2013) Fuzzy soft set based classification for gene expression data. arXiv Preprint arXiv:1301.1502
Karthik S, Perumal RS, Mouli PC (2018) Breast cancer classification using deep neural networks. In: Knowledge computing and its applications. Springer, pp 227–241
Karthik S, Sudha M (2020) Predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes using deep neural network. In: Evolutionary intelligence. Springer, pp 1–16
Khalili M, Majd HA, Khodakarim S, Ahadi B, Hamidpour M (2016) Prediction of the thromboembolic syndrome: an application of artificial neural networks in gene expression data analysis. J Paramed Sci 7(2):15–22
Kolch W, Fey D (2017) Personalized computational models as biomarkers. J Pers Med 7(3):9
Kumar A, Singh TR (2018) Computational mining of genomic and proteomic data to gain insight for Alzheimer’s disease (Ad)
Kuwano Y, Kamio Y, Kawai T, Katsuura S, Inada N, Takaki A, Rokutan K (2011) Autism-associated gene expression in peripheral leucocytes commonly observed between subjects with autism and healthy women having autistic children. PLoS ONE 6(9):e24723
Leyfer OT, Folstein SE, Bacalman S, Davis NO, Dinh E, Morgan J, Tager-Flusberg H, Lainhart JE (2006) Comorbid psychiatric disorders in children with autism: interview development and rates of disorders. J Autism Dev Disord 36(7):849–861
Liu Q, Sung AH, Chen Z, Liu J, Chen L, Qiao M, Wang Z, Huang X, Deng Y (2011) Gene selection and classification for cancer microarray data based on machine learning and similarity measures. BMC Genom 12(5):S1
Logotheti M, Pilalis E, Venizelos N, Kolisis F, Chatziioannou A (2016) Studying microarray gene expression data of schizophrenic patients for derivation of a diagnostic signature through the aid of machine learning. Biometr Biostat Int J 4(5):00106
López-González K, Dávila C (2017) Predicting survivability using breast cancer subtype with transcriptomic profiles. In: IIE annual conference. Proceedings. Institute of Industrial; Systems Engineers (IISE), pp 1406–1411
McKenna MT, Weis JA, Brock A, Quaranta V, Yankeelov TE (2018) Precision medicine with imprecise therapy: computational modeling for chemotherapy in breast cancer. Transl Oncol 11(3):732–742
Motieghader H, Najafi A, Sadeghi B, Masoudi-Nejad A (2017) A hybrid gene selection algorithm for microarray cancer classification using genetic algorithm and learning automata. Inform Med Unlocked 9:246–254
Nanda SJ, Panda G (2014) A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm Evol Comput 16:1–18
Nandagopal V, Geeitha S, Vinoth Kumar K, Anbarasi J (2019) Feasible analysis of gene expression—a computational based classification for breast cancer. Measurement 140:120–125
Neelima E, Prasad Babu MS (2017) Optimizing genome features using T-test to classify the gene expressions as coronary artery disease prone and salubrious. J Theor Appl Inf Technol 95(16)
Oh DH, Kim IB, Kim SH, Ahn DH (2017) Predicting autism spectrum disorder using blood-based gene expression signatures and machine learning. Clin Psychopharmacol Neurosci 15(1):47
Oztan O, Jackson LP, Libove RA, Sumiyoshi RD, Phillips JM, Garner JP, Hardan AY, Parker KJ (2018) Biomarker discovery for disease status and symptom severity in children with autism. Psychoneuroendocrinology 89:39–45
Ritchie ME, Phipson B, Di Wu, Yifang Hu, Law CW, Shi W, Smyth GK (2015) Limma powers differential expression analyses for Rna-sequencing and microarray studies. Nucleic Acids Res 43(7):e47–e47
Scheubert L, Luštrek M, Schmidt R, Repsilber D, Fuellen G (2012) Tissue-based Alzheimer gene expression markers-comparison of multiple machine learning approaches and investigation of redundancy in small biomarker sets. BMC Bioinform 13(1):266
Sekaran K, Sudha M (2020) Predicting drug responsiveness with deep learning from the effects on gene expression of obsessive-compulsive disorder affected cases. Comput Commun 151:386–394
Sharbaf FV, Mosafer S, Moattar MH (2016) A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization. Genomics 107(6):231–238
Sharma N, Anpalagan A, Obaidat MS (2015) Evolutionary algorithms for wireless network resource allocation. In: Modeling and simulation of computer networks and systems. Elsevier, pp. 629–52
Srinivas M, Patnaik LM (1994) Adaptive probabilities of crossover and mutation in genetic algorithms. IEEE Trans Syst Man Cybern 24(4):656–667
Stevens E, Dixon DR, Novack MN, Granpeesheh D, Smith T, Linstead E (2019) Identification and analysis of behavioral phenotypes in autism spectrum disorder via unsupervised machine learning. Int J Med Inform 129:29–36
Sudha M (2017) Evolutionary and neural computing based decision support system for disease diagnosis from clinical data sets in medical practice. J Med Syst 41(11):178
Tajari H, Beigy H (2012) Gene expression based classification using iterative transductive support vector machine. Int J Mach Learn Comput 2(1):76
Vanitha CD, Arockia DD, Venkatesulu M (2015) Gene expression data classification using support vector machine and mutual information-based gene selection. Procedia Comput Sci 47:13–21
Wang F, Chawla S, Liu W (2013) Tikhonov or Lasso regularization: which is better and when. In: 2013 IEEE 25th international conference on tools with artificial intelligence. IEEE, pp. 795–802
Warde-Farley D, Donaldson SL, Comes O, Zuberi K, Badrawi R, Chao P, Franz M et al (2010) The genemania prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res 38(suppl_2):W214–W220
Wei X, Jiang F, Wei F, Zhang J, Liao W, Cheng S (2017) An ensemble model for diabetes diagnosis in large-scale and imbalanced dataset. In: Proceedings of the computing frontiers conference. ACM, pp. 71–78
Yamada M, Jitkrittum W, Sigal L, Xing EP, Sugiyama M (2014) High-dimensional feature selection by feature-wise kernelized Lasso. Neural Comput 26(1):185–207
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sekaran, K., Sudha, M. Predicting autism spectrum disorder from associative genetic markers of phenotypic groups using machine learning. J Ambient Intell Human Comput 12, 3257–3270 (2021). https://doi.org/10.1007/s12652-020-02155-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-020-02155-z