Skip to main content

Advertisement

Log in

Predicting autism spectrum disorder from associative genetic markers of phenotypic groups using machine learning

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Machine learning is a discipline of artificial intelligence, geared towards the development of various critical applications. Due to its high precision, it is widely adopted in the process of extracting useful hidden patterns and valuable insights from complex data structures. Data extracted from the real-time environment might contain some irrelevant information. The presence of noise in the data degrades the model performance. Gene expression is an important source, carries the genetic information of species. Gene expression pattern reveals the significant relationship between genes associated with several diseases. But due to irregular molecular interactions and reactions occurs during the transcription process, the gene expressions are minimally affected. It causes a detrimental effect on the identification of biological markers of the diseases. To address this problem, a novel gene selection strategy is proposed to identify the candidate gene biomarkers from the genomic data. Signal to Noise ratio with logistic sigmoid function, Hilbert–Schmidt Independence Criterion Lasso, and regularized genetic algorithm amalgamation finds the optimal features. The proposed system is tested with the microarray gene expression dataset of autism spectrum disorder (ASD), accessed from gene expression omnibus repository. FAM104B, CCNDBP1, H1F0, ZER1 are identified as the candidate biomarkers of ASD. The methodical performance evaluation of the proposed model is examined with widely used machine learning algorithms. The proposed methodology enhanced the prediction rate of ASD and attained an accuracy of 97.62%, outperformed existing methods. Also, this system could act as a significant tool to assist the medical practitioners for accurate ASD diagnosis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Alshamlan H, Badr G, Alohali Y (2015) MRMR-Abc: a hybrid gene selection algorithm for cancer classification using microarray gene expression profiling. Biomed Res Int 2015:604910. https://doi.org/10.1155/2015/604910

    Article  Google Scholar 

  • Arunkumar C, Ramakrishnan S (2018) Attribute selection using fuzzy roughset based customized similarity measure for lung cancer microarray gene expression data. Future Comput Inform J 3(1):131–142

    Article  Google Scholar 

  • Barati M, Ebrahimi M (2016) Identification of genes involved in the early stages of Alzheimer disease using a neural network algorithm. Gene Cell Tissue 3(3):e38415. https://doi.org/10.17795/gct-38415.

    Article  Google Scholar 

  • Bennet J, Arul Ganaprakasam C, Arputharaj K (2014) A discrete wavelet based feature extraction and hybrid classification technique for microarray data analysis. Sci World J 2014:195470. https://doi.org/10.1155/2014/195470

    Article  Google Scholar 

  • Bennet J, Ganaprakasam C, Kumar N (2015) A hybrid approach for gene selection and classification using support vector machine. Int Arab J Inf Technol (IAJIT) 12:695–700

    Google Scholar 

  • Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28

    Article  Google Scholar 

  • Chen K-H, Wang K-J, Tsai M-L, Wang K-M, Adrian AM, Cheng W-C, Yang T-S, Teng N-C, Tan K-P, Chang K-S (2014) Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm. BMC Bioinform 15(1):49

    Article  Google Scholar 

  • Climente-González H, Azencott C-A, Kaski S, Yamada M (2019) Block Hsic Lasso: model-free biomarker detection for ultra-high dimensional data. Bioinformatics 35(14):i427–i435

    Article  Google Scholar 

  • Collins FS, Morgan M, Patrinos A (2003) The human genome project: lessons from large-scale biology. Science 300(5617):286–290

    Article  Google Scholar 

  • Duda M, Ma R, Haber N, Wall DP (2016) Use of machine learning for behavioral distinction of autism and adhd. Transl Psychiatry 6(2):e732

    Article  Google Scholar 

  • Edgar R, Domrachev M, Lash AE (2002) Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30(1):207–210

    Article  Google Scholar 

  • Eiben AE, Schippers CA (1998) On evolutionary exploration and exploitation. Fundam Inform 35(1–4):35–50

    Article  Google Scholar 

  • Faras H, Ateeqi NA, Tidmarsh L (2010) Autism spectrum disorders. Ann Saudi Med 30(4):295–300

    Article  Google Scholar 

  • Gök M (2019) A novel machine learning model to predict autism spectrum disorders risk gene. Neural Comput Appl 31(10):6711–6717

    Article  Google Scholar 

  • Goldberg DE, Holland JH (1988) Genetic algorithms and machine learning. Mach Learn 3(2):95–99

    Article  Google Scholar 

  • Gour DK, Jain YK, Pandey GS (2011) The classification of cancer gene using hybrid method of machine learning. Int J Adv Res Comput Sci 2(2)

  • Gunavathi C, Premalatha K (2015) Cuckoo search optimisation for feature selection in cancer classification: a new approach. Int J Data Min Bioinform 13(3):248–265

    Article  Google Scholar 

  • Hallmayer J, Cleveland S, Torres A, Phillips J, Cohen B, Torigoe T, Miller J et al (2011) Genetic heritability and shared environmental factors among twin pairs with autism. Arch Gen Psychiatry 68(11):1095–1102

    Article  Google Scholar 

  • Hameed SS, Hassan R, Muhammad FF (2017) Selection and classification of gene expression in autism disorder: use of a combination of statistical filters and a Gbpso-Svm algorithm. PLoS ONE 12(11):e0187371

    Article  Google Scholar 

  • Kalaiselvi N, Inbarani HH (2013) Fuzzy soft set based classification for gene expression data. arXiv Preprint arXiv:1301.1502

  • Karthik S, Perumal RS, Mouli PC (2018) Breast cancer classification using deep neural networks. In: Knowledge computing and its applications. Springer, pp 227–241

  • Karthik S, Sudha M (2020) Predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes using deep neural network. In: Evolutionary intelligence. Springer, pp 1–16

  • Khalili M, Majd HA, Khodakarim S, Ahadi B, Hamidpour M (2016) Prediction of the thromboembolic syndrome: an application of artificial neural networks in gene expression data analysis. J Paramed Sci 7(2):15–22

    Google Scholar 

  • Kolch W, Fey D (2017) Personalized computational models as biomarkers. J Pers Med 7(3):9

    Article  Google Scholar 

  • Kumar A, Singh TR (2018) Computational mining of genomic and proteomic data to gain insight for Alzheimer’s disease (Ad)

  • Kuwano Y, Kamio Y, Kawai T, Katsuura S, Inada N, Takaki A, Rokutan K (2011) Autism-associated gene expression in peripheral leucocytes commonly observed between subjects with autism and healthy women having autistic children. PLoS ONE 6(9):e24723

    Article  Google Scholar 

  • Leyfer OT, Folstein SE, Bacalman S, Davis NO, Dinh E, Morgan J, Tager-Flusberg H, Lainhart JE (2006) Comorbid psychiatric disorders in children with autism: interview development and rates of disorders. J Autism Dev Disord 36(7):849–861

    Article  Google Scholar 

  • Liu Q, Sung AH, Chen Z, Liu J, Chen L, Qiao M, Wang Z, Huang X, Deng Y (2011) Gene selection and classification for cancer microarray data based on machine learning and similarity measures. BMC Genom 12(5):S1

    Article  Google Scholar 

  • Logotheti M, Pilalis E, Venizelos N, Kolisis F, Chatziioannou A (2016) Studying microarray gene expression data of schizophrenic patients for derivation of a diagnostic signature through the aid of machine learning. Biometr Biostat Int J 4(5):00106

    Google Scholar 

  • López-González K, Dávila C (2017) Predicting survivability using breast cancer subtype with transcriptomic profiles. In: IIE annual conference. Proceedings. Institute of Industrial; Systems Engineers (IISE), pp 1406–1411

  • McKenna MT, Weis JA, Brock A, Quaranta V, Yankeelov TE (2018) Precision medicine with imprecise therapy: computational modeling for chemotherapy in breast cancer. Transl Oncol 11(3):732–742

    Article  Google Scholar 

  • Motieghader H, Najafi A, Sadeghi B, Masoudi-Nejad A (2017) A hybrid gene selection algorithm for microarray cancer classification using genetic algorithm and learning automata. Inform Med Unlocked 9:246–254

    Article  Google Scholar 

  • Nanda SJ, Panda G (2014) A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm Evol Comput 16:1–18

    Article  Google Scholar 

  • Nandagopal V, Geeitha S, Vinoth Kumar K, Anbarasi J (2019) Feasible analysis of gene expression—a computational based classification for breast cancer. Measurement 140:120–125

    Article  Google Scholar 

  • Neelima E, Prasad Babu MS (2017) Optimizing genome features using T-test to classify the gene expressions as coronary artery disease prone and salubrious. J Theor Appl Inf Technol 95(16)

  • Oh DH, Kim IB, Kim SH, Ahn DH (2017) Predicting autism spectrum disorder using blood-based gene expression signatures and machine learning. Clin Psychopharmacol Neurosci 15(1):47

    Article  Google Scholar 

  • Oztan O, Jackson LP, Libove RA, Sumiyoshi RD, Phillips JM, Garner JP, Hardan AY, Parker KJ (2018) Biomarker discovery for disease status and symptom severity in children with autism. Psychoneuroendocrinology 89:39–45

    Article  Google Scholar 

  • Ritchie ME, Phipson B, Di Wu, Yifang Hu, Law CW, Shi W, Smyth GK (2015) Limma powers differential expression analyses for Rna-sequencing and microarray studies. Nucleic Acids Res 43(7):e47–e47

    Article  Google Scholar 

  • Scheubert L, Luštrek M, Schmidt R, Repsilber D, Fuellen G (2012) Tissue-based Alzheimer gene expression markers-comparison of multiple machine learning approaches and investigation of redundancy in small biomarker sets. BMC Bioinform 13(1):266

    Article  Google Scholar 

  • Sekaran K, Sudha M (2020) Predicting drug responsiveness with deep learning from the effects on gene expression of obsessive-compulsive disorder affected cases. Comput Commun 151:386–394

    Article  Google Scholar 

  • Sharbaf FV, Mosafer S, Moattar MH (2016) A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization. Genomics 107(6):231–238

    Article  Google Scholar 

  • Sharma N, Anpalagan A, Obaidat MS (2015) Evolutionary algorithms for wireless network resource allocation. In: Modeling and simulation of computer networks and systems. Elsevier, pp. 629–52

  • Srinivas M, Patnaik LM (1994) Adaptive probabilities of crossover and mutation in genetic algorithms. IEEE Trans Syst Man Cybern 24(4):656–667

    Article  Google Scholar 

  • Stevens E, Dixon DR, Novack MN, Granpeesheh D, Smith T, Linstead E (2019) Identification and analysis of behavioral phenotypes in autism spectrum disorder via unsupervised machine learning. Int J Med Inform 129:29–36

    Article  Google Scholar 

  • Sudha M (2017) Evolutionary and neural computing based decision support system for disease diagnosis from clinical data sets in medical practice. J Med Syst 41(11):178

    Article  Google Scholar 

  • Tajari H, Beigy H (2012) Gene expression based classification using iterative transductive support vector machine. Int J Mach Learn Comput 2(1):76

    Article  Google Scholar 

  • Vanitha CD, Arockia DD, Venkatesulu M (2015) Gene expression data classification using support vector machine and mutual information-based gene selection. Procedia Comput Sci 47:13–21

    Article  Google Scholar 

  • Wang F, Chawla S, Liu W (2013) Tikhonov or Lasso regularization: which is better and when. In: 2013 IEEE 25th international conference on tools with artificial intelligence. IEEE, pp. 795–802

  • Warde-Farley D, Donaldson SL, Comes O, Zuberi K, Badrawi R, Chao P, Franz M et al (2010) The genemania prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res 38(suppl_2):W214–W220

    Article  Google Scholar 

  • Wei X, Jiang F, Wei F, Zhang J, Liao W, Cheng S (2017) An ensemble model for diabetes diagnosis in large-scale and imbalanced dataset. In: Proceedings of the computing frontiers conference. ACM, pp. 71–78

  • Yamada M, Jitkrittum W, Sigal L, Xing EP, Sugiyama M (2014) High-dimensional feature selection by feature-wise kernelized Lasso. Neural Comput 26(1):185–207

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Sudha.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sekaran, K., Sudha, M. Predicting autism spectrum disorder from associative genetic markers of phenotypic groups using machine learning. J Ambient Intell Human Comput 12, 3257–3270 (2021). https://doi.org/10.1007/s12652-020-02155-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-020-02155-z

Keywords

Navigation