Skip to main content

Advertisement

Log in

An Ensemble Machine Learning Method Highlights Possible Parkinson’s Disease Genes and Accessing Performance of Re-sampling Techniques

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Identification of genes that lead other genes towards disease with neurological disorders like Parkinson's disease (PD) is an important factor in biomedical research. Machine learning techniques have been extensively used in recent years for effective identification of genes associated with the disease. However, the data used in these methods were based on protein–protein interactions, gene expression, and gene ontology. These data may contain incomplete previous knowledge that is used to construct features for each gene. Therefore, in this study, the physicochemical properties of amino acid as a universal knowledge are used to extract features from the sequences. Also, the several machine learning models are used to classify genes associated with PD. In this study, the ensemble method is designed in such a way, so as to improve the diagnosis accuracy based on top four highest performing classifiers. The comparative analysis reveals that gradient boosting performs better having accuracy of 77.50% and area under curve of 0.774 with respect to other six methods. However, ensemble method achieves an accuracy of 83.75%. Ensemble method is evaluated against existing disease gene identification methods; the results suggest that this approach is more accurate and effective for identification of PD genes. Re-sampling techniques for resolving class imbalance issues have been shown to increase classification accuracy by reducing the bias introduced by class size differences. The proposed model can also be used as a prediction tool for diagnosis Alzheimer’s disease protein sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availabilty

Data will be made available on request.

References

  1. Ala U, Piro RM, Grassi E, et al. Prediction of human disease genes by human-mouse conserved coexpression analysis. PLoS Comput Biol. 2008;4: e1000043.

    Article  MathSciNet  Google Scholar 

  2. Navlakha S, Kingsford C. The power of protein interaction networks for associating genes with diseases. Bioinformatics. 2010;26:1057–63.

    Article  Google Scholar 

  3. Freudenberg J, Propping P. A similarity-based method for genome-wide prediction of disease-relevant human genes. Bioinformatics. 2002;18(suppl_2):S110–5.

    Article  Google Scholar 

  4. Xu J, Li Y. Discovering disease-genes by topological features in human protein–protein interaction network. Bioinformatics. 2006;22:2800–5.

    Article  Google Scholar 

  5. Das R. A comparison of multiple classification methods for diagnosis of Parkinson disease. Expert Syst Appl. 2010;37:1568–72.

    Article  Google Scholar 

  6. Chen HL, Huang CC, Yu XG, et al. An efficient diagnosis system for detection of Parkinson’s disease using fuzzy k-nearest neighbor approach. Expert Syst Appl. 2013;40:263–71.

    Article  Google Scholar 

  7. Little MA, McSharry PE, Hunter EJ, Spielman J, Ramig LO. Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. IEEE Trans Biomed Eng. 2009;56:1015–22.

    Article  Google Scholar 

  8. Aström F, Koker R. A parallel neural network approach to prediction of Parkinson’s disease. Expert Syst Appl. 2011;38:12470–4.

    Article  Google Scholar 

  9. Nilashi M, Ibrahim OB, Ahmadi H, Shahmoradi L. An analytical method for diseases prediction using machine learning techniques. Comput Chem Eng. 2017;106:212–23.

    Article  Google Scholar 

  10. Ozcift A. SVM feature selection based rotation forest ensemble classifiers to improve computer-aided diagnosis of Parkinson disease. J Med Syst. 2012;36:2141–7.

    Article  Google Scholar 

  11. Smalter A, Lei SF, Chen XW. Human disease-gene classification with integrative sequence-based and topological features of protein-protein interaction networks. In: 2007 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2007). IEEE; 2007. p. 209–16.

  12. Yang P, Li XL, Mei JP, Kwoh CK, Ng SK. Positive-unlabeled learning for disease gene identification. Bioinformatics. 2012;28:2640–7.

    Article  Google Scholar 

  13. Mordelet F, Vert JP. ProDiGe: prioritization of disease genes with multitask machine learning from positive and unlabelled examples. BMC Bioinformatics. 2011;12(1):389.

    Article  Google Scholar 

  14. Yousef A, Moghadam CN. A novel method based on physicochemical properties of amino acids and one class classification algorithm for disease gene identification. J Biomed Inform. 2015;56:300–6.

    Article  Google Scholar 

  15. Xiao Y, Wu J, Lin Z, Zhao X. A deep learning-based multi-model ensemble method for cancer prediction. Comput Methods Progr Biomed. 2018;153:1–9.

    Article  Google Scholar 

  16. Friedman JH. Stochastic gradient boosting. Comput Stat Data Anal. 2002;38(4):367–78.

    Article  MathSciNet  Google Scholar 

  17. Ozcift A, Gulten A. Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms. Comput Methods Programs Biomed. 2011;104(3):443–51.

    Article  Google Scholar 

  18. Jacob SG, Athilakshmi R. Extraction of protein sequence features for prediction of neuro-degenerative brain disorders: pioneering the CGAP database. In: Proceedings of the International Conference on Informatics and Analytics, 2016, p. 1–4.

  19. Radivojac P, Peng K, Clark WT, Peters BJ, Mohan A, Boyle SM, Mooney SD. An integrated approach to inferring gene–disease associations in humans. Proteins Struct Funct Bioinform. 2008;72(3):1030–7.

    Article  Google Scholar 

  20. Yang P, Li X, Chua HN, Kwoh CK, Ng SK. Ensemble positive unlabeled learning for disease gene identification. PLoS ONE. 2014;9(5): e97079.

    Article  Google Scholar 

  21. Yousef A, Charkari NM. A novel method based on physicochemical properties of amino acids and one class classification algorithm for disease gene identification. J Biomed Inform. 2015;56:300–306.

    Article  Google Scholar 

  22. Universal Protein Resource. Available: http://www.uniprot.org.

  23. NCBI. https://www.ncbi.nlm.nih.gov/.

  24. Simm S, Einloft J, Mirus O, Schleiff E. 50 years of amino acid hydrophobicity scales: revisiting the capacity for peptide classification. Biol Res. 2016;49(1):31.

    Article  Google Scholar 

  25. Carugo O. Amino acid composition and protein dimension. Protein Sci. 2008;17(12):2187–91.

    Article  Google Scholar 

  26. Jowkar G, Eghbal GM. Perceptron ensemble of graph-based positive-unlabeled learning for disease gene identification. Computational biology and chemistry. 2016;64:263–70.

    Article  MathSciNet  Google Scholar 

  27. Cui Y, Cai M, Dai Y, Stanley HE. A hybrid network-based method for the detection of disease-related genes. Physica A. 2018;492:389–94.

    Article  MathSciNet  Google Scholar 

  28. Arora P, Mishra A, Malhi A. N-semble-based method for identifying Parkinson’s disease genes. Neural Comput Appl. 2023;35(33):23829–39.

    Article  Google Scholar 

  29. Signol F, Arnal L, Navarro-Cerdán JR, Llobet R, Arlandis J, Perez-Cortes JC. SEQENS: an ensemble method for relevant gene identification in microarray data. Comput Biol Med. 2023;152: 106413.

    Article  Google Scholar 

  30. Leo B. Random forests. Mach Learn. 2001;45(1):5–32.

    Article  Google Scholar 

  31. Wu CC, Yeh WC, Hsu WD, Islam MM, Nguyen PAA, Poly TN, Wang YC, Yang HC, Li YCJ. Prediction of fatty liver disease using machine learning algorithms. Comput Methods Progr Biomed. 2019;170:23–9.

    Article  Google Scholar 

  32. Kaur S, Gupta S, Singh S, Gupta I. Detection of Alzheimer’s disease using deep convolutional neural network. Int J Image Graph. 2022;22(03):2140012.

    Article  Google Scholar 

  33. Kumar M, Bajaj K, Sharma B, Narang S. A comparative performance assessment of optimized multilevel ensemble learning model with existing classifier models. Big Data. 2022;10(5):371–87.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Priya Arora.

Ethics declarations

Conflict of interest

None.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Arora, P., Mishra, A. & Malhi, A. An Ensemble Machine Learning Method Highlights Possible Parkinson’s Disease Genes and Accessing Performance of Re-sampling Techniques. SN COMPUT. SCI. 5, 483 (2024). https://doi.org/10.1007/s42979-024-02805-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-024-02805-5

Keywords

Navigation