Skip to main content
Log in

A Comparative Analysis of Machine Learning classifiers for Dysphonia-based classification of Parkinson’s Disease

  • Applications
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

Parkinson’s Disease is the second most common neurogenerative disease that affects the nervous system. There is no permanent cure for this disease, so, its early diagnosis is important to improve the quality of living of Parkinson patients. The distortion of the voice is one of the first symptoms to appear in Parkinson patients. Therefore, comparison and classification plays an important role. In this paper, a comparison of various classification techniques is done to show the potential of each classifier. The various classification techniques include SVM (Linear, RBF, Polynomial), DT, RF, LR, KNN, NB, MLP, AdaBoost, and XGBoost. Three different types of feature selection techniques are also explored to reduce the dimensionality of the dataset without affecting the accuracy much. The three different feature selection techniques include mRMR, GA, and PCA. The potential of voice features in classification process is also shown.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Abdi, H., Williams, L.J.: Principal component analysis. Wiley Interdiscip. Rev.: Comput. Stat. 2(4), 433–459 (2010)

    Google Scholar 

  2. Abrol, A., Rokham, H., Calhoun, V.D.: Diagnostic and prognostic classification of brain disorders using residual learning on structural MRI data. In: 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 4084–4088. IEEE (2019)

  3. Bandini, A., Orlandi, S., Escalante, H.J., Giovannelli, F., Cincotta, M., Reyes-Garcia, C.A., Vanni, P., Zaccara, G., Manfredi, C.: Analysis of facial expressions in Parkinson’s disease through video-based automatic methods. J. Neurosci. Methods 281, 7–20 (2017)

    Google Scholar 

  4. Bayestehtashk, A., Asgari, M., Shafran, I., McNames, J.: Fully automated assessment of the severity of Parkinson’s disease from speech. Comput. Speech Lang. 29(1), 172–185 (2015)

    Google Scholar 

  5. Braga, D., Madureira, A.M., Coelho, L., Ajith, R.: Automatic detection of Parkinson’s disease based on acoustic analysis of speech. Eng. Appl. Artif. Intell. 77, 148–158 (2019)

    Google Scholar 

  6. Cai, J., Luo, J., Wang, S., Yang, S.: Feature selection in machine learning: a new perspective. Neurocomputing 300, 70–79 (2018)

    Google Scholar 

  7. Cai, Y., Huang, T., Hu, L., Shi, X., Xie, L., Li, Y.: Prediction of lysine ubiquitination with mRMR feature selection and analysis. Amino Acids 42(4), 1387–1395 (2012)

    Google Scholar 

  8. Cai, Z., Gu, J., Wen, C., Zhao, D., Huang, C., Huang, H., Tong, C., Li, J., Chen, H.: An intelligent Parkinson’s disease diagnostic system based on a chaotic bacterial foraging optimization enhanced fuzzy KNN approach. Comput. Math. Methods Med. (2018)

  9. Chen, H.L., Wang, G., Ma, C., Cai, Z.N., Liu, W.B., Wang, S.J.: An efficient hybrid kernel extreme learning machine approach for early diagnosis of Parkinson’s disease. Neurocomputing 184, 131–144 (2016)

    Google Scholar 

  10. Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)

  11. Cilia, N.D., De Stefano, C., Fontanella, F., di Freca, A.S.: A ranking-based feature selection approach for handwritten character recognition. Pattern Recogn. Lett. 121, 77–86 (2019)

    Google Scholar 

  12. Daneault, J.F., Lee, S.I., Golabchi, F.N., Patel, S., Shih, L.C., Paganoni, S., Bonato, P.: Estimating Bradykinesia in Parkinson’s disease with a minimum number of wearable sensors. In: Proceedings of the Second IEEE/ACM International Conference on Connected Health: Applications, Systems and Engineering Technologies, pp. 264–265. IEEE Press (2017)

  13. De Rijk, Md, Launer, L., Berger, K., Breteler, M., Dartigues, J., Baldereschi, M., Fratiglioni, L., Lobo, A., Martinez-Lage, J., Trenkwalder, C., et al.: Prevalence of Parkinson’s disease in Europe: a collaborative study of population-based cohorts. Neurologic diseases in the elderly research group. Neurology 54(11 Suppl 5), S21–3 (2000)

    Google Scholar 

  14. Ertuğrul, Ö.F., Kaya, Y., Tekin, R., Almalı, M.N.: Detection of Parkinson’s disease by shifted one dimensional local binary patterns from gait. Expert Syst. Appl. 56, 156–163 (2016)

    Google Scholar 

  15. Fahn, S.: Description of Parkinson’s disease as a clinical syndrome. Ann. N. Y. Acad. Sci. 991(1), 1–14 (2003)

    Google Scholar 

  16. Gautam, R., Sharma, M.: Prevalence and diagnosis of neurological disorders using different deep learning techniques: a meta-analysis. J. Med. Syst. 44(2), 49 (2020)

    Google Scholar 

  17. Goetz, C.G., Poewe, W., Rascol, O., Sampaio, C., Stebbins, G.T., Counsell, C., Giladi, N., Holloway, R.G., Moore, C.G., Wenning, G.K., et al.: Movement disorder society task force report on the Hoehn and Yahr staging scale: status and recommendations the movement disorder society task force on rating scales for Parkinson’s disease. Mov. Disord. 19(9), 1020–1028 (2004)

    Google Scholar 

  18. Goetz, C.G., Fahn, S., Martinez-Martin, P., Poewe, W., Sampaio, C., Stebbins, G.T., Stern, M.B., Tilley, B.C., Dodel, R., Dubois, B., et al.: Movement disorder society-sponsored revision of the unified Parkinson’s disease rating scale (MDS-UPDRS): process, format, and clinimetric testing plan. Mov. Disord. 22(1), 41–47 (2007)

    Google Scholar 

  19. Goetz, C.G., Tilley, B.C., Shaftman, S.R., Stebbins, G.T., Fahn, S., Martinez-Martin, P., Poewe, W., Sampaio, C., Stern, M.B., Dodel, R., et al.: Movement disorder society-sponsored revision of the unified Parkinson’s disease rating scale (MDS-UPDRS): scale presentation and clinimetric testing results. Mov. Disord.: Off. J. Mov. Disord. Soc. 23(15), 2129–2170 (2008)

    Google Scholar 

  20. Gómez-Ríos, A., Luengo, J., Herrera, F.: A study on the noise label influence in boosting algorithms: AdaBoost, GBM and XGBoost. In: International Conference on Hybrid Artificial Intelligence Systems, pp. 268–280. Springer (2017)

  21. Guha, R., Ghosh, M., Kapri, S., Shaw, S., Mutsuddi, S., Bhateja, V., Sarkar, R.: Deluge based genetic algorithm for feature selection. Evolut. Intell., 1–11 (2019)

  22. Haq, A.U., Li, J., Memon, M.H., Khan, J., Din, S.U., Ahad, I., Sun, R., Lai, Z.: Comparative analysis of the classification performance of machine learning classifiers and deep neural network classifier for prediction of Parkinson disease. In: 2018 15th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), pp. 101–106. IEEE (2018)

  23. Jankovic, J.: Parkinson’s disease: clinical features and diagnosis. J. Neurol. Neurosurg. Psychiatry 79(4), 368–376 (2008)

    Google Scholar 

  24. Jin, X., Ma, E.W., Cheng, L.L., Pecht, M.: Health monitoring of cooling fans based on Mahalanobis distance with mRMR feature selection. IEEE Trans. Instrum. Meas. 61(8), 2222–2229 (2012)

    Google Scholar 

  25. Kaur, P., Sharma, M.: Diagnosis of human psychological disorders using supervised learning and nature-inspired computing techniques: a meta-analysis. J. Med. Syst. 43(7), 204 (2019)

    Google Scholar 

  26. Kečo, D., Subasi, A., Kevric, J.: Cloud computing-based parallel genetic algorithm for gene selection in cancer classification. Neural Comput. Appl. 30(5), 1601–1610 (2018)

    Google Scholar 

  27. King, G., Zeng, L.: Logistic regression in rare events data. Polit. Anal. 9(2), 137–163 (2001)

    Google Scholar 

  28. Kotsavasiloglou, C., Kostikis, N., Hristu-Varsakelis, D., Arnaoutoglou, M.: Machine learning-based classification of simple drawing movements in Parkinson’s disease. Biomed. Signal Process. Control 31, 174–180 (2017)

    Google Scholar 

  29. Koutanaei, F.N., Sajedi, H., Khanbabaei, M.: A hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring. J. Retail. Consum. Serv. 27, 11–23 (2015)

    Google Scholar 

  30. Lahmiri, S., Shmuel, A.: Detection of Parkinson’s disease based on voice patterns ranking and optimized support vector machine. Biomed. Signal Process. Control 49, 427–433 (2019)

    Google Scholar 

  31. Lahmiri, S., Dawson, D.A., Shmuel, A.: Performance of machine learning methods in diagnosing Parkinson’s disease based on dysphonia measures. Biomed. Eng. Lett. 8(1), 29–39 (2018)

    Google Scholar 

  32. Lawson, R.A., Yarnall, A.J., Duncan, G.W., Breen, D.P., Khoo, T.K., Williams-Gray, C.H., Barker, R.A., Collerton, D., Taylor, J.P., Burn, D.J., et al.: Cognitive decline and quality of life in incident Parkinson’s disease: the role of attention. Parkinsonism Rel. Disord. 27, 47–53 (2016)

    Google Scholar 

  33. Leardi, R., Boggia, R., Terrile, M.: Genetic algorithms as a strategy for feature selection. J. Chemom. 6(5), 267–281 (1992)

    Google Scholar 

  34. Mostafa, S.A., Mustapha, A., Mohammed, M.A., Hamed, R.I., Arunkumar, N., Ghani, M.K.A., Jaber, M.M., Khaleefah, S.H.: Examining multiple feature evaluation and classification methods for improving the diagnosis of Parkinson’s disease. Cogn. Syst. Res. 54, 90–99 (2019)

    Google Scholar 

  35. Nielsen, A.N., Barch, D.M., Petersen, S.E., Schlaggar, B.L., Greene, D.J.: Machine learning with neuroimaging: evaluating its applications in psychiatry. Biol. Psychiatry: Cogn. Neurosci. Neuroimaging (2019)

  36. Oung, Q.W., Muthusamy, H., Basah, S.N., Lee, H., Vijean, V.: Empirical wavelet transform based features for classification of Parkinson’s disease severity. J. Med. Syst. 42(2), 29 (2018)

    Google Scholar 

  37. Pal, M.: Random forest classifier for remote sensing classification. Int. J. Remote Sens. 26(1), 217–222 (2005)

    Google Scholar 

  38. Pal, S.K., Mitra, S.: Multilayer perceptron, fuzzy sets, and classification. IEEE Trans. Neural Netw. 3(5), 683–697 (1992)

    Google Scholar 

  39. Pape, K., Tamouza, R., Leboyer, M., Zipp, F.: Immunoneuropsychiatry—novel perspectives on brain disorders. Nat. Rev. Neurol. 15(6), 317–328 (2019)

    Google Scholar 

  40. Parisi, L., RaviChandran, N., Manaog, M.L.: Feature-driven machine learning to improve early diagnosis of Parkinson’s disease. Expert Syst. Appl. 110, 182–190 (2018)

    Google Scholar 

  41. Parkinson, J.: An essay on the shaking palsy. J. Neuropsychiatry Clin. Neurosci. 14(2), 223–236 (2002)

    Google Scholar 

  42. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 8, 1226–1238 (2005)

    Google Scholar 

  43. Pham, T.T., Moore, S.T., Lewis, S.J.G., Nguyen, D.N., Dutkiewicz, E., Fuglevand, A.J., McEwan, A.L., Leong, P.H.: Freezing of gait detection in Parkinson’s disease: a subject-independent detector using anomaly scores. IEEE Trans. Biomed. Eng. 64(11), 2719–2728 (2017)

    Google Scholar 

  44. Politis, M., Wu, K., Molloy, S., Bain, P.G., Chaudhuri, K.R., Piccini, P.: Parkinson’s disease symptoms: the patient’s perspective. Mov. Disord. 25(11), 1646–1651 (2010)

    Google Scholar 

  45. Pringsheim, T., Jette, N., Frolkis, A., Steeves, T.D.: The prevalence of Parkinson’s disease: a systematic review and meta-analysis. Mov. Disord. 29(13), 1583–1590 (2014)

    Google Scholar 

  46. Qiao, C., Lu, L., Yang, L., Kennedy, P.J.: Identifying brain abnormalities with schizophrenia based on a hybrid feature selection technology. Appl. Sci. 9(10), 2148 (2019)

    Google Scholar 

  47. Rajagopal, P.C., Choudhury, T., Sharma, A., Kumar, P.: Diagnosis of Parkinson’s diseases using classification based on voice recordings. In: Emerging Trends in Expert Applications and Security, pp. 575–581. Springer (2019)

  48. Rish, I., et al.: An empirical study of the Naive Bayes classifier. In: IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, vol. 3, pp. 41–46 (2001)

  49. Sakar, B.E., Isenkul, M.E., Sakar, C.O., Sertbas, A., Gurgen, F., Delil, S., Apaydin, H., Kursun, O.: Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings. IEEE J. Biomed. Health Inform. 17(4), 828–834 (2013)

    Google Scholar 

  50. Sakar, B.E., Serbes, G., Sakar, C.O.: Analyzing the effectiveness of vocal features in early telediagnosis of Parkinson’s disease. PLoS ONE 12(8), e0182428 (2017)

    Google Scholar 

  51. Sakar, C.O., Serbes, G., Gunduz, A., Tunc, H.C., Nizam, H., Sakar, B.E., Tutuncu, M., Aydin, T., Isenkul, M.E., Apaydin, H.: A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable q-factor wavelet transform. Appl. Soft Comput. 74, 255–263 (2019)

    Google Scholar 

  52. Scarpazza, C., Baecker, L., Vieira, S., Mechelli, A.: Applications of machine learning to brain disorders. In: Machine Learning, pp. 45–65. Elsevier (2020)

  53. Schapire, R.E.: Explaining AdaBoost. In: Empirical Inference, pp. 37–52. Springer (2013)

  54. Sharma, M., Romero, N.: Future prospective of soft computing techniques in psychiatric disorder diagnosis. EAI Endorsed Trans. Pervasive Health Technol. 4(15), e1 (2018)

    Google Scholar 

  55. Sharma, P., Sundaram, S., Sharma, M., Sharma, A., Gupta, D.: Diagnosis of Parkinson’s disease using modified grey wolf optimization. Cogn. Syst. Res. 54, 100–115 (2019)

    Google Scholar 

  56. Shukla, A.K., Singh, P., Vardhan, M.: Medical diagnosis of Parkinson disease driven by multiple preprocessing technique with Scarce Lee Silverman voice treatment data. In: Engineering Vibration, Communication and Information Processing, pp. 407–421. Springer (2019)

  57. Thanawattano, C., Anan, C., Pongthornseri, R., Dumnin, S., Bhidayasiri, R.: Temporal fluctuation analysis of tremor signal in Parkinson’s disease and essential tremor subjects. In: 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 6054–6057. IEEE (2015)

  58. von Campenhausen, S., Winter, Y., e Silva, A.R., Sampaio, C., Ruzicka, E., Barone, P., Poewe, W., Guekht, A., Mateus, C., Pfeiffer, K.P., et al.: Costs of illness and care in Parkinson’s disease: an evaluation in six countries. Eur. Neuropsychopharmacol. 21(2), 180–191 (2011)

    Google Scholar 

  59. Yang, J., Honavar, V.: Feature subset selection using a genetic algorithm. In: Feature Extraction, Construction and Selection, pp. 117–136. Springer (1998)

  60. Yoon, H., Li, J.: A novel positive transfer learning approach for telemonitoring of Parkinson’s disease. IEEE Trans. Autom. Sci. Eng. 16(1), 180–191 (2018)

    Google Scholar 

  61. Zhang, A., San-Segundo, R., Panev, S., Tabor, G., Stebbins, K., Whitford, A., De la Torre, F., Hodgins, J.: Automated tremor detection in Parkinson’s disease using accelerometer signals. In: 2018 IEEE/ACM International Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE), pp. 13–14. IEEE (2018)

  62. Zhang, S., Li, X., Zong, M., Zhu, X., Cheng, D.: Learning K for KNN classification. ACM Trans. Intell. Syst. Technol. (TIST) 8(3), 43 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinee Goyal.

Ethics declarations

Conflict of interest

No conflicts of interest are declared related to the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Parameter selection for GA

Appendix A: Parameter selection for GA

Genetic Algorithm (GA) has four different input parameters which affects the performance of the system. These parameters include the number of generations, population size, crossover probability, and mutation probability. There is no direct formula to estimate the optimal size of these parameters. So, an experimental research design methodology is followed to tune these four input parameters.

1.1 Generation size

To select the optimal generation size, experiment has been done on 10, 20, 30, 40, 50 and 60 generations on the best performing classifier, i.e. XGBoost. The results are shown in Table 4

Table 4 Generation size

The maximum accuracy is 90.60% for dataset 1 with 20 generations and maximum accuracy is 77.23% for dataset 2 with 40 generations. So, to choose from 20 or 40 generations, other performance parameters are checked and the generation size is chosen as 20. Specificity with generation 20 is greater as compared to specificity with generation size 40. The computation time with 20 generations is also reduced as compared to computation time with 40 generations.

1.2 Population size

The experiment for population size is done with size as 10, 15, 20, 25, and 30. The results are calculated with classifier having maximum accuracy, i.e. XGBoost and optimal generation size, i.e. 20. The results are shown in Table 5.

Table 5 Population size

It can be observed from the table that there is an increasing trend with an increase in population size. As the population size increases, accuracy also increases. Increasing the population size does not increase much computation time. We have experimented the population size till 30 but it can be extended depending upon the requirement of the system, i.e. whether accuracy is a major concern or a trade-off is required between accuracy and computation time.

1.3 Crossover probability

The experiments of crossover probability are done with a probability of 0.5, 0.7, and 1 with XGBoost classifier as it is having maximum performance. The optimal generation size of 20 and a population size of 30 is taken for experimentation. The results are discussed in Table 6.

Table 6 Crossover probability

It can be observed from the table that the crossover probability of 1 generates maximum accuracy. This means new generation children need to be crossover in every generation.

1.4 Mutation probability

The mutation probability is selected by experimenting with a probability of 0.1, 0.2, and 0.3. The experiments are done with best classifier (XGBoost), optimal generation size (20), optimal population size (30), and optimal crossover probability (1). The results are shown in Table 7.

Table 7 Mutation probability

It can be observed from the table that optimal mutation probability is 0.2 as it generates maximum accuracy of 91.40% with dataset 1 and 77.31% with dataset 2.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Goyal, J., Khandnor, P. & Aseri, T.C. A Comparative Analysis of Machine Learning classifiers for Dysphonia-based classification of Parkinson’s Disease. Int J Data Sci Anal 11, 69–83 (2021). https://doi.org/10.1007/s41060-020-00234-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41060-020-00234-0

Keywords

Navigation