Abstract
Parkinson’s Disease is the second most common neurogenerative disease that affects the nervous system. There is no permanent cure for this disease, so, its early diagnosis is important to improve the quality of living of Parkinson patients. The distortion of the voice is one of the first symptoms to appear in Parkinson patients. Therefore, comparison and classification plays an important role. In this paper, a comparison of various classification techniques is done to show the potential of each classifier. The various classification techniques include SVM (Linear, RBF, Polynomial), DT, RF, LR, KNN, NB, MLP, AdaBoost, and XGBoost. Three different types of feature selection techniques are also explored to reduce the dimensionality of the dataset without affecting the accuracy much. The three different feature selection techniques include mRMR, GA, and PCA. The potential of voice features in classification process is also shown.
Similar content being viewed by others
References
Abdi, H., Williams, L.J.: Principal component analysis. Wiley Interdiscip. Rev.: Comput. Stat. 2(4), 433–459 (2010)
Abrol, A., Rokham, H., Calhoun, V.D.: Diagnostic and prognostic classification of brain disorders using residual learning on structural MRI data. In: 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 4084–4088. IEEE (2019)
Bandini, A., Orlandi, S., Escalante, H.J., Giovannelli, F., Cincotta, M., Reyes-Garcia, C.A., Vanni, P., Zaccara, G., Manfredi, C.: Analysis of facial expressions in Parkinson’s disease through video-based automatic methods. J. Neurosci. Methods 281, 7–20 (2017)
Bayestehtashk, A., Asgari, M., Shafran, I., McNames, J.: Fully automated assessment of the severity of Parkinson’s disease from speech. Comput. Speech Lang. 29(1), 172–185 (2015)
Braga, D., Madureira, A.M., Coelho, L., Ajith, R.: Automatic detection of Parkinson’s disease based on acoustic analysis of speech. Eng. Appl. Artif. Intell. 77, 148–158 (2019)
Cai, J., Luo, J., Wang, S., Yang, S.: Feature selection in machine learning: a new perspective. Neurocomputing 300, 70–79 (2018)
Cai, Y., Huang, T., Hu, L., Shi, X., Xie, L., Li, Y.: Prediction of lysine ubiquitination with mRMR feature selection and analysis. Amino Acids 42(4), 1387–1395 (2012)
Cai, Z., Gu, J., Wen, C., Zhao, D., Huang, C., Huang, H., Tong, C., Li, J., Chen, H.: An intelligent Parkinson’s disease diagnostic system based on a chaotic bacterial foraging optimization enhanced fuzzy KNN approach. Comput. Math. Methods Med. (2018)
Chen, H.L., Wang, G., Ma, C., Cai, Z.N., Liu, W.B., Wang, S.J.: An efficient hybrid kernel extreme learning machine approach for early diagnosis of Parkinson’s disease. Neurocomputing 184, 131–144 (2016)
Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)
Cilia, N.D., De Stefano, C., Fontanella, F., di Freca, A.S.: A ranking-based feature selection approach for handwritten character recognition. Pattern Recogn. Lett. 121, 77–86 (2019)
Daneault, J.F., Lee, S.I., Golabchi, F.N., Patel, S., Shih, L.C., Paganoni, S., Bonato, P.: Estimating Bradykinesia in Parkinson’s disease with a minimum number of wearable sensors. In: Proceedings of the Second IEEE/ACM International Conference on Connected Health: Applications, Systems and Engineering Technologies, pp. 264–265. IEEE Press (2017)
De Rijk, Md, Launer, L., Berger, K., Breteler, M., Dartigues, J., Baldereschi, M., Fratiglioni, L., Lobo, A., Martinez-Lage, J., Trenkwalder, C., et al.: Prevalence of Parkinson’s disease in Europe: a collaborative study of population-based cohorts. Neurologic diseases in the elderly research group. Neurology 54(11 Suppl 5), S21–3 (2000)
Ertuğrul, Ö.F., Kaya, Y., Tekin, R., Almalı, M.N.: Detection of Parkinson’s disease by shifted one dimensional local binary patterns from gait. Expert Syst. Appl. 56, 156–163 (2016)
Fahn, S.: Description of Parkinson’s disease as a clinical syndrome. Ann. N. Y. Acad. Sci. 991(1), 1–14 (2003)
Gautam, R., Sharma, M.: Prevalence and diagnosis of neurological disorders using different deep learning techniques: a meta-analysis. J. Med. Syst. 44(2), 49 (2020)
Goetz, C.G., Poewe, W., Rascol, O., Sampaio, C., Stebbins, G.T., Counsell, C., Giladi, N., Holloway, R.G., Moore, C.G., Wenning, G.K., et al.: Movement disorder society task force report on the Hoehn and Yahr staging scale: status and recommendations the movement disorder society task force on rating scales for Parkinson’s disease. Mov. Disord. 19(9), 1020–1028 (2004)
Goetz, C.G., Fahn, S., Martinez-Martin, P., Poewe, W., Sampaio, C., Stebbins, G.T., Stern, M.B., Tilley, B.C., Dodel, R., Dubois, B., et al.: Movement disorder society-sponsored revision of the unified Parkinson’s disease rating scale (MDS-UPDRS): process, format, and clinimetric testing plan. Mov. Disord. 22(1), 41–47 (2007)
Goetz, C.G., Tilley, B.C., Shaftman, S.R., Stebbins, G.T., Fahn, S., Martinez-Martin, P., Poewe, W., Sampaio, C., Stern, M.B., Dodel, R., et al.: Movement disorder society-sponsored revision of the unified Parkinson’s disease rating scale (MDS-UPDRS): scale presentation and clinimetric testing results. Mov. Disord.: Off. J. Mov. Disord. Soc. 23(15), 2129–2170 (2008)
Gómez-Ríos, A., Luengo, J., Herrera, F.: A study on the noise label influence in boosting algorithms: AdaBoost, GBM and XGBoost. In: International Conference on Hybrid Artificial Intelligence Systems, pp. 268–280. Springer (2017)
Guha, R., Ghosh, M., Kapri, S., Shaw, S., Mutsuddi, S., Bhateja, V., Sarkar, R.: Deluge based genetic algorithm for feature selection. Evolut. Intell., 1–11 (2019)
Haq, A.U., Li, J., Memon, M.H., Khan, J., Din, S.U., Ahad, I., Sun, R., Lai, Z.: Comparative analysis of the classification performance of machine learning classifiers and deep neural network classifier for prediction of Parkinson disease. In: 2018 15th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), pp. 101–106. IEEE (2018)
Jankovic, J.: Parkinson’s disease: clinical features and diagnosis. J. Neurol. Neurosurg. Psychiatry 79(4), 368–376 (2008)
Jin, X., Ma, E.W., Cheng, L.L., Pecht, M.: Health monitoring of cooling fans based on Mahalanobis distance with mRMR feature selection. IEEE Trans. Instrum. Meas. 61(8), 2222–2229 (2012)
Kaur, P., Sharma, M.: Diagnosis of human psychological disorders using supervised learning and nature-inspired computing techniques: a meta-analysis. J. Med. Syst. 43(7), 204 (2019)
Kečo, D., Subasi, A., Kevric, J.: Cloud computing-based parallel genetic algorithm for gene selection in cancer classification. Neural Comput. Appl. 30(5), 1601–1610 (2018)
King, G., Zeng, L.: Logistic regression in rare events data. Polit. Anal. 9(2), 137–163 (2001)
Kotsavasiloglou, C., Kostikis, N., Hristu-Varsakelis, D., Arnaoutoglou, M.: Machine learning-based classification of simple drawing movements in Parkinson’s disease. Biomed. Signal Process. Control 31, 174–180 (2017)
Koutanaei, F.N., Sajedi, H., Khanbabaei, M.: A hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring. J. Retail. Consum. Serv. 27, 11–23 (2015)
Lahmiri, S., Shmuel, A.: Detection of Parkinson’s disease based on voice patterns ranking and optimized support vector machine. Biomed. Signal Process. Control 49, 427–433 (2019)
Lahmiri, S., Dawson, D.A., Shmuel, A.: Performance of machine learning methods in diagnosing Parkinson’s disease based on dysphonia measures. Biomed. Eng. Lett. 8(1), 29–39 (2018)
Lawson, R.A., Yarnall, A.J., Duncan, G.W., Breen, D.P., Khoo, T.K., Williams-Gray, C.H., Barker, R.A., Collerton, D., Taylor, J.P., Burn, D.J., et al.: Cognitive decline and quality of life in incident Parkinson’s disease: the role of attention. Parkinsonism Rel. Disord. 27, 47–53 (2016)
Leardi, R., Boggia, R., Terrile, M.: Genetic algorithms as a strategy for feature selection. J. Chemom. 6(5), 267–281 (1992)
Mostafa, S.A., Mustapha, A., Mohammed, M.A., Hamed, R.I., Arunkumar, N., Ghani, M.K.A., Jaber, M.M., Khaleefah, S.H.: Examining multiple feature evaluation and classification methods for improving the diagnosis of Parkinson’s disease. Cogn. Syst. Res. 54, 90–99 (2019)
Nielsen, A.N., Barch, D.M., Petersen, S.E., Schlaggar, B.L., Greene, D.J.: Machine learning with neuroimaging: evaluating its applications in psychiatry. Biol. Psychiatry: Cogn. Neurosci. Neuroimaging (2019)
Oung, Q.W., Muthusamy, H., Basah, S.N., Lee, H., Vijean, V.: Empirical wavelet transform based features for classification of Parkinson’s disease severity. J. Med. Syst. 42(2), 29 (2018)
Pal, M.: Random forest classifier for remote sensing classification. Int. J. Remote Sens. 26(1), 217–222 (2005)
Pal, S.K., Mitra, S.: Multilayer perceptron, fuzzy sets, and classification. IEEE Trans. Neural Netw. 3(5), 683–697 (1992)
Pape, K., Tamouza, R., Leboyer, M., Zipp, F.: Immunoneuropsychiatry—novel perspectives on brain disorders. Nat. Rev. Neurol. 15(6), 317–328 (2019)
Parisi, L., RaviChandran, N., Manaog, M.L.: Feature-driven machine learning to improve early diagnosis of Parkinson’s disease. Expert Syst. Appl. 110, 182–190 (2018)
Parkinson, J.: An essay on the shaking palsy. J. Neuropsychiatry Clin. Neurosci. 14(2), 223–236 (2002)
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 8, 1226–1238 (2005)
Pham, T.T., Moore, S.T., Lewis, S.J.G., Nguyen, D.N., Dutkiewicz, E., Fuglevand, A.J., McEwan, A.L., Leong, P.H.: Freezing of gait detection in Parkinson’s disease: a subject-independent detector using anomaly scores. IEEE Trans. Biomed. Eng. 64(11), 2719–2728 (2017)
Politis, M., Wu, K., Molloy, S., Bain, P.G., Chaudhuri, K.R., Piccini, P.: Parkinson’s disease symptoms: the patient’s perspective. Mov. Disord. 25(11), 1646–1651 (2010)
Pringsheim, T., Jette, N., Frolkis, A., Steeves, T.D.: The prevalence of Parkinson’s disease: a systematic review and meta-analysis. Mov. Disord. 29(13), 1583–1590 (2014)
Qiao, C., Lu, L., Yang, L., Kennedy, P.J.: Identifying brain abnormalities with schizophrenia based on a hybrid feature selection technology. Appl. Sci. 9(10), 2148 (2019)
Rajagopal, P.C., Choudhury, T., Sharma, A., Kumar, P.: Diagnosis of Parkinson’s diseases using classification based on voice recordings. In: Emerging Trends in Expert Applications and Security, pp. 575–581. Springer (2019)
Rish, I., et al.: An empirical study of the Naive Bayes classifier. In: IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, vol. 3, pp. 41–46 (2001)
Sakar, B.E., Isenkul, M.E., Sakar, C.O., Sertbas, A., Gurgen, F., Delil, S., Apaydin, H., Kursun, O.: Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings. IEEE J. Biomed. Health Inform. 17(4), 828–834 (2013)
Sakar, B.E., Serbes, G., Sakar, C.O.: Analyzing the effectiveness of vocal features in early telediagnosis of Parkinson’s disease. PLoS ONE 12(8), e0182428 (2017)
Sakar, C.O., Serbes, G., Gunduz, A., Tunc, H.C., Nizam, H., Sakar, B.E., Tutuncu, M., Aydin, T., Isenkul, M.E., Apaydin, H.: A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable q-factor wavelet transform. Appl. Soft Comput. 74, 255–263 (2019)
Scarpazza, C., Baecker, L., Vieira, S., Mechelli, A.: Applications of machine learning to brain disorders. In: Machine Learning, pp. 45–65. Elsevier (2020)
Schapire, R.E.: Explaining AdaBoost. In: Empirical Inference, pp. 37–52. Springer (2013)
Sharma, M., Romero, N.: Future prospective of soft computing techniques in psychiatric disorder diagnosis. EAI Endorsed Trans. Pervasive Health Technol. 4(15), e1 (2018)
Sharma, P., Sundaram, S., Sharma, M., Sharma, A., Gupta, D.: Diagnosis of Parkinson’s disease using modified grey wolf optimization. Cogn. Syst. Res. 54, 100–115 (2019)
Shukla, A.K., Singh, P., Vardhan, M.: Medical diagnosis of Parkinson disease driven by multiple preprocessing technique with Scarce Lee Silverman voice treatment data. In: Engineering Vibration, Communication and Information Processing, pp. 407–421. Springer (2019)
Thanawattano, C., Anan, C., Pongthornseri, R., Dumnin, S., Bhidayasiri, R.: Temporal fluctuation analysis of tremor signal in Parkinson’s disease and essential tremor subjects. In: 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 6054–6057. IEEE (2015)
von Campenhausen, S., Winter, Y., e Silva, A.R., Sampaio, C., Ruzicka, E., Barone, P., Poewe, W., Guekht, A., Mateus, C., Pfeiffer, K.P., et al.: Costs of illness and care in Parkinson’s disease: an evaluation in six countries. Eur. Neuropsychopharmacol. 21(2), 180–191 (2011)
Yang, J., Honavar, V.: Feature subset selection using a genetic algorithm. In: Feature Extraction, Construction and Selection, pp. 117–136. Springer (1998)
Yoon, H., Li, J.: A novel positive transfer learning approach for telemonitoring of Parkinson’s disease. IEEE Trans. Autom. Sci. Eng. 16(1), 180–191 (2018)
Zhang, A., San-Segundo, R., Panev, S., Tabor, G., Stebbins, K., Whitford, A., De la Torre, F., Hodgins, J.: Automated tremor detection in Parkinson’s disease using accelerometer signals. In: 2018 IEEE/ACM International Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE), pp. 13–14. IEEE (2018)
Zhang, S., Li, X., Zong, M., Zhu, X., Cheng, D.: Learning K for KNN classification. ACM Trans. Intell. Syst. Technol. (TIST) 8(3), 43 (2017)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
No conflicts of interest are declared related to the publication of this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Parameter selection for GA
Appendix A: Parameter selection for GA
Genetic Algorithm (GA) has four different input parameters which affects the performance of the system. These parameters include the number of generations, population size, crossover probability, and mutation probability. There is no direct formula to estimate the optimal size of these parameters. So, an experimental research design methodology is followed to tune these four input parameters.
1.1 Generation size
To select the optimal generation size, experiment has been done on 10, 20, 30, 40, 50 and 60 generations on the best performing classifier, i.e. XGBoost. The results are shown in Table 4
The maximum accuracy is 90.60% for dataset 1 with 20 generations and maximum accuracy is 77.23% for dataset 2 with 40 generations. So, to choose from 20 or 40 generations, other performance parameters are checked and the generation size is chosen as 20. Specificity with generation 20 is greater as compared to specificity with generation size 40. The computation time with 20 generations is also reduced as compared to computation time with 40 generations.
1.2 Population size
The experiment for population size is done with size as 10, 15, 20, 25, and 30. The results are calculated with classifier having maximum accuracy, i.e. XGBoost and optimal generation size, i.e. 20. The results are shown in Table 5.
It can be observed from the table that there is an increasing trend with an increase in population size. As the population size increases, accuracy also increases. Increasing the population size does not increase much computation time. We have experimented the population size till 30 but it can be extended depending upon the requirement of the system, i.e. whether accuracy is a major concern or a trade-off is required between accuracy and computation time.
1.3 Crossover probability
The experiments of crossover probability are done with a probability of 0.5, 0.7, and 1 with XGBoost classifier as it is having maximum performance. The optimal generation size of 20 and a population size of 30 is taken for experimentation. The results are discussed in Table 6.
It can be observed from the table that the crossover probability of 1 generates maximum accuracy. This means new generation children need to be crossover in every generation.
1.4 Mutation probability
The mutation probability is selected by experimenting with a probability of 0.1, 0.2, and 0.3. The experiments are done with best classifier (XGBoost), optimal generation size (20), optimal population size (30), and optimal crossover probability (1). The results are shown in Table 7.
It can be observed from the table that optimal mutation probability is 0.2 as it generates maximum accuracy of 91.40% with dataset 1 and 77.31% with dataset 2.
Rights and permissions
About this article
Cite this article
Goyal, J., Khandnor, P. & Aseri, T.C. A Comparative Analysis of Machine Learning classifiers for Dysphonia-based classification of Parkinson’s Disease. Int J Data Sci Anal 11, 69–83 (2021). https://doi.org/10.1007/s41060-020-00234-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41060-020-00234-0