Abstract
The aim of this study is the analysis of voice and speech recordings for the task of Parkinson’s disease detection. Voice modality corresponds to sustained phonation /a/ and speech modality to a short sentence in Lithuanian language. Diverse information from recordings is extracted by 22 well-known audio feature sets. Random forest is used as a learner, both for individual feature sets and for decision-level fusion. Essentia descriptors were found as the best individual feature set, achieving equal error rate of 16.3 % for voice and 13.3 % for speech. Fusion of feature sets and modalities improved detection and achieved equal error rate of 10.8 %. Variable importance in fusion revealed speech modality as more important than voice.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bogdanov, D., Wack, N., Gómez, E., Gulati, S., Herrera, P., Mayor, O., Roma, G., Salamon, J., Zapata, J., Serra, X.: Essentia: an audio analysis library for music information retrieval. In: International Society for Music Information Retrieval Conference (ISMIR), pp. 493–498. Curitiba, Brazil, 4–8 November 2013. http://essentia.upf.edu
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Brümmer, N., de Villiers, E.: The BOSARIS toolkit: Theory, algorithms and code for surviving the new DCF. arXiv 1304(2865v1), 1–23, Presented at the NIST SRE 2011 Analysis Workshop, Atlanta, December 2011. http://sites.google.com/site/bosaristoolkit/
Crysandt, H., Tummarello, G., Piazza, F.: MPEG-7 encoding and processing: MPEG7AUDIOENC + MPEG7AUDIODB. In: 3rd MUSICNETWORK Open Workshop: MPEG AHG on Music Notation Requirements. Munich, Germany, 13–14 March 2004. http://mpeg7audioenc.sf.net
de Rijk, M.C., Launer, L.J., Berger, K., Breteler, M.M.B., Dartigues, J.F., Baldereschi, M., Fratiglioni, L., Lobo, A., Martínez-Lage, J.M., Trenkwalder, C., Hofman, A.: Prevalence of Parkinson’s disease in Europe: a collaborative study of population-based cohorts. Neurology 54(11 Supply 5), S21–S23 (2000). Neurologic Diseases in the Elderly Research Group
Ellis, D.P.W.: PLP and RASTA (and MFCC, and inversion) in Matlab (2005). Matlab implementation of popular speech recognition feature extraction including MFCC and PLP (as defined by Hermansky and Morgan), http://www.ee.columbia.edu/~dpwe/resources/matlab/rastamat/, http://www.ee.columbia.edu/%7Edpwe/resources/matlab/rastamat/
Eyben, F., Weninger, F., Gross, F., Schuller, B.: Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM International Conference on Multimedia (MM), pp. 835–838. ACM Press, Barcelona, Spain, 21–25 October 2013. http://audeering.com/research/opensmile/
Gelzinis, A., Verikas, A., Bacauskiene, M.: Automated speech analysis applied to laryngeal disease categorization. Comput. Methods Programs Biomed. 91(1), 36–47 (2008)
Guyon, I.: Practical Feature Selection: from Correlation to Causality, NATO Science for Peace and Security Series D: Information and Communication Security, vol. 19, Chap. 3, pp. 27–43. IOS Press (2008)
Jaiantilal, A.: Random forest (regression, classification and clustering) implementation for Matlab (and standalone) (2012). http://code.google.com/archive/p/randomforest-matlab/
Mathieu, B., Essid, S., Fillon, T., Prado, J., Richard, G.: YAAFE, an easy to use and efficient audio feature extraction software. In: Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR), pp. 441–446. Utrecht, Netherlands, 9–13 August 2010. http://yaafe.sf.net
McEnnis, D., McKay, C., Fujinaga, I.: jAudio: Additions and improvements. In: Proceedings of the 7th International Conference on Music Information Retrieval (ISMIR), pp. 385–386. University of Victoria, Victoria, British Columbia, Canada, 8–12 October 2006. http://github.com/dmcennis/jAudioGIT
Nilsson, R., Peña, J.M., Björkegren, J., Tegnér, J.: Evaluating feature selection for svms in high dimensions. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 719–726. Springer, Heidelberg (2006)
Orozco-Arroyave, J.R., Hönig, F., Arias-Londoño, J.D., Vargas-Bonilla, J.F., Daqrouq, K., Skodda, S., Rusz, J., Nöth, E.: Automatic detection of Parkinson’s disease in running speech spoken in three different languages. J. Acoust. Soc. Am. 139(1), 481–500 (2016)
Sakar, C.O., Kursun, O.: Telediagnosis of Parkinson’s disease using measurements of dysphonia. J. Med. Syst. 34(4), 591–599 (2010)
Sáenz-Lechón, N., Godino-Llorente, J.I., Osma-Ruiz, V., Gómez-Vilda, P.: Methodological issues in the development of automatic systems for voice pathology detection. Biomed. Signal Process. Control 1(2), 120–128 (2006). Voice Models and Analysis for Biomedical Applications
Tsanas, A.: Accurate telemonitoring of Parkinson’s disease symptom severity using nonlinear speech signal processing and statistical machine learning. Ph.D. thesis, Oxford Centre for Industrial and Applied Mathematics, University of Oxford, Oxford, United Kingdom, http://people.maths.ox.ac.uk/tsanas/software.html
Tsanas, A., Little, M.A., McSharry, P.E., Spielman, J.L., Ramig, L.O.: Novel speech signal processing algorithms for high-accuracy classification of Parkinson’s disease. IEEE Trans. Biomed. Eng. 59(5), 1264–1271 (2012)
Verikas, A., Gelzinis, A., Vaiciukynas, E., Bacauskiene, M., Minelga, J., Hallander, M., Uloza, V., Padervinskis, E.: Data dependent random forest applied to screening for laryngeal disorders through analysis of sustained phonation: acoustic versus contact microphone. Med.Eng. Phys. 37(2), 210–218 (2015)
Xu, H., Caramanis, C., Mannor, S.: Sparse algorithms are not stable: a no-free-lunch theorem. IEEE Trans. Pattern Anal. Mach. Intell. 34(1), 187–193 (2012)
Acknowledgments
This research was funded by a grant (No. MIP-075/2015) from the Research Council of Lithuania.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Vaiciukynas, E. et al. (2016). Fusing Various Audio Feature Sets for Detection of Parkinson’s Disease from Sustained Voice and Speech Recordings. In: Ronzhin, A., Potapova, R., Németh, G. (eds) Speech and Computer. SPECOM 2016. Lecture Notes in Computer Science(), vol 9811. Springer, Cham. https://doi.org/10.1007/978-3-319-43958-7_39
Download citation
DOI: https://doi.org/10.1007/978-3-319-43958-7_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43957-0
Online ISBN: 978-3-319-43958-7
eBook Packages: Computer ScienceComputer Science (R0)