Abstract
Despite the fact that perceptual evaluation is considered as a gold standard for assessing pathological voice quality, the considerably high inter- and intra-listeners variability associated with different perceptual ratings cannot be ignored. This is probably due to other confounding factors such as listeners’ perceptual bias, listeners’ experience and type of rating scale being used. Automatic objective assessment can serve as a useful tool for diagnosis of pathological voices. Acoustic analysis can be useful in determining severity of dysphonia. The present study aimed to develop a complementary automatic voice assessment system by using multidimensional acoustical measures based on the well-known GRBAS perceptual rating scale. A total of 65 dimensionality measures including traditional acoustic methods, MFCC, Glottal-to-Noise Excitation Methods and nonlinear dynamical analysis were used to compose a matrix of features. To reduce redundancy in features, four different feature extraction techniques were applied. The multiclass classification was carried out by means of RBF kernel-SVM and Extreme Learning Machine. The classification results were moderately correlated with GRBAS ratings of severity, with the best accuracy around 77.55 and 80.58 %, respectively. This suggests that such multidimensional acoustic analysis can be an appropriate assessment tool in determining the presence and severity of voice disorders.
Similar content being viewed by others
References
Kreiman, J., Gerratt, B. R., & Precoda, K. (1990). Listener experience and perception of voice quality. Journal of Speech, Language, and Hearing Research, 33(1), 103–115.
Rabinov, C. R., Kreiman, J., Gerratt, B. R., & Bielamowicz, S. (1995). Comparing reliability of perceptual ratings of roughness and acoustic measures of jitter. Journal of Speech, Language, and Hearing Research, 38(1), 26–32.
Kreiman, J., Gerratt, B. R., Precoda, K., & Berke, G. S. (1992). Individual differences in voice quality perception. Journal of Speech, Language, and Hearing Research, 35(3), 512–520.
Hirano, M. (1981). Clinical examination of voice. New York: Springer.
Baken, R. J., & Orlikoff, R. F. (2000) Clinical measurement of speech and voice. Cengage Learning.
Michaelis, D., Gramss, T., & Strube, H. W. (1997). Glottal-to-noise excitation ratio–a new measure for describing pathological voices. Acta Acustica United with Acustica, 83(4), 700–706.
Tsanas, A., Little, M. A., McSharry, P. E., & Ramig, L. O. (2011). Nonlinear speech analysis algorithms mapped to a standard metric achieve clinically useful quantification of average Parkinson’s disease symptom severity. Journal of the Royal Society Interface, 8(59), 842–855.
Huang, N. E., Shen, Z., Long, S. R., Wu, M. C., Shih, H. H., Zheng, Q., & Liu, H. H. (1998). The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings of the Royal Society of London. Proceedings of the Royal Society of London Series A Mathematical Physical and Engineering Sciences, 454(1971), 903–995.
Yan, N., Ng, M. L., Wang, D., Zhang, L., Chan, V., & Ho, R. S. (2013). Nonlinear dynamical analysis of laryngeal, esophageal, and tracheoesophageal speech of Cantonese. Journal of Voice, 27(1), 101–110.
MacCallum, J. K., Cai, L., Zhou, L., Zhang, Y., & Jiang, J. J. (2009). Acoustic analysis of aperiodic voice: perturbation and nonlinear dynamic properties in esophageal phonation. Journal of Voice, 23(3), 283–290.
Godino-Llorente, J. I., Gómez-Vilda, P., Sáenz-Lechón, N., Blanco-Velasco, M., Cruz-Roldán, F., Ferrer, M. A. (2005). Discriminative methods for the detection of voice disorders. In ISCA Tutorial and Research Workshop (ITRW) on Non-Linear Speech Processing.
Dimitriadis, D., Potamianos, A., & Maragos, P. (2009). A comparison of the squared energy and Teager-Kaiser operators for short-term energy estimation in additive noise. IEEE Transactions on Signal Processing, 57(7), 2569–2581.
Naylor, P. A., Kounoudes, A., Gudnason, J., & Brookes, M. (2007). Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Transactions on Audio Speech and Language Processing, 15(1), 34–43.
Little, M. A., Costello, D. A., & Harries, M. L. (2011). Objective dysphonia quantification in vocal fold paralysis: comparing nonlinear with classical measures. Journal of Voice, 25(1), 21–31.
Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1226–1238.
Kira, K., & Rendell, L. A. (1992). A practical approach to feature selection. Paper presented at the proceedings of the ninth international workshop on machine learning. Scotland: Aberdeen.
Kononenko, I. (1994). Estimating attributes: analysis and extensions of RELIEF. In Machine Learning: ECML-94 (pp. 171–182). Springer Berlin Heidelberg.
Fletcher, R. (1987). Practical methods of optimization (2nd ed.). Chichester: Wiley.
Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer.
Hsu, C. W., Chang, C. C., Lin, C. J. (2003). A practical guide to support vector classification.
Huang, G. B., Zhu, Q. Y., & Siew, C. K. (2006). Extreme learning machine: theory and applications. Neurocomputing, 70(1), 489–501.
Ortega, J. M. (1987). Matrix theory. New York: Plenum Press.
Huang, G. B., Zhou, H., Ding, X., & Zhang, R. (2012). Extreme learning machine for regression and multiclass classification. IEEE Transactions on Systems Man and Cybernetics Part B Cybernetics, 42(2), 513–529.
ELM code: http://www.ntu.edu.sg/home/egbhuang/elm_codes.html.
Duda, R. O., Hart, P.E., Stork, D. G.(1999) Pattern classification. Wiley.
Ferreiros, J., & Pardo, J. M. (1999). Improving continuous speech recognition in Spanish by phone-class semicontinuous HMMs with pausing and multiple pronunciations. Speech Communication, 29(1), 65–76.
Hariharan, M., Polat, K., Sindhu, R., & Yaacob, S. (2013). A hybrid expert system approach for telemonitoring of vocal fold pathology. Applied Soft Computing, 13(10), 4148–4161.
Arias-Londoño, J. D., Godino-Llorente, J. I., Sáenz-Lechón, N., Osma-Ruiz, V., & Castellanos-Domínguez, G. (2010). An improved method for voice pathology detection by means of a HMM-based feature space transformation. Pattern Recognition, 43(9), 3100–3112.
Sáenz-Lechón, N., Godino-Llorente, J. I., Osma-Ruiz, V., Blanco-Velasco, M., Cruz-Roldán, F. (2006). Automatic assessment of voice quality according to the GRBAS scale. In Engineering in Medicine and Biology Society, 2006.EMBS’06. 28th Annual International Conference of the IEEE. 2478–2481.
Wolfe, V. I., & Ratusnik, D. L. (1988). Acoustic and perceptual measurements of roughness influencing judgments of pitch. Journal of Speech and Hearing Disorders, 53(1), 15–22.
Markaki, M., & Stylianou, Y. (2009). Using modulation spectra for voice pathology detection and classification. In Engineering in Medicine and Biology Society, 2009.EMBC 2009. Annual International Conference of the IEEE. 2514–2517.
Yu, P., Ouaknine, M., Revis, J., & Giovanni, A. (2001). Objective voice analysis for dysphonic patients: a multiparametric protocol including acoustic and aerodynamic measurements. Journal of Voice, 15(4), 529–542.
Yu, P., Wang, Z., Liu, S., Yan, N., Wang, L., Ng, M. (2014). Multidimensional acoustic analysis for voice quality assessment based on the GRBAS scale. In Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on. 321–325.
Maryn, Y., Corthals, P., Van Cauwenberge, P., Roy, N., & De Bodt, M. (2010). Toward improved ecological validity in the acoustic measurement of overall voice quality: combining continuous speech and sustained vowels. Journal of Voice, 24(5), 540–555.
Acknowledgments
The research was partially supported by a grant from National Natural Science Foundation of China (NSFC 61135003 and NFSC 61401452), Shenzhen Speech Rehabilitation Technology Laboratory and Guangdong Innovative Research Team Program (No.201001D0104648280).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, Z., Yu, P., Yan, N. et al. Automatic Assessment of Pathological Voice Quality Using Multidimensional Acoustic Analysis Based on the GRBAS Scale. J Sign Process Syst 82, 241–251 (2016). https://doi.org/10.1007/s11265-015-1016-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-015-1016-2