Abstract
Laryngeal pathologies have a significant influence on the quality of life, verbal communication, and the human profession. Most organic vocal pathologies affect the shape and vibration pattern of the vocal fold(s). Many automatic computer-based, non-intrusive systems for rapid detection and progression tracking have been introduced in recent years. This paper proposes an integrated wavelet-based voice condition evaluation framework, which is independent of human bias and language. The true voice source is extracted using quasi-closed phase (QCP) glottal inverse filtering to capture the altered vocal fold(s) dynamics. The voice source is decomposed using stationary wavelet transform (SWT) and the fundamental frequency independent statistical and energy measures are extracted from each spectral sub-band to quantify the voice source. As the multilevel stationary wavelet decomposition leads to high-dimensional feature vector, information gain-based feature ranking process is harnessed to pick up the most discerning features. Speech samples of sustained vowel / a / mined from four distinct databases in German, Spanish, English and Arabic are used to perform different intra-and cross-database experiments. The effect of the decomposition level on detection and classification accuracy is observed and the fifth level of decomposition is found to result in the highest recognition rate. Achieved performance metrics of classifiers suggest that SWT based energy and statistical features reveal more resourceful information on pathological voices and thus the proposed system can be used as a complimentary tool for clinical diagnosis of laryngeal pathologies.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Change history
16 May 2022
A Correction to this paper has been published: https://doi.org/10.1007/s10772-022-09977-6
References
Airaksinen, M., Raitio, T., Story, B., & Alku, P. (2014). Quasi closed phase glottal inverse filtering analysis with weighted linear prediction. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(3), 596–607.
Airaksinen, M., Story, B., & Alku, P. (2013). Quasi closed phase analysis for glottal inverse filtering. In Proceedings of the Interspeech 2013, (pp. 143–147).
Akbari, A., & Arjmandi, M. (2014). An efficient voice pathology classification scheme based on applying multi-layer linear discriminant analysis to wavelet packet-based features. Biomedical Signal Processing and Control, 10, 209–223.
Akbari, A., & Arjmandi, M. (2015). Employing linear prediction residual signal of wavelet sub-bands in automatic detection of laryngeal pathology. Biomedical Signal Processing and Control, 18, 293–302.
Ali, Z., et al. (2017). Intra-and Inter-database study for Arabic, English, and German databases: Do conventional speech features detect voice pathology? Journal of Voice, 31(3), 386.e1-386.e8.
Al-nasheri, A., et al. (2016). An investigation of multidimensional voice program parameters in three different databases for voice pathology detection and classification. Journal of Voice, 31(1), 113.e9-113.e18.
Al-nasheri, A., Muhammad, G., Alsulaiman, M., Ali, Z., Malki, K., Mesallam, T., & Ibrahim, M. (2018). Voice pathology detection and classification using auto-correlation and entropy features in different frequency regions. IEEE Access, 6(1), 6961–6974.
Al-nasheri, A., Muhammad, G., Alsulaiman, M., & Ali, Z. (2017). Investigation of voice pathology detection and classification on different frequency regions using correlation functions. Journal of Voice, 31(1), 3–15.
Arias-Londoño, J., & Godino-Llorente, J. (2015). Entropies from Markov models as complexity measures of embedded attractors. Entropy, 17(6), 3595–3620.
Arias-Londoño, J., Godino-Llorente, J., Castellanos-Dominguez, G., Sáenz-Lechón, N., & Osma-Ruiz, V. (2009). Complexity analysis of pathological voices by means of hidden Markov entropy measurements. Annual International Conference of the IEEE Engineering in Medicine and Biology Society, (pp. 2248–2251).
Arias-Londoño, J., Godino-Llorente, J., Markaki, M., & Stylianou, Y. (2011a). On combining information from modulation spectra and mel-frequency cepstral coefficients for automatic detection of pathological voices. Logopedics Phoniatrics Vocology, 36(2), 60–69.
Arias-Londoño, J., Godino-Llorente, J., Sáenz-Lechón, N., Osma-Ruiz, V., & Castellanos-Domínguez, G. (2011b). Automatic detection of pathological voices using complexity measures, noise parameters, and melcepstral coefficients. IEEE Transactions on Biomedical Engineering, 58(2), 370–379.
Arjmandi, M., & Pooyan, M. (2012). An optimum algorithm in pathological voice quality assessment using wavelet-packet-based features, linear discriminant analysis and support vector machine. Biomedical Signal Processing and Control, 7(1), 3–19.
Barry, W., & Pützer, M. Saarbrucken voice database. Retrieved from http://www.Stimmdatenbank.coli.uni-saarland.de
Behroozmand, R., & Almasganj, F. (2007). Optimal selection of wavelet packet-based features using genetic algorithm in pathological assessment of patients’ speech signal with unilateral vocal fold paralysis. Computers in Biology and Medicine, 37(4), 474–485.
Brockmann, M., Drinnan, M., Storck, C., & Carding, P. (2011). Reliable jitter and shimmer measurements in voice clinics: The relevance of vowel, gender, vocal intensity, and fundamental frequency effects in a typical clinical task. Journal of Voice, 25(1), 44–53.
Chui, C. (1992). An introduction to wavelets. Academic.
Crovato, C., & Schuck, A. (2007). The use of wavelet packet transform and artificial neural networks in analysis and classification of dysphonic voices. IEEE Transactions on Biomedical Engineering, 54(10), 1898–1900.
Drugman, T., Bozkurt, B., & Dutoit, T. (2012). A comparative study of glottal source estimation techniques. Computer Speech & Language, 26(1), 20–34.
Ezzine K., & Frikha, M. (2018). Investigation of glottal flow parameters for voice pathology detection on SVD and MEEI databases. 4th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), (pp. 1–6).
Ezzine, K., Hamida, A., Messaoud, Z. & Frikha, M. (2016). Towards a computer tool for automatic detection of laryngeal cancer. 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), (pp. 387–392).
Farouk, M. (2018). Clinical diagnosis and assessment of speech pathology. Springer International Publishing.
Fonseca, E., Guido, R., Scalassara, P., Maciel, C., & Pereira, J. (2007). Wavelet time-frequency analysis and least squares support vector machines for the identification of voice disorders. Computers in Biology and Medicine, 37(4), 571–578.
Forero, L., Kohler, M., Vellasco, M., & Cataldo, E. (2016). Analysis and classification of voice pathologies using glottal signal parameters. Journal of Voice, 30(5), 549–556.
Fraile, R., & Godino-Llorente, J. (2014). Cepstral peak prominence: A comprehensive analysis. Biomedical Signal Processing and Control, 14, 42–54.
Fraile, R., Godino-Llorente, J., Sáenz-Lechón, N., Osma-Ruiz, V., & Vilda, P. (2008). Use of cepstrum-based parameters for automatic pathology detection on speech - analysis of performance and theoretical justification. BIOSIGNALS, (pp. 85–91).
Ghoraani, B., & Krishnan, S. (2009). A joint time-frequency and matrix decomposition feature extraction methodology for pathological voice classification. EURASIP Journal on Advances in Signal Processing, 1, 1–9.
Gidaye, G., Nirmal, J., Ezzine, K., & Frikha, M. (2019). Effective detection of voice dysfunction using glottic flow descriptors. Third International Conference on Inventive Systems and Control (ICISC), (pp. 307–3120).
Gidaye, G., Nirmal, J., Ezzine, K., Shrivas, A., & Frikha, M. (2020). Application of glottal flow descriptors for pathological voice diagnosis. International Journal of Speech Technology, 23, 205–222.
Giovanni, A., Ouaknine, M., & Triglia, J. (1999). Determination of largest Lyapunov exponents of vocal signal: Application to unilateral laryngeal paralysis. Journal of Voice, 13(3), 341–354.
Godino-Llorente, J., & Vilda, P. (2004). Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors. IEEE Transactions on Biomedical Engineering, 51(2), 380–384.
Godino-Llorente, J., Aguilera-Navarro, S., & Vilda, P. (2000). LPC, LPCC and MFCC parameterisation applied to the detection of voice impairments. INTERSPEECH, (pp. 965–968).
Godino-Llorente, J., Osma-Ruiz, V., Sáenz-Lechón, N., Gómez-Vilda, P., Blanco-Velasco, M., & Cruz-Roldán, F. (2010). The effectiveness of the glottal to noise excitation ratio for the screening of voice disorders. Journal of Voice, 24(1), 47–56.
Gomez, P., Godino, J., Alvarez, A., Martinez, R., Nieto V., & Rodellar, V. (2005). Evidence of glottal source spectral features found in vocal fold dynamics. Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 5. (pp. v/441-v/444).
Gómez-García, J. (2018). Contributions to the design of automatic voice quality analysis systems using speech technologies. Retrieved from http://oa.upm.es/49565/
Gómez-García, J., Godino-Llorente, J., & Castellanos Dominguez, G. (2012). Influence of delay time on regularity estimation for voice pathology detection. Conf Proc IEEE Eng Med Biol Soc., (pp. 4217–4220).
Gómez-García, J., Moro-Velázquez, L., & Godino-Llorente, J. (2019a). On the design of automatic voice condition analysis systems. Part I: Review of concepts and an insight to the state of the art. Biomedical Signal Processing and Control, 51, 181–199.
Gómez-García, J., Moro-Velázquez, L., & Godino-Llorente, J. (2019b). On the design of automatic voice condition analysis systems. Part II: Review of speaker recognition techniques and study on the effects of different variability factors. Biomedical Signal Processing and Control, 48, 128–143.
Hariharan, M., Polat, K., & Yaacob, S. (2014). A new feature constituting approach to detection of vocal fold pathology. International Journal of Systems Science, 45(8), 1622–1634.
Hegde, S., Shetty, S., Rai, S., & Dodderi, T. (2019). A survey on machine learning approaches for automatic detection of voice
Henríquez, P., Alonso, J., Ferrer, M., Travieso, C., Godino-Llorente, J., & Díaz-de-María, F. (2009). Characterization of healthy and pathological voice through measures based on nonlinear dynamics. IEEE Transactions on Audio, Speech, and Language Processing, 17(6), 1186–1195.
Hillenbrand, J., Cleveland, R., & Erickson, R. (1994). Acoustic correlates of breathy vocal quality. Journal of Speech, Language, and Hearing Research, 37(4), 769–778.
Kadiri, S., & Alku, P. (2019). Analysis and detection of pathological voice using glottal source features. IEEE Journal of Selected Topics in Signal Processing, 14(2), 367–379.
Kaleem, M., Ghoraani, B., Guergachi, A., & Krishnan, S. (2013). Pathological speech signal analysis and classification using empirical mode decomposition. Medical & Biological Engineering & Computing, 51(7), 811–821.
Kasuya, H., Ogawa, S., Mashima, K., & Ebihara, S. (1986). Normalized noise energy as an acoustic measure to evaluate pathologic voice. The Journal of the Acoustical Society of America, 80(5), 1329–1334.
Lee, J., Kim, S., & Kang, H. (2014). Detecting pathological speech using contour modeling of harmonic-to-noise ratio. ICASSP, (pp. 5969–5973).
Little, M., Costello, D., & Harries, M. (2011). Objective dysphonia quantification in vocal fold paralysis: Comparing nonlinear with classical measures. Journal of Voice, 25(1), 21–31.
Little, M., McSharry, P., Hunter, E., Spielman, J., & Ramig, L. (2009). Suitability of dysphonia measurements for telemonitoring of parkinson’s disease. IEEE Transactions on Biomedical Engineering, 56(4), 1015.
Little, M., McSharry, P., Roberts, S., Costello, D., & Moroz, I. (2007). Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection. Biomedical Engineering Online, 6(1), 23.
Ma, C., Kamp, Y., & Willems, L. (1993). Robust signal selection for linear prediction analysis of voiced speech. Speech Communication, 12(1), 69-81.
Manfredi, C., D’Aniello, M., Bruscaglioni, P., & Ismaelli, A. (2000). A comparative analysis of fundamental frequency estimation methods with application to pathological voices. Medical Engineering & Physics, 22(2), 135–147.
Markaki, M., & Stylianou, Y. 2009. Using modulation spectra for voice pathology detection and classification. Conf Proc IEEE Eng Med Biol Soc., (pp. 2514–2517).
Markaki, M., Stylianou, Y., Arias-Londoño, J., & Godino-Llorente, J. (2010). “Dysphonia detection based on modulation spectral features and cepstral coefficients,” ICASSP, (pp. 5162–5165).
MEEI: Disordered Voice Database, Voice and Speech Lab, Kay Elemetrics Corp., Version 1.03 (CD-ROM).
Mekyska, J., et al. (2015). Robust and complex approach of pathological speech signal analysis. Neurocomputing, 167, 94–111.
Mesallam, T., et al. (2017). Development of the Arabic voice pathology database and its evaluation by using speech features and machine learning algorithms. Journal of Healthcare Engineering, 2017, 1–13.
Michaelis, D., Gramss, T., & Strube, H. (1997). Glottal-to-noise excitation ratio a new measure for describing pathological voices. Acta Acustica United with Acustica, 83(4), 700–706.
Moro-Velázquez, L., Gómez-García, J., & Godino-Llorente, J. (2016). Voice pathology detection using modulation spectrum-optimized metrics. Frontiers in Bioengineering and Biotechnology, 4, 1.
Muhammad, G., et al. (2017a). Voice pathology detection using interlaced derivative pattern on glottal source excitation. Biomedical Signal Processing and Control, 31, 156–164.
Muhammad, G., Rahman, S., Alelaiwi, A., & Alamri, A. (2017b). Smart health solution integrating IOT and cloud: A case study of voice pathology monitoring. IEEE Communications Magazine, 55(1), 69–73.
Nemr, K., et al. (2012). GRBAS and Cape-V scales: High reliability and consensus when applied at different times. Journal of Voice, 26(6), 812.e17-812.e22.
Nongpiur, R., & Shpak, D. (2013). Impulse-noise suppression in speech using the stationary wavelet transform. The Journal of the Acoustical Society of America, 133(2), 866–879.
Orozco-Arroyave, J., Bonilla, J., & Trejos, E. (2012). Acoustic analysis and non-linear dynamics applied to voice pathology detection: A review. Recent Patents on Signal Processing, 2(2), 96–107.
Orozco-Arroyave, J., et al. (2015). Characterization methods for the detection of multiple voice disorders: Neurological, functional, and laryngeal diseases. IEEE Journal of Biomedical and Health Informatics, 19(6), 1820–1828.
Parsa, V., & Jamieson, D. (2000). Identification of pathological voices using glottal noise measures. Journal of Speech, Language, and Hearing Research, 43(2), 469–485.
Parsa, V., & Jamieson, D. (2001). Acoustic discrimination of pathological voice. Journal of Speech, Language, and Hearing Research, 44(2), 327–339.
Patel, R., et al. (2018). Recommended protocols for instrumental assessment of voice: American speech-language hearing association expert panel to develop a protocol for instrumental assessment of vocal function. American Journal of Speech-Language Pathology, 27(3), 887–905.
Péan, V., Ouayoun, M., Fugain, C., Meyer, B., & Chouard, C. (2000). A fractal approach to normal and pathological voices. Acta Otolaryngologica, 120(2), 222–224.
Qi, Y., & Hillman, R. (1997). Temporal and spectral estimations of harmonics-to-noise ratio in human voice signals. The Journal of the Acoustical Society of America, 102(1), 537–543.
Rosa, M., Pereira, J., & Grellet, M. (2000). Adaptive estimation of residue signal for voice pathology diagnosis. IEEE Transactions on Biomedical Engineering, 47(1), 96–104.
Roy, N., Barkmeier-Kraemer, J., Eadie, T., Sivasankar, M., Mehta, D., Paul, D., & Hillman, R. (2013). Evidence-based clinical voice assessment: A systematic review. American Journal of Speech-Language Pathology, 22(2), 212–226.
Saldanha, J., Ananthakrishna, T., & Pinto, R. (2014). Vocal fold pathology assessment using Mel-frequency cepstral coefficients and linear predictive cepstral coefficients features. Journal of Medical Imaging and Health Informatics, 2, 168–173.
Silva, D., Oliveira, L., & Andrea, M. (2009). Jitter estimation algorithms for detection of pathological voices. EURASIP Journal on Advances in Signal Processing, 9(1–9), 9.
Sreehari, V., & Mary, L. (2018). Automatic speaker recognition using stationary wavelet coefficients of lp residual. In: TENCON 2018–2018 IEEE Region 10 Conference, (pp. 1595–1600).
Stemple, J., Glaze, L., & Klaben, B. (2010). Clinical voice pathology: Theory and management. Plural.
Titze, I. (2006). The myoelastic aerodynamic theory of phonation. National Center for Voice and Speech.
Travieso, C., Alonso, J., Orozco-Arroyave, J., Vargas-Bonilla, J., Nth, E., & Ravelo-García, A. (2017). Detection of different voice diseases based on the nonlinear characterization of speech signals. Expert Systems with Applications, 82, 184–195.
Tsanas, A., Little, M., McSharry, P., & Ramig, L. (2010). Accurate telemonitoring of parkinson’s disease progression by noninvasive speech tests. IEEE Transactions on Biomedical Engineering, 57(4), 884–893.
Tsanas, A., Little, M., McSharry, P., & Ramig, L. (2011). Nonlinear speech analysis algorithms mapped to a standard metric achieve clinically useful quantification of average Parkinson’s disease symptom severity. Journal of the Royal Society Interface, 8(59), 842–855.
Umapathy, K., Krishnan, S., Parsa, V., & Jamieson, D. (2005). Discrimination of pathological voices using a time-frequency approach. IEEE Transactions on Biomedical Engineering, 52(3), 421–430.
Vasilakis, M., & Stylianou, Y. (2009). Voice pathology detection based on short-term jitter estimations in running speech. Folia Phoniatrica Logopedica, 61(3), 153–170.
Vaziri, G., Almasganj, F., & Behroozmand, R. (2010). Pathological assessment of patients’ speech signals using nonlinear dynamical analysis. Computers in Biology and Medicine, 40(1), 54–63.
Verdolini, K., & Ramig, L. (2001). Review: Occupational risks for voice problems. Logopedics Phoniatrics Vocology, 26(1), 37–46.
Vilda, P., et al. (2009). Glottal source biometrical signature for voice pathology detection. Speech Communication, 51(9), 759–781.
Watts, C., & Awan, S. (2011). Use of spectral/cepstral analyses for differentiating normal from hypofunctional voices in sustained vowel and continuous speech contexts. Journal of Speech, Language, and Hearing Research, 54(6), 1525–1537.
Ye, H., Wang, G., & Ding, S. (2004). A new parity space approach for fault detection based on stationary wavelet transform. IEEE Transactions on Automatic Control, 49(2), 281–287.
Zhang, Y., Jiang, J., Biazzo, L., & Jorgensen, M. (2005). Perturbation and nonlinear dynamic analyses of voices from patients with unilateral laryngeal paralysis. Journal of Voice, 19(4), 519–528.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gidaye, G., Nirmal, J., Ezzine, K. et al. Unified wavelet-based framework for evaluation of voice impairment. Int J Speech Technol 25, 527–548 (2022). https://doi.org/10.1007/s10772-022-09969-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-022-09969-6