Skip to main content
Log in

Unified wavelet-based framework for evaluation of voice impairment

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

A Correction to this article was published on 16 May 2022

This article has been updated

Abstract

Laryngeal pathologies have a significant influence on the quality of life, verbal communication, and the human profession. Most organic vocal pathologies affect the shape and vibration pattern of the vocal fold(s). Many automatic computer-based, non-intrusive systems for rapid detection and progression tracking have been introduced in recent years. This paper proposes an integrated wavelet-based voice condition evaluation framework, which is independent of human bias and language. The true voice source is extracted using quasi-closed phase (QCP) glottal inverse filtering to capture the altered vocal fold(s) dynamics. The voice source is decomposed using stationary wavelet transform (SWT) and the fundamental frequency independent statistical and energy measures are extracted from each spectral sub-band to quantify the voice source. As the multilevel stationary wavelet decomposition leads to high-dimensional feature vector, information gain-based feature ranking process is harnessed to pick up the most discerning features. Speech samples of sustained vowel / a / mined from four distinct databases in German, Spanish, English and Arabic are used to perform different intra-and cross-database experiments. The effect of the decomposition level on detection and classification accuracy is observed and the fifth level of decomposition is found to result in the highest recognition rate. Achieved performance metrics of classifiers suggest that SWT based energy and statistical features reveal more resourceful information on pathological voices and thus the proposed system can be used as a complimentary tool for clinical diagnosis of laryngeal pathologies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Change history

References

  • Airaksinen, M., Raitio, T., Story, B., & Alku, P. (2014). Quasi closed phase glottal inverse filtering analysis with weighted linear prediction. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(3), 596–607.

    Article  Google Scholar 

  • Airaksinen, M., Story, B., & Alku, P. (2013). Quasi closed phase analysis for glottal inverse filtering. In Proceedings of the Interspeech 2013, (pp. 143–147).

  • Akbari, A., & Arjmandi, M. (2014). An efficient voice pathology classification scheme based on applying multi-layer linear discriminant analysis to wavelet packet-based features. Biomedical Signal Processing and Control, 10, 209–223.

    Article  Google Scholar 

  • Akbari, A., & Arjmandi, M. (2015). Employing linear prediction residual signal of wavelet sub-bands in automatic detection of laryngeal pathology. Biomedical Signal Processing and Control, 18, 293–302.

    Article  Google Scholar 

  • Ali, Z., et al. (2017). Intra-and Inter-database study for Arabic, English, and German databases: Do conventional speech features detect voice pathology? Journal of Voice, 31(3), 386.e1-386.e8.

    Article  Google Scholar 

  • Al-nasheri, A., et al. (2016). An investigation of multidimensional voice program parameters in three different databases for voice pathology detection and classification. Journal of Voice, 31(1), 113.e9-113.e18.

    Article  Google Scholar 

  • Al-nasheri, A., Muhammad, G., Alsulaiman, M., Ali, Z., Malki, K., Mesallam, T., & Ibrahim, M. (2018). Voice pathology detection and classification using auto-correlation and entropy features in different frequency regions. IEEE Access, 6(1), 6961–6974.

    Article  Google Scholar 

  • Al-nasheri, A., Muhammad, G., Alsulaiman, M., & Ali, Z. (2017). Investigation of voice pathology detection and classification on different frequency regions using correlation functions. Journal of Voice, 31(1), 3–15.

    Article  Google Scholar 

  • Arias-Londoño, J., & Godino-Llorente, J. (2015). Entropies from Markov models as complexity measures of embedded attractors. Entropy, 17(6), 3595–3620.

    Article  MathSciNet  MATH  Google Scholar 

  • Arias-Londoño, J., Godino-Llorente, J., Castellanos-Dominguez, G., Sáenz-Lechón, N., & Osma-Ruiz, V. (2009). Complexity analysis of pathological voices by means of hidden Markov entropy measurements. Annual International Conference of the IEEE Engineering in Medicine and Biology Society, (pp. 2248–2251).

  • Arias-Londoño, J., Godino-Llorente, J., Markaki, M., & Stylianou, Y. (2011a). On combining information from modulation spectra and mel-frequency cepstral coefficients for automatic detection of pathological voices. Logopedics Phoniatrics Vocology, 36(2), 60–69.

    Article  Google Scholar 

  • Arias-Londoño, J., Godino-Llorente, J., Sáenz-Lechón, N., Osma-Ruiz, V., & Castellanos-Domínguez, G. (2011b). Automatic detection of pathological voices using complexity measures, noise parameters, and melcepstral coefficients. IEEE Transactions on Biomedical Engineering, 58(2), 370–379.

    Article  Google Scholar 

  • Arjmandi, M., & Pooyan, M. (2012). An optimum algorithm in pathological voice quality assessment using wavelet-packet-based features, linear discriminant analysis and support vector machine. Biomedical Signal Processing and Control, 7(1), 3–19.

    Article  Google Scholar 

  • Barry, W., & Pützer, M. Saarbrucken voice database. Retrieved from http://www.Stimmdatenbank.coli.uni-saarland.de

  • Behroozmand, R., & Almasganj, F. (2007). Optimal selection of wavelet packet-based features using genetic algorithm in pathological assessment of patients’ speech signal with unilateral vocal fold paralysis. Computers in Biology and Medicine, 37(4), 474–485.

    Article  Google Scholar 

  • Brockmann, M., Drinnan, M., Storck, C., & Carding, P. (2011). Reliable jitter and shimmer measurements in voice clinics: The relevance of vowel, gender, vocal intensity, and fundamental frequency effects in a typical clinical task. Journal of Voice, 25(1), 44–53.

    Article  Google Scholar 

  • Chui, C. (1992). An introduction to wavelets. Academic.

    MATH  Google Scholar 

  • Crovato, C., & Schuck, A. (2007). The use of wavelet packet transform and artificial neural networks in analysis and classification of dysphonic voices. IEEE Transactions on Biomedical Engineering, 54(10), 1898–1900.

    Article  Google Scholar 

  • Drugman, T., Bozkurt, B., & Dutoit, T. (2012). A comparative study of glottal source estimation techniques. Computer Speech & Language, 26(1), 20–34.

    Article  Google Scholar 

  • Ezzine K., & Frikha, M. (2018). Investigation of glottal flow parameters for voice pathology detection on SVD and MEEI databases. 4th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), (pp. 1–6).

  • Ezzine, K., Hamida, A., Messaoud, Z. & Frikha, M. (2016). Towards a computer tool for automatic detection of laryngeal cancer. 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), (pp. 387–392).

  • Farouk, M. (2018). Clinical diagnosis and assessment of speech pathology. Springer International Publishing.

    Book  Google Scholar 

  • Fonseca, E., Guido, R., Scalassara, P., Maciel, C., & Pereira, J. (2007). Wavelet time-frequency analysis and least squares support vector machines for the identification of voice disorders. Computers in Biology and Medicine, 37(4), 571–578.

    Article  Google Scholar 

  • Forero, L., Kohler, M., Vellasco, M., & Cataldo, E. (2016). Analysis and classification of voice pathologies using glottal signal parameters. Journal of Voice, 30(5), 549–556.

    Article  Google Scholar 

  • Fraile, R., & Godino-Llorente, J. (2014). Cepstral peak prominence: A comprehensive analysis. Biomedical Signal Processing and Control, 14, 42–54.

    Article  Google Scholar 

  • Fraile, R., Godino-Llorente, J., Sáenz-Lechón, N., Osma-Ruiz, V., & Vilda, P. (2008). Use of cepstrum-based parameters for automatic pathology detection on speech - analysis of performance and theoretical justification. BIOSIGNALS, (pp. 85–91).

  • Ghoraani, B., & Krishnan, S. (2009). A joint time-frequency and matrix decomposition feature extraction methodology for pathological voice classification. EURASIP Journal on Advances in Signal Processing, 1, 1–9.

    MATH  Google Scholar 

  • Gidaye, G., Nirmal, J., Ezzine, K., & Frikha, M. (2019). Effective detection of voice dysfunction using glottic flow descriptors. Third International Conference on Inventive Systems and Control (ICISC), (pp. 307–3120).

  • Gidaye, G., Nirmal, J., Ezzine, K., Shrivas, A., & Frikha, M. (2020). Application of glottal flow descriptors for pathological voice diagnosis. International Journal of Speech Technology, 23, 205–222.

    Article  Google Scholar 

  • Giovanni, A., Ouaknine, M., & Triglia, J. (1999). Determination of largest Lyapunov exponents of vocal signal: Application to unilateral laryngeal paralysis. Journal of Voice, 13(3), 341–354.

    Article  Google Scholar 

  • Godino-Llorente, J., & Vilda, P. (2004). Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors. IEEE Transactions on Biomedical Engineering, 51(2), 380–384.

    Article  Google Scholar 

  • Godino-Llorente, J., Aguilera-Navarro, S., & Vilda, P. (2000). LPC, LPCC and MFCC parameterisation applied to the detection of voice impairments. INTERSPEECH, (pp. 965–968).

  • Godino-Llorente, J., Osma-Ruiz, V., Sáenz-Lechón, N., Gómez-Vilda, P., Blanco-Velasco, M., & Cruz-Roldán, F. (2010). The effectiveness of the glottal to noise excitation ratio for the screening of voice disorders. Journal of Voice, 24(1), 47–56.

    Article  Google Scholar 

  • Gomez, P., Godino, J., Alvarez, A., Martinez, R., Nieto V., & Rodellar, V. (2005). Evidence of glottal source spectral features found in vocal fold dynamics. Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 5. (pp. v/441-v/444).

  • Gómez-García, J. (2018). Contributions to the design of automatic voice quality analysis systems using speech technologies. Retrieved from http://oa.upm.es/49565/

  • Gómez-García, J., Godino-Llorente, J., & Castellanos Dominguez, G. (2012). Influence of delay time on regularity estimation for voice pathology detection. Conf Proc IEEE Eng Med Biol Soc., (pp. 4217–4220).

  • Gómez-García, J., Moro-Velázquez, L., & Godino-Llorente, J. (2019a). On the design of automatic voice condition analysis systems. Part I: Review of concepts and an insight to the state of the art. Biomedical Signal Processing and Control, 51, 181–199.

    Article  Google Scholar 

  • Gómez-García, J., Moro-Velázquez, L., & Godino-Llorente, J. (2019b). On the design of automatic voice condition analysis systems. Part II: Review of speaker recognition techniques and study on the effects of different variability factors. Biomedical Signal Processing and Control, 48, 128–143.

    Article  Google Scholar 

  • Hariharan, M., Polat, K., & Yaacob, S. (2014). A new feature constituting approach to detection of vocal fold pathology. International Journal of Systems Science, 45(8), 1622–1634.

  • Hegde, S., Shetty, S., Rai, S., & Dodderi, T. (2019). A survey on machine learning approaches for automatic detection of voice

  • Henríquez, P., Alonso, J., Ferrer, M., Travieso, C., Godino-Llorente, J., & Díaz-de-María, F. (2009). Characterization of healthy and pathological voice through measures based on nonlinear dynamics. IEEE Transactions on Audio, Speech, and Language Processing, 17(6), 1186–1195.

    Article  Google Scholar 

  • Hillenbrand, J., Cleveland, R., & Erickson, R. (1994). Acoustic correlates of breathy vocal quality. Journal of Speech, Language, and Hearing Research, 37(4), 769–778.

    Article  Google Scholar 

  • Kadiri, S., & Alku, P. (2019). Analysis and detection of pathological voice using glottal source features. IEEE Journal of Selected Topics in Signal Processing, 14(2), 367–379.

    Article  Google Scholar 

  • Kaleem, M., Ghoraani, B., Guergachi, A., & Krishnan, S. (2013). Pathological speech signal analysis and classification using empirical mode decomposition. Medical & Biological Engineering & Computing, 51(7), 811–821.

    Article  Google Scholar 

  • Kasuya, H., Ogawa, S., Mashima, K., & Ebihara, S. (1986). Normalized noise energy as an acoustic measure to evaluate pathologic voice. The Journal of the Acoustical Society of America, 80(5), 1329–1334.

    Article  Google Scholar 

  • Lee, J., Kim, S., & Kang, H. (2014). Detecting pathological speech using contour modeling of harmonic-to-noise ratio. ICASSP, (pp. 5969–5973).

  • Little, M., Costello, D., & Harries, M. (2011). Objective dysphonia quantification in vocal fold paralysis: Comparing nonlinear with classical measures. Journal of Voice, 25(1), 21–31.

    Article  Google Scholar 

  • Little, M., McSharry, P., Hunter, E., Spielman, J., & Ramig, L. (2009). Suitability of dysphonia measurements for telemonitoring of parkinson’s disease. IEEE Transactions on Biomedical Engineering, 56(4), 1015.

    Article  Google Scholar 

  • Little, M., McSharry, P., Roberts, S., Costello, D., & Moroz, I. (2007). Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection. Biomedical Engineering Online, 6(1), 23.

    Article  Google Scholar 

  • Ma, C., Kamp, Y., & Willems, L. (1993). Robust signal selection for linear prediction analysis of voiced speech. Speech Communication, 12(1), 69-81.

  • Manfredi, C., D’Aniello, M., Bruscaglioni, P., & Ismaelli, A. (2000). A comparative analysis of fundamental frequency estimation methods with application to pathological voices. Medical Engineering & Physics, 22(2), 135–147.

    Article  Google Scholar 

  • Markaki, M., & Stylianou, Y. 2009. Using modulation spectra for voice pathology detection and classification. Conf Proc IEEE Eng Med Biol Soc., (pp. 2514–2517).

  • Markaki, M., Stylianou, Y., Arias-Londoño, J., & Godino-Llorente, J. (2010). “Dysphonia detection based on modulation spectral features and cepstral coefficients,” ICASSP, (pp. 5162–5165).

  • MEEI: Disordered Voice Database, Voice and Speech Lab, Kay Elemetrics Corp., Version 1.03 (CD-ROM).

  • Mekyska, J., et al. (2015). Robust and complex approach of pathological speech signal analysis. Neurocomputing, 167, 94–111.

    Article  Google Scholar 

  • Mesallam, T., et al. (2017). Development of the Arabic voice pathology database and its evaluation by using speech features and machine learning algorithms. Journal of Healthcare Engineering, 2017, 1–13.

    Article  Google Scholar 

  • Michaelis, D., Gramss, T., & Strube, H. (1997). Glottal-to-noise excitation ratio a new measure for describing pathological voices. Acta Acustica United with Acustica, 83(4), 700–706.

    Google Scholar 

  • Moro-Velázquez, L., Gómez-García, J., & Godino-Llorente, J. (2016). Voice pathology detection using modulation spectrum-optimized metrics. Frontiers in Bioengineering and Biotechnology, 4, 1.

    Google Scholar 

  • Muhammad, G., et al. (2017a). Voice pathology detection using interlaced derivative pattern on glottal source excitation. Biomedical Signal Processing and Control, 31, 156–164.

    Article  Google Scholar 

  • Muhammad, G., Rahman, S., Alelaiwi, A., & Alamri, A. (2017b). Smart health solution integrating IOT and cloud: A case study of voice pathology monitoring. IEEE Communications Magazine, 55(1), 69–73.

    Article  Google Scholar 

  • Nemr, K., et al. (2012). GRBAS and Cape-V scales: High reliability and consensus when applied at different times. Journal of Voice, 26(6), 812.e17-812.e22.

    Article  Google Scholar 

  • Nongpiur, R., & Shpak, D. (2013). Impulse-noise suppression in speech using the stationary wavelet transform. The Journal of the Acoustical Society of America, 133(2), 866–879.

    Article  Google Scholar 

  • Orozco-Arroyave, J., Bonilla, J., & Trejos, E. (2012). Acoustic analysis and non-linear dynamics applied to voice pathology detection: A review. Recent Patents on Signal Processing, 2(2), 96–107.

    Article  Google Scholar 

  • Orozco-Arroyave, J., et al. (2015). Characterization methods for the detection of multiple voice disorders: Neurological, functional, and laryngeal diseases. IEEE Journal of Biomedical and Health Informatics, 19(6), 1820–1828.

    Article  Google Scholar 

  • Parsa, V., & Jamieson, D. (2000). Identification of pathological voices using glottal noise measures. Journal of Speech, Language, and Hearing Research, 43(2), 469–485.

    Article  Google Scholar 

  • Parsa, V., & Jamieson, D. (2001). Acoustic discrimination of pathological voice. Journal of Speech, Language, and Hearing Research, 44(2), 327–339.

    Article  Google Scholar 

  • Patel, R., et al. (2018). Recommended protocols for instrumental assessment of voice: American speech-language hearing association expert panel to develop a protocol for instrumental assessment of vocal function. American Journal of Speech-Language Pathology, 27(3), 887–905.

    Article  Google Scholar 

  • Péan, V., Ouayoun, M., Fugain, C., Meyer, B., & Chouard, C. (2000). A fractal approach to normal and pathological voices. Acta Otolaryngologica, 120(2), 222–224.

    Article  MATH  Google Scholar 

  • Qi, Y., & Hillman, R. (1997). Temporal and spectral estimations of harmonics-to-noise ratio in human voice signals. The Journal of the Acoustical Society of America, 102(1), 537–543.

    Article  Google Scholar 

  • Rosa, M., Pereira, J., & Grellet, M. (2000). Adaptive estimation of residue signal for voice pathology diagnosis. IEEE Transactions on Biomedical Engineering, 47(1), 96–104.

    Article  Google Scholar 

  • Roy, N., Barkmeier-Kraemer, J., Eadie, T., Sivasankar, M., Mehta, D., Paul, D., & Hillman, R. (2013). Evidence-based clinical voice assessment: A systematic review. American Journal of Speech-Language Pathology, 22(2), 212–226.

    Article  Google Scholar 

  • Saldanha, J., Ananthakrishna, T., & Pinto, R. (2014). Vocal fold pathology assessment using Mel-frequency cepstral coefficients and linear predictive cepstral coefficients features. Journal of Medical Imaging and Health Informatics, 2, 168–173.

    Article  Google Scholar 

  • Silva, D., Oliveira, L., & Andrea, M. (2009). Jitter estimation algorithms for detection of pathological voices. EURASIP Journal on Advances in Signal Processing, 9(1–9), 9.

    MATH  Google Scholar 

  • Sreehari, V., & Mary, L. (2018). Automatic speaker recognition using stationary wavelet coefficients of lp residual. In: TENCON 2018–2018 IEEE Region 10 Conference, (pp. 1595–1600).

  • Stemple, J., Glaze, L., & Klaben, B. (2010). Clinical voice pathology: Theory and management. Plural.

    Google Scholar 

  • Titze, I. (2006). The myoelastic aerodynamic theory of phonation. National Center for Voice and Speech.

    Google Scholar 

  • Travieso, C., Alonso, J., Orozco-Arroyave, J., Vargas-Bonilla, J., Nth, E., & Ravelo-García, A. (2017). Detection of different voice diseases based on the nonlinear characterization of speech signals. Expert Systems with Applications, 82, 184–195.

    Article  Google Scholar 

  • Tsanas, A., Little, M., McSharry, P., & Ramig, L. (2010). Accurate telemonitoring of parkinson’s disease progression by noninvasive speech tests. IEEE Transactions on Biomedical Engineering, 57(4), 884–893.

    Article  Google Scholar 

  • Tsanas, A., Little, M., McSharry, P., & Ramig, L. (2011). Nonlinear speech analysis algorithms mapped to a standard metric achieve clinically useful quantification of average Parkinson’s disease symptom severity. Journal of the Royal Society Interface, 8(59), 842–855.

    Article  Google Scholar 

  • Umapathy, K., Krishnan, S., Parsa, V., & Jamieson, D. (2005). Discrimination of pathological voices using a time-frequency approach. IEEE Transactions on Biomedical Engineering, 52(3), 421–430.

    Article  Google Scholar 

  • Vasilakis, M., & Stylianou, Y. (2009). Voice pathology detection based on short-term jitter estimations in running speech. Folia Phoniatrica Logopedica, 61(3), 153–170.

    Article  Google Scholar 

  • Vaziri, G., Almasganj, F., & Behroozmand, R. (2010). Pathological assessment of patients’ speech signals using nonlinear dynamical analysis. Computers in Biology and Medicine, 40(1), 54–63.

    Article  Google Scholar 

  • Verdolini, K., & Ramig, L. (2001). Review: Occupational risks for voice problems. Logopedics Phoniatrics Vocology, 26(1), 37–46.

    Article  Google Scholar 

  • Vilda, P., et al. (2009). Glottal source biometrical signature for voice pathology detection. Speech Communication, 51(9), 759–781.

    Article  Google Scholar 

  • Watts, C., & Awan, S. (2011). Use of spectral/cepstral analyses for differentiating normal from hypofunctional voices in sustained vowel and continuous speech contexts. Journal of Speech, Language, and Hearing Research, 54(6), 1525–1537.

    Article  Google Scholar 

  • Ye, H., Wang, G., & Ding, S. (2004). A new parity space approach for fault detection based on stationary wavelet transform. IEEE Transactions on Automatic Control, 49(2), 281–287.

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang, Y., Jiang, J., Biazzo, L., & Jorgensen, M. (2005). Perturbation and nonlinear dynamic analyses of voices from patients with unilateral laryngeal paralysis. Journal of Voice, 19(4), 519–528.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Girish Gidaye.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gidaye, G., Nirmal, J., Ezzine, K. et al. Unified wavelet-based framework for evaluation of voice impairment. Int J Speech Technol 25, 527–548 (2022). https://doi.org/10.1007/s10772-022-09969-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-022-09969-6

Keywords

Navigation