Skip to main content
Log in

Low-complexity disordered speech quality estimation

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

A Correction to this article was published on 25 March 2020

This article has been updated

Abstract

Tracheoesophageal (TE) speech is generated by patients who have undergone a total laryngectomy where the larynx (voice box) is removed and replaced by a tracheoesophageal puncture. This work presents a novel low complexity algorithm to estimate the degree of severity of disordered TE speech. The proposed algorithm has two output scores which are computed from 20 ms voiced frames of the speech signal. An 18th order Linear Prediction (LP) analysis is performed on each voiced frame of the speech signal. The first output score uses features derived from high order statistics (mean, variance, skewness and kurtosis) which are calculated from the LP coefficients, the cepstral coefficients and the LP residual signal. These high order statistics (HOS) along with the pitch value are averaged over all voiced frames yielding a total of 14 HOS quality features. The second output score is derived from features derived from the estimated vocal tract model parameters (cross-sectional tubes areas). Statistical vocal tract parameters (VTPs) across all voiced speech frames were used as speech quality features. Forward stepwise regression as well as K-fold cross validation are then used to select the best sets of features to be fed to the regression models. The results show high correlations with subjective scores for several regression techniques that can provide a correlation up to 0.91 when VTP-Gaussian model is used.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Change history

  • 25 March 2020

    The original version of this article unfortunately contained a mistake in the PDF and HTML version. The spelling of the third author’s name, Philip Doyle, has been corrected. Additionally, the affiliation for Vijay Parsa and Philip Doyle is ‘School of Communication Sciences and Disorders’.

References

  • Ali, Y., Parsa, V., Doyle, P., & Berkane, S. (2017). Disordered speech quality estimation using the matching pursuit algorithm. In The 30th annual IEEE Canadian conference on electrical and computer engineering.

  • Alonso, J. B., De Leon, J., Alonso, I., & Ferrer, M. A. (2001). Automatic detection of pathologies in the voice by HOS based parameters. EURASIP Journal on Applied Signal Processing, 4, 275–284.

    Article  Google Scholar 

  • Awan, S. N., & Frenkel, M. L. (1994). Improvements in estimating the harmonics-to-noise ratio of the voice. Journal of Voice, 8(3), 255–262.

    Article  Google Scholar 

  • Awan, S. N., Roy, N., Jetté, M. E., Meltzner, G. S., & Hillman, R. E. (2010). Quantifying dysphonia severity using a spectral/cepstral-based acoustic index: Comparisons with auditory-perceptual judgements from the cape-v. Clinical Linguistics & Phonetics, 24(9), 742–758.

    Article  Google Scholar 

  • Beerends, J. G., Schmidmer, C., Berger, J., Obermann, M., Ullmann, R., Pomy, J., et al. (2013). Perceptual objective listening quality assessment (POLQA), the third generation ITU-T standard for end-to-end speech quality measurement part i—Temporal alignment. Journal of the Audio Engineering Society, 61(6), 366–384.

    Google Scholar 

  • Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.

    MATH  Google Scholar 

  • Eadie, T. L., & Doyle, P. C. (2002). Direct magnitude estimation and interval scaling of naturalness and severity in tracheoesophageal (te) speakers. Journal of Speech, Language, and Hearing Research, 45(6), 1088–1096.

    Article  Google Scholar 

  • Eadie, T. L., & Doyle, P. C. (2005). Scaling of voice pleasantness and acceptability in tracheoesophageal speakers. Journal of Voice, 19(3), 373–383.

    Article  Google Scholar 

  • Grancharov, V., Zhao, D. Y., Lindblom, J., & Kleijn, W. B. (2006). Low-complexity, nonintrusive speech quality assessment. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1948–1956.

    Article  Google Scholar 

  • Gray, P., Hollier, M., & Massara, R. (2000). Non-intrusive speech-quality assessment using vocal-tract models. IEEE Proceedings on Vision, Image and Signal Processing, 147(6), 493–501.

    Article  Google Scholar 

  • Gu, L., Harris, J. G., Shrivastav, R., & Sapienza, C. (2005). Disordered speech assessment using automatic methods based on quantitative measures. EURASIP Journal on Advances in Signal Processing, 2005(9), 768125.

    Article  Google Scholar 

  • Hirano, M. (1981). Clinical examination of voice (Vol. 5). New York: Springer.

    Google Scholar 

  • Kates, J. M., & Arehart, K. H. (2010). The hearing-aid speech quality index (HASQI). Journal of the Audio Engineering Society, 58(5), 363–381.

    Google Scholar 

  • Kempster, G. B., Gerratt, B. R., Abbott, K. V., Barkmeier-Kraemer, J., & Hillman, R. E. (2009). Consensus auditory-perceptual evaluation of voice: Development of a standardized clinical protocol. American Journal of Speech-Language Pathology, 18(2), 124–132.

    Article  Google Scholar 

  • Lee, J., & Hahn, M. (2009). Automatic assessment of pathological voice quality using higher-order statistics in the LPC residual domain. EURASIP Journal on Advances in Signal Processing,. https://doi.org/10.1155/2009/748207.

    Article  MATH  Google Scholar 

  • Malfait, L., Berger, J., & Kastner, M. (2006). P. 563–The ITU-T standard for single-ended speech quality assessment. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1924–1934.

    Article  Google Scholar 

  • Maniglia, A. J., Lundy, D. S., Casiano, R. C., & Swim, S. C. (1989). Speech restoration and complications of primary versus secondary tracheoesophageal puncture following total laryngectomy. The Laryngoscope, 99(5), 489–491.

    Article  Google Scholar 

  • Maryn, Y., Roy, N., De Bodt, M., Van Cauwenberge, P., & Corthals, P. (2009). Acoustic measurement of overall voice quality: A meta-analysis. The Journal of the Acoustical Society of America, 126(5), 2619–2634.

    Article  Google Scholar 

  • Nemer, E., Goubran, R., & Mahmoud, S. (2001). Robust voice activity detection using higher-order statistics in the LPC residual domain. IEEE Transactions on Speech and Audio Processing, 9(3), 217–231.

    Article  Google Scholar 

  • Parsa, V., & Jamieson, D. G. (2001). Acoustic discrimination of pathological voice: Sustained vowels versus continuous speech. Journal of Speech, Language, and Hearing Research, 44(2), 327–339.

    Article  Google Scholar 

  • Pearson, K. (1895). Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58, 240–242.

    Article  Google Scholar 

  • Picard, R. R., & Cook, R. D. (1984). Cross-validation of regression models. Journal of the American Statistical Association, 79(387), 575–583.

    Article  MathSciNet  Google Scholar 

  • Rabiner, L., Cheng, M., Rosenberg, A., & McGonegal, C. (1976). A comparative performance study of several pitch detection algorithms. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24(5), 399–418.

    Article  Google Scholar 

  • Ritchings, R., McGillion, M., & Moore, C. (2002). Pathological voice quality assessment using artificial neural networks. Medical Engineering & Physics, 24(7), 561–564.

    Article  Google Scholar 

  • Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)—A new method for speech quality assessment of telephone networks and codecs. In IEEE international conference on acoustics, speech, and signal processing (pp. 749–752).

  • Robbins, J., Fisher, H. B., Blom, E. C., & Singer, M. I. (1984). A comparative acoustic study of normal, esophageal, and tracheoesophageal speech production. Journal of Speech and Hearing disorders, 49(2), 202–210.

    Article  Google Scholar 

  • Stolzenberg, R. M. (2004). Multiple regression analysis. Handbook of Data Analysis, 165, 208.

    Google Scholar 

  • Union, I. T. (1996). ITU-T recommendation P.800: Methods for subjective determination of transmission quality. International Telecommunication Union.

  • Ward, E. C., & van As-Brooks, C. J. (2014). Head and neck cancer: Treatment, rehabilitation, and outcomes. San Diego: Plural Publishing.

    Google Scholar 

Download references

Acknowledgements

Funding from the Natural Sciences and Engineering Research Council of Canada is gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yousef S. Ettomi Ali.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The spelling of the third author’s name, Philip Doyle, was incorrect. Additionally, the affiliation for Vijay Parsa and Philip Doyle should read ‘School of Communication Sciences and Disorders’.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ali, Y.S.E., Parsa, V., Doyle, P. et al. Low-complexity disordered speech quality estimation. Int J Speech Technol 23, 585–594 (2020). https://doi.org/10.1007/s10772-020-09688-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-020-09688-w

Keywords

Navigation