Low-complexity disordered speech quality estimation

Ali, Yousef S. Ettomi; Parsa, Vijay; Doyle, Phillip; Berkane, Soulaimane

doi:10.1007/s10772-020-09688-w

Low-complexity disordered speech quality estimation

Published: 20 February 2020

Volume 23, pages 585–594, (2020)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Yousef S. Ettomi Ali ORCID: orcid.org/0000-0003-2793-4949¹,
Vijay Parsa^1,2,
Phillip Doyle² &
…
Soulaimane Berkane³

147 Accesses
1 Citation
Explore all metrics

A Correction to this article was published on 25 March 2020

This article has been updated

Abstract

Tracheoesophageal (TE) speech is generated by patients who have undergone a total laryngectomy where the larynx (voice box) is removed and replaced by a tracheoesophageal puncture. This work presents a novel low complexity algorithm to estimate the degree of severity of disordered TE speech. The proposed algorithm has two output scores which are computed from 20 ms voiced frames of the speech signal. An 18th order Linear Prediction (LP) analysis is performed on each voiced frame of the speech signal. The first output score uses features derived from high order statistics (mean, variance, skewness and kurtosis) which are calculated from the LP coefficients, the cepstral coefficients and the LP residual signal. These high order statistics (HOS) along with the pitch value are averaged over all voiced frames yielding a total of 14 HOS quality features. The second output score is derived from features derived from the estimated vocal tract model parameters (cross-sectional tubes areas). Statistical vocal tract parameters (VTPs) across all voiced speech frames were used as speech quality features. Forward stepwise regression as well as K-fold cross validation are then used to select the best sets of features to be fed to the regression models. The results show high correlations with subjective scores for several regression techniques that can provide a correlation up to 0.91 when VTP-Gaussian model is used.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Influence of Reverberation on Automatic Evaluation of Intelligibility with Prosodic Features

Accuracy Optimization in Speech Pathology Diagnosis with Data Preprocessing Techniques

Spectral Analysis of Speech Signal Characteristics: A Comparison Between Healthy Controls and Laryngeal Disorder

Change history

25 March 2020
The original version of this article unfortunately contained a mistake in the PDF and HTML version. The spelling of the third author’s name, Philip Doyle, has been corrected. Additionally, the affiliation for Vijay Parsa and Philip Doyle is ‘School of Communication Sciences and Disorders’.

References

Ali, Y., Parsa, V., Doyle, P., & Berkane, S. (2017). Disordered speech quality estimation using the matching pursuit algorithm. In The 30th annual IEEE Canadian conference on electrical and computer engineering.
Alonso, J. B., De Leon, J., Alonso, I., & Ferrer, M. A. (2001). Automatic detection of pathologies in the voice by HOS based parameters. EURASIP Journal on Applied Signal Processing, 4, 275–284.
Article Google Scholar
Awan, S. N., & Frenkel, M. L. (1994). Improvements in estimating the harmonics-to-noise ratio of the voice. Journal of Voice, 8(3), 255–262.
Article Google Scholar
Awan, S. N., Roy, N., Jetté, M. E., Meltzner, G. S., & Hillman, R. E. (2010). Quantifying dysphonia severity using a spectral/cepstral-based acoustic index: Comparisons with auditory-perceptual judgements from the cape-v. Clinical Linguistics & Phonetics, 24(9), 742–758.
Article Google Scholar
Beerends, J. G., Schmidmer, C., Berger, J., Obermann, M., Ullmann, R., Pomy, J., et al. (2013). Perceptual objective listening quality assessment (POLQA), the third generation ITU-T standard for end-to-end speech quality measurement part i—Temporal alignment. Journal of the Audio Engineering Society, 61(6), 366–384.
Google Scholar
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.
MATH Google Scholar
Eadie, T. L., & Doyle, P. C. (2002). Direct magnitude estimation and interval scaling of naturalness and severity in tracheoesophageal (te) speakers. Journal of Speech, Language, and Hearing Research, 45(6), 1088–1096.
Article Google Scholar
Eadie, T. L., & Doyle, P. C. (2005). Scaling of voice pleasantness and acceptability in tracheoesophageal speakers. Journal of Voice, 19(3), 373–383.
Article Google Scholar
Grancharov, V., Zhao, D. Y., Lindblom, J., & Kleijn, W. B. (2006). Low-complexity, nonintrusive speech quality assessment. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1948–1956.
Article Google Scholar
Gray, P., Hollier, M., & Massara, R. (2000). Non-intrusive speech-quality assessment using vocal-tract models. IEEE Proceedings on Vision, Image and Signal Processing, 147(6), 493–501.
Article Google Scholar
Gu, L., Harris, J. G., Shrivastav, R., & Sapienza, C. (2005). Disordered speech assessment using automatic methods based on quantitative measures. EURASIP Journal on Advances in Signal Processing, 2005(9), 768125.
Article Google Scholar
Hirano, M. (1981). Clinical examination of voice (Vol. 5). New York: Springer.
Google Scholar
Kates, J. M., & Arehart, K. H. (2010). The hearing-aid speech quality index (HASQI). Journal of the Audio Engineering Society, 58(5), 363–381.
Google Scholar
Kempster, G. B., Gerratt, B. R., Abbott, K. V., Barkmeier-Kraemer, J., & Hillman, R. E. (2009). Consensus auditory-perceptual evaluation of voice: Development of a standardized clinical protocol. American Journal of Speech-Language Pathology, 18(2), 124–132.
Article Google Scholar
Lee, J., & Hahn, M. (2009). Automatic assessment of pathological voice quality using higher-order statistics in the LPC residual domain. EURASIP Journal on Advances in Signal Processing,. https://doi.org/10.1155/2009/748207.
Article MATH Google Scholar
Malfait, L., Berger, J., & Kastner, M. (2006). P. 563–The ITU-T standard for single-ended speech quality assessment. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1924–1934.
Article Google Scholar
Maniglia, A. J., Lundy, D. S., Casiano, R. C., & Swim, S. C. (1989). Speech restoration and complications of primary versus secondary tracheoesophageal puncture following total laryngectomy. The Laryngoscope, 99(5), 489–491.
Article Google Scholar
Maryn, Y., Roy, N., De Bodt, M., Van Cauwenberge, P., & Corthals, P. (2009). Acoustic measurement of overall voice quality: A meta-analysis. The Journal of the Acoustical Society of America, 126(5), 2619–2634.
Article Google Scholar
Nemer, E., Goubran, R., & Mahmoud, S. (2001). Robust voice activity detection using higher-order statistics in the LPC residual domain. IEEE Transactions on Speech and Audio Processing, 9(3), 217–231.
Article Google Scholar
Parsa, V., & Jamieson, D. G. (2001). Acoustic discrimination of pathological voice: Sustained vowels versus continuous speech. Journal of Speech, Language, and Hearing Research, 44(2), 327–339.
Article Google Scholar
Pearson, K. (1895). Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58, 240–242.
Article Google Scholar
Picard, R. R., & Cook, R. D. (1984). Cross-validation of regression models. Journal of the American Statistical Association, 79(387), 575–583.
Article MathSciNet Google Scholar
Rabiner, L., Cheng, M., Rosenberg, A., & McGonegal, C. (1976). A comparative performance study of several pitch detection algorithms. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24(5), 399–418.
Article Google Scholar
Ritchings, R., McGillion, M., & Moore, C. (2002). Pathological voice quality assessment using artificial neural networks. Medical Engineering & Physics, 24(7), 561–564.
Article Google Scholar
Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)—A new method for speech quality assessment of telephone networks and codecs. In IEEE international conference on acoustics, speech, and signal processing (pp. 749–752).
Robbins, J., Fisher, H. B., Blom, E. C., & Singer, M. I. (1984). A comparative acoustic study of normal, esophageal, and tracheoesophageal speech production. Journal of Speech and Hearing disorders, 49(2), 202–210.
Article Google Scholar
Stolzenberg, R. M. (2004). Multiple regression analysis. Handbook of Data Analysis, 165, 208.
Google Scholar
Union, I. T. (1996). ITU-T recommendation P.800: Methods for subjective determination of transmission quality. International Telecommunication Union.
Ward, E. C., & van As-Brooks, C. J. (2014). Head and neck cancer: Treatment, rehabilitation, and outcomes. San Diego: Plural Publishing.
Google Scholar

Download references

Acknowledgements

Funding from the Natural Sciences and Engineering Research Council of Canada is gratefully acknowledged.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Western Ontario, London, ON, Canada
Yousef S. Ettomi Ali & Vijay Parsa
School of Communications and Speech Disorders, University of Western Ontario, London, ON, Canada
Vijay Parsa & Phillip Doyle
Department of Computer Sciences and Engineering, University of Quebec in Outaouais, Gatineau, QC, Canada
Soulaimane Berkane

Authors

Yousef S. Ettomi Ali
View author publications
You can also search for this author in PubMed Google Scholar
Vijay Parsa
View author publications
You can also search for this author in PubMed Google Scholar
Phillip Doyle
View author publications
You can also search for this author in PubMed Google Scholar
Soulaimane Berkane
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yousef S. Ettomi Ali.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The spelling of the third author’s name, Philip Doyle, was incorrect. Additionally, the affiliation for Vijay Parsa and Philip Doyle should read ‘School of Communication Sciences and Disorders’.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ali, Y.S.E., Parsa, V., Doyle, P. et al. Low-complexity disordered speech quality estimation. Int J Speech Technol 23, 585–594 (2020). https://doi.org/10.1007/s10772-020-09688-w

Download citation

Received: 11 June 2019
Accepted: 11 February 2020
Published: 20 February 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s10772-020-09688-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Low-complexity disordered speech quality estimation

Abstract

Access this article

Similar content being viewed by others

Influence of Reverberation on Automatic Evaluation of Intelligibility with Prosodic Features

Accuracy Optimization in Speech Pathology Diagnosis with Data Preprocessing Techniques

Spectral Analysis of Speech Signal Characteristics: A Comparison Between Healthy Controls and Laryngeal Disorder

Change history

25 March 2020

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Low-complexity disordered speech quality estimation

Abstract

Access this article

Similar content being viewed by others

Influence of Reverberation on Automatic Evaluation of Intelligibility with Prosodic Features

Accuracy Optimization in Speech Pathology Diagnosis with Data Preprocessing Techniques

Spectral Analysis of Speech Signal Characteristics: A Comparison Between Healthy Controls and Laryngeal Disorder

Change history

25 March 2020

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation