Non-intrusive speech quality assessment using several combinations of auditory features

Dubey, Rajesh Kumar; Kumar, Arun

doi:10.1007/s10772-012-9162-4

Non-intrusive speech quality assessment using several combinations of auditory features

Published: 27 June 2012

Volume 16, pages 89–101, (2013)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Rajesh Kumar Dubey¹ &
Arun Kumar²

518 Accesses
25 Citations
Explore all metrics

Abstract

Quality estimation of speech is essential for monitoring and maintenance of the quality of service at different nodes of modern telecommunication networks. It is also required in the selection of codecs in speech communication systems. There is no requirement of the original clean speech signal as a reference in non-intrusive speech quality evaluation, and thus it is of importance in evaluating the quality of speech at any node of the communication network. In this paper, non-intrusive speech quality assessment of narrowband speech is done by Gaussian Mixture Model (GMM) training using several combinations of auditory perception and speech production features, which include principal components of Lyon’s auditory model features, MFCC, LSF and their first and second differences. Results are obtained and compared for several combinations of auditory features for three sets of databases. The results are also compared with ITU-T Recommendation P.563 for non-intrusive speech quality assessment. It is found that many combinations of these feature sets outperform the ITU-T P.563 Recommendation under the test conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Non-intrusive speech quality prediction based on the blind estimation of clean speech and the i-vector framework

Article 06 October 2020

Recent Advances in Speech Quality Assessment and Their Implementation

Modulation Spectral Features for Intrusive Measurement of Reverberant Speech Quality

Notes

References

ITU-T Recommendation P.800 (1996). Methods for subjective determination of transmission quality. International Telecommunication Union-Telecommunication Standardization Sector, Geneva.
Fegyo, T., Szarvas, M., Tatai, P., & Gordos, G. (2000). Objective speech quality estimation for analog mobile channels: problems and solutions. International Journal of Speech Technology, 3, 277–287.
Article MATH Google Scholar
ITU-T Recommendation P.862 (1996). Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrow-band telephone networks. International Telecommunication Union-Telecommunication Standardization Sector, Geneva.
ITU-T Recommendation P.861 (1996). Objective quality measurement of telephone band (300–3400 Hz) speech codecs. International Telecommunication Union-Telecommunication Standardization Sector, Geneva.
ITU-T Recommendation P.563 (2004). Single ended method for objective speech quality assessment in narrow-band telephony applications. International Telecommunication Union- Telecommunication Standardization Sector, Geneva.
Liang, J., & Kubichek, R. (1994). Output based objective speech quality. In IEEE 44th vehicular technology conf. (Vol. 3(8–10), pp. 1719–1723).
Google Scholar
Au, O. C., & Lam, K. (1998). A novel output based objective speech quality measure for wireless communication. In Proc. 4th international conf. on signal processing (Vol. 1, pp. 666–669).
Google Scholar
Gray, P., Hollier, M., & Massara, R. (2000). Non-intrusive speech quality assessment using vocal-tract models. IEE Proceedings. Vision, Image and Signal Processing, 147(6), 493–501.
Article Google Scholar
Malfait, L., Berger, J., & Kastner, M. (2006). P.563-The ITU-T standard for single-ended speech quality assessment. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1924–1934.
Article Google Scholar
Kim, D. S. (2005). ANIQUE: an auditory model for single ended speech quality estimation. IEEE Transactions on Audio, Speech, and Language Processing, 13(5), 821–831.
Article Google Scholar
Grancharov, V., Zhao, D. Y., Lindblom, J., & Kleijn, W. B. (2006). Low-complexity, non-intrusive speech quality assessment. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1948–1956.
Article Google Scholar
Falk, T., Xu, Q., & Chan, W. Y. (2005). Non-intrusive GMM based speech quality measurement. In Proc. IEEE international conf. on acoustics, speech and signal processing (Vol. 1, pp. 125–128).
Google Scholar
Chen, G., & Parsa, V. (2006). Bayesian model based non-intrusive speech quality evaluation. In Proc. IEEE international conf. on acoustics, speech and signal processing (Vol. 1, pp. 385–388).
Google Scholar
Falk, T., & Chan, W. Y. (2006a). Enhanced non-intrusive speech quality measurement using degradation models. In Proc. IEEE international conf. on acoustics, speech and signal processing (Vol. 1, pp. 837–840).
Google Scholar
Falk, T., & Chan, W. Y. (2006b). Single-ended speech quality measurement using machine learning methods. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1935–1947.
Article Google Scholar
Chen, G., & Parsa, V. (2005). Non-intrusive speech quality evaluation using an adaptive neuro-fuzzy inference system. IEEE Signal Processing Letters, 12(5), 403–406.
Article Google Scholar
Slaney, M. (1988). Lyon’s cochlear model. Advanced technology group, apple technical report no. 13, Apple Computer Inc.
Lyon, R. F. (1982). A computational model of filtering, detection, and compression in the cochlea. In Proc. IEEE international conf. on acoustics, speech and signal processing (pp. 1282–1285).
Google Scholar
Jing, Z., & Johnson, M. H. (2002). Auditory modeling inspired methods of feature extraction for robust automatic speech recognition. In Proc. IEEE international conf. on acoustics, speech and signal processing (Vol. 4, pp. 4176–4179).
Google Scholar
ITU-T Recommendation P. Supplement-23 (1998). ITU-T Coded-Speech database (1998). International telecommunication union-telecommunication standardization sector, Geneva.
Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238.
Article Google Scholar
Hu, Y., & Loizou, P. C. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communications, 49, 588–601.
Article Google Scholar
Narwaria, M., Lin, W., McLoughlin, I. V., Emmanuel, S., & Tien, C. L. (2010). Non-intrusive speech quality assessment with support vector regression. In Advances in multimedia modeling, 16th International multimedia modeling conf. (Vol. 5916, pp. 325–335). Berlin: Springer.
Chapter Google Scholar
Hasan, M. R., Jamil, M., Rabbani, M. G., & Rahman, M. S. (2004). Speaker identification using mel frequency cepstral coefficients. In 3rd international conf. on electrical & computer engineering (pp. 565–568). Dhaka: ICECE.
Google Scholar
Bozkurt, E., Erzin, E., Erdem, C. E., & Erdem, A. T. (2010). Use of line spectral frequencies for emotion recognition from speech. In IEEE international conf. on pattern recognition, Turkey (pp. 3708–3711).
Chapter Google Scholar
Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America, 87, 1738–1752.
Article Google Scholar
Audhkhasi, K., & Kumar, A. (2010). Two scale auditory features based non-intrusive speech quality evaluation. IETE Journal of Research, 56(2), 111–118.
Article Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B. Methodological, 39(1), 1–38.
MathSciNet MATH Google Scholar
Karmakar, A., Kumar, A., & Patney, R. K. (2006). A multiresolution model of auditory excitation pattern and its application to objective evaluation of perceived speech quality. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1912–1923.
Article Google Scholar
Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Englewood: Prentice-Hall.
Google Scholar

Download references

Acknowledgements

The authors would like to thank Mr. Yi Hu and Dr. Philipos C. Loizou of Department of Electrical Engineering, The University of Texas at Dallas, Richardson, TX75083-0688, USA for providing the NOIZEUS-2240 database of 2240 speech utterances passed through different speech processing algorithms at different noisy conditions to create the different degradations. The authors would also like to thank Prof. S.C. Saxena, Vice Chancellor (Acting), Jaypee Institute of Information Technology, Noida, India for providing the suitable environment to the corresponding author to complete the work.

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, Jaypee Institute of Information Technology (JIIT), Noida, India
Rajesh Kumar Dubey
CARE, Indian Institute of Technology (IIT), Delhi, India
Arun Kumar

Authors

Rajesh Kumar Dubey
View author publications
You can also search for this author in PubMed Google Scholar
Arun Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rajesh Kumar Dubey.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dubey, R.K., Kumar, A. Non-intrusive speech quality assessment using several combinations of auditory features. Int J Speech Technol 16, 89–101 (2013). https://doi.org/10.1007/s10772-012-9162-4

Download citation

Received: 21 April 2012
Accepted: 11 June 2012
Published: 27 June 2012
Issue Date: March 2013
DOI: https://doi.org/10.1007/s10772-012-9162-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Non-intrusive speech quality assessment using several combinations of auditory features

Abstract

Access this article

Similar content being viewed by others

Non-intrusive speech quality prediction based on the blind estimation of clean speech and the i-vector framework

Recent Advances in Speech Quality Assessment and Their Implementation

Modulation Spectral Features for Intrusive Measurement of Reverberant Speech Quality

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Non-intrusive speech quality assessment using several combinations of auditory features

Abstract

Access this article

Similar content being viewed by others

Non-intrusive speech quality prediction based on the blind estimation of clean speech and the i-vector framework

Recent Advances in Speech Quality Assessment and Their Implementation

Modulation Spectral Features for Intrusive Measurement of Reverberant Speech Quality

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation