Skip to main content
Log in

Non-intrusive speech quality assessment using several combinations of auditory features

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Quality estimation of speech is essential for monitoring and maintenance of the quality of service at different nodes of modern telecommunication networks. It is also required in the selection of codecs in speech communication systems. There is no requirement of the original clean speech signal as a reference in non-intrusive speech quality evaluation, and thus it is of importance in evaluating the quality of speech at any node of the communication network. In this paper, non-intrusive speech quality assessment of narrowband speech is done by Gaussian Mixture Model (GMM) training using several combinations of auditory perception and speech production features, which include principal components of Lyon’s auditory model features, MFCC, LSF and their first and second differences. Results are obtained and compared for several combinations of auditory features for three sets of databases. The results are also compared with ITU-T Recommendation P.563 for non-intrusive speech quality assessment. It is found that many combinations of these feature sets outperform the ITU-T P.563 Recommendation under the test conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://www.ee.columbia.edu/~dpwe/resources/matlab/rastamat.

  2. http://www.utdallas.edu/~loizou/speech/noizeus.

References

  • ITU-T Recommendation P.800 (1996). Methods for subjective determination of transmission quality. International Telecommunication Union-Telecommunication Standardization Sector, Geneva.

  • Fegyo, T., Szarvas, M., Tatai, P., & Gordos, G. (2000). Objective speech quality estimation for analog mobile channels: problems and solutions. International Journal of Speech Technology, 3, 277–287.

    Article  MATH  Google Scholar 

  • ITU-T Recommendation P.862 (1996). Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrow-band telephone networks. International Telecommunication Union-Telecommunication Standardization Sector, Geneva.

  • ITU-T Recommendation P.861 (1996). Objective quality measurement of telephone band (300–3400 Hz) speech codecs. International Telecommunication Union-Telecommunication Standardization Sector, Geneva.

  • ITU-T Recommendation P.563 (2004). Single ended method for objective speech quality assessment in narrow-band telephony applications. International Telecommunication Union- Telecommunication Standardization Sector, Geneva.

  • Liang, J., & Kubichek, R. (1994). Output based objective speech quality. In IEEE 44th vehicular technology conf. (Vol. 3(8–10), pp. 1719–1723).

    Google Scholar 

  • Au, O. C., & Lam, K. (1998). A novel output based objective speech quality measure for wireless communication. In Proc. 4th international conf. on signal processing (Vol. 1, pp. 666–669).

    Google Scholar 

  • Gray, P., Hollier, M., & Massara, R. (2000). Non-intrusive speech quality assessment using vocal-tract models. IEE Proceedings. Vision, Image and Signal Processing, 147(6), 493–501.

    Article  Google Scholar 

  • Malfait, L., Berger, J., & Kastner, M. (2006). P.563-The ITU-T standard for single-ended speech quality assessment. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1924–1934.

    Article  Google Scholar 

  • Kim, D. S. (2005). ANIQUE: an auditory model for single ended speech quality estimation. IEEE Transactions on Audio, Speech, and Language Processing, 13(5), 821–831.

    Article  Google Scholar 

  • Grancharov, V., Zhao, D. Y., Lindblom, J., & Kleijn, W. B. (2006). Low-complexity, non-intrusive speech quality assessment. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1948–1956.

    Article  Google Scholar 

  • Falk, T., Xu, Q., & Chan, W. Y. (2005). Non-intrusive GMM based speech quality measurement. In Proc. IEEE international conf. on acoustics, speech and signal processing (Vol. 1, pp. 125–128).

    Google Scholar 

  • Chen, G., & Parsa, V. (2006). Bayesian model based non-intrusive speech quality evaluation. In Proc. IEEE international conf. on acoustics, speech and signal processing (Vol. 1, pp. 385–388).

    Google Scholar 

  • Falk, T., & Chan, W. Y. (2006a). Enhanced non-intrusive speech quality measurement using degradation models. In Proc. IEEE international conf. on acoustics, speech and signal processing (Vol. 1, pp. 837–840).

    Google Scholar 

  • Falk, T., & Chan, W. Y. (2006b). Single-ended speech quality measurement using machine learning methods. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1935–1947.

    Article  Google Scholar 

  • Chen, G., & Parsa, V. (2005). Non-intrusive speech quality evaluation using an adaptive neuro-fuzzy inference system. IEEE Signal Processing Letters, 12(5), 403–406.

    Article  Google Scholar 

  • Slaney, M. (1988). Lyon’s cochlear model. Advanced technology group, apple technical report no. 13, Apple Computer Inc.

  • Lyon, R. F. (1982). A computational model of filtering, detection, and compression in the cochlea. In Proc. IEEE international conf. on acoustics, speech and signal processing (pp. 1282–1285).

    Google Scholar 

  • Jing, Z., & Johnson, M. H. (2002). Auditory modeling inspired methods of feature extraction for robust automatic speech recognition. In Proc. IEEE international conf. on acoustics, speech and signal processing (Vol. 4, pp. 4176–4179).

    Google Scholar 

  • ITU-T Recommendation P. Supplement-23 (1998). ITU-T Coded-Speech database (1998). International telecommunication union-telecommunication standardization sector, Geneva.

  • Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238.

    Article  Google Scholar 

  • Hu, Y., & Loizou, P. C. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communications, 49, 588–601.

    Article  Google Scholar 

  • Narwaria, M., Lin, W., McLoughlin, I. V., Emmanuel, S., & Tien, C. L. (2010). Non-intrusive speech quality assessment with support vector regression. In Advances in multimedia modeling, 16th International multimedia modeling conf. (Vol. 5916, pp. 325–335). Berlin: Springer.

    Chapter  Google Scholar 

  • Hasan, M. R., Jamil, M., Rabbani, M. G., & Rahman, M. S. (2004). Speaker identification using mel frequency cepstral coefficients. In 3rd international conf. on electrical & computer engineering (pp. 565–568). Dhaka: ICECE.

    Google Scholar 

  • Bozkurt, E., Erzin, E., Erdem, C. E., & Erdem, A. T. (2010). Use of line spectral frequencies for emotion recognition from speech. In IEEE international conf. on pattern recognition, Turkey (pp. 3708–3711).

    Chapter  Google Scholar 

  • Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America, 87, 1738–1752.

    Article  Google Scholar 

  • Audhkhasi, K., & Kumar, A. (2010). Two scale auditory features based non-intrusive speech quality evaluation. IETE Journal of Research, 56(2), 111–118.

    Article  Google Scholar 

  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B. Methodological, 39(1), 1–38.

    MathSciNet  MATH  Google Scholar 

  • Karmakar, A., Kumar, A., & Patney, R. K. (2006). A multiresolution model of auditory excitation pattern and its application to objective evaluation of perceived speech quality. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1912–1923.

    Article  Google Scholar 

  • Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Englewood: Prentice-Hall.

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank Mr. Yi Hu and Dr. Philipos C. Loizou of Department of Electrical Engineering, The University of Texas at Dallas, Richardson, TX75083-0688, USA for providing the NOIZEUS-2240 database of 2240 speech utterances passed through different speech processing algorithms at different noisy conditions to create the different degradations. The authors would also like to thank Prof. S.C. Saxena, Vice Chancellor (Acting), Jaypee Institute of Information Technology, Noida, India for providing the suitable environment to the corresponding author to complete the work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rajesh Kumar Dubey.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dubey, R.K., Kumar, A. Non-intrusive speech quality assessment using several combinations of auditory features. Int J Speech Technol 16, 89–101 (2013). https://doi.org/10.1007/s10772-012-9162-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-012-9162-4

Keywords

Navigation