Abstract
We propose a new non-intrusive speech quality assessment algorithm based on Support Vector Regression (SVR) and Mel Frequency Cepstral Coefficients (MFCCs). The basic idea is to map the MFCCs into the desired quality score using SVR. The sensitivity of the MFCCs to external noise is exploited to gauge the changes in the speech signal to evaluate its perceptual quality. The use of SVR exploits the advantages of machine learning with the ability to learn complex data patterns for an effective and generalized mapping of features into a perceptual score, in contrast with the oft-utilized feature pooling process in the existing speech quality estimators. Experimental results indicate that the proposed approach outperforms the standard P.563 algorithm for non-intrusive assessment of speech quality with a total of 1792 speech files and the associated subjective scores.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Au, O., Lam, K.: A novel output-based objective speech quality measure for wireless communication. In: Proc. 4th Int. Conf. Signal Process., vol. 1, pp. 666–669 (1998)
Falk, T., Xu, Q., Chan, W.Y.: Non-intrusive GMM-based speech quality measurement. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. 1, pp. 125–128 (2005)
Chen, G., Parsa, V.: Bayesian model based non-intrusive speech quality evaluation. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. 1, pp. 385–388 (2005)
Kim, D.: ANIQUE: An auditory model for single-ended speech quality estimation. IEEE Trans. Speech Audio Process. 13(5), 821–831 (2005)
Grancharov, V., David, Y., Jonas, L., Bastiaan, W.: Low Complexity Non Intrusive Speech Quality Assessment. IEEE Trans. Speech Audio Process. 14(6), 1948–1956 (2006)
Hu, Y., Loizou, P.C.: Evaluation of Objective Quality Measures for Speech Enhancement. IEEE Trans. Speech Audio Process. 16(1), 229–230 (2008)
Falk, T., Chan, W.Y.: Single-ended speech quality measurement using machine learning methods. IEEE Trans. Audio, Speech, Lang. Process. 14(6), 1935–1947 (2006)
Zhu, Q., Alwan, A.: The Effect of Additive Noise on Speech Amplitude Spectra: A Quantitative Analysis. IEEE Signal Processing Letters 9(9), 275–277 (2002)
Scholkopf, Smola, A.J.: Learning with kernels. MIT Press, Cambridge (2002)
Hu, Y., Loizou, P.C.: Subjective comparison and evaluation of speech enhancement algorithms. Speech Commun. 49, 588–601 (2007)
ITU-T.: Subjective test methodology for evaluating speech communication systems that include noise suppression algorithms. In: ITU-T Rec. P.835, Geneva, Switzerland (2003)
ITU-T.: Single-ended method for objective speech quality assessment in narrow-band telephony applications. In: ITU-T P.563Geneva, Switzerland (2004)
Rix, A.: Perceptual speech quality assessment - A Review. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., May 2004, vol. 3, pp. 1056–1059 (2004)
Psytechnics Limited: NiQA - Product Description. Tech. Rep. (January 2003), http://www.psytechnics.com/pages/products/niqa.php
SwissQual Inc.: NiNA - SwissQual’s non-intrusive algorithm for estimating the subjective quality of live speech. Tech. Rep. (June 2001), http://www.swissqual.com/HTML/ninapage.htm
Murty, K., Yegnanarayana, B.: Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Processing Letters 13(1), 52–55 (2006)
ITU-T.: Perceptual evaluation of speech quality. In: ITU-TP.862 Recommendation (Febrauary 2001)
Müller, M.: Information Retrieval for Music and Motion. Springer, New York (2007)
Shao, X., Milner, B.: Clean speech reconstruction from noisy Mel frequency cepstral coefficients using a sinusoidal model. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. I, pp. 704–707 (2003)
Boucheron, L., Philip, L.: On the inversion of Mel frequency cepstral coefficients for Speech Enhancement Applications. In: Proc. ICSES, pp. 485–488 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Narwaria, M., Lin, W., McLoughlin, I.V., Emmanuel, S., Tien, C.L. (2010). Non-intrusive Speech Quality Assessment with Support Vector Regression. In: Boll, S., Tian, Q., Zhang, L., Zhang, Z., Chen, YP.P. (eds) Advances in Multimedia Modeling. MMM 2010. Lecture Notes in Computer Science, vol 5916. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11301-7_34
Download citation
DOI: https://doi.org/10.1007/978-3-642-11301-7_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11300-0
Online ISBN: 978-3-642-11301-7
eBook Packages: Computer ScienceComputer Science (R0)