Abstract
Novel speech features calculated from third-order statistics of subband-filtered speech signals are introduced and studied for robust speech recognition. These features have the potential to capture nonlinear information not represented by cepstral coefficients. Also, because the features presented in this paper are based on the third-order moments, they may be more immune to Gaussian noise than cepstrals, as Gaussian distributions have zero third-order moments. Experiments on the AURORA2 database studying these features in combination with Mel-frequency cepstral coefficients (MFCC’s) are presented, and some improvement over the MFCC-only baseline is shown when clean speech is used for training, though the same improvement is not seen when multi-condition training data is used.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Gold, B., Morgan, N.: Speech and Audio Signal Processing. John Wiley and Sons, New York (2000)
Banbrook, M., McLaughlin, S.: Is Speech Chaotic? Presented at IEE Colloquium on Exploiting Chaos in Signal Processing (1994)
Banbrook, M., McLaughlin, S., Mann, I.: Speech characterization and synthesis by nonlinear methods. IEEE Transactions on Speech and Audio Processing 7, 1–17 (1999)
Teager, H.M., Teager, S.M.: Evidence for nonlinear sound production mechanisms in the vocal tract. Presented at NATO ASI on Speech Production and Speech Modelling (1990)
Hermansky, H.: Perceptual linear predictive (PLP) analysis for speech recognition. Presented at Journal of the Acoustical Society of America (1990)
Gu, L., Rose, K.: Perceptual harmonic cepstral coefficients for speech recognition in noisy environments. In: Presented at IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2001), Salt Lake City, UT (2001)
Boll, S.F.: Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing 27, 113–120 (1979)
Yu, K., Xu, B., Dai, M., Yu, C.: Suppressing cocktail party noise for speech recognition. In: Presented at 5th International conference on signal processing (WCCC-ICSP 2000), Beijing, China (2000)
Deng, L., Acero, A., Plumpe, M., Huang, X.: Large-Vocabulary Speech Recognition Under Adverse Acoustic Environments. In: Presented at Internation Conference on Spoken Language Processing (ICSLP), Beijing, China (2000)
Young, S., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book (1997)
Meyer, C., Rose, G.: Improved Noise Robustness By Corrective and Rival Training. In: Presented at International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2003 (2003)
Ott, E.: Chaos in dynamical systems. Cambridge University Press, Cambridge (1993)
Pitsikalis, V., Maragos, P.: Speech analysis and feature extraction using chaotic models. In: Presented at International Conference on Acoustics, Speech, and Signal Processing, ICASSP (2002)
Liu, X., Povinelli, R.J., Johnson, M.T.: Vowel Classification by Global Dynamic Modeling. In: Presented at ISCA Tutorial and Research Workshop on Non-linear Speech Processing (NOLISP), Le Croisic, France (2003)
Dimitriadis, D., Maragos, P., Potamianos, A.: Modulation features for speech recognition. In: Presented at International Conference on Acoustics, Speech, and Signal Processing, ICASSP (2002)
Johnson, M.T., Povinelli, R.J., Lindgren, A.C., Ye, J., Liu, X., Indrebo, K.M.: Time-Domain Isolated Phoneme Classification using Reconstructed Phase Spaces. IEEE Transactions on Speech and Audio Processing (in press)
Indrebo, K.M., Povinelli, R.J., Johnson, M.T.: Sub-banded Reconstructed Phase Spaces for Speech Recognition. Speech Communication (in press)
Pearce, D., Hirsch, H.: The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions, Beijing, China (2000)
HTK Version 2.1, Entropic Cambridge Research Laboratory Ltd. (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Indrebo, K.M., Povinelli, R.J., Johnson, M.T. (2006). Third-Order Moments of Filtered Speech Signals for Robust Speech Recognition. In: Faundez-Zanuy, M., Janer, L., Esposito, A., Satue-Villar, A., Roure, J., Espinosa-Duro, V. (eds) Nonlinear Analyses and Algorithms for Speech Processing. NOLISP 2005. Lecture Notes in Computer Science(), vol 3817. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11613107_24
Download citation
DOI: https://doi.org/10.1007/11613107_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31257-4
Online ISBN: 978-3-540-32586-4
eBook Packages: Computer ScienceComputer Science (R0)