Third-Order Moments of Filtered Speech Signals for Robust Speech Recognition

Indrebo, Kevin M.; Povinelli, Richard J.; Johnson, Michael T.

doi:10.1007/11613107_24

Kevin M. Indrebo²³,
Richard J. Povinelli²³ &
Michael T. Johnson²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3817))

Included in the following conference series:

International Conference on Nonlinear Analyses and Algorithms for Speech Processing

691 Accesses
3 Citations

Abstract

Novel speech features calculated from third-order statistics of subband-filtered speech signals are introduced and studied for robust speech recognition. These features have the potential to capture nonlinear information not represented by cepstral coefficients. Also, because the features presented in this paper are based on the third-order moments, they may be more immune to Gaussian noise than cepstrals, as Gaussian distributions have zero third-order moments. Experiments on the AURORA2 database studying these features in combination with Mel-frequency cepstral coefficients (MFCC’s) are presented, and some improvement over the MFCC-only baseline is shown when clean speech is used for training, though the same improvement is not seen when multi-condition training data is used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Gold, B., Morgan, N.: Speech and Audio Signal Processing. John Wiley and Sons, New York (2000)
Google Scholar
Banbrook, M., McLaughlin, S.: Is Speech Chaotic? Presented at IEE Colloquium on Exploiting Chaos in Signal Processing (1994)
Google Scholar
Banbrook, M., McLaughlin, S., Mann, I.: Speech characterization and synthesis by nonlinear methods. IEEE Transactions on Speech and Audio Processing 7, 1–17 (1999)
Article Google Scholar
Teager, H.M., Teager, S.M.: Evidence for nonlinear sound production mechanisms in the vocal tract. Presented at NATO ASI on Speech Production and Speech Modelling (1990)
Google Scholar
Hermansky, H.: Perceptual linear predictive (PLP) analysis for speech recognition. Presented at Journal of the Acoustical Society of America (1990)
Google Scholar
Gu, L., Rose, K.: Perceptual harmonic cepstral coefficients for speech recognition in noisy environments. In: Presented at IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2001), Salt Lake City, UT (2001)
Google Scholar
Boll, S.F.: Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing 27, 113–120 (1979)
Article Google Scholar
Yu, K., Xu, B., Dai, M., Yu, C.: Suppressing cocktail party noise for speech recognition. In: Presented at 5th International conference on signal processing (WCCC-ICSP 2000), Beijing, China (2000)
Google Scholar
Deng, L., Acero, A., Plumpe, M., Huang, X.: Large-Vocabulary Speech Recognition Under Adverse Acoustic Environments. In: Presented at Internation Conference on Spoken Language Processing (ICSLP), Beijing, China (2000)
Google Scholar
Young, S., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book (1997)
Google Scholar
Meyer, C., Rose, G.: Improved Noise Robustness By Corrective and Rival Training. In: Presented at International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2003 (2003)
Google Scholar
Ott, E.: Chaos in dynamical systems. Cambridge University Press, Cambridge (1993)
MATH Google Scholar
Pitsikalis, V., Maragos, P.: Speech analysis and feature extraction using chaotic models. In: Presented at International Conference on Acoustics, Speech, and Signal Processing, ICASSP (2002)
Google Scholar
Liu, X., Povinelli, R.J., Johnson, M.T.: Vowel Classification by Global Dynamic Modeling. In: Presented at ISCA Tutorial and Research Workshop on Non-linear Speech Processing (NOLISP), Le Croisic, France (2003)
Google Scholar
Dimitriadis, D., Maragos, P., Potamianos, A.: Modulation features for speech recognition. In: Presented at International Conference on Acoustics, Speech, and Signal Processing, ICASSP (2002)
Google Scholar
Johnson, M.T., Povinelli, R.J., Lindgren, A.C., Ye, J., Liu, X., Indrebo, K.M.: Time-Domain Isolated Phoneme Classification using Reconstructed Phase Spaces. IEEE Transactions on Speech and Audio Processing (in press)
Google Scholar
Indrebo, K.M., Povinelli, R.J., Johnson, M.T.: Sub-banded Reconstructed Phase Spaces for Speech Recognition. Speech Communication (in press)
Google Scholar
Pearce, D., Hirsch, H.: The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions, Beijing, China (2000)
Google Scholar
HTK Version 2.1, Entropic Cambridge Research Laboratory Ltd. (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Electrical and Computer Engineering, Marquette University, Milwaukee, Wisconsin, USA
Kevin M. Indrebo, Richard J. Povinelli & Michael T. Johnson

Authors

Kevin M. Indrebo
View author publications
You can also search for this author in PubMed Google Scholar
Richard J. Povinelli
View author publications
You can also search for this author in PubMed Google Scholar
Michael T. Johnson
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Escola Universitària Politècnica de Mataró, UPC, Spain
Marcos Faundez-Zanuy
Escola Universitària Politècnica de Mataró, Spain
Léonard Janer & Antonio Satue-Villar &
Department of Psychology, Second University of Naples, and IIASS, Via Pellegrino 19, 84019, Vietri sul Mare, (SA), Italy
Anna Esposito
The Auton Lab, Carnegie Mellon University, Pittsburgh, PA, USA
Josep Roure
Escola Universitària Politècnica de Mataró (UPC), Barcelona, Spain
Virginia Espinosa-Duro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Indrebo, K.M., Povinelli, R.J., Johnson, M.T. (2006). Third-Order Moments of Filtered Speech Signals for Robust Speech Recognition. In: Faundez-Zanuy, M., Janer, L., Esposito, A., Satue-Villar, A., Roure, J., Espinosa-Duro, V. (eds) Nonlinear Analyses and Algorithms for Speech Processing. NOLISP 2005. Lecture Notes in Computer Science(), vol 3817. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11613107_24

Download citation

DOI: https://doi.org/10.1007/11613107_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31257-4
Online ISBN: 978-3-540-32586-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics