Skip to main content

Third-Order Moments of Filtered Speech Signals for Robust Speech Recognition

  • Conference paper
Nonlinear Analyses and Algorithms for Speech Processing (NOLISP 2005)

Abstract

Novel speech features calculated from third-order statistics of subband-filtered speech signals are introduced and studied for robust speech recognition. These features have the potential to capture nonlinear information not represented by cepstral coefficients. Also, because the features presented in this paper are based on the third-order moments, they may be more immune to Gaussian noise than cepstrals, as Gaussian distributions have zero third-order moments. Experiments on the AURORA2 database studying these features in combination with Mel-frequency cepstral coefficients (MFCC’s) are presented, and some improvement over the MFCC-only baseline is shown when clean speech is used for training, though the same improvement is not seen when multi-condition training data is used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gold, B., Morgan, N.: Speech and Audio Signal Processing. John Wiley and Sons, New York (2000)

    Google Scholar 

  2. Banbrook, M., McLaughlin, S.: Is Speech Chaotic? Presented at IEE Colloquium on Exploiting Chaos in Signal Processing (1994)

    Google Scholar 

  3. Banbrook, M., McLaughlin, S., Mann, I.: Speech characterization and synthesis by nonlinear methods. IEEE Transactions on Speech and Audio Processing 7, 1–17 (1999)

    Article  Google Scholar 

  4. Teager, H.M., Teager, S.M.: Evidence for nonlinear sound production mechanisms in the vocal tract. Presented at NATO ASI on Speech Production and Speech Modelling (1990)

    Google Scholar 

  5. Hermansky, H.: Perceptual linear predictive (PLP) analysis for speech recognition. Presented at Journal of the Acoustical Society of America (1990)

    Google Scholar 

  6. Gu, L., Rose, K.: Perceptual harmonic cepstral coefficients for speech recognition in noisy environments. In: Presented at IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2001), Salt Lake City, UT (2001)

    Google Scholar 

  7. Boll, S.F.: Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing 27, 113–120 (1979)

    Article  Google Scholar 

  8. Yu, K., Xu, B., Dai, M., Yu, C.: Suppressing cocktail party noise for speech recognition. In: Presented at 5th International conference on signal processing (WCCC-ICSP 2000), Beijing, China (2000)

    Google Scholar 

  9. Deng, L., Acero, A., Plumpe, M., Huang, X.: Large-Vocabulary Speech Recognition Under Adverse Acoustic Environments. In: Presented at Internation Conference on Spoken Language Processing (ICSLP), Beijing, China (2000)

    Google Scholar 

  10. Young, S., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book (1997)

    Google Scholar 

  11. Meyer, C., Rose, G.: Improved Noise Robustness By Corrective and Rival Training. In: Presented at International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2003 (2003)

    Google Scholar 

  12. Ott, E.: Chaos in dynamical systems. Cambridge University Press, Cambridge (1993)

    MATH  Google Scholar 

  13. Pitsikalis, V., Maragos, P.: Speech analysis and feature extraction using chaotic models. In: Presented at International Conference on Acoustics, Speech, and Signal Processing, ICASSP (2002)

    Google Scholar 

  14. Liu, X., Povinelli, R.J., Johnson, M.T.: Vowel Classification by Global Dynamic Modeling. In: Presented at ISCA Tutorial and Research Workshop on Non-linear Speech Processing (NOLISP), Le Croisic, France (2003)

    Google Scholar 

  15. Dimitriadis, D., Maragos, P., Potamianos, A.: Modulation features for speech recognition. In: Presented at International Conference on Acoustics, Speech, and Signal Processing, ICASSP (2002)

    Google Scholar 

  16. Johnson, M.T., Povinelli, R.J., Lindgren, A.C., Ye, J., Liu, X., Indrebo, K.M.: Time-Domain Isolated Phoneme Classification using Reconstructed Phase Spaces. IEEE Transactions on Speech and Audio Processing (in press)

    Google Scholar 

  17. Indrebo, K.M., Povinelli, R.J., Johnson, M.T.: Sub-banded Reconstructed Phase Spaces for Speech Recognition. Speech Communication (in press)

    Google Scholar 

  18. Pearce, D., Hirsch, H.: The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions, Beijing, China (2000)

    Google Scholar 

  19. HTK Version 2.1, Entropic Cambridge Research Laboratory Ltd. (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Indrebo, K.M., Povinelli, R.J., Johnson, M.T. (2006). Third-Order Moments of Filtered Speech Signals for Robust Speech Recognition. In: Faundez-Zanuy, M., Janer, L., Esposito, A., Satue-Villar, A., Roure, J., Espinosa-Duro, V. (eds) Nonlinear Analyses and Algorithms for Speech Processing. NOLISP 2005. Lecture Notes in Computer Science(), vol 3817. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11613107_24

Download citation

  • DOI: https://doi.org/10.1007/11613107_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-31257-4

  • Online ISBN: 978-3-540-32586-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics