Skip to main content

Hybrid HMM/BLSTM-RNN for Robust Speech Recognition

  • Conference paper
Text, Speech and Dialogue (TSD 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6231))

Included in the following conference series:

Abstract

The question how to integrate information from different sources in speech decoding is still only partially solved (layered architecture versus integrated search). We investigate the optimal integration of information from Artificial Neural Nets in a speech decoding scheme based on a Dynamic Bayesian Network for noise robust ASR. A HMM implemented by the DBN cooperates with a novel Recurrent Neural Network (BLSTM-RNN), which exploits long-range context information to predict a phoneme for each MFCC frame. When using the identity of the most likely phoneme as a direct observation, such a hybrid system has proved to improve noise robustness. In this paper, we use the complete BLSTM-RNN output which is presented to the DBN as Virtual Evidence. This allows the hybrid system to use information about all phoneme candidates, which was not possible in previous experiments. Our approach improved word accuracy on the Aurora 2 Corpus by 8%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lathoud, G., Magimia-Doss, M., Mesot, B., Boulard, H.: Unsupervised Spectral Subtraction for Noise-Robust ASR. In: Proc. of ASRU, San Juan (2005)

    Google Scholar 

  2. Droppo, J., Acero, A.: Noise Robust Speech Recognition with a Switching Linear Dynamic Model. In: Proc. of ICASSP, Montreal (2004)

    Google Scholar 

  3. Mesot, B., Barber, D.: Switching Linear Dynamic Systems for Noise Robust Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing 15(6), 1850–1858 (2007)

    Article  Google Scholar 

  4. Bourlard, H., Morgan, N.: Connectionist Speech Recognition: A Hybrid Approach. Kluwer Academic Publishers, Dordrecht (1994)

    Google Scholar 

  5. Wöllmer, M., Eyben, F., Schuller, B., Sun, Y., Moosmayr, T., Nguyen-Thien, N.: Robust In-Car Spelling Recognition – a Tandem BLSTM-HMM Approach. In: Proc. of Interspeech, Brighton (2009)

    Google Scholar 

  6. Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient Flow in Recurrent Nets: The Difficulty of Learning Long-Term Dependencies. In: Kremer, S.C., Kolen, J.F. (eds.) A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press, Los Alamitos (2001)

    Google Scholar 

  7. Hochreiter, S., Schmidhuber, J.: Long Short-Term Memory. Neural Computation 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  8. Fernandez, S., Graves, A., Schmidhuber, J.: An Application of Recurrent Neural Networks to Discriminative Keyword Spotting. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D.P. (eds.) ICANN 2007. LNCS, vol. 4669, pp. 220–229. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  9. Schuster, M., Paliwal, K.: Bidirectional Recurrent Neural Networks. IEEE Transactions on Signal Processing 45, 2673–2681 (1997)

    Article  Google Scholar 

  10. Wöllmer, M., Eyben, F., Keshet, J., Graves, A., Schuller, B., Rigoll, G.: Robust Discriminative Keyword Spotting for Emotionally Colored Spontaneous Speech using Bidirectional LSTM Networks. In: Proc. of ICASSP, Taipei (2009)

    Google Scholar 

  11. Graves, A., Fernandez, S., Liwicki, M., Bunke, H., Schmidhuber, J.: Unconstrained Online Handwriting Recognition with Recurrent Neural Networks. Advances in Neural Information Processing Systems (2008)

    Google Scholar 

  12. Wöllmer, M., Eyben, F., Reiter, S., Schuller, B., Cox, C., Douglas-Cowie, E., Cowie, R.: Abandoning Emotion Classes – Towards Continuous Emotion Recognition with Modelling of Long-Range Dependencies. In: Proc. of Interspeech, Brisbane, pp. 597–600 (2008)

    Google Scholar 

  13. Graves, A.: Supervised Sequence Labelling with Recurrent Neural Networks. Ph.D. thesis (2008)

    Google Scholar 

  14. Rigoll, G., Neukirchen, C.: A New Approach to Hybrid HMM/ANN Speech Recognition Using Mutual Information Neural Networks. In: Advances in Neural Information Processing Systems, (NIPS 1996), vol. 9, pp. 772–778 (2008)

    Google Scholar 

  15. Graves, A., Fernandez, S., Schmidhuber, J.: Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3697, pp. 602–610. Springer, Heidelberg (2005)

    Google Scholar 

  16. Morgan, N., Bourlard, H.: An Introduction to Hybrid HMM/Connectionist Continuous Speech Recognition. IEEE Signal Processing Magazine, 25–42 (May 1995)

    Google Scholar 

  17. Bilmes, J.: On Soft Evidence in Bayesian Networks. Technical Report UWEETR-2004-0016, University of Washington, Dept. of EE (2004)

    Google Scholar 

  18. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, Inc., San Francisco (1988)

    Google Scholar 

  19. Hirsch, G. H., Pearce, D.: The AURORA Experimental Framework for the Performance Evaluations of Speech Recognition Systems under Noisy Conditions. In: ISCA ITRW ASR 2000: Automatic Speech Recognition: Challenges for the Next Millennium, Paris (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sun, Y., ten Bosch, L., Boves, L. (2010). Hybrid HMM/BLSTM-RNN for Robust Speech Recognition. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2010. Lecture Notes in Computer Science(), vol 6231. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15760-8_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15760-8_51

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15759-2

  • Online ISBN: 978-3-642-15760-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics