Skip to main content

Large Vocabulary Audio-Visual Speech Recognition Using the Janus Speech Recognition Toolkit

  • Conference paper
Pattern Recognition (DAGM 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3175))

Included in the following conference series:

Abstract

This paper describes audio-visual speech recognition experiments on a multi-speaker, large vocabulary corpus using the Janus speech recognition toolkit. We describe a complete audio-visual speech recognition system and present experiments on this corpus. By using visual cues as additional input to the speech recognizer, we observed good improvements, both on clean and noisy speech in our experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. McGurk, H., MacDonald, J.: Hearing lips and seeing voices. Nature (1976)

    Google Scholar 

  2. Potamianos, G., Neti, C., Deligne, S.: Joint Audio-Visual Speech Processing for Recognition and Enhancement. In: Proceedings of AVSP 2003 (2003)

    Google Scholar 

  3. Goecke, R., Potamianos, G., Neti, C.: Noisy Audio Feature Enhancement using Audio-Visual Speech Data. In: ICASSP 2002 (2002)

    Google Scholar 

  4. Hennecke, M.E., Prasad, K.V., Stork, D.G.: Using deformable templates to infer visual speech dynamics. In: 28th Annual Asimolar conference on Signal speech and Computers

    Google Scholar 

  5. Goldschen, A.J., Gracia, O.N., Petajan, E.: Continuous optical automatic speech recognition by lipreading. In: 28th Annual Asimolar conference on Signal speech and Computers

    Google Scholar 

  6. Movellan, J.R.: Visual speech recognition with stochastic networks. In: NIPS 1994 (1994)

    Google Scholar 

  7. Duchnowski, P., Meier, U., Waibel, A.: See me, hear me: Integrating automatic speech recognition and lip-reading. In: Internation Conference on Spoken Language Processing, ICSLP, pp. 547–550 (1994)

    Google Scholar 

  8. Deligne, S., Potamianos, G., Neti, C.: Audio-Visual speech enhancement with avcdcn (Audio-Visual Codebook Dependent Cepstral Normalization). In: IEEE workshop on Sensor Array and Multichannel Signal Processing in August 2002, Washington DC and ICSLP (2002)

    Google Scholar 

  9. Dupont, S., Luettin, J.: Audio-visual speech modeling for continuous speech recognition. IEEE Trans. Multimedia 2, 141–151 (2000)

    Article  Google Scholar 

  10. Huang, J., Potamianos, G., Neti, C.: Improving Audio-Visual Speech Recognition with an Infrared Headset. In: Proceedings of AVSP 2003 (2003)

    Google Scholar 

  11. Meier, U., Stiefelhagen, R., Yang, J., Waibel, A.: Towards Unrestricted Lipreading. International Journal of pattern Recognition and Artificial Intelligence 14(5), 571–785 (2000); Second International Conference on Multimodal Interfaces, ICMI 1999 (1999)

    Google Scholar 

  12. Bregler, C., Konig, Y.: Eigenlips for robust speech recognition. In: Proc. IEEE Intl. Conf. Acous. Speech Sig. Process, pp. 669–672 (1994)

    Google Scholar 

  13. Matthews, I., Bangham, J.A., Cox, S.: Audiovisual speech recognition using multiscale nonlinear image decomposition. In: Proc. 4th ICSLP, vol. 1, pp. 38–41 (1996)

    Google Scholar 

  14. Ogihara, A., Asao, S.: An isolated word speech recognition based on fusion of visual and auditory information using 30-frames/s and 24-bit color image. IEICE Trans. Fund. Electron., Commun. Comput. Sci. E80A(8), 1417–1422 (1997)

    Google Scholar 

  15. Neti, C., Potamianos, G., et al.: Audio-Visual Speech Recognition - Workshop 2000 Final Report. Center for Language and Speech Processing. The Johns Hopkins University, Baltimore (2000)

    Google Scholar 

  16. Potamianos, G., Neti, C., Iyengar, G., Helmuth, E.: Large-Vocabulary Audio-Visual Speech Recognition by Machines and Humans. In: Proc. Eurospeech (2001)

    Google Scholar 

  17. Potamianos, G., Verma, A., Neti, C., Iyengar, G., Basu, S.: A Cascade Image Transformation For Speaker Independent Automatic Speechreading. In: Proceedings of the IEEE International Conference on Multimedia and Expo, pp. 1097–1100 (2000)

    Google Scholar 

  18. Finke, M., Geutner, P., Hild, H., Kemp, T., Ries, K., Westphal, M.: The Karlsruhe- VERBMOBIL Speech Recognition Engine. In: Proceedings of ICASSP, Munich, Germany (1997)

    Google Scholar 

  19. Soltau, H., Metze, F., Fügen, C., Waibel, A.: A One Pass-Decoder Based on Polymorphic Linguistic Context Assignment. In: Proc. of ASRU, Trento, Italy (2001)

    Google Scholar 

  20. Stiefelhagen, R., Yang, J.: Gaze Tracking for Multimodal Human- Computer Interaction. In: Proc. of the International Conference on Acoustics, Speech and Signal Processing: ICASSP 1997, Munich, Germany (April 1997)

    Google Scholar 

  21. Gravier, G., Potamianos, G., Neti, C.: Asynchrony modeling for audio-visual speech recognition. In: Proc. Human Language Technology Conference (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kratt, J., Metze, F., Stiefelhagen, R., Waibel, A. (2004). Large Vocabulary Audio-Visual Speech Recognition Using the Janus Speech Recognition Toolkit. In: Rasmussen, C.E., Bülthoff, H.H., Schölkopf, B., Giese, M.A. (eds) Pattern Recognition. DAGM 2004. Lecture Notes in Computer Science, vol 3175. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28649-3_60

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-28649-3_60

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22945-2

  • Online ISBN: 978-3-540-28649-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics