Skip to main content

The Role of Speech in Multimodal Human-Computer Interaction

(Towards Reliable Rejection of Non-keyword Input)

  • Conference paper
Text, Speech and Dialogue (TSD 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3658))

Included in the following conference series:

  • 709 Accesses

Abstract

Natural audio-visual interface between human user and machine requires understanding of user’s audio-visual commands. This does not necessarily require full speech and image recognition. It does require, just as the interaction with any working animal does, that the machine is capable of reacting to certain particular sounds and/or gestures while ignoring the rest. Towards this end, we are working on sound identification and classification approaches that would ignore most of the acoustic input and react only to a particular sound (keyword).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 87(4) (April 1990)

    Google Scholar 

  2. Hermansky, H., Ellis, D.P.W.E., Sharma, S.: Connectionist Feature Extraction for Conventional HMM Systems. In: Proc. of ICASSP 2000, Istanbul, Turkey (2000)

    Google Scholar 

  3. Hermansky, H., Fousek, P.: Multiresolution RASTA filtering for TANDEM-based ASR. In: Proc. of Interspeech 2005, Lisbon, Portugal (September 2005)

    Google Scholar 

  4. Cole, R.A., Noel, M., Lander, T., Durham, T.: New Telephone Speech Corpora at CSLU. In: Proc. of Eurospeech 1995, Madrid, Spain, pp. 821–824 (1995)

    Google Scholar 

  5. Lehtonen, M., Fousek, P., Hermansky, H.: Hierarchical Approach for Spotting Keywords, IDIAP Research Report (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hermansky, H., Fousek, P., Lehtonen, M. (2005). The Role of Speech in Multimodal Human-Computer Interaction. In: Matoušek, V., Mautner, P., Pavelka, T. (eds) Text, Speech and Dialogue. TSD 2005. Lecture Notes in Computer Science(), vol 3658. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551874_2

Download citation

  • DOI: https://doi.org/10.1007/11551874_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28789-6

  • Online ISBN: 978-3-540-31817-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics