The Role of Speech in Multimodal Human-Computer Interaction

Hermansky, Hynek; Fousek, Petr; Lehtonen, Mikko

doi:10.1007/11551874_2

Hynek Hermansky¹⁹,
Petr Fousek¹⁹ &
Mikko Lehtonen¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3658))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

709 Accesses

Abstract

Natural audio-visual interface between human user and machine requires understanding of user’s audio-visual commands. This does not necessarily require full speech and image recognition. It does require, just as the interaction with any working animal does, that the machine is capable of reacting to certain particular sounds and/or gestures while ignoring the rest. Towards this end, we are working on sound identification and classification approaches that would ignore most of the acoustic input and react only to a particular sound (keyword).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Speech communication integrated with other modalities

Article 15 September 2018

Bimodal Speech Recognition Fusing Audio-Visual Modalities

Multimodal speech recognition: increasing accuracy using high speed video data

Article 01 August 2018

References

Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 87(4) (April 1990)
Google Scholar
Hermansky, H., Ellis, D.P.W.E., Sharma, S.: Connectionist Feature Extraction for Conventional HMM Systems. In: Proc. of ICASSP 2000, Istanbul, Turkey (2000)
Google Scholar
Hermansky, H., Fousek, P.: Multiresolution RASTA filtering for TANDEM-based ASR. In: Proc. of Interspeech 2005, Lisbon, Portugal (September 2005)
Google Scholar
Cole, R.A., Noel, M., Lander, T., Durham, T.: New Telephone Speech Corpora at CSLU. In: Proc. of Eurospeech 1995, Madrid, Spain, pp. 821–824 (1995)
Google Scholar
Lehtonen, M., Fousek, P., Hermansky, H.: Hierarchical Approach for Spotting Keywords, IDIAP Research Report (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

IDIAP Research Institute, Martigny, Switzerland
Hynek Hermansky, Petr Fousek & Mikko Lehtonen

Authors

Hynek Hermansky
View author publications
You can also search for this author in PubMed Google Scholar
Petr Fousek
View author publications
You can also search for this author in PubMed Google Scholar
Mikko Lehtonen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of West Bohemia in Pilsen, Univerzitni 8, 30614, Plzen, Czech Republic
Václav Matoušek , Pavel Mautner & Tomáš Pavelka , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hermansky, H., Fousek, P., Lehtonen, M. (2005). The Role of Speech in Multimodal Human-Computer Interaction. In: Matoušek, V., Mautner, P., Pavelka, T. (eds) Text, Speech and Dialogue. TSD 2005. Lecture Notes in Computer Science(), vol 3658. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551874_2

Download citation

DOI: https://doi.org/10.1007/11551874_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28789-6
Online ISBN: 978-3-540-31817-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics