Multi-party Human-Machine Interaction Using a Smart Multimodal Digital Signage

Tung, Tony; Gomez, Randy; Kawahara, Tatsuya; Matsuyama, Takashi

doi:10.1007/978-3-642-39330-3_43

Tony Tung¹⁷,
Randy Gomez¹⁷,
Tatsuya Kawahara¹⁷ &
…
Takashi Matsuyama¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8007))

Included in the following conference series:

International Conference on Human-Computer Interaction

4668 Accesses
2 Citations

Abstract

In this paper, we present a novel multimodal system designed for smooth multi-party human-machine interaction. HCI for multiple users is challenging because simultaneous actions and reactions have to be consistent. Here, the proposed system consists of a digital signage or large display equipped with multiple sensing devices: a 19-channel microphone array, 6 HD video cameras (3 are placed on top and 3 on the bottom of the display), and two depth sensors. The display can show various contents, similar to a poster presentation, or multiple windows (e.g., web browsers, photos, etc.). On the other hand, multiple users positioned in front of the panel can freely interact using voice or gesture while looking at the displayed contents, without wearing any particular device (such as motion capture sensors or head mounted devices). Acoustic and visual information processing are performed jointly using state-of-the-art techniques to obtain individual speech and gaze direction. Hence displayed contents can be adapted to users’ interests.

Download to read the full chapter text

Chapter PDF

RMSLRS: Real-Time Multi-terminal Sign Language Recognition System

Interactive Multimodal Platform for Digital Signage

Hand Gesture Mapping Using MediaPipe Algorithm

Keywords

References

Chen, L., Rose, R.T., Qiao, Y., Kimbara, I., Parrill, F., Welji, H., Han, T.X., Tu, J., Huang, Z., Harper, M.P., Quek, F., Xiong, Y., McNeill, D., Tuttle, R., Huang, T.: Vace multimodal meeting corpus. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 40–51. Springer, Heidelberg (2006)
Chapter Google Scholar
Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. In: Burkhardt, H., Neumann, B. (eds.) ECCV 1998. LNCS, vol. 1407, p. 484. Springer, Heidelberg (1998)
Chapter Google Scholar
Fanelli, G., Weise, T., Gall, J., Van Gool, L.: Real Time Head Pose Estimation from Consumer Depth Cameras. In: Mester, R., Felsberg, M. (eds.) DAGM 2011. LNCS, vol. 6835, pp. 101–110. Springer, Heidelberg (2011)
Chapter Google Scholar
Feng, L., Sugano, Y., Okabe, T., Sato, Y.: Inferring human gaze from appearance via adaptive linear regression. In: ICCV (2011)
Google Scholar
Gomez, R., Kawahara, T.: Robust speech recognition based on dereverberation parameter optimization using acoustic model likelihood. IEEE Trans. Audio, Speech and Language Processing 18(7), 1708–1716 (2010)
Article Google Scholar
Nakajima, H., Nakadai, K., Hasegawa, Y., Tsujino, H.: Adaptive Step-size Parameter Control for real World Blind Source Separation. In: ICASSP (2008)
Google Scholar
Pianesi, F., Zancanaro, M., Lepri, B., Cappelletti, A.: A multimodal annotated corpus of concensus decision making meetings. Language Resources and Evaluation, 409–429 (2007)
Google Scholar
Poel, M., Poppe, R., Nijholt, A.: Meeting behavior detection in smart environments: Nonverbal cues that help to obtain natural interaction. In: FG (2008)
Google Scholar
Sawada, H., Mukai, R., Araki, S., Makino, S.: Polar coordinate based nonlinear function for frequency-domain blind source separation. In: ICASSP (2002)
Google Scholar
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-Time Human Pose Recognition in Parts from a Single Depth Image. In: CVPR (2011)
Google Scholar
Sumi, Y., Yano, M., Nishida, T.: Analysis environment of conversational structure with nonverbal multimodal data. In: ICMI-MLMI (2010)
Google Scholar
Tung, T., Matsuyama, T.: Topology Dictionary for 3D Video Understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 34(8), 1645–1657 (2012)
Article Google Scholar
Tung, T., Gomez, R., Kawahara, T., Matsuyama, T.: Group Dynamics and Multimodal Interaction Modeling using a Smart Digital Signage. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012 Ws/Demos, Part I. LNCS, vol. 7583, pp. 362–371. Springer, Heidelberg (2012)
Chapter Google Scholar
Viola, P., Jones, M.: Robust real-time object detection. In: IJCV (2001)
Google Scholar
White, S.: Backchannels across cultures: A study of americans and japanese. Language in Society 18, 59–76 (1989)
Article Google Scholar
Xu, S., Jiang, H., Lau, F.C.: User-oriented document summarization through vision-based eye-tracking. In: 13th ACM Int’l Conf. Intelligent User Interfaces (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Academic Center for Computing and Media Studies and Graduate School of Informatics, Kyoto University, Japan
Tony Tung, Randy Gomez, Tatsuya Kawahara & Takashi Matsuyama

Authors

Tony Tung
View author publications
You can also search for this author in PubMed Google Scholar
Randy Gomez
View author publications
You can also search for this author in PubMed Google Scholar
Tatsuya Kawahara
View author publications
You can also search for this author in PubMed Google Scholar
Takashi Matsuyama
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The Open University of Japan, 2-11 Wakaba, 261-8586, Mihama-ku, Chiba-shi, Japan
Masaaki Kurosu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tung, T., Gomez, R., Kawahara, T., Matsuyama, T. (2013). Multi-party Human-Machine Interaction Using a Smart Multimodal Digital Signage. In: Kurosu, M. (eds) Human-Computer Interaction. Interaction Modalities and Techniques. HCI 2013. Lecture Notes in Computer Science, vol 8007. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39330-3_43

Download citation

DOI: https://doi.org/10.1007/978-3-642-39330-3_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39329-7
Online ISBN: 978-3-642-39330-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multi-party Human-Machine Interaction Using a Smart Multimodal Digital Signage

Abstract

Chapter PDF

Similar content being viewed by others

RMSLRS: Real-Time Multi-terminal Sign Language Recognition System

Interactive Multimodal Platform for Digital Signage

Hand Gesture Mapping Using MediaPipe Algorithm

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Multi-party Human-Machine Interaction Using a Smart Multimodal Digital Signage

Abstract

Chapter PDF

Similar content being viewed by others

RMSLRS: Real-Time Multi-terminal Sign Language Recognition System

Interactive Multimodal Platform for Digital Signage

Hand Gesture Mapping Using MediaPipe Algorithm

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation