Abstract
In this paper, we present a novel multimodal system designed for smooth multi-party human-machine interaction. HCI for multiple users is challenging because simultaneous actions and reactions have to be consistent. Here, the proposed system consists of a digital signage or large display equipped with multiple sensing devices: a 19-channel microphone array, 6 HD video cameras (3 are placed on top and 3 on the bottom of the display), and two depth sensors. The display can show various contents, similar to a poster presentation, or multiple windows (e.g., web browsers, photos, etc.). On the other hand, multiple users positioned in front of the panel can freely interact using voice or gesture while looking at the displayed contents, without wearing any particular device (such as motion capture sensors or head mounted devices). Acoustic and visual information processing are performed jointly using state-of-the-art techniques to obtain individual speech and gaze direction. Hence displayed contents can be adapted to users’ interests.
Chapter PDF
Similar content being viewed by others
References
Chen, L., Rose, R.T., Qiao, Y., Kimbara, I., Parrill, F., Welji, H., Han, T.X., Tu, J., Huang, Z., Harper, M.P., Quek, F., Xiong, Y., McNeill, D., Tuttle, R., Huang, T.: Vace multimodal meeting corpus. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 40–51. Springer, Heidelberg (2006)
Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. In: Burkhardt, H., Neumann, B. (eds.) ECCV 1998. LNCS, vol. 1407, p. 484. Springer, Heidelberg (1998)
Fanelli, G., Weise, T., Gall, J., Van Gool, L.: Real Time Head Pose Estimation from Consumer Depth Cameras. In: Mester, R., Felsberg, M. (eds.) DAGM 2011. LNCS, vol. 6835, pp. 101–110. Springer, Heidelberg (2011)
Feng, L., Sugano, Y., Okabe, T., Sato, Y.: Inferring human gaze from appearance via adaptive linear regression. In: ICCV (2011)
Gomez, R., Kawahara, T.: Robust speech recognition based on dereverberation parameter optimization using acoustic model likelihood. IEEE Trans. Audio, Speech and Language Processing 18(7), 1708–1716 (2010)
Nakajima, H., Nakadai, K., Hasegawa, Y., Tsujino, H.: Adaptive Step-size Parameter Control for real World Blind Source Separation. In: ICASSP (2008)
Pianesi, F., Zancanaro, M., Lepri, B., Cappelletti, A.: A multimodal annotated corpus of concensus decision making meetings. Language Resources and Evaluation, 409–429 (2007)
Poel, M., Poppe, R., Nijholt, A.: Meeting behavior detection in smart environments: Nonverbal cues that help to obtain natural interaction. In: FG (2008)
Sawada, H., Mukai, R., Araki, S., Makino, S.: Polar coordinate based nonlinear function for frequency-domain blind source separation. In: ICASSP (2002)
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-Time Human Pose Recognition in Parts from a Single Depth Image. In: CVPR (2011)
Sumi, Y., Yano, M., Nishida, T.: Analysis environment of conversational structure with nonverbal multimodal data. In: ICMI-MLMI (2010)
Tung, T., Matsuyama, T.: Topology Dictionary for 3D Video Understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 34(8), 1645–1657 (2012)
Tung, T., Gomez, R., Kawahara, T., Matsuyama, T.: Group Dynamics and Multimodal Interaction Modeling using a Smart Digital Signage. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012 Ws/Demos, Part I. LNCS, vol. 7583, pp. 362–371. Springer, Heidelberg (2012)
Viola, P., Jones, M.: Robust real-time object detection. In: IJCV (2001)
White, S.: Backchannels across cultures: A study of americans and japanese. Language in Society 18, 59–76 (1989)
Xu, S., Jiang, H., Lau, F.C.: User-oriented document summarization through vision-based eye-tracking. In: 13th ACM Int’l Conf. Intelligent User Interfaces (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tung, T., Gomez, R., Kawahara, T., Matsuyama, T. (2013). Multi-party Human-Machine Interaction Using a Smart Multimodal Digital Signage. In: Kurosu, M. (eds) Human-Computer Interaction. Interaction Modalities and Techniques. HCI 2013. Lecture Notes in Computer Science, vol 8007. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39330-3_43
Download citation
DOI: https://doi.org/10.1007/978-3-642-39330-3_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39329-7
Online ISBN: 978-3-642-39330-3
eBook Packages: Computer ScienceComputer Science (R0)