Abstract
Among the cognitive abilities a robot companion must be endowed with, human perception and speech understanding are both fundamental in the context of multimodal human-robot interaction. In order to provide a mobile robot with the visual perception of its user and means to handle verbal and multimodal communication, we have developed and integrated two components. In this paper we will focus on an interactively distributed multiple object tracker dedicated to two-handed gestures and head location in 3D. Its relevance is highlighted by in- and off- line evaluations from data acquired by the robot. Implementation and preliminary experiments on a household robot companion, including speech recognition and understanding as well as basic fusion with gesture, are then demonstrated. The latter illustrate how vision can assist speech by specifying location references, object/person IDs in verbal statements in order to interpret natural deictic commands given by human beings. Extensions of our work are finally discussed.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Kawahara, T., Lee, A., Shikano, K.: Julius — an open source real-time large vocabulary recognition engine. In: European Conference on Speech Communication and Technology (EUROSPEECH), pp. 1691–1694 (2001)
Clodic, A., Montreuil, V., Alami, R., Chatila, R.: A decisional framework for autonomous robots interacting with humans. In: IEEE International Workshop on Robot and Human Interactive Communication (RO-MAN) (2005)
Fong, T., Nourbakhsh, I., Dautenhahn, K.: A survey of socially interactive robots. Robotics and Autonomous Systems 42, 143–166 (2003)
Isard, M., Blake, A.: CONDENSATION – conditional density propagation for visual tracking. Int. Journal on Computer Vision 29(1), 5–28 (1998)
Isard, M., Blake, A.: I-CONDENSATION: Unifying low-level and high-level tracking in a stochastic framework. In: European Conf. on Computer Vision, 1998, pp. 893–908 (1998)
Isard, M., Blake, A.: BraMBLe: a bayesian multiple blob tracker. In: Int. Conf. on Computer Vision, Vancouver, pp. 34–41 (2001)
Maas, J., Spexard, T., Fritsch, J., Wrede, B., Sagerer, G.: A multi-modal topic tracker for improved human-robot interaction. In: Int. Symp. on Robot and Human Interactive Communication, Hatfield (September 2006)
Nickel, K., Stiefehagen, R.: Visual recognition of pointing gestures for human-robot interaction. Image and Vision Computing 3(12), 1875–1884 (2006)
Pérennou, G., de Calmès, M.: MHATLex: Lexical resources for modelling the french pronunciation. In: Int. Conf. on Language Resources and Evaluations, Athens, June 2000, pp. 257–264 (2000)
Rogalla, O., Ehrenmann, M., Zollner, R., Becher, R., Dillman, R.: Advanced in human-robot interaction. In: Using gesture and speech control for commanding a robot., vol. 14, Springer, Heidelberg (2004)
Lerasle, F., Germa, T., Brèthes, L., Simon, T.: Data fusion and eigenface based tracking dedicated to a tour-guide robot. In: Int. Conf. on Computer Vision Systems (2007)
Wei, Q., Schonfeld, D., Mohamed, M.: Real-time interactively distributed multi-object tracking using a magnetic-inertia potential model. In: Int. Conf. on Computer Vision, Beijing, October 2005, pp. 535–540 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Burger, B., Ferrané, I., Lerasle, F. (2008). Multimodal Interaction Abilities for a Robot Companion. In: Gasteratos, A., Vincze, M., Tsotsos, J.K. (eds) Computer Vision Systems. ICVS 2008. Lecture Notes in Computer Science, vol 5008. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-79547-6_53
Download citation
DOI: https://doi.org/10.1007/978-3-540-79547-6_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-79546-9
Online ISBN: 978-3-540-79547-6
eBook Packages: Computer ScienceComputer Science (R0)