Abstract
A multimodal system for acquiring new objects, updating already known ones, and searching for them is presented. The system is able to learn objects and associate them to speech received from a speech recogniser in a natural and convenient fashion. The learning and retrieval process takes into account information gained from multiple attributes calculated from an image recorded by a standard video camera, from deictic gestures, and from information of a dialog based conversation. Histogram intersection and subgraph matching on segmented color regions are used as attributes.
This work is supported within the Graduate Program “Task Oriented Communication” by the German Research Foundation (DFG).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
BFWS99. H. Brandt-Pook, G. A. Fink, S. Wachsmuth, and G. Sagerer. Integrated recognition and interpretation of speech for a construction task domain. In H.-J. Bullinger and J. Ziegler, editors, Proc. 8th Int. Conf. on Human-Computer Interaction, volume 1, pages 550–554, München, 1999.
CIE86. CIE. CIE colorimetry specifications. No. 15.2, Central Bureau of the CIE, Vienna, Austria, 1986.
CM97. D. Comaniciu and P. Meer. Robust analysis of feature space: Color image segmentation. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pages 750–755, Puerto Rico, 1997.
Fin99. G. A. Fink. Developing HMM-based recognizers with ESMERALDA. In V. Matoušek, P. Mautner, J. Ocelíková, and P. Sojka, editors, Lecture Notes in Artificial Intelligence, volume 1692, pages 229–234, Berlin Heidelberg, 1999. Springer.
FLWS00. J. Fritsch, F. Lömker, M. Wienecke, and G. Sagerer. Detecting assembly actions by scene observation. In Proc. Int. Conf. on Image Processing, volume I, pages 212–215, Vancouver, CA, 2000. IEEE.
MB98. B. T. Messmer and H. Bunke. A new algorithm for error-tolerant subgraph isomorphism detection. IEEE Trans. PAMI, 20:493–505, 1998.
Roy99. D. K. Roy. Learning Words from Sights and Sounds: A Computational Model. PhD thesis, Massachusetts Institute of Technology, 1999.
SB91. M. J. Swain and D. H. Ballard. Color indexing. International Journal of Computer Vision, 7(1):11–32, 1991.
SK01. L. Steels and F. Kaplan. Aibo’s first words: The social learning of language and meaning. Evolution of Communication, 4(1), 2001.
VM97. V. V. Vinod and H. Murase. Focused color intersection with efficient searching for object extraction. Pattern Recognition, 30(10):1787–1797, 1997.
WFS98. S. Wachsmuth, G. A. Fink, and G. Sagerer. Integration of parsing and incremental speech recognition. In Proc. of the European Signal Processing Conf., volume 1, pages 371–375, Rhodes, September 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lömker, F., Sagerer, G. (2002). A Multimodal System for Object Learning. In: Van Gool, L. (eds) Pattern Recognition. DAGM 2002. Lecture Notes in Computer Science, vol 2449. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45783-6_59
Download citation
DOI: https://doi.org/10.1007/3-540-45783-6_59
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44209-7
Online ISBN: 978-3-540-45783-1
eBook Packages: Springer Book Archive