Multi-modal Learning

Skočaj, Danijel; Kristan, Matej; Vrečko, Alen; Leonardis, Aleš; Fritz, Mario; Stark, Michael; Schiele, Bernt; Hongeng, Somboon; Wyatt, Jeremy L.

doi:10.1007/978-3-642-11694-0_7

Danijel Skočaj⁸,
Matej Kristan⁸,
Alen Vrečko⁸,
Aleš Leonardis⁸,
Mario Fritz⁹,
Michael Stark⁹,
Bernt Schiele⁹,
Somboon Hongeng¹⁰ &
…
Jeremy L. Wyatt¹⁰

Part of the book series: Cognitive Systems Monographs ((COSMOS,volume 8))

1508 Accesses

Introduction

The main topic of this chapter is learning, more specifically, multimodal learning.

In biological systems, learning occurs in various forms and at various developmental stages facilitating adaptation to the ever changing environment. Learning is also one of the most fundamental capabilities of an artificial cognitive system, thus significant efforts have been dedicated in CoSy to researching a variety of issues related to it.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Timeline from Neural Network to Python Through ANN and AI: An Introductory Tutorial

Emergent spatio-temporal multimodal learning using a developmental network

Article 05 November 2018

Editorial: preface to the special issue on embodied cognition and technology for learning

Article 14 July 2021

References

Fidler, S., Skočaj, D., Leonardis, A.: Combining reconstructive and discriminative subspace methods for robust classification and regression by subsampling. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(3), 337–350 (2006), http://cognitivesystems.org/CoSyBook/chap7.asp#fidlerPAMI06
Article Google Scholar
Harnad, S.: The symbol grounding problem. Physica D: Nonlinear Phenomena 42, 335–346 (1990)
Article Google Scholar
Ardizzone, E., Chella, A., Frixione, M., Gaglio, S.: Integrating subsymbolic and symbolic processing in artificial vision. Journal of Intelligent Systems 1(4), 273–308 (1992)
Google Scholar
Chella, A., Frixione, M., Gaglio, S.: A cognitive architecture for artificial vision. Artificial Intelligence 89(1–2), 73–111 (1997)
Article MATH Google Scholar
Roy, D.K., Pentland, A.P.: Learning words from sights and sounds: a computational model. Cognitive Science 26(1), 113–146 (2002)
Article Google Scholar
Roy, D.K.: Learning visually-grounded words and syntax for a scene description task. Computer Speech and Language 16(3), 353–385 (2002)
Article Google Scholar
Steels, L., Vogt, P.: Grounding adaptive language games in robotic agents. In: Proceedings of the Fourth European Conference on Artificial Life, ECAL 1997, Complex Adaptive Systems, pp. 474–482 (1997)
Google Scholar
Vogt, P.: The physical symbol grounding problem. Cognitive Systems Research 3(3), 429–457 (2002)
Article Google Scholar
Bauckhage, C., Fink, G., Fritsch, J., Kummert, F., Lömker, F., Sagerer, G., Wachsmuth, S.: An integrated system for cooperative man-machine interaction. In: IEEE International Symposium on Computational Intelligence in Robotics and Automation, pp. 328–333 (2001)
Google Scholar
Kirstein, S., Wersing, H., Körner, E.: Rapid online learning of objects in a biologically motivated recognition architecture. In: Kropatsch, W.G., Sablatnig, R., Hanbury, A. (eds.) DAGM 2005. LNCS, vol. 3663, pp. 301–308. Springer, Heidelberg (2005)
Chapter Google Scholar
Steels, L., Kaplan, F.: AIBO’s first words. the social learning of language and meaning. Evolution of Communication 4(1), 3–32 (2001)
Article Google Scholar
Arsenio, A.: Developmental learning on a humanoid robot. In: IEEE International Joint Conference on Neural Networks, pp. 3167–3172 (2004)
Google Scholar
Pollard, D.E.: A user’s guide to measure theoretic probability. Cambridge University Press, Cambridge (2002)
MATH Google Scholar
Kristan, M., Skočaj, D., Leonardis, A.: Online kernel density estimation for interactive learning (submitted for publication), http://cognitivesystems.org/CoSyBook/chap7.asp#KristanIMAVIS2008
Wand, M.P., Jones, M.C.: Kernel Smoothing. Chapman & Hall/CRC (1995)
Google Scholar
Scott, D.W., Szewczyk, W.F.: From kernels to mixtures. Technometrics 43(3), 323–335 (2001)
Article MathSciNet Google Scholar
Goldberger, J., Roweis, S.: Hierarchical clustering of a mixture model. In: Neural Inf. Proc. Systems, pp. 505–512 (2005)
Google Scholar
Zhang, K., Kwok, J.T.: Simplifying mixture models through function approximation. In: Neural Inf. Proc. Systems (2006)
Google Scholar
Mc Lachlan, G.J., Krishan, T.: The EM algorithm and extensions. Wiley, Chichester (1997)
Google Scholar
Figueiredo, M.A.F., Jain, A.K.: Unsupervised learning of finite mixture models. IEEE Trans. Patter. Anal. Mach. Intell. 24(3), 381–396 (2002)
Article Google Scholar
Živkovič, Z., van der Heijden, F.: Recursive unsupervised learning of finite mixture models. IEEE Trans. Patter. Anal. Mach. Intell. 26(5), 651–656 (2004)
Article Google Scholar
Corduneanu, A., Bishop, C.M.: Variational Bayesian model selection for mixture distributions. In: Artificial Intelligence and Statistics, pp. 27–34. Morgan Kaufmann, Los Altos (2001)
Google Scholar
McGrory, C.A., Titterington, D.M.: Variational approximations in Bayesian model selection for finite mixture distributions. Comput. Stat. Data Analysis 51(11), 5352–5367 (2007)
Article MathSciNet MATH Google Scholar
Song, M., Wang, H.: Highly efficient incremental estimation of gaussian mixture models for online data stream clustering. In: SPIE: Intelligent Computing: Theory and Applications, pp. 174–183 (2005)
Google Scholar
Arandjelović, O., Cipolla, R.: Incremental learning of temporally-coherent gaussian mixture models. In: British Machine Vision Conference, pp. 759–768 (2005)
Google Scholar
Szewczyk, W.F.: Time-evolving adaptive mixtures, Tech. rep., National Security Agency (2005)
Google Scholar
Declercq, A., Piater, J.H.: Online learning of gaussian mixture models - a two-level approach. In: Intl.l Conf. Comp. Vis., Imaging and Comp. Graph. Theory and Applications, pp. 605–611 (2008)
Google Scholar
Han, B., Comaniciu, D., Zhu, Y., Davis, L.S.: Sequential kernel density approximation and its application to real-time visual tracking. IEEE Trans. Patter. Anal. Mach. Intell. 30(7), 1186–1197 (2008)
Article Google Scholar
Kristan, M., Skočaj, D., Leonardis, A.: Incremental learning with Gaussian mixture models. In: Computer Vision Winter Workshop CVWW 2008, Moravske toplice, Slovenia, pp. 25–32 (2008), http://cognitivesystems.org/CoSyBook/chap7.asp#kristanCVWW08
Girolami, M., He, C.: Probability density estimation from optimally condensed data samples. IEEE Trans. Patter. Anal. Mach. Intell. 25(10), 1253–1264 (2003)
Article Google Scholar
Jones, M.C., Marron, J.S., Sheather, S.J.: A brief survey of bandwidth selection for density estimation. J. Amer. Stat. Assoc. 91(433), 401–407 (1996)
Article MathSciNet MATH Google Scholar
Skočaj, D., Berginc, G., Ridge, B., Štimec, A., Jogan, M., Vanek, O., Leonardis, A., Hutter, M., Hewes, N.: A system for continuous learning of visual concepts. In: International Conference on Computer Vision Systems ICVS 2007, Bielefeld, Germany (2007), http://cognitivesystems.org/CoSyBook/chap7.asp#skocajICVS07
Skočaj, D., Ridge, B., Berginc, G., Leonardis, A.: A framework for continuous learning of simple visual concepts. In: Computer Vision Winter Workshop 2007, St. Lambrecht, Austria, pp. 99–105 (2007), http://cognitivesystems.org/CoSyBook/chap7.asp#skocajCVWW07
Skočaj, D., Kristan, M., Leonardis, A.: Continuous learning of simple visual concepts using incremental kernel density estimation. In: International Conference on Computer Vision Theory and Applications, Funchal, Madeira, Portugal, pp. 598–604 (2008), http://cognitivesystems.org/CoSyBook/chap7.asp#skocajVISAPP08
Lowe, D.: Object recognition from local scale invariant features. In: ICCV 1999 (1999)
Google Scholar
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. In: CVPR 2003 (2003)
Google Scholar
Mikolajczyk, K., Leibe, B., Schiele, B.: Local features for object class recognition. In: ICCV 2005, Beijing, China (2005)
Google Scholar
Csurka, G., Dance, C., Fan, L., Willarnowski, J., Bray, C.: Visual categorization with bags of keypoints. In: SLCV (2004)
Google Scholar
Leibe, B., Seemann, E., Schiele, B.: Pedestrian detection in crowded scenes. In: CVPR 2005, San Diego, CA, USA (2005)
Google Scholar
Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.: Discovering objects and their locations in images. In: ICCV 2005, Beijing, China (2005)
Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR 2006, pp. 2169–2178 (2006)
Google Scholar
Agarwal, A., Triggs, B.: Hyperfeatures - multilevel local coding for visual recognition. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 30–43. Springer, Heidelberg (2006)
Chapter Google Scholar
Fritz, M., Schiele, B.: Towards unsupervised discovery of visual categories. In: Franke, K., Müller, K.-R., Nickolay, B., Schäfer, R. (eds.) DAGM 2006. LNCS, vol. 4174, pp. 232–241. Springer, Heidelberg (2006)
Chapter Google Scholar
Grauman, K., Darrell, T.: Unsupervised learning of categories from sets of partially matching image features. In: CVPR 2006, pp. 19–25. IEEE Computer Society, Washington (2006)
Google Scholar
Baldridge, J., Kruijff, G.-J.M.: Multi-modal combinatory categorial grammar. In: EACL 2003, Morristown, NJ, USA (2003)
Google Scholar
Baldridge, J., Kruijff, G.-J.M.: Coupling ccg and hybrid logic dependency semantics. In: ACL 2002, Morristown, NJ, USA (2001)
Google Scholar
Roy, D.: Learning words and syntax for a scene description task. Computer Speech and Language 16(3)
Google Scholar
Kruijff, G.-J.M., Kelleher, J.D., Berginc, G., Leonardis, A.: Structural descriptions in Human-Assisted robot visual learning. In: Proceedings of 1st Annual Conference on Human-Robot Interaction (2006)
Google Scholar
Kruijff, G.-J.M., Kelleher, J.D., Hawes, N.: Information fusion for visual reference resolution in dynamic situated dialogue. In: André, E., Dybkjær, L., Minker, W., Neumann, H., Weber, M. (eds.) PIT 2006. LNCS (LNAI), vol. 4021, pp. 117–128. Springer, Heidelberg (2006)
Chapter Google Scholar
Kelleher, J., Kruijff, G.-J., Costello, F.: Proximity in context: an empirically grounded computational model of proximity for processing topological spatial expression. In: Coling-ACL 2006, Sydney Australia (2006)
Google Scholar
Brand, M., Oliver, N., Pentland, A.: Coupled hidden markov models for complex action recognition. In: IEEE Proceedings of Computer Vision and Pattern Recognition, Puerto Rico, USA (1997)
Google Scholar
Wren, C., Pentland, A.: Dynamic modeling of human motion. In: Proceedings of the Third IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan (1998)
Google Scholar
Hongeng, S., Wyatt, J.: Learning causality and intention in human actions. In: Proceedings of IEEE-RAS International Conference on Humanoid Robots, Genoa, France (2006), http://cognitivesystems.org/CoSyBook/chap7.asp#hong06
Sutton, R.S., Barto, A.G.: Reinforcement learning : An introduction. MIT Press, Cambridge (1998)
Google Scholar
Domingos, P., Richardson, M.: Markov logic: A unifying framework for statistical relational learning. In: Proceedings of the ICML 2004 Workshop on Statistical Relational Learning and its Connection to Other Fields, Banff, Canada (2004)
Google Scholar
Hongeng, S., Wyatt, J.: Learning Causality and Intentional Actions. In: Rome, E., Hertzberg, J., Dorffner, G. (eds.) Towards Affordance-Based Robot Control. LNCS (LNAI), vol. 4760, pp. 27–46. Springer, Heidelberg (2008), http://cognitivesystems.org/CoSyBook/chap7.asp#hong08a
Chapter Google Scholar
Hongeng, S., Wyatt, J.: Learning goal-based motion sequences of object manipulation, Tech. Rep. CSR-08-02, School of Computer Science, University of Birmingham (2008)
Google Scholar
Glymour, C.: Learning causes : Psychological explanations of causal explanation. Minds and Machines 8, 39–60 (1998)
Article Google Scholar
Gergely, G., Csibra, G.: Teleological reasoning in infancy: the naive theory of rational action. Trends in Cognitive Sciences 7(7), 287–292 (2003)
Article Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2007 (VOC 2007) Results (2007)
Google Scholar
Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset, Tech. Rep. 7694, California Institute of Technology (2007)
Google Scholar
Rosch, E., Mervis, C.B., Gray, W.D., Johnson, D.M., Braem, P.B.: Basic objects in natural categories. Cognitive Psychology
Google Scholar
Gibson, J.J.: The theory of affordance, in: Percieving, Acting, and Knowing. Lawrence Erlbaum Associates, Hillsdale (1977)
Google Scholar
Winston, P.H., Katz, B., Binford, T.O., Lowry, M.R.: Learning physical descriptions from functional definitions, examples, and precedents. In: AAAI 1983 (1983)
Google Scholar
Stark, L., Bowyer, K.: Achieving generalized object recognition through reasoning about association of function to structure. PAMI 13(10), 1097–1104 (1991)
Google Scholar
Stark, L., Hoover, A., Goldgof, D., Bowyer, K.: Function-based recognition from incomplete knowledge of shape. In: WQV 1993, pp. 11–22 (1993)
Google Scholar
Rivlin, E., Dickinson, S.J., Rosenfeld, A.: Recognition by functional parts. Computer Vision and Image Understanding: CVIU 62(2), 164–176 (1995)
Article MATH Google Scholar
Bogoni, L., Bajcsy, R.: Interactive recognition and representation of functionality. CVIU 62(2), 194–214 (1995)
MATH Google Scholar
Saxena, A., Driemeyer, J., Ng, A.Y.: Robotic grasping of novel objects using vision. IJRR
Google Scholar
Stark, M., Lies, P., Zillich, M., Wyatt, J., Schiele, B.: Functional object class detection based on learned affordance cues. In: 6th International Conference on Computer Vision Systems, ICVS (2008), http://cognitivesystems.org/CoSyBook/chap7.asp#stark08icvs
Sun, J., Zhang, W.W., Tang, X., Shum, H.Y.: Background cut. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part II. LNCS, vol. 3952, pp. 628–641. Springer, Heidelberg (2006)
Chapter Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML 2001 (2001)
Google Scholar
Jones, M.J., Rehg, J.M.: Statistical color models with application to skin detection. In: CVPR, pp. 1274–1280. IEEE Computer Society, Los Alamitos (1999)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)
Article Google Scholar
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Gool, L.J.V.: A comparison of affine region detectors. In: IJCV 2005 (2005)
Google Scholar
Ferrari, V., Fevrier, L., Jurie, F., Schmid, C.: Groups of adjacent contour segments for object detection, Rapport De Recherche Inria
Google Scholar
Ferrari, V., Tuytelaars, T., Gool, L.J.V.: Object detection by contour segment networks. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3953, pp. 14–28. Springer, Heidelberg (2006)
Chapter Google Scholar
Stark, M., Schiele, B.: How good are local features for classes of geometric objects. In: ICCV (2007), http://cognitivesystems.org/CoSyBook/chap7.asp#stark07iccv
Zillich, M.: Incremental Indexing for Parameter-Free Perceptual Grouping. In: 31st Workshop of the Austrian Association for Pattern Recognition (2007)
Google Scholar
Leibe, B., Leonardis, A., Schiele, B.: An implicit shape model for combined object categorization and segmentation. In: Ponce, J., Hebert, M., Schmid, C., Zisserman, A. (eds.) Toward Category-Level Object Recognition. LNCS, vol. 4170, pp. 508–524. Springer, Heidelberg (2006), http://cognitivesystems.org/CoSyBook/chap7.asp#Leibe06b
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

VICOS Lab, University of Ljubljana, Ljubljana, Slovenia
Danijel Skočaj, Matej Kristan, Alen Vrečko & Aleš Leonardis
Technische Universität Darmstadt, Darmstadt, Germany
Mario Fritz, Michael Stark & Bernt Schiele
Intelligent Robotics Laboratory, School of Computer Science, University of Birmingham, Birmingham, UK
Somboon Hongeng & Jeremy L. Wyatt

Authors

Danijel Skočaj
View author publications
You can also search for this author in PubMed Google Scholar
Matej Kristan
View author publications
You can also search for this author in PubMed Google Scholar
Alen Vrečko
View author publications
You can also search for this author in PubMed Google Scholar
Aleš Leonardis
View author publications
You can also search for this author in PubMed Google Scholar
Mario Fritz
View author publications
You can also search for this author in PubMed Google Scholar
Michael Stark
View author publications
You can also search for this author in PubMed Google Scholar
Bernt Schiele
View author publications
You can also search for this author in PubMed Google Scholar
Somboon Hongeng
View author publications
You can also search for this author in PubMed Google Scholar
Jeremy L. Wyatt
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Georgia Tech, RIM@GT, 30332, Atlanta, GA, USA
Henrik Iskov Christensen
DFKI GmbH, Stuhlsatzenhausweg 3, 66123, Saarbrücken, Germany
Geert-Jan M. Kruijff
School of Computer Science, University of Birmingham, B15 2TT, Birmingham, UK
Jeremy L. Wyatt

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Skočaj, D. et al. (2010). Multi-modal Learning. In: Christensen, H.I., Kruijff, GJ.M., Wyatt, J.L. (eds) Cognitive Systems. Cognitive Systems Monographs, vol 8. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11694-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-11694-0_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11693-3
Online ISBN: 978-3-642-11694-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics