Model-free incremental learning of the semantics of manipulation actions
Introduction
One of the main problems in cognitive robotics is how to recognize and learn human demonstrations of new concepts, for example learning a relatively complex manipulation sequence like cutting a cucumber. Association-based or reinforcement learning methods are usually too slow to achieve this in an efficient way. They are therefore most often used in combination with supervised learning. Especially the Learning from Demonstration (LfD) paradigm seems promising for cognitive learning [1], [2], [3], [4], [5] because humans employ it very successfully. The problem that remains in all these approaches is how to represent complex actions or chains of actions in a generic and generalizable way allowing inferring the essential “meaning” (semantics) of an action irrespective of its individual instantiation.
In our earlier studies we introduced the “Semantic Event Chain” (SEC) as a possible descriptor for manipulation actions [6], [7]. The SEC framework analyzes the sequence of changes of the spatial relations between the objects that are being manipulated by a human or a robot. Consequently, SECs are invariant to the particular objects used, the precise object poses observed, the actual trajectories followed, or the resulting interaction forces between objects. All these aspects are allowed to change and still the same SEC is observed and captures the “essence of the action” as demonstrated in several action classification tests performed by us [6], [7], [8], [9].
In this paper, we address the problem of on-line, incremental learning of the semantics of manipulation actions observed from human demonstrations. We use the concept of SECs as the main processing tool to encode manipulations in a generic and compact way. Manipulations are continuous in the temporal domain but with event chains we discretize them by sampling only decisive key time points. Those time points represent topological changes between objects and the hand in the scene which are highly descriptive for a given manipulation. Our main intent here is to design a cognitive agent that can infer and learn frequently observed spatiotemporal changes embedded in SECs in an unsupervised manner whenever a new manipulation instance occurs. The learning phase is bootstrapped only with the semantic similarities between SECs, i.e. the encoded spatiotemporal patterns, without using any prior knowledge about actions or objects. Since we use computer vision methods to derive event chains, our approach for incremental learning of semantics is highly grounded in the signal domain. To the best of our knowledge, this is the first attempt to apply reasoning at the semantic level, while being fully grounded at the signal level, to learn manipulations with an unsupervised method. Note, here–on purpose–we do not include any object- or other information to show the power of our methods to fully automatically and in an unsupervised way extract action and object information. Clearly, in praxis, it will often make sense to include whatever additional knowledge is available to further ease action understanding.
The paper is organized as follows. We start with introducing the state of the art. We next provide a detailed description of each processing step; extraction of SEC representations and learning model-SECs for each observed manipulation. In the next section, we discuss experimental results from the proposed framework, which also includes validation and testing of the learned models. We finally conclude with a discussion.
Section snippets
State of the art
Learning from Demonstration (LfD) has been successfully applied both at the control [1], [2], [10] as well as the symbolic level [3], [4], [5]. Although various types of actions can be encoded at the control level, e.g. trajectory-level, this is not general enough to imitate complicated actions under different circumstances. On the other hand, at the symbolic level, sequences of predefined abstract action units are used to learn complex actions, but this might lead to problems for execution as
Method
In this method section we will present the core algorithmic components where are complex details will only be given in the Appendix. This should make reading easier, while still everything is present to implement this algorithm if desired.
Results
In this section, we will first show experimental results from our proposed incremental learning framework. We will then continue with enrichment of each learned SEC model with object information. Next, validation and testing processes of the learned models will be given.
Discussion
The main contribution of our paper is a novel method for incrementally learning the semantics of manipulation actions by observation. The proposed learning framework is bootstrapped with the semantic relations (SECs) between observed manipulations without using any prior knowledge about actions or objects while being fully grounded at the sensory level (image segments). To our best knowledge this is one of the first attempts in cognitive robotics to infer descriptive semantic models of observed
Acknowledgment
The research leading to these results has received funding from the European Community’s Seventh Framework Programme FP7/2007–2013 (Specific Programme Cooperation, Theme 3, Information and Communication Technologies) under grant agreement no. 270273, Xperience.
Eren Erdal Aksoy received his M.Sc. degree in Mechatronics from the University of Siegen, Germany, in 2008 and his Ph.D. degree in computer science from the University of Göttingen, Germany, in 2012. He is currently employed as a research assistant at the Bernstein Center at the University of Göttingen. His research interests include semantic analysis of manipulation actions, computer vision, and imitation learning.
References (38)
Is imitation learning the route to humanoid robots?
Trends Cogn. Sci.
(1999)- et al.
Discriminative and adaptive imitation in uni-manual and bi-manual tasks
Robot. Auton. Syst.
(2006) - et al.
Visual object-action recognition: Inferring object affordances from human demonstration
Comput. Vis. Image Underst.
(2011) - et al.
Incremental learning of tasks from user demonstrations, past experiences, and vocal comments
IEEE Trans. Syst. Man Cybern. B
(2007) - et al.
Robot learning from demonstration: a task-level planning approach
Int. J. Adv. Robot. Syst.
(2008) - R. Cubek, W. Ertel, Learning and Execution of High-Level Concepts with Conceptual Spaces and PDDL, in: 3rd Workshop on...
- E.E. Aksoy, A. Abramov, F. Wörgötter, B. Dellen, Categorizing object-action relations from semantic scene graphs, in:...
- et al.
Learning the semantics of object-action relations by observation
Int. J. Robot. Res.
(2011) - E.E. Aksoy, B. Dellen, M. Tamosiunaite, F. Wörgötter, Execution of a dual-object (pushing) action with semantic event...
- E.E. Aksoy, M. Tamosiunaite, R. Vuga, A. Ude, C. Geib, M. Steedman, F. Wörgötter, Structural bootstrapping at the...
Incremental semantically grounded learning from demonstration
Robot. Sci. Syst.
Temporal scene analysis: Conceptual descriptions of object movements
Toward an assembly plan from observation, part I: Task recognition with polyhedral objects
IEEE Trans. Robot. Automat.
Cited by (0)
Eren Erdal Aksoy received his M.Sc. degree in Mechatronics from the University of Siegen, Germany, in 2008 and his Ph.D. degree in computer science from the University of Göttingen, Germany, in 2012. He is currently employed as a research assistant at the Bernstein Center at the University of Göttingen. His research interests include semantic analysis of manipulation actions, computer vision, and imitation learning.
Minija Tamosiunaite has received a Ph.D. in Informatics in Vytautas Magnus University, Lithuania, in 1997. Currently she works as a senior researcher at the Bernstein Center for Computational Neuroscience, Inst. Physics 3, University of Göttingen. Her research interests include machine learning, biological signal analysis, and application of learning methods in robotics.
Florentin Wörgötter has studied biology and mathematics at the University of Düsseldorf, Germany. He received his Ph.D. for work on the visual cortex from the University of Essen, Germany, in 1988. From 1988 to 1990, he was engaged in computational studies with the California Institute of Technology, Pasadena. Between 1990 and 2000, he was a Researcher at the University of Bochum, Germany where he was investigating the experimental and computational neuroscience of the visual system. From 2000 to 2005, he was a Professor of computational neuroscience with the Psychology Department, University of Stirling, U.K., where his interests strongly turned towards “Learning in Neurons”. Since July 2005, he has been the Head of the Computational Neuroscience Department at the Bernstein Center for Computational Neuroscience, Inst. Physics 3, University of Göttingen, Germany. His current research interests include information processing in closed-loop perception–action systems, sensory processing, motor control, and learning/plasticity, which are tested in different robotic implementations. His group has also developed the RunBot, which is a fast and adaptive biped-walking robot.