Elsevier

Robotics and Autonomous Systems

Volume 71, September 2015, Pages 118-133
Robotics and Autonomous Systems

Model-free incremental learning of the semantics of manipulation actions

https://doi.org/10.1016/j.robot.2014.11.003Get rights and content

Highlights

  • We addressed the problem of on-line learning of the semantics of manipulations.

  • This is the first attempt to apply reasoning at the semantic level for learning.

  • Our framework is fully grounded at the signal level.

  • We introduced a new benchmark with 8 manipulations including in total 120 samples.

  • We evaluated the learned semantic models with 20 long manipulation sequences.

Abstract

Understanding and learning the semantics of complex manipulation actions are intriguing and non-trivial issues for the development of autonomous robots. In this paper, we present a novel method for an on-line, incremental learning of the semantics of manipulation actions by observation. Recently, we had introduced the Semantic Event Chains (SECs) as a new generic representation for manipulations, which can be directly computed from a stream of images and is based on the changes in the relationships between objects involved in a manipulation. We here show that the SEC concept can be used to bootstrap the learning of the semantics of manipulation actions without using any prior knowledge about actions or objects. We create a new manipulation action benchmark with 8 different manipulation tasks including in total 120 samples to learn an archetypal SEC model for each manipulation action. We then evaluate the learned SEC models with 20 long and complex chained manipulation sequences including in total 103 manipulation samples. Thereby we put the event chains to a decisive test asking how powerful is action classification when using this framework. We find that we reach up to 100% and 87% average precision and recall values in the validation phase and 99% and 92% in the testing phase. This supports the notion that SECs are a useful tool for classifying manipulation actions in a fully automatic way.

Introduction

One of the main problems in cognitive robotics is how to recognize and learn human demonstrations of new concepts, for example learning a relatively complex manipulation sequence like cutting a cucumber. Association-based or reinforcement learning methods are usually too slow to achieve this in an efficient way. They are therefore most often used in combination with supervised learning. Especially the Learning from Demonstration (LfD) paradigm seems promising for cognitive learning [1], [2], [3], [4], [5] because humans employ it very successfully. The problem that remains in all these approaches is how to represent complex actions or chains of actions in a generic and generalizable way allowing inferring the essential “meaning” (semantics) of an action irrespective of its individual instantiation.

In our earlier studies we introduced the “Semantic Event Chain” (SEC) as a possible descriptor for manipulation actions  [6], [7]. The SEC framework analyzes the sequence of changes of the spatial relations between the objects that are being manipulated by a human or a robot. Consequently, SECs are invariant to the particular objects used, the precise object poses observed, the actual trajectories followed, or the resulting interaction forces between objects. All these aspects are allowed to change and still the same SEC is observed and captures the “essence of the action” as demonstrated in several action classification tests performed by us  [6], [7], [8], [9].

In this paper, we address the problem of on-line, incremental learning of the semantics of manipulation actions observed from human demonstrations. We use the concept of SECs as the main processing tool to encode manipulations in a generic and compact way. Manipulations are continuous in the temporal domain but with event chains we discretize them by sampling only decisive key time points. Those time points represent topological changes between objects and the hand in the scene which are highly descriptive for a given manipulation. Our main intent here is to design a cognitive agent that can infer and learn frequently observed spatiotemporal changes embedded in SECs in an unsupervised manner whenever a new manipulation instance occurs. The learning phase is bootstrapped only with the semantic similarities between SECs, i.e. the encoded spatiotemporal patterns, without using any prior knowledge about actions or objects. Since we use computer vision methods to derive event chains, our approach for incremental learning of semantics is highly grounded in the signal domain. To the best of our knowledge, this is the first attempt to apply reasoning at the semantic level, while being fully grounded at the signal level, to learn manipulations with an unsupervised method. Note, here–on purpose–we do not include any object- or other information to show the power of our methods to fully automatically and in an unsupervised way extract action and object information. Clearly, in praxis, it will often make sense to include whatever additional knowledge is available to further ease action understanding.

The paper is organized as follows. We start with introducing the state of the art. We next provide a detailed description of each processing step; extraction of SEC representations and learning model-SECs for each observed manipulation. In the next section, we discuss experimental results from the proposed framework, which also includes validation and testing of the learned models. We finally conclude with a discussion.

Section snippets

State of the art

Learning from Demonstration (LfD) has been successfully applied both at the control  [1], [2], [10] as well as the symbolic level [3], [4], [5]. Although various types of actions can be encoded at the control level, e.g. trajectory-level, this is not general enough to imitate complicated actions under different circumstances. On the other hand, at the symbolic level, sequences of predefined abstract action units are used to learn complex actions, but this might lead to problems for execution as

Method

In this method section we will present the core algorithmic components where are complex details will only be given in the Appendix. This should make reading easier, while still everything is present to implement this algorithm if desired.

Results

In this section, we will first show experimental results from our proposed incremental learning framework. We will then continue with enrichment of each learned SEC model with object information. Next, validation and testing processes of the learned models will be given.

Discussion

The main contribution of our paper is a novel method for incrementally learning the semantics of manipulation actions by observation. The proposed learning framework is bootstrapped with the semantic relations (SECs) between observed manipulations without using any prior knowledge about actions or objects while being fully grounded at the sensory level (image segments). To our best knowledge this is one of the first attempts in cognitive robotics to infer descriptive semantic models of observed

Acknowledgment

The research leading to these results has received funding from the European Community’s Seventh Framework Programme FP7/2007–2013 (Specific Programme Cooperation, Theme 3, Information and Communication Technologies) under grant agreement no. 270273, Xperience.

Eren Erdal Aksoy received his M.Sc. degree in Mechatronics from the University of Siegen, Germany, in 2008 and his Ph.D. degree in computer science from the University of Göttingen, Germany, in 2012. He is currently employed as a research assistant at the Bernstein Center at the University of Göttingen. His research interests include semantic analysis of manipulation actions, computer vision, and imitation learning.

References (38)

  • S. Schaal

    Is imitation learning the route to humanoid robots?

    Trends Cogn. Sci.

    (1999)
  • A. Billard et al.

    Discriminative and adaptive imitation in uni-manual and bi-manual tasks

    Robot. Auton. Syst.

    (2006)
  • H. Kjellström et al.

    Visual object-action recognition: Inferring object affordances from human demonstration

    Comput. Vis. Image Underst.

    (2011)
  • M. Pardowitz et al.

    Incremental learning of tasks from user demonstrations, past experiences, and vocal comments

    IEEE Trans. Syst. Man Cybern. B

    (2007)
  • S. Ekvall et al.

    Robot learning from demonstration: a task-level planning approach

    Int. J. Adv. Robot. Syst.

    (2008)
  • R. Cubek, W. Ertel, Learning and Execution of High-Level Concepts with Conceptual Spaces and PDDL, in: 3rd Workshop on...
  • E.E. Aksoy, A. Abramov, F. Wörgötter, B. Dellen, Categorizing object-action relations from semantic scene graphs, in:...
  • E.E. Aksoy et al.

    Learning the semantics of object-action relations by observation

    Int. J. Robot. Res.

    (2011)
  • E.E. Aksoy, B. Dellen, M. Tamosiunaite, F. Wörgötter, Execution of a dual-object (pushing) action with semantic event...
  • E.E. Aksoy, M. Tamosiunaite, R. Vuga, A. Ude, C. Geib, M. Steedman, F. Wörgötter, Structural bootstrapping at the...
  • S. Niekum et al.

    Incremental semantically grounded learning from demonstration

    Robot. Sci. Syst.

    (2013)
  • N. Badler

    Temporal scene analysis: Conceptual descriptions of object movements

    (1975)
  • K. Ikeuchi et al.

    Toward an assembly plan from observation, part I: Task recognition with polyhedral objects

    IEEE Trans. Robot. Automat.

    (1994)
  • M. Sridhar, G.A. Cohn, D. Hogg, Learning functional object-categories from a relational spatio-temporal representation,...
  • Y. Yang, C. Fermüller, Y. Aloimonos, Detection of manipulation action consequences (mac), in: International Conference...
  • D. Summers-Stay, C. Teo, Y. Yang, C. Fermuller, Y. Aloimonos, Using a minimal action grammar for activity understanding...
  • K. Nagahama, K. Yamazaki, K. Okada, M. Inaba, Manipulation of multiple objects in close proximity based on visual...
  • K. Ramirez-Amaro, E.-S. Kim, J. Kim, B.-T. Zhang, M. Beetz, G. Cheng, Enhancing Human Action Recognition through...
  • Y. Yang, C. Fermüller, Y. Aloimonos, A cognitive system for human manipulation action understanding, in: Proceedings of...
  • Cited by (0)

    Eren Erdal Aksoy received his M.Sc. degree in Mechatronics from the University of Siegen, Germany, in 2008 and his Ph.D. degree in computer science from the University of Göttingen, Germany, in 2012. He is currently employed as a research assistant at the Bernstein Center at the University of Göttingen. His research interests include semantic analysis of manipulation actions, computer vision, and imitation learning.

    Minija Tamosiunaite has received a Ph.D. in Informatics in Vytautas Magnus University, Lithuania, in 1997. Currently she works as a senior researcher at the Bernstein Center for Computational Neuroscience, Inst. Physics 3, University of Göttingen. Her research interests include machine learning, biological signal analysis, and application of learning methods in robotics.

    Florentin Wörgötter has studied biology and mathematics at the University of Düsseldorf, Germany. He received his Ph.D. for work on the visual cortex from the University of Essen, Germany, in 1988. From 1988 to 1990, he was engaged in computational studies with the California Institute of Technology, Pasadena. Between 1990 and 2000, he was a Researcher at the University of Bochum, Germany where he was investigating the experimental and computational neuroscience of the visual system. From 2000 to 2005, he was a Professor of computational neuroscience with the Psychology Department, University of Stirling, U.K., where his interests strongly turned towards “Learning in Neurons”. Since July 2005, he has been the Head of the Computational Neuroscience Department at the Bernstein Center for Computational Neuroscience, Inst. Physics 3, University of Göttingen, Germany. His current research interests include information processing in closed-loop perception–action systems, sensory processing, motor control, and learning/plasticity, which are tested in different robotic implementations. His group has also developed the RunBot, which is a fast and adaptive biped-walking robot.

    View full text