A computational model of intention reading in imitation

https://doi.org/10.1016/j.robot.2006.01.006Get rights and content

Abstract

Imitation in artificial systems involves a number of important aspects, such as extracting the relevant features of the demonstrated behaviour, inverse mapping observations, and executing motor commands. In this article we focus on how an artificial system can infer what the demonstrator intended to do. The model that we propose draws inspiration from developmental psychology and contains three crucial features. One is that the imitating agent needs repeated trials, thus stepping away from the one-shot learning by demonstration paradigm. The second is that the imitating agent needs a learning method in which it keeps track of intentions to reach goals. The third feature is that the model does not require an external measure of equivalence; instead, the demonstrator decides whether an attempted imitation was equivalent to the demonstration or not. We present a computational model and its simulation results, which underpin our theory of goal-directed imitation on an artificial system.

Introduction

Imitation in agents and robots involves a number of aspects which have been elegantly cast into five questions by Dautenhahn and Nehaniv [1]. These questions are who, when, what, and how to imitate and how to judge if an imitation was successful. The who question is about whom to imitate: what makes someone or something a good model to imitate? Many studies, including this study, circumvent this issue by defining a fixed role for one agent as being the demonstrator and another agent as being the imitator. The when to imitate issue is about how agents know when imitation is appropriate. This is not always obvious, and the start and end of a demonstration are often heavily contextualised. Imitation might be just for play, or it might be in a teacher–student context. Again, we are not concerned by this and presuppose that there is a teacher–student relation between the agents. The how to imitate question addresses the problem of inverse kinematics: how to reach a certain state given your body configuration and constraints from your actuators and from the environment (correspondence problem [2]). Humans acquire knowledge for this task for a large part through motor babbling, a mechanism which has recently been demonstrated for robots as well [3], [4], [5], [6]. Learning inverse kinematics is a major issue, which we will not address here: in our model we do not consider how a robot would exactly move objects; instead, we reach a certain goal1 by using a simple planning algorithm, leaving the intricacies of robot control for further research.

The what to imitate question is the focus of our model. How can an agent know what it should imitate? If you want to imitate someone rubbing his nose, do you rub your own nose, or the nose of the demonstrator? Or maybe you just mimic the rubbing motion, without touching any noses. Or, you copy everything as faithfully as possible, up to standing in the same place as the demonstrator. Humans typically solve this kind of situation by inferring the intention of the demonstrator and imitating the intention only: you just rub your own nose, using your right hand if you are right-handed and without paying any attention to imitating how the demonstrator moved his arm and hand to reach his nose. We propose a computational model for learning the goal–or intent–of a demonstration. Due the nature of the model, which draws inspiration from psychological models of imitation, the learning process requires repeated imitative interactions. It is therefore not so much a model of one-shot learning by demonstration, but rather a model suited for robots following a developmental process to acquire skills and knowledge.

In our model we propose one way in which the correct interpretation of demonstrated behaviour can self-organise through repeated interactions. We call these repeated interactions imitation games. An imitation game involves two agents, one being the demonstrator, the other the imitator. The demonstrator shows a certain behaviour involving actions and effects, which the imitator has to imitate. The quality of the imitation is judged by the demonstrator, and is used to adapt internal beliefs and representations in the imitator.

In previous work on imitating gestures (gestures are actions that do not have a physical effect on the world, such as waving) we have shown that limitations on the perception and the execution in a robotic set-up have a profound influence on the number and type of gestures that can be socially transmitted between robots [7], [8]. The noise in perception and imitation determines how well gestures are observed and reproduced. The number of gestures that agents could consistently learn from each other was inversely proportional to the noise on their perception and the production. This shows that embodiment is a crucial factor in imitation — see also [2].

When addressing imitation, one also has to look into judging the quality of imitation. In most approaches this is done by defining an explicit measure for the match between the demonstrated behaviour and the imitated behaviour. In Billard et al. [9] for example a cost function is defined which is then minimised so that imitated behaviour equals as much as possible the demonstrated behaviour. Alissandrakis et al. [10] also use a hard-coded metric for measuring the similarity of perceived and executed actions, states and their effect on the world.

Exactly what the intent is of a displayed behaviour is to a large extent determined by contextual information. This contextual information is absent in the simple experimental set-up presented here, and is not used by the model as such. Adding semantics for interpreting the goal is a formidable task, and some approaches exist in the field of event recognition (e.g. [11]).

The model presented in this work also relates to plan recognition, where an agent infers the goal through observing a sequence of actions leading towards that goal. Plan recognition is basically seen as a classification problem, where a sequence of actions (or input states) is classified into one or more goals, and often a standard classification algorithm is used to solve this (for an overview see [12]). In our model certain aspects of plan recognition are present–such as the inference of the goal from a sequence of observations–but it differs in two important aspects. Firstly, the goal of a sequence of actions is visible, whereas often in plan recognition and collaborative systems the goal needs to be inferred before the demonstration ends (your word processor needs to infer that you are writing a letter and offer you help with it before you have finished writing the letter). Secondly, our model extends and adapts its list of candidate goals through repeated imitative interactions. In plan recognition, on the contrary, goals are often fixed and the classification algorithm is trained on a set of hand-tailored examples.

Recently, there has been a growing interest in imitation in the context of social interactions, this specifically to facilitate the teaching of robots — see for example [13], [14]. Systems were built in which the goal rather than the precise actions are imitated, see for instance [9], [13], in which agents learn new goals, simultaneously with learning how to achieve the goals.

The focus of the model is on finding the intended goal of a demonstration using iterative interactions. It is not important how this goal is reproduced. This dovetails with the observations on imitation [15], [16]. In psychological experiments they observed that children pay attention to the goal of a demonstration (in their experiments: touching the left or right ear, or touching one of two dots on a table) and not so much to how this goal is achieved (children use either the left or right arm to touch their ears or dots on a table, whatever suits them best, irrespective of what arm the demonstrator used).2 Likewise, our model infers the goal of a demonstration without imitating the steps on how to reach that goal. The actions–or motor responses, if you will–that the agent executes depend on the environmental context and the embodiment of the imitator. Billard et al. [9] also present a model which is supported by these psychological observations.

Section snippets

A computational model

In our computer simulations, we consider a single demonstrator and a single imitator. We will show how the imitator can learn to imitate the goal of the actions of the demonstrator. Both agents have a robot arm and reside in a blocks world which can be manipulated by the agent by means of a limited set of very simple actions. This set-up is depicted in Fig. 1.

The goals the agent can learn in this blocks world express its desire to have the blocks world arranged such that it matches certain

Measures

The performance of the imitator in both imitating and learning a repertoire of goals is evaluated using three measures. Besides the imitative success (which is the fraction of successful games) and the repertoire size, we also measure the similarity between the repertoires of the demonstrator and the imitator. The category distance (CD) is the distance between the repertoires of two agents. However, this measure is not symmetrical, thus the average category distance (CD¯) is computed as the

Experiments

The environment of the agents consists of a two-dimensional 5-by-5 simulated blocks world, consisting of the blocks BS={A,B,C}. Agents can manipulate these blocks; a block can be moved a single cell in any of the four directions: up(x),down(x),left(x),right(x), where x denotes a block in BS. The plans built by the agents consist of several of those actions. In the experiments reported here the goals in the repertoires of the agents are represented as conjunctions of the predicates above?(x,y)

Discussion

The issues involved in imitation, although we can identify them clearly, are very much intertwined, and studying one is often impossible without addressing the other. In order to get a clear picture of the what aspect of imitation and in order simplify the model, we fixed the when, who and how aspects of imitation. We have described and tested a mechanism for the imitation of discrete goals in a population of simulated agents. Goals are represented as spatial relations over blocks. This

Acknowledgements

BJ is sponsored by a grant from the Institute for the Promotion of Innovation by Science and Technology in Flanders (IWT). Thanks to Musaab Garghouti for the illustration in Fig. 1.

Bart Jansen is a Ph.D. student at the Vrije Universiteit Brussel (VUB), Belgium, and is paid by grant from the IWT (Instituut voor de aanmoediging van Innovatie door Wetenschap en Technologie in Vlaanderen). He works as a researcher at the Artificial Intelligence Laboratory headed by Luc Steels. His current research interests include cognitive robotics, robot imitation and concept formation.

References (21)

  • A. Billard et al.

    Discovering optimal imitation strategies

    Robotics and Autonomous Systems

    (2004)
  • D.A. Baldwin et al.

    Discerning intentions in dynamic human action

    Trends in Cognitive Sciences

    (2001)
  • K. Dautenhahn et al.

    The agent-based perspective on imitation

  • C.L. Nehaniv et al.

    The correspondence problem

  • S. Vijayakumar et al.

    Statistical learning for humanoid robots

    Autonomous Robots

    (2002)
  • S. Schaal et al.

    Computational approaches to motor learning by imitation

    Philosophical Transactions of the Royal Society of London: Series B

    (2003)
  • P. Andry et al.

    From sensori-motor development to low-level imitation

    Adaptive behavior

    (2004)
  • A. Dearden, Y. Demiris, Learning forward models for robots, in: Proceedings of the Nineteenth International Joint...
  • B. Jansen

    An imitation game for emerging action categories

  • B. Jansen et al.

    Imitation in embodied agents results in self-organization of behavior

There are more references available in the full text version of this article.

Cited by (34)

  • The collaborative mind: intention reading and trust in human-robot interaction

    2021, iScience
    Citation Excerpt :

    The use of embodied agents such as robots for the exploration of intention reading capabilities is promoted by Sciutti et al. (2015), who underline the importance of sharing the same action space with the human partner. Robots have in fact been successfully used to investigate intention understanding and sharing in turn-based games that possess a strong learning-by-demonstration aspect (Dominey and Warneken, 2011; Jansen and Belpaeme, 2006). Trust has also been extensively researched in the context of human-robot interaction (HRI), the main reason being that the quality of the interaction is usually shaped by how trustworthy the robot appears to the human.

  • Self discovery enables robot social cognition: Are you my teacher?

    2010, Neural Networks
    Citation Excerpt :

    Several researchers have addressed different aspects of this challenge, known as the correspondence problem. Approaches are diverse: The agents used to play the roles of student and teacher range from anthropomorphic robot arms (Jansen & Belpaeme, 2006; Schaal, 1997) to simulated robots (Amit & Mataric, 2002), as well as humanoid robots (Billard & Schaal, 2001) with redundant degrees of freedom. The nature of imitation varied from reproducing the demonstrator’s action (Amit & Mataric, 2002; Billard & Schaal, 2001) to performing an action that achieves the underlying goal of the observed action (Jansen & Belpaeme, 2006; Shon, Storz, Meltzoff, & Rao, 2007), and learning correctly from a ‘flawed’ demonstration (Breazeal, Berlin, Brooks, Gray, & Thomaz, 2006).

  • A survey of robot learning from demonstration

    2009, Robotics and Autonomous Systems
    Citation Excerpt :

    Overall task intentions provided in response to learner queries are used to encode plan post-conditions [87], and human feedback draws attention to particular elements of the domain when learning a high-level plan from shadowing [88]. Goal annotations include indicating the demonstrator’s goal and time points at which it is achieved or abandoned when learning non-deterministic plans [89], and providing binary feedback that grows or prunes the goal set used to infer the demonstrator’s goal [90]. Annotations of the demonstrated action sequence combine with high-level instructions, for example that some actions can occur in any order, when learning a domain-specific hierarchical task model [91].

View all citing articles on Scopus

Bart Jansen is a Ph.D. student at the Vrije Universiteit Brussel (VUB), Belgium, and is paid by grant from the IWT (Instituut voor de aanmoediging van Innovatie door Wetenschap en Technologie in Vlaanderen). He works as a researcher at the Artificial Intelligence Laboratory headed by Luc Steels. His current research interests include cognitive robotics, robot imitation and concept formation.

Tony Belpaeme is a senior lecturer in intelligent systems at the University of Plymouth. He obtained a Ph.D. in 2002 from the Vrije Universiteit Brussel (VUB), Belgium, where he then worked as a postdoctoral researcher at the Artificial Intelligence Laboratory headed by Luc Steels. He also held the position of guest professor at the same university. His current research focuses on cognitive robotics, the evolution of language and its application to intelligent systems and concept formation.

View full text