Applying Ant Colony Optimization algorithms for high-level behavior learning and reproduction from demonstrations

https://doi.org/10.1016/j.robot.2014.12.001Get rights and content

Highlights

  • A behavior learning method based on Ant Colony Optimization (ACO) is proposed.

  • We combined ACO, Semantic Networks and Spreading Activation mechanisms.

  • The method is able to learn high-level aspects of behaviors from demonstrations.

  • The method answers questions of “What to imitate” and “When to imitate”.

  • The method generalizes concepts while learning high-level aspects of behaviors.

Abstract

In domains where robots carry out human’s tasks, the ability to learn new behaviors easily and quickly plays an important role. Two major challenges with Learning from Demonstration (LfD) are to identify what information in a demonstrated behavior requires attention by the robot, and to generalize the learned behavior such that the robot is able to perform the same behavior in novel situations.

The main goal of this paper is to incorporate Ant Colony Optimization (ACO) algorithms into LfD in an approach that focuses on understanding tutor’s intentions and learning conditions to exhibit a behavior. The proposed method combines ACO algorithms with semantic networks and spreading activation mechanism to reason and generalize the knowledge obtained through demonstrations. The approach also provides structures for behavior reproduction under new circumstances. Finally, applicability of the system in an object shape classification scenario is evaluated.

Introduction

During the past years robot task learning has received remarkable attention and motivated the robotics community to take a deeper interest in techniques based on human skill learning from observation  [1]. In robotics, such an approach fits in the framework of Learning from Demonstration (LfD).

In the current paper, we address the questions “What to imitate” and “When to imitate” from a high-level perspective, while employing methods to learn and reproduce motor actions from demonstrations. Therefore, our focus is on learning and reproducing high-level aspects of demonstrated behaviors. For this purpose, we utilize a core Semantic Network (SN) as a model to represent behaviors by nodes and link them to a set of other nodes, which correspond to concepts and objects in the real world. The network contains concepts, objects and their properties required for learning and reproducing behaviors, and must be provided prior to the learning and reproducing process. The learned behaviors are then used as object affordances as well as preparing the ground for behavior arbitration. Our learning methods also provide techniques to learn conditions that result in the behavior and thus answer the question of “When to imitate”. These conditions can be environmental, objects to use, and concepts related to demonstrated behavior. Therefore, depending on the amount of knowledge available in the core SN, the robot may perceive enormous amounts of information during learning. In many cases when the desired behavior is very complex or is demonstrated in an ambiguous manner, the robot requires a bias in order to focus on the right aspects of the demonstration  [2]. By having a controller from a higher abstraction level to guide the robot especially during the learning phase, the robot’s attention can be directed to aspects of demonstration that are significant for learning behaviors and thereby answering question of “What to imitate”. This controller is a part of an architecture that has been proposed and developed in our previous work  [3], [4], and is employed in the current paper.

Our current work introduces, as an original contribution, the use of SNs for biasing the robot and the use of ACO algorithms to learn new behaviors and define the degree of generalization in order to exhibit the learned behaviors in situations new to the robot. Our previously developed learning method  [4] is based on one-way ANOVA test and has some observed limitations caused by the imposed statistical constraints. This makes the method less successful in noisy environments. To address this issue, our new method views the problem of behavior learning as an optimization problem and applies ACO algorithms to determine the most related elements of demonstrations. Nodes and links in the core SN represent elements of a demonstration and the goal is to find the shortest path between each element’s node and a behavior node using pheromone-laying behavior implementation of ACO.

The rest of the article is organized in the following manner: Section  2 provides background of the work, Section  3 presents principle elements of SNs for modeling the world and representing behaviors, Section  4 provides adapted formulations of ACO algorithms for the purpose of learning behaviors, Section  5 elaborates learning and reproduction of high-level representation of behaviors using ACO and SN, Section  6 presents the experimental setup and results, Section  7 gives a discussion of the approach and finally in Section  8 we make the conclusions.

Section snippets

Learning from demonstration

LfD is a promising way to naturally and intuitively teach robots new behaviors (skills) by demonstrating how to perform the behavior  [5]. Applying LfD does not require any expert knowledge of robotics or domain dynamics, so it can be easily applied by non-roboticist users for both trivial and non-trivial behaviors. To apply LfD, a number of questions have to be answered, as brought to attention by Schaal  [6] and Demiris and Hayes  [7]. These central questions are known as the “Big Five” and

Formulation of spreading activation

The core SN is not informative per se, it requires a method to query the network and retrieve the information; this method is called Spreading Activation. The hierarchical network model is the base for long-term memory which contains interconnected nodes of information. The connections implement associations between the nodes and can control how to retrieve information.

When a node is activated by perceptual input, its activation value is set to 1.0, which is then propagated to its connections

Formulation of ACO algorithms

In the following sections we are going to give an overview of Ant System (AS) and Ant Colony System (ACS) meta-heuristic algorithms based on our application of LfD using core SN.

Learning

The proposed approach is aimed at teaching the robot the necessary conditions used for reproducing behaviors, and controls the way it generalizes the learned conditions. The conditions are environmental states, presence and properties of perceived objects, and associations to other nodes in the SN. During the learning phase, the nodes representing the conditions are linked to a new node, denoted context node. What the robot perceives is linked to the context node directly and thus activation

Experimental results

Two experiments have been conducted to demonstrate how AS and ACS, respectively, are used to learn and generalize a new behavior from demonstrations. In the presented experiments the robot is taught to collect objects with particular shapes, and then to place them in the designated baskets regardless of their color and size. The shapes are cylindrical, triangular and square and the objects should be placed in baskets with the same shape. We run the experiments with three objects of the same

Discussions

In this section, several aspects of the achieved results are discussed. In the presented examples, both the AS and ACS algorithms were capable of reproducing the demonstrated behaviors although they generated slightly different networks. This difference was due to different pheromone-updating processes in the two algorithms. As an example, consider the Collect Cylindrical Object context. With the AS algorithm (see Table 1), nodes that were activated in only one of the two demonstrations

Conclusions

This paper addresses the challenge of “What to imitate?” and “When to imitate?” from a higher-level perspective. In real world scenarios, all concepts and objects in the environment can impact the learning process. In the case of an ambiguous demonstration encompassing numerous distracting objects, a technique to focus the robot’s attention to the significant aspects of the demonstration is needed. For this end, a learning method with ACO algorithms and semantic networks is introduced. The

Acknowledgment

This work was financed by the EU funded Initial Training Network (ITN) in the Marie-Curie People Programme (FP7): INTRO (INTeractive RObotics research network), grant agreement no. 238486.

Benjamin Fonooni received his B.S and M.S. degree in Artificial Intelligence and Robotics from Tehran Azad University—Science and Research branch in 2003 and 2006 respectively. He is currently pursuing his Ph.D. at Umeå University (Sweden) and involved in the Marie Curie Initial Training Network INTeractive RObotics (INTRO) as an Early Stage Researcher. His main research topics are developing techniques for robot learning, based on Learning from Demonstration and Imitation by utilizing various

References (48)

  • B. Fonooni, T. Hellström, L.-E. Janlert, Learning high-level behaviors from demonstration through semantic networks,...
  • B. Fonooni, T. Hellström, L.E. Janlert, Towards goal based architecture design for learning high-level representation...
  • J. Demiris et al.

    Imitation as a dual-route process featuring predictive and learning components; biologically plausible computational model

  • K. Dautenhahn et al.

    The Agent-based Perspective on Imitation

    (2002)
  • M.J. Mataric

    Sensoryf: linking perception to action and biology to robotics

  • S. Ekvall et al.

    Grasp recognition for programming by demonstration

  • P. Pastor et al.

    Learning and generalization of motor skills by learning from demonstration

  • E.A. Billing et al.

    A formalism for learning from demonstration

    Paladyn J. Behav. Robot.

    (2010)
  • M. Mahmoodian et al.

    Hierarchical concept learning based on functional similarity of actions

  • H. Hajimirsadeghi et al.

    Conceptual imitation learning in a human–robot interaction paradigm

    ACM Trans. Intell. Syst. Technol. (TIST)

    (2012)
  • M. Cakmak et al.

    Computational benefits of social learning mechanisms: stimulus enhancement and emulation

  • C. Chao et al.

    Towards grounding concepts for transfer in goal learning from demonstration

  • S. Ekvall

    Robot task learning from human demonstration

    (2007)
  • T.M. Mitchell et al.

    Explanation-based generalization: a unifying view

    Mach. Learn.

    (1986)
  • Cited by (9)

    View all citing articles on Scopus

    Benjamin Fonooni received his B.S and M.S. degree in Artificial Intelligence and Robotics from Tehran Azad University—Science and Research branch in 2003 and 2006 respectively. He is currently pursuing his Ph.D. at Umeå University (Sweden) and involved in the Marie Curie Initial Training Network INTeractive RObotics (INTRO) as an Early Stage Researcher. His main research topics are developing techniques for robot learning, based on Learning from Demonstration and Imitation by utilizing various aspects of human–robot interaction.

    Aleksandar Jevtić received the B.S. degree in Electrical Engineering from the University of Belgrade, Serbia in 2005, and the M.S. and Ph.D. degrees in Computer Science from the Technical University of Madrid (UPM), Spain in 2007 and 2011, respectively. In his Ph.D. work he proposed a general design methodology of swarm intelligence tools for various application domains. As a Marie Curie Postdoctoral Fellow in Robosoft, and later as an invited researcher at the ESTIA institute, both in France, he focused his research on human-robot interaction.

    Thomas Hellström—Assoc. prof. in Computing Science at Umeå University, Sweden. Expertise in field robotics, autonomous forest machines, and machine learning. 10+ years industrial experience of product development and project management in R&D for automation. Has supervised 2 Ph.D. students and over 30 Masters degree projects.

    Lars-Erik Janlert—Professor in Computing Science and Cognitive Science at Umeå University Sweden. Expertise in cognitive science, human–computer interaction, knowledge representation, information philosophy, design theory.

    View full text