Elsevier

Knowledge-Based Systems

Volume 57, February 2014, Pages 8-27
Knowledge-Based Systems

Biologically inspired layered learning in humanoid robots

https://doi.org/10.1016/j.knosys.2013.12.003Get rights and content

Abstract

A hierarchical paradigm for bipedal walking which consists of 4 layers of learning is introduced in this paper. In the Central Pattern Generator layer some Learner-CPGs are trained which are made of coupled oscillatory neurons in order to generate basic walking trajectories. The dynamical model of each neuron in Learner-CPGs is discussed. Then we explain how we have connected these new neurons with each other and built up a new type of neural network called Learner-CPG neural networks. Training method of these neural networks is the most important contribution of this paper. The proposed two-stage learning algorithm consists of learning the basic frequency of the input trajectory to find a suitable initial point for the second stage. In the next stage a mathematical path to the best unknown parameters of the neural network is designed. Then these neural networks are trained with some basic trajectories enable them to generate new walking patterns based on a policy. A policy of walking is parameterized by some policy parameters controlling the central pattern generator variables. The policy learning can take place in a middle layer called MLR layer. High level commands are originated from a third layer called HLDU layer. In this layer the focus is on training curvilinear walking in NAO humanoid robot. This policy should optimize total payoff of a walking period which is defined as a combination of smoothness, precision and speed.

Introduction

The problem of robot locomotion is where neuroscience and robotics converge. This common part is the pattern generators in the spinal cord of vertebrate animals called “Central Pattern Generators” (CPGs). CPGs are neural circuits located in the end parts of the brain and first parts of the spinal cord of a large number of animals and are responsible for generating rhythmic and periodic patterns of locomotion in different parts of their bodies [1]. Although these pattern generators use very simple sensory inputs imported from the sensory systems, they can produce high dimensional and complex patterns for walking, swimming, jumping, turning and other types of locomotion [2]. The idea that human nervous system has a layered mechanism in generating complex locomotion patterns with only simple stimulations is a provocative one which is intended to be modeled in this paper.

Learning in humanoid robots deals with a large number of challenges. For example, the robot should overcome noisy and nondeterministic situations and reduce unwelcome perturbations [4]. The state space is continuous and multidimensional, thus it is impossible to search systematically in that space. The fact that there is no explicit mapping between intentions and actions in a humanoid robot is a big issue that should be solved [5].

In this paper we intend to train to perform a curvilinear walk in a NAO soccer player robot using a hierarchical layered learning paradigm. The proposed method uses a basic CPG based walk controller built of Learner-CPG Neural Networks (LCPGNNs). In this manner, any kind of complex behavior can be trained into a CPG neural network and it can be used in the movement of different types of robots.

In the next section related works in the field of humanoid robot locomotion and learning will be reviewed and the advantages and disadvantages in each method will be discussed. In this section we also introduce NAO platform which is used in this research. Section 3 is dedicated to the proposed model of layered learning in this work. It introduces each layer of our learning platform and explains different correlations between the layers. The CPG layer is explained in Section 4. The role of the arms and coupling of them with other joints is explained in this section. Another important concept is the feedback pathways which are discussed here. The mathematical discussion about the learning algorithm used for Learner-CPG neural networks is presented in Section 5. Here the two-stage learning algorithm which can train each oscillator neuron and its synaptic connection in a LCPGNN is explained. Section 6 introduces the MLR layer and its learning mechanism. We use reinforcement learning in this layer to find an optimal policy for the CPG layer. Policy parameterization and payoff function is discussed here as well. Section 7 includes the experimental results. Here some of the implementations and results in WebotsTM simulator and simulink of Matlab are presented. In Section 8 the highest layer HLDU, its functions and capabilities are briefly discussed. Section 9 includes the conclusion and future works.

Section snippets

Related works

There are many approaches to solve bipedal skill learning issues [6]. As an alternative to the methods using pre-recorded trajectories [7], [8], ZMP-based approaches [11] or methods using heuristic control laws (e.g. Virtual Model Control (VMC) [12]), the CPG based methods are introduced, using some biological perspectives. They encode rhythmic trajectories as limit cycles of nonlinear dynamical systems. Coupled oscillator-based CPG implementations offer miscellaneous features such as the

Layered learning architecture

The idea of layered learning in multi agent systems was introduced in [19] by Stone. He investigated the use of machine learning within a team of soccer player’s 2D agents. Using hierarchical task decomposition, layered learning enables us to learn a specific task at each level of the hierarchy. Here a hierarchical learning framework for walk learning in soccer player humanoid robots is designed. Our model composed of 4 different layers. Designs of these layers are inspired from biological

CPG layer

In this section the building blocks of CPG layer, the third layer of the hierarchical model is discussed. First, the new design of oscillatory neural networks is described. The architecture, inputs, outputs and internal dynamics of each neuron is explained. Then the neurons are connected to build up the Learner-CPG Neural Network (LCPGNN). Next the coupling scheme between joints and its role in walking are discussed. Section 4.4 of this section is devoted to the arms’ movement. The effects of

Training algorithm of the neural network

To train different trajectories to LCPGNNs in CPG layer we need to enter the training trajectories to the system and after a short period of time, the parameters of LCPGNNs will be trained to the desired inputs (i.e. the total squared error converge to zero). This should be done to find initial-state values of O-neurons and post synaptic weights through the network. The neuron in each LCPGNN sends out its c state as synchronization criterion to the other o-neurons. The major benefit of this

MLR policy learning layer

MLR part of this proposed model is an important layer which is responsible to make suitable stimulations for CPG layer. This section discusses the learning process in MLR layer. The main objective in this layer, the policy learning, is explained. How to model previous layer with a reinforcement learning problem and train some important open parameters with policy gradient learning according to feedbacks originated from gyro sensor values and information of speed and precision comes from HLDU

Experimental results and analysis

In this section, implementation methods and experimental results of the biologically inspired layered learning paradigm is presented. The first-stage experiments (training phase of CPG layer) were tested using the Simulink toolbox in MATLAB. In the second stage (the online policy gradient based training of the robot in MLR layer), an integrated simulation of the NAO robot in Webots Robotstadium was used [18]. The model of the robot is almost similar to the real robot as far as simulation is

HLDU layer

The HLDU layer is briefly discussed here. Since this layer deals with high level decision making in soccer robots, it contains complex processing units. The inputs and outputs of this layer are described in the application and illustrate future perspectives and capabilities of this layer in learning advanced behavior in soccer player humanoid robots.

The main objective of this layer is to decide about strategic goals in soccer i.e. where is the robot now, where it should go next, where is the

Conclusions

A hierarchical model for learning bipedal walking skills using learner central pattern generator neural networks are introduced which are trained in two stages. In the first stage CPG layer is trained by NAO basic walking trajectories in order to find fundamental frequencies. In the second stage a learning method is proposed based on Levenberg–Marquardt algorithm with an analytical gradient calculation. The second stage starts from the initial point found in the first stage and finalize the

References (27)

  • T. Matsubaraa

    Learning CPG-based biped locomotion with a policy gradient method

    Robot. Auton. Syst.

    (2006)
  • J. Strom et al.
  • J. Pratt, P. Dilworth, G. Pratt, Virtual model control of a biped walking robot, in: Presented at the IEEE Int’l Conf....
  • Cited by (17)

    • Hybrid autonomous controller for bipedal robot balance with deep reinforcement learning and pattern generators[Formula presented]

      2021, Robotics and Autonomous Systems
      Citation Excerpt :

      This controller has pre-deployment and post-deployment learning capabilities that are based on humans’ and animals gait control. The controller is a hierarchical Central Pattern Generator (CPG) that is divided into two independent systems, in contrast to similar approaches from other researchers where they are combined into a single controller [9–11]. The higher-level system processes all sensors’ information to produce parameters for the lower-level system.

    • Tackling the start-up of a reinforcement learning agent for the control of wastewater treatment plants

      2018, Knowledge-Based Systems
      Citation Excerpt :

      Reinforcement learning (RL) is a machine learning paradigm where the agent learns to do better in its environment by interacting with it [29]. RL has been applied to different domains which include: medical applications such as optimization of anemia treatment [12], control of blood glucose variability [10], robotics [27], operations research (e.g. optimization the pricing policy of a cloud service provider [36] or web services [32]). RL has also been successfully applied to the intelligent control of processes.

    • Dynamically stable walk control of biped humanoid on uneven and inclined terrain

      2018, Neurocomputing
      Citation Excerpt :

      Abdolmaleki et al. [20] augmented the 3D inverted pendulum with a spring model and use policy search to optimize the parameters of the walking engines on Nao robots. Shahbazi et al. [21] introduced a two-stage learning algorithm for Central Pattern Generator (CPG) of Nao robot’s bipedal walking. A biped robot with a primal walk controller is capable of walking on flat surfaces.

    • Online fitted policy iteration based on extreme learning machines

      2016, Knowledge-Based Systems
      Citation Excerpt :

      Reinforcement learning (RL) is a learning paradigm in the field of machine learning for solving decision-making problems where decisions are made in stages [1]. This kind of problems appears in many fields, such as medicine [2,3], automatic control [4,5], artificial intelligence [6,7], or operations research [8,9]. The standard RL setting consists of an agent (or controller) in an environment (or system).

    • A central pattern generator for controlling sequential activation in a neural architecture for sentence processing

      2015, Neurocomputing
      Citation Excerpt :

      Motor control with CPGs is found in organisms ranging from mollusks (e.g., [18,35,44]) to control of locomotion speed in humans [8]. Based on the role of CPGs in motor control, CPG models have been used in controlling the motor behavior of robots (e.g., [45]), ranging from snake-like robots (e.g., [30]) to humanoid robots (e.g., [29,36]). Here we want to investigate a role of CPGs in controlling higher level cognitive processing.

    View all citing articles on Scopus
    View full text