Elsevier

Neurocomputing

Volume 366, 13 November 2019, Pages 35-45
Neurocomputing

Sequential learning unification controller from human demonstrations for robotic compliant manipulation

https://doi.org/10.1016/j.neucom.2019.07.081Get rights and content

Abstract

Robotic compliant manipulation not only contains robot motion but also embodies interaction with the environment. Frequently endowing the compliant manipulation skills to the robot by manual programming or off-line training is complicated and time-consuming. In this paper, we propose a sequential learning framework to take both kinematic profile and variable impedance parameter profile into consideration to model a unified control strategy with “motion generation” and “compliant control”. In order to acquire this unification controller efficiently, we use a sequential learning neural network to encode robot motion and a new force-based variable impedance learning algorithm to estimate varying damping and stiffness profiles in three directions. Furthermore, the state-independent stability constraints for variable impedance control are presented. The effectiveness of the proposed learning framework is validated by a set of experiments using the 4-DoF Barrett WAM.

Introduction

When a robot will be used in a new task, a very time-consuming programming work by the professional technicians is needed to endow the robot with a new skill [1]. This traditional way for preprogramming robots is not suitable for the multi-product and small batch manufacturing lines. To endow robots with the learning ability instead of being explicitly programmed in every possible situation is a most effective way to reduce the threshold to start using robot technology and increase implementation efficiency. Robot learning from demonstration (RLfD) [2] is a well-known methodology to extract the task-relevant information and the new sensorimotor knowledge by observing the correct demonstrations. Then a learned model acquired from the data of demonstration can be used for autonomous task execution by the robot [3].

Trad itionally, RLfD is used to model the kinematic profiles of the tasks then to produce robot motion. Many excellent methods have been proposed, such as the dynamic movement primitives (DMPs) [4], [5], the stable estimator of dynamical systems (SEDS) approach [6], [7], the fast and stable modeling for dynamical systems [8]. However, two problems have emerged in practical application: Firstly, these methods collected and modeled the robot motion (kinematic profile) in an off-line way. These processes are time-consuming. And even longer than the manual programming, so the advantages of the RLfD with off-line way is not ideal in the multipurpose production line (new manipulation skills are needed frequently). Secondly, these methods are usually used to learn the robot motion. But a stiff motion controller is not enough to implement compliant behaviors [9]. The successful executions of compliant tasks not only require the motion generation (the kinematic profiles of the tasks) but also need the interaction with the environment, see Fig. 1. In this graph, the red line represents a portion of the path and the belted-ellipsoid illustrates the robot stiffness at the contact point in Cartesian space between the end-effector and the computer screen. Motion generation creates a path in task implementation (from starting point to reach the final state). This path is named kinematic profile of the task in this paper. The term “interaction with the environment” refers to regulation of the robot’s dynamic behavior at its contact points/surfaces (for instance making the robot compliant or stiff when necessary). Both of these skills are essential for safe and successful execution of many robotic tasks. Cleaning the computer screen with a pure motion generation is challenging because slight uncertainty could result in a excessive or inadequate force, and eventually lead to breakage or unresolved dirt on the screen. Besides, the typical tasks always involve contact or require an appropriate response to unforeseen physical perturbations. The question of how to regulate the compliance to accommodate the requirements of the task have not been fully tackled in RLfD [10]. There are several ways to make the robots compliant: contact detection using sensors (artificial tactile skin or force/ torque sensor) [11], [12], passively compliant by design (reducing the weight and hardness of the robot structure or implementing elastic elements) [13], [14] and active torque control strategies [15], [16], [17]. The last way is seen as the good alternative to the formers, because of its initiative and precision [18]. Therefore, we apply the suitable impedance strategies to active control the torque of the robot in this paper. Impedance control can regulate the dynamic responses of the robot to interaction forces by establishing an appropriate virtual mass-spring-damper system on the end-effector [15]. Similar to the human limb impedance adjusting, parameters of the impedance control model are regulated over time through learning based on certain criterias [19]. This learning policy has been explored in many previous studies [20], [21], [22], [23].

In [24], [25], learning variable impedance control strategy was formulated as an optimal control problem. In these works, the optimal control formulations were proposed to compute both the variable stiffness profiles and motion reference adaptation. However, in these studies, the impedance profile was tailored to each specific robotic platform, which lost the generality.

Reinforcement learning (RL) has also been applied for learning variable impedance policies [21], [26], [27], which is closely related to the optimal control, and successfully applied in via-point trajectory following and pancakes flipping. Even though RL can be applied to compute the impedance of the robot to improve the task performance, it usually needs a large number of iterations to find an optimal policy, which is not preferred in the multi-product and small batch manufacturing lines.

In recent years, the research about seeking suitable impedance control policies for compliant behaviors is going toward learning and imitation of impedance strategies of humans [9], [10], [28]. The learning of the teacher’s arm impedance is not easy. Depending on the differences of task and the teaching interface, the corresponding learning methods are also different.

In [29], the concept of teleimpedance was proposed to make the human impedance regulation skills applied for robot interacting with uncertain environments. In this case, a suitable human-machine interface was used to provide the trajectory and stiffness references. Stiffness reference was computed using the electromyogram signals (EMG) recorded from the arm of the operator. However, as it is a real-time teleoperate robot by a user, it did not learn the task for autonomous execution. Recently, [30], [31] not only used the surface EMG to estimate demonstrator limb stiffness but also used the DMP method to encode both movement trajectories and 1-DoFs stiffness profile (estimated from EMG) to achieve variable impedance skill transfer and generalization.

Calinon et al. [28] derived variable stiffness for an active compliant controller from the demonstrations of the kinematic profiles. The stiffness profile was shaped inversely proportional to the variance along the trajectory [9]. This approach depends upon the assumption that a large position variety in demonstration will not have a negative impact on the task execution which may not always be reasonable [10], especially in the narrow task space (such as Minimally Invasive Surgery). In [32], the trajectory of the task was taught using kinesthetic teaching firstly. When the kinesthetic profile of the task was learned, the teacher demonstrated the forces using a haptic device while the robot was executing the learned motion. Then, the model describing the desired contact forces was built, and used by the robot to determine the desired force during the task execution. The teaching process was divided into two successive steps (kinesthetic teaching first and then haptic teaching), and also used the off-line learning method to encode position and force profiles.

In addition, none of these works addressed the issue of execution stability of the learned variable impedance model [33]. For the variable impedance control, stability must be guaranteed for all the possible range of variation of the impedance parameters. This issue has been recently addressed in [34], wherein a variable impedance control policy was proposed, by designing a tank-based admittance control strategy and the impedance parameters all can be passively changed. The main idea of this method is to evaluate the power balance of the robot and it ensures that the amount of energy pumped into the system is always less than the dissipated energy, and thus the system remains passive. This approach analyzed the stability for the cases that the reference trajectory is fixed throughout the motion [35].

According to the current research progress and present problems, the main contents in this paper are as follows:

  • (1)

    Firstly, we emphasize that the kinematic profile (the trajectory of task) is very important for completing tasks in some applications. Especially teaching large position deviation [9], [10] in a narrow task space is not feasible, such as Minimally Invasive Surgery. Therefore, we propose a learning framework based on a simple teaching interface, so as the stiffness and damping profile can be learned from demonstrations without destroying the desired kinematic profiles (It is unnecessary to intentionally make a deviation from the teaching trajectory). That also means there is no need to provide kinematic and compliant teaching separately, the kinematic and 3-DoFs impedance profiles can be learned simultaneously from demonstrations.

  • (2)

    Owing to an increasing number of robots are working on multiproduct and small-batch production lines or providing a wide range of services to meet people’s daily needs, robots need to master new skills frequently and rapidly, the learning speed of algorithms is an important evaluation metric for learning skills from demonstrations. Therefore an effective sequential learning method is developed in this work, which sequentially exploits the new input data of kinematic profiles and interactive forces to learn the control model in demonstration phase. Then the learned controller is used to reproducing the robotic compliant manipulation. After the demonstration phase (learning phase), the learned controller is fixed until the next demonstration start. The proposed learning method does not require the time-consuming off-line training process, and thus the robot learning system becomes high efficiency and easy using. In addition, the learning process does not need a large number of samples, a control policy can be learned from one or more demonstrations.

  • (3)

    A force-based variable impedance learning process along the x, y and z axes is developed. More specifically, the compliances along the three directions change independently which makes robot more flexible than the previous works using uniform compliance changing in three directions.

  • (4)

    Furthermore, the stability analysis and the corresponding constraints for the learned impedance parameters are presented. Therefore, the global asymptotic stability of variable impedance control can be guaranteed through the stability considerations in the learning process, which allows the robot to safely operate in the environments where human beings and robot share the workspace.

To validate the proposed learning framework, we test in three different experiments. The first experiment illustrates the capability of the sequential learning algorithm in modeling kinematic profiles while the learned model being able to generalize to unseen initial points and reach the desired position precisely. The second experiment shows the proposed method can learn variable impedance parameters (stiffness and damping value) in selective directions based on the teaching forces. Meanwhile, the stability constraints of variable impedance control are confirmed by the test. The third experiment shows how a unification controller can be learned in a realistic task teaching.

The rest of the paper is organized as follows. Section 2 presents the control framework of this work. The proposed interface for demonstrations and the learning framework of the compliance behaviors are described in Section 3. In Section 4, we develop the global asymptotic stability conditions of the proposed methods. Section 5 presents the experimental evaluations. Finally, the conclusion and future work are presented in Section 6.

Section snippets

Controller and problem formulation

Impedance control is a very effective way of implementing the robotic compliant behavior, which makes a manipulator equivalent to a multidimensional mass-spring-damper system with the desired impedance parameters (inertia, stiffness and damping matrix). The goal of impedance control in the task space is to maintain the relationship between the position error x˜=xdx and the exerted force Fe, the controller is given by:Mx˜¨=FeBqx˜˙Kqx˜. where x and xd R3 represent the current and desired

Learning the unified control strategy from demonstrations

We start by giving an overview of our robotic compliant manipulation learning system. As illustrated in the schematic of Fig. 2, the system is divided into two phase (learning and reproduction). In the first phase (learning), demonstrations of the task are online collected to sequentially learn the motor skills and impedance parameters (variable stiffness and damping). In the reproduction phase, the learned motion generation and compliant skills (variable impedance parameters) are used for

Stability analysis

Section 3 proposed the sequential learning method to acquire the robot motion dynamic system x˙=f(x) and variable impedance parameters from demonstrations. Some researchers found that the variability of the impedance parameters produce the potential energy [34]. The potential energy is injected into the system, and it may destroy the passivity of the system. The loss of passivity severely affects the impedance control: the stable interaction is no more guaranteed, the tracking error x˜ can

Experimental evaluations

We evaluate the performance of the proposed learning framework via three experiments. The first experiment is performed the kinematic profile of the task on the 4-DoF Baxter WAM. From this experiment, we illustrate the capability of the sequential learning algorithm in modeling kinematic profiles. Furthermore, this experiment also shows that the learned model is able to generalize to unseen initial points and reach the desired position precisely. In the second experiment, we show that the

Conclusion and future work

In this paper, we presented a learning framework to model a unification controller for robotic compliant manipulation. This framework employs a sequential learning neural network to encode robot motion, and uses a new force-based variable impedance learning method to estimate varying damping and stiffness matrix in three directions. When the demonstration is over, a unified control strategy has automatically acquired without hand-coding. Furthermore, we derived the stability constraints of the

Declaration of Competing Interest

We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.

Acknowledgement

This research was supported in part by the National Natural Science Foundation of China under Grant U1613210, in part by the Guangdong Special Support Program (2017TX04X265), in part by the Science and Technology Planning Project of Guangdong Province (2019B090915002).

Jianghua Duan is currently pursuing the Ph.D. degree in pattern recognition and intelligent system at University of Chinese Academy of Sciences. His research interest includes robot learning from human demonstrations in an unstructured environment and compliant control for safe human robot interaction.

References (43)

  • B.D. Argall et al.

    A survey of robot learning from demonstration

    Robot. Auto. Syst.

    (2009)
  • K. Neumann et al.

    Learning robot motions with stable dynamical systems under diffeomorphic transformations

    Robot. Auton. Syst.

    (2015)
  • A. Billard et al.

    Springer Handbook of Robotics

    (2016)
  • S.M. Khansari-Zadeh et al.

    Learning to play minigolf: a dynamical system-based approach

    Adv. Robot.

    (2012)
  • A.J. Ijspeert et al.

    Learning attractor landscapes for learning motor primitives

    Proceedings of the Advances in Neural Information Processing System

    (Jan. 2002)
  • R. Wang et al.

    Dynamic movement primitives plus: for enhanced reproduction quality and efficient trajectory modification using truncated kernels and local biases

    Proceedings of the IEEE International Conference on Intelligent Robots and System (IROS)

    (2016)
  • S.M. Khansari-Zadeh et al.

    Learning stable nonlinear dynamical systems with gaussian mixture models

    IEEE Trans. Robot.

    (2011)
  • J. Duan et al.

    Fast and stable learning of dynamical systems based on extreme learning machine

    IEEE Trans. Syst. Man. Cybern. Syst.

    (2019)
  • K. Kronander et al.

    Online learning of varying stiffness through physical human-robot interaction

    Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)

    (2012)
  • K. Kronander et al.

    Learning compliant manipulation through kinesthetic and tactile human-robot interaction

    IEEE Trans. Haptics

    (2014)
  • A. Schmitz et al.

    Methods and technologies for the implementation of large-scale robot tactile sensors

    IEEE Trans. Robot.

    (Jun. 2011)
  • P. Mittendorfer et al.

    Humanoid multimodal tactile-sensing modules

    IEEE Trans. Robot.

    (2011)
  • R.V. Ham et al.

    Compliant actuator designs

    IEEE Robot. Autom. Mag.

    (Sep. 2009)
  • C. Laschi et al.

    Soft robot arm inspired by the octopus

    Adv. Robot.

    (2012)
  • F. Ficuciello et al.

    Variable impedance control of redundant manipulators for intuitive human-robot physical interaction

    IEEE Trans. Robot.

    (2015)
  • J.K. Salisbury

    Active stiffness control of a manipulator in cartesian coordinates

    Proceedings of the IEEE International Conference on Decision and Control

    (1980)
  • W. He et al.

    Adaptive fuzzy neural network control for a constrained robot using impedance learning

    IEEE Trans. Neural Netw. Learn. Syst.

    (2017)
  • M. Deniša et al.

    Learning compliant movement primitives through demonstration and statistical generalization

    IEEE Trans. Mechatron.

    (Oct. 2016)
  • Y. Li et al.

    Impedance learning for robots interacting with unknown environments

    IEEE Trans. Control Syst. Technol.

    (Jul. 2014)
  • H. Sadeghian et al.

    Task-space control of robot manipulators with null-space compliance

    IEEE Trans. Robot.

    (Apr. 2014)
  • J. Buchli et al.

    Learning variable impedance control

    Int. J. Rob. Res.

    (Jun. 2011)
  • Cited by (9)

    • Markovian policy network for efficient robot learning

      2022, Neurocomputing
      Citation Excerpt :

      Even so, one disadvantage is that we have to design policies by hand for each task, for the reason that the policy is task-specific. One of the most direct ways to utilize priors on the policy networks is to initialize policies with demonstrated trajectories Hester, Vecerik, Pietquin, Lanctot, Schaul, Piot, Horgan, Quan, Sendonaris, Osband et al. [19], Nair, Mc-Grew, Andrychowicz, Zaremba and Abbeel [30], Duan, Ou, Xu and Liu [10]. The demonstrations are typically provided by imitation in robotics Osa et al. [31].

    • A Survey of Robot Learning Strategies for Human-Robot Collaboration in Industrial Settings

      2022, Robotics and Computer-Integrated Manufacturing
      Citation Excerpt :

      Impedance control is used to regulate the robot dynamic behaviour as a response to the interaction forces between the robot and its environment. In [163], a unified learning framework was developed for sequential learning neural network for encoding robot motion through demonstration and a novel force-based variable impedance learning for impedance control without hand-coding. By bypassing the need for off-line training, the learning framework increased its efficiency.

    • SADRL: Merging human experience with machine intelligence via supervised assisted deep reinforcement learning

      2022, Neurocomputing
      Citation Excerpt :

      Conclusions are summarized in Section 6, with some future research directions. Generally, demonstration comes from the human or other kinds of "experts" who can provide valuable information [22] for different hard problems, such as robot control [14,23], self-driving [24], etc [25,26]. Imitation learning [27,28] methods can exploit the demonstration to mimic expert actions and behavioral cloning [20], as a kind of imitation learning, has received extensive attention [21,29,30] due to its significant advantages, such as fast learning speed, simplicity, and effective utilization of demonstration, etc.

    • Scalable Motion Planning and Task-Oriented Coordination Scheme for Mobile Manipulators in Smart Manufacturing

      2023, Journal of Intelligent and Robotic Systems: Theory and Applications
    • A Multi-modal Framework for Robots to Learn Manipulation Tasks from Human Demonstrations

      2023, Journal of Intelligent and Robotic Systems: Theory and Applications
    View all citing articles on Scopus

    Jianghua Duan is currently pursuing the Ph.D. degree in pattern recognition and intelligent system at University of Chinese Academy of Sciences. His research interest includes robot learning from human demonstrations in an unstructured environment and compliant control for safe human robot interaction.

    Yongsheng Ou received the Ph.D. degree in Department of Mechanical and Automation from the Chinese University of Hong Kong, Hong Kong, China, in 2004. He was a Postdoctoral Researcher in Department of Mechanical Engineering, Lehigh University, USA, for Five years. He is currently a Professor at Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. He is the author of more than 150 papers in major journals and international conferences, as well as a coauthor of the monograph on Control of Single Wheel Robots (Springer, 2005). He also serves as Associate Editor of IEEE Robotics & Automation Letters. His research interests include control of complex systems, learning human control from demonstrations, and robot navigation.

    Sheng Xu received B.S. degree from Shandong University, Shandong, China, in 2011, the M.S. degree from Beihang University, Beijing, China, in 2014, both in Electrical Engineering, and the Ph.D. degree in Telecommunications Engineering from ITR, University of South Australia, Australia, in 2017. He is currently a postdoctoral fellow with the Center for Intelligent and Biomimetic Systems, Shenzhen Institutes of Advanced Technology (SIAT), Chinese Academy of Sciences (CAS), Shenzhen, China. His main research interests include target tracking, nonlinear estimation, statistical signal processing, servo system control and robot control.

    Ming Liu (S12M12SM18) received the B.A. degree in automation from Tongji University, Shanghai, China, in 2005, and the Ph.D. degree from the Department of Mechanical Engineering and Process Engineering, ETH Zurich, Zurich, Switzerland, in 2013. He was a Visiting Scholar with Erlangen Nurnberg University, Erlangen, Germany, and the Fraunhofer Institute IISB, Erlangen. He is currently an Assistant Professor with the Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong. His current research interests include autonomous mapping, visual navigation, topological mapping, and environment modeling.

    View full text