Elsevier

Neural Networks

Volume 53, May 2014, Pages 52-60
Neural Networks

Assist-as-needed robotic trainer based on reinforcement learning and its application to dart-throwing

https://doi.org/10.1016/j.neunet.2014.01.012Get rights and content

Abstract

This paper proposes a novel robotic trainer for motor skill learning. It is user-adaptive inspired by the assist-as-needed principle well known in the field of physical therapy. Most previous studies in the field of the robotic assistance of motor skill learning have used predetermined desired trajectories, and it has not been examined intensively whether these trajectories were optimal for each user. Furthermore, the guidance hypothesis states that humans tend to rely too much on external assistive feedback, resulting in interference with the internal feedback necessary for motor skill learning. A few studies have proposed a system that adjusts its assistive strength according to the user’s performance in order to prevent the user from relying too much on the robotic assistance. There are, however, problems in these studies, in that a physical model of the user’s motor system is required, which is inherently difficult to construct. In this paper, we propose a framework for a robotic trainer that is user-adaptive and that neither requires a specific desired trajectory nor a physical model of the user’s motor system, and we achieve this using model-free reinforcement learning. We chose dart-throwing as an example motor-learning task as it is one of the simplest throwing tasks, and its performance can easily be and quantitatively measured. Training experiments with novices, aiming at maximizing the score with the darts and minimizing the physical robotic assistance, demonstrate the feasibility and plausibility of the proposed framework.

Introduction

Acquiring expertly skilled movements is generally a difficult task. Moreover, instructing novices in acquiring expertly skillful movements is also inherently difficult, because such movements are generated by unseen muscle (d’Avella and Bizzi, 2005, d’Avella et al., 2003, Murai et al., 2009) and neural activity (Hemami and Dariush, 2012, Ijspeert, 2008, Mylonas et al., 2012, Williamson, 1998).

This paper proposes a novel robotic trainer for motor skill learning. It is user-adaptive inspired by the assist-as-needed principle well known in the field of physical therapy (Cai et al., 2006, Emken and Reinkensmeyer, 2005, Jezernik et al., 2003).

Most previous studies in the field of the robotic assistance of motor skill learning have used predetermined desired trajectories, and it has not been examined intensively whether these trajectories were optimal for each user (Crespo and Reinkensmeyer, 2008, Duschau-Wicke et al., 2008, Emken and Reinkensmeyer, 2005).

Furthermore, the guidance hypothesis states that humans tend to rely too much on external assistive feedback, resulting in interference with the internal feedback necessary for motor skill learning (Schmidt & Wrisberg, 2008).

MIT-Manus (Krebs et al., 2004) and MIME (Lum et al., 2006) pioneered impedance control to rehabilitation robotics, and MIT-Manus (Krebs et al., 2003) include impedance selection based on the user’s performance. The impedance selection, however, has not been automatized, and instead physical therapists do the selection based on their knowledge and experience.

A few studies have proposed a system that adjusts its assistive strength according to the user’s performance in order to prevent the user from relying too much on the robotic assistance (Crespo and Reinkensmeyer, 2008, Emken and Reinkensmeyer, 2005). There are, however, problems in these studies, in that a physical model of the user’s motor system is required, which is inherently difficult to construct.

In this paper, we propose a framework for a robotic trainer that is user-adaptive and that neither requires a specific desired trajectory nor a physical model of the user’s motor system, and we achieve this using model-free reinforcement learning.

We chose dart-throwing as an example motor-learning task as it is one of the simplest throwing tasks, and its performance can be easily and quantitatively measured. Training experiments with novices aimed at maximizing the score with the darts and minimizing the physical robotic assistance. We demonstrate the feasibility and the plausibility of the proposed robotic trainer through experiments by comparing the results of four conditions: (1) without robot; (2) with non-adaptive fixed stiffness robot; (3) with adaptive robot; and (4) with non-adaptive decreasing stiffness robot.

This paper is organized as follows. Section  2 outlines the framework for our assist-as-needed robotic trainer. Section  3 describes how we applied this framework to develop the training system for dart-throwing. Section  4 describes our training experiments to validate the plausibility and the feasibility of the proposed training method. Section  5 presents the experimental results. Section  6 discusses the results while concluding remarks are provided in Section  7.

Section snippets

Assist-as-needed robotic trainer

The key points of the framework are enumerated as follows.

Task-goal oriented. In general, it is not trivial at all to predetermine some desired trajectory for motor skill learning because each person has their own motor control system. Since one of the most important aims of motor skill learning is to accomplish a task, the aim of the robotic trainer should be task-goal oriented, which requires a means of measuring the user’s achievement (performance) on the task.

Assist-as-needed. The guidance

Training system for dart-throwing

In this section, we describe an application of the proposed framework to learning dart-throwing. We chose dart-throwing as our motor learning task because it is one of the simplest throwing tasks. More detailed reasons are as follows. First, throwing darts is usually performed by fixing the body trunk, primarily driven by one of the upper limbs, whose motion is mostly constrained in the sagittal plane. Second, its performance can be easily and quantitatively measured by a numerical score.

Training experiments

To validate the plausibility and feasibility of the proposed training method, we used four experimental conditions: (1) without robot, (2) with non-adaptive and fixed stiffness (NA–FS) robot, (3) with adaptive robot, and (4) with non-adaptive and decreasing stiffness (NA–DS) robot. In conditions (2)–(4), we used the same robot. The robot joints were completely fixed and not controlled in condition (2). Our proposed method was employed in condition (3). We expected that the learning of darts

Results

Fig. 9 shows the mean and standard deviation of the reinforcement learning variables from six subjects in “with adaptive robot” condition over 2 days. Fig. 9(a) shows that the reward increased over the 60 throws. Fig. 9(b) shows that the stiffness of the robot decreased over the 60 throws, according to the acquired rewards in our system. Fig. 9(c) shows that the assistive force fˆk (Eq. (10)), measured at the end-effector of the robot, gradually decreased as the reward increased. This is

Learning results

The results presented in the last section together suggest the plausibility of our adaptive training system. The normalized score was significantly higher than in the “without robot” condition as well as the “with NA–DS robot” condition (Fig. 11), which is congruent with our expectation. Although Fig. 10(a) shows that the subjects of the “with adaptive robot” condition were more untrained than of the other two conditions, this should not be the only reason why we obtained the result shown in

Conclusion

In this article, we have proposed an adaptive robotic training system for dart-throwing novices. How a robot physically assists the novices was determined based on motion comparison between novices and experts. Since the motion comparison revealed that novices had larger shoulder and elbow displacement during throwing compared with experts, a robot was used to give assistive force to the upper limb of the novices. Our approach to assistance was based on the assist-as-needed principle. Namely,

Acknowledgments

This work was supported by Grants-in-Aid for Scientific Research from the Japan Society for the Promotion of Science, Nos. 20300071 and 23240028, and the Mitsui Sumitomo Insurance Welfare Foundation.

References (27)

  • A. Duschau-Wicke et al.

    Patient-cooperative control increases active participation of individuals with SCI during robot-aided gait training

    Journal of NeuroEngineering and Rehabilitation

    (2010)
  • J. Emken et al.

    Robot-enhanced motor learning: accelerating internal model formation during locomotion by transient dynamic amplification

    IEEE Transactions on Neural Systems and Rehabilitation Engineering

    (2005)
  • S. Jezernik et al.

    Adaptive robotic rehabilitation of locomotion: a clinical study in spinally injured individuals

    Spinal Cord

    (2003)
  • Cited by (23)

    • Electromyography-controlled lower extremity exoskeleton to provide wearers flexibility in walking

      2023, Biomedical Signal Processing and Control
      Citation Excerpt :

      Most of these traditional rehabilitation robots for lower limbs are controlled based on predefined and fixed trajectories [6], due to which patients are unable to positively participate in the rehabilitation process. Further, humans often tend to rely on external assistance, which interferes with the internal feedback necessary for motor skill learning [9]. A more effective way to rehabilitate patients is to recognize their intention and utilize the feedback system to improve learning processes.

    • An adaptive deep Q-learning strategy for handwritten digit recognition

      2018, Neural Networks
      Citation Excerpt :

      ADAE can obtain the key features of handwritten digits using the adaptive learning rate and the hierarchical feature-extracting, Essentially, ADAE is a fast dimensionality reduction method, which facilitates the recognition, classification, visualization and storage of high-dimensional data (Hinton & Salakhutdinov, 2006; Jiang & Chung, 2014; Tangkaratt, Morimoto, & Sugiyama, 2016; Zhu, Xu, Shen, & Zhao, 2017). Considering the extracted key features as the current states, Q-learning can give efficient recognition behaviors using the dynamic programming approach, which is inspired by reinforcement learning (Obayashi, Tamei, & Shibata, 2014). This is the first time to combine deep learning and reinforcement learning (DRL) to recognize the handwritten digits, which brings significant breakthroughs in those fields requiring both features-extracting and decisions-making.

    • Learning assistive strategies for exoskeleton robots from user-robot physical interaction

      2017, Pattern Recognition Letters
      Citation Excerpt :

      Based on the above successful studies of complex interaction designs, in this paper, we explore such a data-driven learning approach for designing assistive strategies for exoskeletons from user-robot physical interaction data. A few recent studies have applied learning methods for assistive strategy design on walking-aid robot control [28], robotic training for dart-throwing [29], and exoskeleton walking assistance [30,31]. However, two serious problems have not been sufficiently explored for applying learning methods for physical human-robot interactions.

    • Model-Agnostic Personalized Knowledge Adaptation for Soft Exoskeleton Robot

      2023, IEEE Transactions on Medical Robotics and Bionics
    View all citing articles on Scopus
    View full text