Assist-as-needed robotic trainer based on reinforcement learning and its application to dart-throwing

doi:10.1016/j.neunet.2014.01.012

Neural Networks

Volume 53, May 2014, Pages 52-60

https://doi.org/10.1016/j.neunet.2014.01.012 Get rights and content

Abstract

This paper proposes a novel robotic trainer for motor skill learning. It is user-adaptive inspired by the assist-as-needed principle well known in the field of physical therapy. Most previous studies in the field of the robotic assistance of motor skill learning have used predetermined desired trajectories, and it has not been examined intensively whether these trajectories were optimal for each user. Furthermore, the guidance hypothesis states that humans tend to rely too much on external assistive feedback, resulting in interference with the internal feedback necessary for motor skill learning. A few studies have proposed a system that adjusts its assistive strength according to the user’s performance in order to prevent the user from relying too much on the robotic assistance. There are, however, problems in these studies, in that a physical model of the user’s motor system is required, which is inherently difficult to construct. In this paper, we propose a framework for a robotic trainer that is user-adaptive and that neither requires a specific desired trajectory nor a physical model of the user’s motor system, and we achieve this using model-free reinforcement learning. We chose dart-throwing as an example motor-learning task as it is one of the simplest throwing tasks, and its performance can easily be and quantitatively measured. Training experiments with novices, aiming at maximizing the score with the darts and minimizing the physical robotic assistance, demonstrate the feasibility and plausibility of the proposed framework.

Introduction

Acquiring expertly skilled movements is generally a difficult task. Moreover, instructing novices in acquiring expertly skillful movements is also inherently difficult, because such movements are generated by unseen muscle (d’Avella and Bizzi, 2005, d’Avella et al., 2003, Murai et al., 2009) and neural activity (Hemami and Dariush, 2012, Ijspeert, 2008, Mylonas et al., 2012, Williamson, 1998).

Most previous studies in the field of the robotic assistance of motor skill learning have used predetermined desired trajectories, and it has not been examined intensively whether these trajectories were optimal for each user (Crespo and Reinkensmeyer, 2008, Duschau-Wicke et al., 2008, Emken and Reinkensmeyer, 2005).

Furthermore, the guidance hypothesis states that humans tend to rely too much on external assistive feedback, resulting in interference with the internal feedback necessary for motor skill learning (Schmidt & Wrisberg, 2008).

MIT-Manus (Krebs et al., 2004) and MIME (Lum et al., 2006) pioneered impedance control to rehabilitation robotics, and MIT-Manus (Krebs et al., 2003) include impedance selection based on the user’s performance. The impedance selection, however, has not been automatized, and instead physical therapists do the selection based on their knowledge and experience.

A few studies have proposed a system that adjusts its assistive strength according to the user’s performance in order to prevent the user from relying too much on the robotic assistance (Crespo and Reinkensmeyer, 2008, Emken and Reinkensmeyer, 2005). There are, however, problems in these studies, in that a physical model of the user’s motor system is required, which is inherently difficult to construct.

In this paper, we propose a framework for a robotic trainer that is user-adaptive and that neither requires a specific desired trajectory nor a physical model of the user’s motor system, and we achieve this using model-free reinforcement learning.

We chose dart-throwing as an example motor-learning task as it is one of the simplest throwing tasks, and its performance can be easily and quantitatively measured. Training experiments with novices aimed at maximizing the score with the darts and minimizing the physical robotic assistance. We demonstrate the feasibility and the plausibility of the proposed robotic trainer through experiments by comparing the results of four conditions: (1) without robot; (2) with non-adaptive fixed stiffness robot; (3) with adaptive robot; and (4) with non-adaptive decreasing stiffness robot.

This paper is organized as follows. Section 2 outlines the framework for our assist-as-needed robotic trainer. Section 3 describes how we applied this framework to develop the training system for dart-throwing. Section 4 describes our training experiments to validate the plausibility and the feasibility of the proposed training method. Section 5 presents the experimental results. Section 6 discusses the results while concluding remarks are provided in Section 7.

Section snippets

Assist-as-needed robotic trainer

The key points of the framework are enumerated as follows.

Task-goal oriented. In general, it is not trivial at all to predetermine some desired trajectory for motor skill learning because each person has their own motor control system. Since one of the most important aims of motor skill learning is to accomplish a task, the aim of the robotic trainer should be task-goal oriented, which requires a means of measuring the user’s achievement (performance) on the task.

Assist-as-needed. The guidance

Training system for dart-throwing

In this section, we describe an application of the proposed framework to learning dart-throwing. We chose dart-throwing as our motor learning task because it is one of the simplest throwing tasks. More detailed reasons are as follows. First, throwing darts is usually performed by fixing the body trunk, primarily driven by one of the upper limbs, whose motion is mostly constrained in the sagittal plane. Second, its performance can be easily and quantitatively measured by a numerical score.

Training experiments

To validate the plausibility and feasibility of the proposed training method, we used four experimental conditions: (1) without robot, (2) with non-adaptive and fixed stiffness (NA–FS) robot, (3) with adaptive robot, and (4) with non-adaptive and decreasing stiffness (NA–DS) robot. In conditions (2)–(4), we used the same robot. The robot joints were completely fixed and not controlled in condition (2). Our proposed method was employed in condition (3). We expected that the learning of darts

Results

Fig. 9 shows the mean and standard deviation of the reinforcement learning variables from six subjects in “with adaptive robot” condition over 2 days. Fig. 9(a) shows that the reward increased over the 60 throws. Fig. 9(b) shows that the stiffness of the robot decreased over the 60 throws, according to the acquired rewards in our system. Fig. 9(c) shows that the assistive force ${\hat{f}}_{k}$ (Eq. (10)), measured at the end-effector of the robot, gradually decreased as the reward increased. This is

Learning results

The results presented in the last section together suggest the plausibility of our adaptive training system. The normalized score was significantly higher than in the “without robot” condition as well as the “with NA–DS robot” condition (Fig. 11), which is congruent with our expectation. Although Fig. 10(a) shows that the subjects of the “with adaptive robot” condition were more untrained than of the other two conditions, this should not be the only reason why we obtained the result shown in

Conclusion

In this article, we have proposed an adaptive robotic training system for dart-throwing novices. How a robot physically assists the novices was determined based on motion comparison between novices and experts. Since the motion comparison revealed that novices had larger shoulder and elbow displacement during throwing compared with experts, a robot was used to give assistive force to the upper limb of the novices. Our approach to assistance was based on the assist-as-needed principle. Namely,

Acknowledgments

This work was supported by Grants-in-Aid for Scientific Research from the Japan Society for the Promotion of Science, Nos. 20300071 and 23240028, and the Mitsui Sumitomo Insurance Welfare Foundation.

References (27)

H. Hemami et al.
Central mechanisms for force and motion-towards computational synthesis of human movement
Neural Networks
(2012)
A.J. Ijspeert
Central pattern generators for locomotion control in animals and robots: a review
Neural Networks
(2008)
G. Mylonas et al.
Gaze-contingent motor channelling, haptic constraints and associated cognitive demand for robotic mis
Medical Image Analysis
(2012)
J. Peters et al.
Reinforcement learning of motor skills with policy gradients
Neural Networks
(2008)
M.M. Williamson
Neural control of rhythmic arm movements
Neural Networks
(1998)
L. Cai et al.
Implications of assist-as-needed robotic step training after a complete spinal cord injury on intrinsic strategies of motor learning
The Journal of Neuroscience
(2006)
L. Crespo et al.
Haptic guidance can enhance motor learning of a steering task
Journal of Motor Behavior
(2008)
A. d’Avella et al.
Shared and specific muscle synergies in natural motor behaviors
Proceedings of the National Academy of Sciences of the United States of America
(2005)
A. d’Avella et al.
Combinations of muscle synergies in the construction of a natural motor behavior
Nature Neuroscience
(2003)
Duschau-Wicke, A., Brunsch, T., Lunenburger, L., & Riener, R. (2008). Adaptive support for patient-cooperative gait...

A. Duschau-Wicke et al.

Patient-cooperative control increases active participation of individuals with SCI during robot-aided gait training

Journal of NeuroEngineering and Rehabilitation

(2010)

J. Emken et al.

Robot-enhanced motor learning: accelerating internal model formation during locomotion by transient dynamic amplification

IEEE Transactions on Neural Systems and Rehabilitation Engineering

(2005)

S. Jezernik et al.

Adaptive robotic rehabilitation of locomotion: a clinical study in spinally injured individuals

Spinal Cord

(2003)

Cited by (23)

Electromyography-controlled lower extremity exoskeleton to provide wearers flexibility in walking
2023, Biomedical Signal Processing and Control
Citation Excerpt :
Most of these traditional rehabilitation robots for lower limbs are controlled based on predefined and fixed trajectories [6], due to which patients are unable to positively participate in the rehabilitation process. Further, humans often tend to rely on external assistance, which interferes with the internal feedback necessary for motor skill learning [9]. A more effective way to rehabilitate patients is to recognize their intention and utilize the feedback system to improve learning processes.
In recent years, there have been significant developments in lower extremity robotic exoskeletons intended for gait rehabilitation. However, wearers may not be sufficiently motivated to participate in traditional rehabilitation robot training as the training pattern is usually predefined and rigid. Enabling wearers to actively control the exoskeleton to assist them in walking may improve rehabilitation treatments.
This paper presents an electromyography (EMG)-based gait pattern adaptation method that allows subjects to control the exoskeleton via EMG signals of thigh muscles (quadriceps femoris and hamstrings muscles). Six healthy subjects participated in the initial experiment on a treadmill based lower extremity rehabilitation robot system. In a single walking routine, six widely used adaptation gait patterns were tested.
The results indicate that all of the subjects were able to change the gait pattern of the exoskeleton and achieved the adaptation goals stably within average 16 strides by generating EMG signals. The muscle activation during the adaptation condition is significantly higher than that in fixed normal walking condition ( $p < 0.05$ ). The subjects gave positive evaluation on the designed system.
With this method, the subjects were involved in the control loop and actively participated in the training.
The proposed EMG-based two-layer admittance control algorithm is novel, which enabled subjects to adjust gait trajectories continuously and smoothly.
An adaptive deep Q-learning strategy for handwritten digit recognition
2018, Neural Networks
Citation Excerpt :
ADAE can obtain the key features of handwritten digits using the adaptive learning rate and the hierarchical feature-extracting, Essentially, ADAE is a fast dimensionality reduction method, which facilitates the recognition, classification, visualization and storage of high-dimensional data (Hinton & Salakhutdinov, 2006; Jiang & Chung, 2014; Tangkaratt, Morimoto, & Sugiyama, 2016; Zhu, Xu, Shen, & Zhao, 2017). Considering the extracted key features as the current states, Q-learning can give efficient recognition behaviors using the dynamic programming approach, which is inspired by reinforcement learning (Obayashi, Tamei, & Shibata, 2014). This is the first time to combine deep learning and reinforcement learning (DRL) to recognize the handwritten digits, which brings significant breakthroughs in those fields requiring both features-extracting and decisions-making.
Handwritten digits recognition is a challenging problem in recent years. Although many deep learning-based classification algorithms are studied for handwritten digits recognition, the recognition accuracy and running time still need to be further improved. In this paper, an adaptive deep Q-learning strategy is proposed to improve accuracy and shorten running time for handwritten digit recognition. The adaptive deep Q-learning strategy combines the feature-extracting capability of deep learning and the decision-making of reinforcement learning to form an adaptive Q-learning deep belief network (Q-ADBN). First, Q-ADBN extracts the features of original images using an adaptive deep auto-encoder (ADAE), and the extracted features are considered as the current states of Q-learning algorithm. Second, Q-ADBN receives Q-function (reward signal) during recognition of the current states, and the final handwritten digits recognition is implemented by maximizing the Q-function using Q-learning algorithm. Finally, experimental results from the well-known MNIST dataset show that the proposed Q-ADBN has a superiority to other similar methods in terms of accuracy and running time.
Learning assistive strategies for exoskeleton robots from user-robot physical interaction
2017, Pattern Recognition Letters
Citation Excerpt :
Based on the above successful studies of complex interaction designs, in this paper, we explore such a data-driven learning approach for designing assistive strategies for exoskeletons from user-robot physical interaction data. A few recent studies have applied learning methods for assistive strategy design on walking-aid robot control [28], robotic training for dart-throwing [29], and exoskeleton walking assistance [30,31]. However, two serious problems have not been sufficiently explored for applying learning methods for physical human-robot interactions.
Social demand for exoskeleton robots that physically assist humans has been increasing in various situations due to the demographic trends of aging populations. With exoskeleton robots, an assistive strategy is a key ingredient. Since interactions between users and exoskeleton robots are bidirectional, the assistive strategy design problem is complex and challenging. In this paper, we explore a data-driven learning approach for designing assistive strategies for exoskeletons from user-robot physical interaction. We formulate the learning problem of assistive strategies as a policy search problem and exploit a data-efficient model-based reinforcement learning framework. Instead of explicitly providing the desired trajectories in the cost function, our cost function only considers the user’s muscular effort measured by electromyography signals (EMGs) to learn the assistive strategies. The key underlying assumption is that the user is instructed to perform the task by his/her own intended movements. Since the EMGs are observed when the intended movements are achieved by the user’s own muscle efforts rather than the robot’s assistance, EMGs can be interpreted as the “cost” of the current assistance. We applied our method to a 1-DoF exoskeleton robot and conducted a series of experiments with human subjects. Our experimental results demonstrated that our method learned proper assistive strategies that explicitly considered the bidirectional interactions between a user and a robot with only 60 seconds of interaction. We also showed that our proposed method can cope with changes in both the robot dynamics and movement trajectories.
Uncertainty-Aware Self-Supervised Learning for Cross-Domain Technical Skill Assessment in Robot-Assisted Surgery
2023, IEEE Transactions on Medical Robotics and Bionics
Model-Agnostic Personalized Knowledge Adaptation for Soft Exoskeleton Robot
2023, IEEE Transactions on Medical Robotics and Bionics
AR3n: A Reinforcement Learning-based Assist-As-Needed Controller for Robotic Rehabilitation
2023, arXiv

View all citing articles on Scopus

View full text

Assist-as-needed robotic trainer based on reinforcement learning and its application to dart-throwing

Abstract

Introduction

Section snippets

Assist-as-needed robotic trainer

Training system for dart-throwing

Training experiments

Results

Learning results

Conclusion

Acknowledgments

Neural Networks

Neural Networks

Medical Image Analysis

Neural Networks

Neural Networks

Implications of assist-as-needed robotic step training after a complete spinal cord injury on intrinsic strategies of motor learning

The Journal of Neuroscience

Haptic guidance can enhance motor learning of a steering task

Journal of Motor Behavior

Shared and specific muscle synergies in natural motor behaviors

Proceedings of the National Academy of Sciences of the United States of America

Combinations of muscle synergies in the construction of a natural motor behavior

Nature Neuroscience

Patient-cooperative control increases active participation of individuals with SCI during robot-aided gait training

Journal of NeuroEngineering and Rehabilitation

Robot-enhanced motor learning: accelerating internal model formation during locomotion by transient dynamic amplification

IEEE Transactions on Neural Systems and Rehabilitation Engineering

Adaptive robotic rehabilitation of locomotion: a clinical study in spinally injured individuals

Spinal Cord