Robot teaching by teleoperation based on visual interaction and extreme learning machine

doi:10.1016/j.neucom.2017.10.034

Neurocomputing

Volume 275, 31 January 2018, Pages 2093-2103

https://doi.org/10.1016/j.neucom.2017.10.034 Get rights and content

Abstract

Compared with traditional robot teaching methods, robots can learn various human-like skills in a more efficient and natural manner through teleoperation. In this paper, we propose a teleoperation method based on human-robot interaction (HRI), which mainly uses visual information. With only one teleoperation, the robot can reproduce a trajectory. There is a certain error between this trajectory and the optimal trajectory due to the cause of the human demonstrator or the robot. So we use an extreme learning machine (ELM) based algorithm to transfer the demonstrator’s motions to the robot. To verify the method, we use a Microsoft KinectV2 to capture the human body motion and the hand state, according to which a Baxter robot in Virtual Robot Experimentation Platform (V-REP) will be controlled by the command. Through learning and training by the ELM, the robot in V-REP can complete a certain task autonomously and the robot in reality can reproduce this trajectory well. The experimental results show that the developed method has achieved satisfactory performance.

Introduction

With the recent rapid advances in robotics, the application of robots in industries has been extended to various fields. Through a teaching by demonstration (TbD) method, a robot can perform a task which is different from the previous one in a new working environment [1], [2]. Traditionally, only after the professionals spend a lot of time for programming by keyboards or joysticks [3], the industrial robots can learn fixed skills on the assembly line. Apparently this approach is usually time consuming and not flexible to adapt to modern manufacturing. While a robot can be directly programmed by learning human-like manipulation skills from a skilful demonstrator through teleoperation. Therefore, this method can enable the robot to adapt to different tasks or environment efficiently.

Teleoperation based on HRI has been recently attracted much attention due to the advantages described above [4]. In [5], a TbD method is presented, enhanced by transferring the stiffness profile during HRI. In their work, muscle surface electromyography (sEMG) is collected and processed to extract the demonstrator’s variable stiffness and hand grasping patterns. In [6], various hand gestures are recognized through the proposed HRI method based on hand guided demonstration. In [7], a teleoperation-based robot programming method is proposed. To verify the method, they develop a master-slave teleoperation system and an exoskeleton device is used as the HRI device.

There are many techniques or devices that are applied to HRI for enhanced performance. Generally, visual interaction has been one of the most widely utilized techniques [8]. Because visual interaction based on body motions tracking is comparative easy to implement, most of them are applied to capture human motion [9].

While in the TbD method for robot, neural networks have been widely applied. In [10], a TbD method for building an adaptive control system is presented. And the robot will improve its work performance by repeated a task with the help of the neural network. In [11], a neural learning scheme which can be used in estimating stable dynamical systems is presented. The result shows that the method is able to evaluate systems accurately. Slightly less than the robot teaching based on neural learning is too complex to cost much time.

In this paper, we put forward a robot teaching method which uses a virtual teleoperation system based on visual interaction, and uses a neural learning method based on ELM. More specifically, in our work, Microsoft second generation of motion capture device, which is called Kinect V2, is used to track human body motion and the hand state. A simulation experiment has been conducted based on the V-REP platform where the Baxter robot is guided to learn the demonstrator’s motion skills. And a learning algorithm based on ELM [12], which can teach the robot to learn some skills from human, is developed. Compared with robot teaching based on other neural network, this ELM requires less training samples, and has a high generalization capacity [13].

The rest of the paper is structured as follows: In Section 2, we introduce Kinect sensor, V-REP and its remote API. In Section 3, we present system included a virtual teleoperation system and a learning and training system. In Section 4, the design methodology about the system is introduced. In the Section 5, a space vector approach, a data processing method and a TbD method based ELM is presented. Finally, the experimental results are revealed in Section 6 followed by the conclusion given in Section 7.

Section snippets

Kinect sensor

The second-generation Kinect for Windows is used in our work. Kinect has an RGB color camera, an IR emitter, and a depth sensor which is composed of an IR camera and an IR projector. With these devices above, Kinect Sensor provides full-body 3D motion capture, facial recognition and other capabilities [14]. Compared with the Kinect V1, Kinect V2 allows us to track up down 25 body joints [15], included the fists and thumbs. Because of such an advantage, the Kinect V2 can recognize the hand

The architecture of the system

As is shown in Fig. 4, the system we design contains a virtual teleoperation system and a training and learning system. In the first stage, a human demonstrator controls the Baxter in V-REP by Kinect. In the second stage, a neural network will be used to train and learn the data, which is recorded in the first stage. And then the output data is sent to the Baxter, to make it complete the previous task.

The virtual teleoperation system, which is the simulation of the real one, can verify the

Acquiring information from Kinect

Kinect skeletal tracking is not affected by ambient lighting because of the infrared information. 3D depth images can be captured by the Kinect due to the mechanism of binocular vision [26].

There are three steps for the Kinect to capture the demonstrator’s body information: At first, Kinect adopts the method of image segmentation to distinguish the human body from the complex background. Then Kinect finds the object in the image that is more likely to be human and evaluates depth of field image

Space vector approach

The key of controlling the Baxter by Kinect is how to calculate the human joint angle. Kinect is able to get the 3D Cartesian coordinates of the joints of a human body. In a 3D space, the distance between two points A(x₁, y₁, z₁) and B(x₂, y₂, z₂) can be calculated by the following equation: $d = \sqrt{{(x_{2} - x_{1})}^{2} + {(y_{2} - y_{1})}^{2} + {(z_{2} - z_{1})}^{2}}$ And the Vector $\vec{A B}$ can be expressed as $\vec{A B} = (x_{2} - x_{1}, y_{2} - y_{1}, z_{2} - z_{1}), d = | \vec{A B} |$ And in 3D space, the law of Cosines can calculate the angle between two joints. A joint in Kinect

The Effectiveness of the virtual teleoperation system

In order to verify the validity of the proposed method, we build a virtual teleoperation platform which mainly consists of Kinect and V-REP. At first we verify the effectiveness of controlling the robot by human body motions. Four motions are designed to verify that the Baxter robot arms can be moved flexibly in the virtual space.

As shown in Fig. 11, the first two motions show that the Baxter robot arms can swing up and down controlled by the human demonstrator. And the rest motions show that

Conclusion

In this paper, we have developed a virtual teleoperation system based on visual interaction. The human body motion is used to control the robot’s arms, and gesture are used to control the beginning and end of the simulation. In addition, through a TbD method based on ELM, the system can transfer the human motions to the robot. We use Kinect to acquire the body skeleton data and hand states. Then we use V-REP, to build a Baxter robot and its work environment. To verify the effectiveness of the

Acknowledgement

This work was partially supported by National Nature Science Foundation (NSFC) under Grant 61473120, Guangdong Provincial Natural Science Foundation 2014A030313266 and International Science and Technology Collaboration Grant 2015A050502017, Science and Technology Planning Project of Guangzhou 201607010006, State Key Laboratory of Robotics and System (HIT) Grant SKLRS-2017-KF-13, and the Fundamental Research Funds for the Central Universities 2017ZD057.

Yang Xu received the B.Eng. degree in automation from the South China University of Technology, Guangzhou, China, in 2016, and is currently pursuing the M.S. degree in the South China University of Technology, Guangzhou, China. His research interests include human-robot interaction, robot imitation learning, and machine learning.

References (27)

S. Calinon et al.
Human-robot skills transfer interfaces for a flexible surgical robot.
Comput. Methods Programs Biomed.
(2014)
HuangG.-B. et al.
Extreme learning machine: theory and applications
Neurocomputing
(2006)
XuX. et al.
The validity of the first and second generation Microsoft kinect for identifying joint center locations during static postures
Appl. Ergon.
(2015)
HuangG. et al.
Trends in extreme learning machines: a review
Neural Netw. Off. J. Int. Neural Netw. Soc.
(2015)
L. Peternel et al.
Teaching robots to cooperate with humans in dynamic manipulation tasks based on multi-modal human-in-the-loop approach
Auton. Robots
(2014)
J.J. LaViola et al.
Natural user interfaces for adjustable autonomy in robot control
IEEE Comput. Graph. Appl.
(2015)
T.B. Sheridan
Human-robot interaction: status and challenges.
Hum. Factors: J. Hum. Factors Ergon. Soc.
(2016)
YangC. et al.
Teaching by demonstration on dual-arm robot using variable stiffness transferring
Proceedings of IEEE International Conference on Robotics and Biomimetics (ROBIO)
(2015)
ZhaoL. et al.
Intuitive robot teaching by hand guided demonstration
Proceedings of 2016 IEEE International Conference on Mechatronics and Automation (ICMA)
(2016)
LeeH. et al.
A robot teaching framework for a redundant dual arm manipulator with teleoperation from exoskeleton motion data
Proceedings of IEEE-RAS International Conference on Humanoid Robots
(2014)

C Yang et al.

Human-Robot Interaction Interface[M]

Adv. Technol. Mod. Rob. Appl.

(2016)

C.D. Mutto et al.

Time-of-Flight Cameras and Microsoft Kinect (TM)

(2012)

LiuS. et al.

Teaching and learning of deburring robots using neural networks

Proceedings of IEEE International Conference on Robotics and Automation, 1993

(1993)

Cited by (44)

State of charge estimation techniques of Li-ion battery of electric vehicles
2023, e-Prime - Advances in Electrical Engineering, Electronics and Energy
The Lithium-ion batteries are widely utilized in the electric car, bus, and two-wheeler industries because of their high energy density, low cost, extended lifespan, high power density, and stable voltage. One of the essential systems that must be present in any electric vehicle (EV) is the battery management system (BMS). One major input to BMS is state of charge to ensure the battery's durability, safety, and reliable operation. The state-of-charge (SoC) estimation of EV batteries plays a crucial role in optimizing their performance and extending their lifespan. As batteries are nonlinear and time-variant devices, estimating the state of charge or instantaneous remaining charge within a battery is a particularly challenging task. This paper covers a deep understanding of SoC estimation techniques for BMS. The two main approaches to explaining estimation of instantaneous remaining charge are model-based which relies on various battery models and their mathematical equations to explain the battery characteristics. The second approach is data-driven which studies large measured battery data sets to understand the behavior of running algorithms. Model-based approaches are based on series-parallel combinations of resistance and capacitance electrical circuits, while data-driven approaches are based on neural networks and machine learning algorithms. The review highlights the strengths and limitations of each technique, suggesting that hybrid approaches could yield more robust results. It emphasizes the importance of future research in integrating multiple information sources and developing standard evaluation procedures to enhance SoC estimation accuracy and its practical application in EVs.
An incremental cross-modal transfer learning method for gesture interaction
2022, Robotics and Autonomous Systems
Citation Excerpt :
The interactive control for industrial robots, thanks to the high precision of the novel contact-free sensors, such as LeapMotion and Kinect, there have been increasing tele-operating robotic applications. Particularly, in the case of direct human–robot cooperation or collaboration, the gesture-based user interface is more straightforward and safe (e.g. [29,30]). Using multi-modal signals such as the speech commands, hand gestures as well as body position provides a complementary way to order the robot, thus the users do not have to explicitly tell the instructions [31].
Gesture can be used as an important way for human–robot interaction, since it is able to give accurate and intuitive instructions to the robots. Various sensors can be used to capture gestures. We apply three different sensors that can provide different modalities in recognizing human gestures. Such data also owns its own statistical properties for the purpose of transfer learning: they own the same labeled data, but both the source and the validation data-sets have their own statistical distributions. To tackle the transfer learning problem across different sensors with such kind of data-sets, we propose a weighting method to adjust the probability distributions of the data, which results in a more faster convergence result. We further apply this method in a broad learning system, which has proven to be efficient to learn with the incremental learning capability. The results show that although these three sensors measure different parts of the body using different technologies, transfer learning is able to find out the weighting correlation among the data-sets. It also suggests that using the proposed transfer learning is able to adjust the data which has different distributions which may be similar to the physical correlation between different parts of the body in the context of giving gestures.
Self-supervised learning of monocular depth using quantized networks
2022, Neurocomputing
Citation Excerpt :
Absolute depth estimation from a single image is attractive because it has numerous applications in different fields such as autonomous driving [1], robotics [2] and augmented reality [3].
Learning monocular depth in a self-supervised manner is desirable for numerous applications ranging from autonomous driving, robotics to augmented reality. However, the current challenges lie in the problems of scale ambiguity, dynamic scene and hardware limitations. To this end, a self-supervised approach is proposed in this work for monocular depth learning and estimation. Specifically, we first introduce a self-supervised depth learning framework that learns a part of the camera intrinsics and the stereo extrinsics, which ensures the absoluteness of the predicted depth while achieving enhanced performance (i.e., lower error between the predicted depth and the ground-truth) on depth estimation. Besides, we further improve the accuracy and the efficiency (i.e., shorter inference time and lower weight/activation footprints) via a specially-designed network that exploits multi-scale context across multi-level feature maps. In addition, we propose a quantization scheme for our depth estimation networks. The scheme allows the network inference to be carried out using INT4-INT8 arithmetic while keeping a high performance on depth estimation. Extensive experiments on KITTI and Make3D datasets demonstrate that our approach substantially boosts the performance compared to the existing state-of-the-art methods on monocular depth estimation.
A combined machine learning algorithms and DEA method for measuring and predicting the efficiency of Chinese manufacturing listed companies
2021, Journal of Management Science and Engineering
Citation Excerpt :
After decades of continuous development (Turing, 1950; Rosenblatt, 1958; Werbos, 1981; Schapire, 1990; Cortes et al., 1995), nowadays ML is a well-known method that “using algorithms to parse data, learn from it, and then make decisions or predictions about something unknown in the world”. ML has been widely used in many applications, such as data mining (Kavakiotis et al., 2017), computer vision (Brunetti et al., 2018), biometric recognition (Chen et al., 2016), stock market analysis (Lee et al., 2019) and robotic applications (Xu et al., 2018), and so on. Generally, the key to ML is using algorithms to parse data, learn from it, and then make decisions or predictions about something unknown.
Data Envelopment Analysis (DEA) is a linear programming methodology for measuring the efficiency of Decision Making Units (DMUs) to improve organizational performance in the private and public sectors. However, if a new DMU needs to be known its efficiency score, the DEA analysis would have to be re-conducted, especially nowadays, datasets from many fields have been growing rapidly in the real world, which will need a huge amount of computation. Following the previous studies, this paper aims to establish a linkage between the DEA method and machine learning (ML) algorithms, and proposes an alternative way that combines DEA with ML (ML-DEA) algorithms to measure and predict the DEA efficiency of DMUs. Four ML-DEA algorithms are discussed, namely DEA-CCR model combined with back-propagation neural network (BPNN-DEA), with genetic algorithm (GA) integrated with back-propagation neural network (GANN-DEA), with support vector machines (SVM-DEA), and with improved support vector machines (ISVM-DEA), respectively. To illustrate the applicability of above models, the performance of Chinese manufacturing listed companies in 2016 is measured, predicted and compared with the DEA efficiency scores obtained by the DEA-CCR model. The empirical results show that the average accuracy of the predicted efficiency of DMUs is about 94%, and the comprehensive performance order of four ML-DEA algorithms ranked from good to poor is GANN-DEA, BPNN-DEA, ISVM-DEA, and SVM-DEA.
Broad learning extreme learning machine for forecasting and eliminating tremors in teleoperation[Formula presented]
2021, Applied Soft Computing
Citation Excerpt :
Difficult missions such as stitching in minimally invasive surgery and high-precision processing in industry can be performed by controlling robots. With the rapid development and application of teleoperation robots, the most critical teleoperation control system has attracted extensive attention of researchers [4,5]. Recently, various filtering models have been proposed for estimating and eliminating the tremor signals.
Unwanted errors caused by hand tremors are a bottleneck for the application of teleoperation robots in space explorations, underwater explorations, and minimally invasive surgery. In order to eliminate hand tremor signals in teleoperation control systems, two tremor-filtering models based on artificial neural networks are defined. With the purpose of decreasing the errors of tremor filtering models, a novel Broad Learning Extreme Learning Machine with Improved Equilibrium Optimizer (IEO-BLELM) is proposed. Firstly, the structure of Extreme Learning Machine (ELM) is re-designed by coupling with broad learning. Time series and smoothing are introduced as feature extraction layer and enhancement layer, respectively. Secondly, different activation functions are selected to construct Broad Learning ELM (BLELM). An Improved Equilibrium Optimizer is introduced to optimize input weights, thresholds, and parameters of the BLELM model. To verify the performance of the IEO-BLELM model, the proposed model is applied to six examples and compared with other models. The results show that Mean Absolute Error (MAE) of the proposed model in six examples is at least lower than 0.253. As compared with the ELM, the MAE of the IEO-BLELM model can be decreased by 51.03% through reasonable improvement strategies. In particular, estimation errors are mainly contributed to peak and the proposed model significantly reduces the peak errors. The forecasting performance of the proposed model is better than that of previously existing models. In general, this study provides effective models to eliminate hand tremor signals in teleoperation control systems.
Robot recognizing humans intention and interacting with humans based on a multi-task model combining ST-GCN-LSTM model and YOLO model
2021, Neurocomputing
Citation Excerpt :
Specifically, the robots are required to track the human’s motion, identify the context of interaction, and predict how the human would behave subsequently to accomplish a certain task [3–6]. In fact, human intention recognition [7,8] has many important practical applications such as robots learning various human-like skills through teleoperation [9] based on human-robot interaction, or human impedance adaptive skill transfer in a physical human-robot interaction system [10]. In this paper, we mainly develop a human intention recognition system for HRI.
It is hoped that the robot could interact with the human when the robots help us in our daily lives. And understanding humans’ specific intention is the first crucial task for human-robot interaction. In this paper, we firstly develop a multi-task model for recognizing humans’ intention, which is composed of two sub-tasks: human action recognition and hand-held object identification. For the front subtask, an effective ST-GCN-LSTM model is proposed by fusing the Spatial Temporal Graph Convolutional Networks and Long Short Term Memory Networks. And for the second subtask, the YOLO v3 model is adopted for the hand-held object identification. Then, we build a framework for robot interacting with the human. Finally, these proposed models and the interacting framework are verified on several datasets and the testing results show the effectiveness of the proposed models and the framework.

View all citing articles on Scopus

Chenguang Yang received the B.Eng. degree in measurement and control from Northwestern Polytechnical University, Xi’an, China, in 2005, and the Ph.D. degree in control engineering from the National University of Singapore, Singapore, in 2010. He received postdoctoral training at Imperial College London, UK. He is the recipient of the Best Paper Award from the IEEE Transactions on Robotics and a number of international conferences. His research interests lie in robotics, automation and computational intelligence.

Junpei Zhong is currently a research scientist at Artificial Intelligence Research Center of National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan. He received the B.Eng degree from South China University of Technology in 2006, M.Phil from the Hong Kong Polytechnic University in 2010 and doctoral degree (with “magna cum laude”) from University of Hamburg in 2015. He has been awarded the Marie-Curie fellowship for his doctoral study from 2010 to 2013. From 2014 to 2016, he has participated in different EU and Japanese funded projects at University of Hertfordshire, Plymouth University and Waseda University before joining AIST. His research interests are machine learning, computational intelligence and cognitive robotics.

Ning Wang received the B.Eng. degree in measurement and control technologies and devices from the College of Automation, Northwestern Polytechnical University, Xi’an, China, in 2005, the M.Phil. and Ph.D degree in electronic engineering from the Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong, China, in 2007 and 2011, respectively. She was working as Post-doc fellow at the Department of Computer Science & Engineering, The Chinese University of Hong Kong from 2011 to 2013, and was research fellow at the School of Computing, Electronics and Mathematics, Plymouth University, United Kingdom from 2014 to 2015. Her research interests lie in signal processing and machine learning, with applications in robust speaker recognition, biomedical pattern recognition, intelligent data analysis, and human-robot interaction.

Lijun Zhao received the bachelor degree from Beijing Institute of Technology in 1996, Beijing, China, master and Ph.D. degrees from Harbin Institute of Technology(HIT), Harbin, Heilongjiang, China, in 2002 and 2009, respectively, all in Mechatronics Engineering. He is current the supervisor of masters with State Key Laboratory of Robotics and Systems, Robotics Institute, Harbin Institute of Technology, China. His research interests include mobile robot 3D environment mapping, perception, navigation and planning in robotics.

View full text

Robot teaching by teleoperation based on visual interaction and extreme learning machine

Abstract

Introduction

Section snippets

Kinect sensor

The architecture of the system

Acquiring information from Kinect

Space vector approach

The Effectiveness of the virtual teleoperation system

Conclusion

Acknowledgement

Comput. Methods Programs Biomed.

Neurocomputing

Appl. Ergon.

Neural Netw. Off. J. Int. Neural Netw. Soc.

Teaching robots to cooperate with humans in dynamic manipulation tasks based on multi-modal human-in-the-loop approach

Auton. Robots

Natural user interfaces for adjustable autonomy in robot control

IEEE Comput. Graph. Appl.

Human-robot interaction: status and challenges.

Hum. Factors: J. Hum. Factors Ergon. Soc.

Teaching by demonstration on dual-arm robot using variable stiffness transferring

Proceedings of IEEE International Conference on Robotics and Biomimetics (ROBIO)

Intuitive robot teaching by hand guided demonstration

Proceedings of 2016 IEEE International Conference on Mechatronics and Automation (ICMA)

A robot teaching framework for a redundant dual arm manipulator with teleoperation from exoskeleton motion data

Proceedings of IEEE-RAS International Conference on Humanoid Robots

Human-Robot Interaction Interface[M]

Adv. Technol. Mod. Rob. Appl.

Time-of-Flight Cameras and Microsoft Kinect (TM)

Teaching and learning of deburring robots using neural networks

Proceedings of IEEE International Conference on Robotics and Automation, 1993