Human–robot interaction based on gesture and movement recognition

https://doi.org/10.1016/j.image.2019.115686Get rights and content

Highlights

  • Both RGB video frames and depth images are used in our model.

  • We design a dynamic gesture segmentation method for video segmentation.

  • The interactive scenarios and modes are designed for experiment and implementation.

Abstract

Human–robot interaction (HRI) has become a research hotspot in computer vision and robotics due to its wide application in human–computer interaction (HCI) domain. Based on the explored algorithms of gesture recognition and limb movement recognition in somatosensory interaction, an HRI model of a robotic arm is proposed for robot arm manipulation. More specifically, 3D SSD architecture is used for the location and identification of gesture and arm movement. Then, DTW template matching algorithm is adopted to trace the dynamic gestures. The interactive scenarios and interactive modes are designed for experiment and implementation. Virtual interactive experimental results have demonstrated the usefulness of our method.

Introduction

The rapid development of robotics and the acceleration of industrialization make robots increasingly the best helper for humans. In recent years, the mobile robot arm (that is, a movable platform and one or several robot arms) has been widely used as an important branch of robots in medical centers, home services, and space exploration. However, for some complicated environments or difficult work, relying on the mobile robot arm alone is difficult to complete, and it is necessary to move the robot arm to cooperate with each other. The researchers proposed a human–computer intelligent fusion method based on human–computer interaction, pointing out that in the human–machine–environment integration system, human intervention and coordination can effectively improve the performance of the system. Therefore, combining the human–computer interaction technology with the mobile robot arm can effectively improve the intelligence level and working ability of the mobile robot arm. The main research goal of human–computer interaction is to realize the natural interaction between human and robot, so that the robot can complete the transaction efficiently. The interaction between humans and robots will open up new horizons for human–machine interfaces and will revolutionize people’s lifestyles and environments. Therefore, how to realize the efficient cooperation between human and mobile robot arm through human–computer interaction technology is the hotspot and difficulty of robot research.

Somatosensory interaction is a kind of new human–computer interactive technology. Through this technology, users can directly use the limbs to interact with the devices or scenes in the way that users can interact with the target objects or content without using any advanced control devices. According to the different ways of body interaction, it can be divided into three main categories: inertial sensing, optical sensing and joint sensing of inertia and optics. The inertial sensing method measures the motion signals of the user through a gravity sensor or an acceleration sensor attached to the user, and then converts the motion signals into control signals to control the interaction object to achieve the purpose of interaction. The advantages of the interaction mode are high accuracy, reliability, sensitivity, and so on, whereas these sensors are either very cumbersome, not user-friendly, or expensive and difficult to widely use. The optical sensing method captures image information by extracting user motion or state information from images taken by optical sensors (i.e., cameras and cameras), then converts these extracted user actions or state information into control signals to manipulate the movement of interactive objects. This interaction has the advantages of natural, intuitive, operability, non-intrusive, etc., but the image is easily contaminated by noises such as illumination, background, motion, etc. The extraction step is more demanding on the algorithm and is also susceptible to occlusion by other users or the user’s own joints.

The most important thing in the interaction between humans and robots is that robots recognize human behavior. The correct perception of human behavior will directly affect the quality and efficiency of human interaction with robots. Cognition of expressions, gestures and language is an important direction for humans to interact with robots, and is the basis for robots to correctly recognize human intentions. The current control interactive method of the robot is mainly by the command, which is sometimes delayed, inflexible, and inconvenient. For some complicated work tasks, precise operations are required to complete, and simple command control methods can perform operations that are pre-defined combinations of basic operations, which cannot accomplish such complex tasks. Moreover, in the command control mode, the operation task needs to be first interpreted as a command mode, and in the execution process, the command is restored to a specific operation, and it is inevitable that there will be no interpretation deviation in the middle. Therefore, in human–computer interaction, how robots correctly recognize human intentions through gesture recognition technology is an important factor to improve the performance of human–computer interaction systems.

In recent years, the release of Microsoft Kinect has brought new opportunities in this field. Kinect devices can collect depth maps in real-time. Compared with traditional color images, depth maps have many advantages. Firstly, the depth map sequence is essentially a four-dimensional space and not sensitive to changes in lighting conditions. Moreover, it can contain more action information and estimate human contours and bones more reliably. In this paper, based on the explored algorithms of gesture recognition and limb recognition in somatosensory interaction, an HCI model of robotic arm based on somatosensory is proposed for robot arm manipulation. The main contributions of this paper are as follows:

(1) In the model, the features of RGB video frames and depth images are extracted by 3D SSD architecture for the location and identification of gesture and arm.

(2) The dynamic gesture segmentation method is designed to determine the start and end of the dynamic gesture through the pause time of the palm. Then, the DTW template matching algorithm is used to identify the dynamic gestures effectively and efficiently.

(3) The interactive scenarios and interactive modes are designed for experiment and implementation. The man–machine interaction scene of the robotic arm is designed. Meanwhile, the simulation of the virtual robot arm through the somatosensory interaction control is realized. The HCI modes include the detection and identification of the static and dynamic gestures of the hands and the limbs.

Section snippets

Related works

Human motion recognition is based on analysis and understanding of human motion, which is considered as interdisciplinary disciplines such as biology, psychology and multimedia. Human motion recognition includes human motion feature extraction and classification recognition.

Methods

We proposed a vision-based HCI architecture for robotic arm by identifying somatosensory motion, shown in Fig. 1. The input of the model is collected by Kinect sensors from human body movement. Then, the model captures the static information by conducting static gesture recognition and arm movement recognition. The static information of recognition is used to make the robotic arm stay in approximate gesture and posture. The ongoing dynamic gesture recognition task is utilized to fine-tune the

Experiments and analysis

In this section, we present the execution details and experimental results for the proposed model.

Conclusion

Human–robot interaction is a key technique in modern HCI systems. In this paper, in order to facilitate the development of human–computer interactive system, we propose a vision-based HCI framework for robotic arm manipulation by recognizing somatosensory motion. Within the model, the somatosensitive interaction modes of dynamic and static gestures and body movements based on SSD and DTW are designed to conduct somatosensitive interaction experiments with the robotic arm. Virtual interactive

Acknowledgments

The authors acknowledge the National Natural Science Foundation of China (Grant: 61873054 and 61503070). Fundamental Research Funds for the Central Universities, China (N170804006)

Xing Li was born in 1982. She received the Dr. Eng.degree in Power electronics and power transmission from Northeast University. She is currently an assistant Professor with the State Key Laboratory of Synthetical Automation for Process Industries, Northeast University, Liaoning, China. A number of research results has gained the support of the National Natural Science Fund and Liaoning Natural Science Fund. She is also the member of the Big Data Committee and Process Control Committee for the

References (24)

  • MalimaA.K. et al.

    A fast algorithm for vision-based hand gesture recognition for robot control

    (2006)
  • Sánchez-NielsenE. et al.

    Hand gesture recognition for human-machine interaction

    (2004)
  • WangC.C. et al.

    Hand posture recognition using adaboost with sift for human robot interaction

  • TangM.

    Recognizing Hand Gestures with Microsoft’s Kinect

    (2011)
  • RameyA. et al.

    Integration of a low-cost RGB-D sensor in a social robot for gesture recognition

  • Van den BerghM. et al.

    Real-time 3D hand gesture interaction with a robot for understanding directions from humans

  • BurgerB. et al.

    Multimodal interaction abilities for a robot companion

  • GastJ. et al.

    Real-time framework for multimodal human–robot interaction

  • KahnP.H. et al.

    Design patterns for sociality in human–robot interaction

  • ChenQ. et al.

    Hand gesture recognition using haar-like features and a stochastic context-free grammar

    IEEE Trans. Instrum. Meas.

    (2008)
  • RautarayS.S. et al.

    A novel human computer interface based on hand gesture recognition using computer vision techniques

  • ThangaliA. et al.

    Exploiting phonological constraints for handshape inference in ASL video

  • Cited by (0)

    Xing Li was born in 1982. She received the Dr. Eng.degree in Power electronics and power transmission from Northeast University. She is currently an assistant Professor with the State Key Laboratory of Synthetical Automation for Process Industries, Northeast University, Liaoning, China. A number of research results has gained the support of the National Natural Science Fund and Liaoning Natural Science Fund. She is also the member of the Big Data Committee and Process Control Committee for the Chinese Association of Automation. She is current research interests include adaptive/robust control, motion control, Intelligent robot system, etc.

    No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.image.2019.115686.

    View full text