Elsevier

Robotics and Autonomous Systems

Volume 71, September 2015, Pages 150-165
Robotics and Autonomous Systems

Hybrid learning model and MMSVM classification for on-line visual imitation of a human with 3-D motions

https://doi.org/10.1016/j.robot.2014.12.003Get rights and content

Highlights

  • On-line visual imitation of the 3-D motions of a human.

  • Visual imitation is designed by the combined motions of upper body and lower body.

  • The motions of lower body are classified as eleven stable motions.

  • The motions of upper body are imitated by that of two pairs of (hand, elbow).

  • The trained hybrid learning model are designed to reduce the complex computation, to improve robust accuracy, to obtain more complex imitation.

Abstract

In this paper, the on-line visual imitation of humanoid robot (HR) for human’s 3-D motions is developed. At the beginning, the 3-D motional sequences of a human is captured by a stereo vision system (SVS), which skeleton algorithm can capture and estimate the 3-D coordinates of fifteen main joints. Since the dynamic balance of the HR is not considered, the proposed on-line visual imitation is divided into two parts, lower body (LB) and upper body (UB). Eleven stable motions of LB with the developed feature vector based on the 3-D coordinates of head, left and right feet are classified by the proposed modified multi-class support vector machine (MMSVM). To confirm the effectiveness of MMSVM, it is also compared with the SVM based on error correcting output codes (ECOC). The imitation of UB is based on the inverse kinematics (IK) of two pairs of (hand, elbow). To enhance one-to-one mapping and to reduce the modeling complexity of IK, two arms of UB are partitioned into eight subwork spaces, and each one is approximated by a pre-trained hybrid learning model. The comparisons between hybrid learning model based and ordinary IKs are also made. Combining the classified motion of LB with the operated IK motion of UB accomplishes the task of imitating the 3-D motions of a human. Finally, the corresponding experiments are presented to validate the effectiveness and practicality of the proposed method.

Introduction

There are many researches about robot imitation (or learning)  [1], [2], [3], [4], [5], [6], [7]. These are briefly introduced as follows. Since 1994, researches for the arm’s imitation, such as gesture, are proposed. Kuniyoshi et al. used visual recognition and analysis of human action sequences for robot to learn reusable tasks  [1]. Recently, mimicking whole human body becomes an important topic. A body mapping model discussed by Alissandrakis et al.  [2] overcomes the difficulties that demonstrator and HR possessed for different structures. In addition, Calinon et al. presented a model for robot to learn actions from demonstrator’s kinesthetic  [3], and robots can apply trained skills to respond to an unknown situation  [4]. Khansari-Zadeh and Billard  [5] decomposed the motions into point-to-point for robots to imitate. Two papers of Thobbi and Sheng  [6] and Liu et al.  [7] used various marks on human body as identifications for robots to imitate. Therefore, it is a good idea to train robots by learning from the motion of demonstrator.

In these researches, the visual methods seem more direct for robots to imitate. Therefore, how to estimate the posture of demonstrator is of paramount importance. Since the demonstrator of this paper is a human, there are several researches about the posture estimation for human body  [8], [9], [10], [11], [12]. Hsieh et al.  [8] extracted the skeleton feature and the centroid context of human body image for posture estimation. Takahashi et al.  [9] and Juang et al.  [10] had used silhouette image to determine the feature points of human body. It is concluded that silhouette analysis is the fundamental posture estimation of human body. Full-body imitation of human motions with Kinect and heterogeneous kinematic structure of HR was developed by Nguyen and Lee  [11]. A cognitive learning system by neural network for robot recognition and composite action learning was also developed by Chuang et al.  [12]. The above-mentioned papers only discussed the concept and verified by some simulations. It seems not enough and incomplete for the development of visual imitation of a humanoid robot. Recently, the human pose estimation and tracking via Markov Chain dynamics and tree structure state space are developed by Zhang et al.  [13]. However, it needs a lot of computation time and is not suitable to execute on-line visual imitation. Although the extraction of key-posture frames of a sequence of motion images developed by Hwang and Lin  [14] is effective, it is not suitable for on-line visual imitation.

In this paper, the on-line visual imitation of an HR for a human with 3-D motions is developed by our visual imitation principle. Without the requirement of the identified color marks on a human (e.g.,  [7]), his (or her) 3-D motions are captured by a face to face stereo vision system (SVS). The proposed SVS can obtain the depth image such that the binary human silhouettes with the 3-D coordinates of fifteen main joints (i.e., head, neck, torso, left shoulder, left elbow, left hand, right shoulder, right elbow, right hand, left hip, left knee, left foot, right hip, right knee, and right foot) are achieved for the classified motions of LB and the hybrid learning model based IKs of UB.

Because the active dynamic balance of the HR is not considered, the proposed visual imitation is separated into upper body (UB) and lower body (LB). Based on the stable motion control of an HR (e.g.,  [11]), the total eleven classified motions (e.g., single and double support phases, four directions of motion) for the LB are classified by the proposed modified multiclass support vector machine (MMSVM) (e.g.,  [15], [16], [17], [18], [19], [20]) with the developed feature vectors based on the 3-D coordinates of head, left and right feet. In  [15], a unifying framework for studying the solution of multiclass categorization problems by reducing them to multiple binary problems is presented. In  [16], six features extracted from the Doppler spectrogram are applied to classify seven activities. A model selection method for SVMs with a Gaussian kernel is proposed by Varewyck and Martens  [17] to find suitable values for the kernel parameter and the cost parameter with a minimum amount of central processing unit time. In  [18], the in-sample approach can be favorably compared to out-of-sample methods, especially in cases where the latter ones provide questionable results. A piecewise linear support vector machine is constructed for a nonlinear classification boundary that can discriminate multi-view and multi-posture human bodies from the backgrounds in a high-dimensional feature space  [19]. The method for human region detection and segmentation by constructing a generalized human upper body model is developed by the extraction of foreground connected regions and the minimization of an energy function to evolve the human contour  [20]. Each paper applies the SVM classification for the corresponding tasks with their own features.

On the other hand, two pairs of (hand, elbow) are used for the imitation of the UB of a human. To accomplish the corresponding task, the computation time of inverse kinematics (IK) must be short enough. In addition, one-to-many or one-to-none mapping of IK is difficult to deal with (e.g.,  [21], [22], [23], [24], [25]). To enhance one-to-one mapping of IK, the work space of each arm is partitioned into four subwork spaces. Then a more systematic and effective approach using radial basis function neural network (RBFNN) is proposed to deal with this IK problem of each subwork space. The suggested hybrid learning model includes K-means clustering for the determination of center and number of Gaussian basis function  [26]. After that, the recursive least-squares (RLS) algorithm is employed to estimate the unknown weight in RBFNN. In this situation, the coordinates of the pair of (hand, elbow) are the input data for the proposed hybrid learning model to enhance one-to-one mapping of IK and to accomplish more complex imitation. To verify the effectiveness of hybrid learning model based IK, the compared result with ordinary IK is also given.

Finally, the combined motions of LB and UB with suitable interpolation are employed to imitate the 3-D motions of a human. In addition, the performance index for the corresponding visual imitation is designed to evaluate and analyze the effectiveness of the proposed method. The main contributions of this study are as follows: (i) On-line visual imitation of the 3-D motions of a human is constructed by the combined motions of UB and LB. The motions of LB are classified as eleven stable motions; on the other hand, the motions of UB are imitated by that of two pairs of (hand, elbow). (ii) The proposed MMSVM (cf.  [27]) and the SVM by the loss-based decoding (e.g.,  [15]) are compared to confirm the effectiveness of our multiclass classification. (iii) The trained hybrid learning model for multiple subwork spaces are designed for the IKs of two pairs of (hand, elbow) to reduce the complex computation, to improve robust accuracy, to achieve more complex imitation, and to prevent the possible one-to-many and one-to-none mappings. (iv) The performance index for the on-line visual imitation is developed. The integrated experiments of visual imitation are also presented to confirm the effectiveness and practicality of the proposed method.

Section snippets

Experimental setup, estimation of joint coordinate, and task description

Before discussing experimental setup, estimation of joint coordinate, and task description, the mathematical notations (or symbols) are listed and explained in Table 1.

MMSVM for the classified motions of low body

Since the dynamic balance of our HR is not considered, eleven stable motions for LB are designed to represent the possible LB movement of an HR. Through our developed feature vector, they are classified by the proposed MMSVM. It is expected that the suggested system can be applied to different users with large configuration variation. The comparison between the proposed MMSVM and the SVM by the loss-based decoding is employed to confirm its effectiveness.

Hybrid learning model for inverse kinematics

In this section, the inverse kinematics (IK) of HR arm is achieved by the hybrid learning model with the technique “K-means, then recursive least-squares (RLS)”. It is known that IK is an important problem in robotics research. It is difficult to derive (especially for high DOFs), is not always one-to-one, often one-to-many, or sometime one-to-none mapping. In this situation, only the corresponding input–output data (i.e., the mapping between the four joint angles and the 3-D coordinates of

Experimental results and discussions

The experimental results of the on-line visual imitation via the combined motions of LB and UB are presented in this section. The input images are the color images of pixels 320 × 240. Fig. 15 shows the important snapshots of one typical experiment. The corresponding video is also attached with this study (see Appendix A). There are 20 representative frames that belong to specific classified motions presented in Fig. 16, which possesses a total of 3000 frames. For evaluating the effectiveness

Conclusions

The correct detection and estimation of feature points are one of the key factors for the successful visual imitation. Based on the skeleton capture and the estimation of the 3-D coordinates of eight joints (i.e., head, neck, left elbow, left hand, right elbow, right hand, left foot, and right foot), the task of real-time visual imitation is achieved by the integrated motions of LB and UB. Because no active dynamic balance is considered, the visual imitation of a human with 3-D motions is

Acknowledgment

The authors thank the financial support from the project of NSC-102-2221-E-011-110-MY2 of Taiwan, R.O.C.

Chih-Lyang Hwang received the B.E. degree in Aeronautical Engineering from Tamkang University, Taiwan, R.O.C., in 1981, the M.E. and Ph.D. degrees in Mechanical Engineering from Tatung Institute of Technology, Taiwan, R.O.C., in 1986 and 1990, respectively. From 1990 to 2006, he had been with the Department of Mechanical Engineering of Tatung Institute of Technology (or Tatung University), where he was engaged in teaching and research in the area of servo control and control of manufacturing

References (31)

  • Y. Kuniyoshi et al.

    Learning by watching: extracting reusable task knowledge from visual observation of human performance

    IEEE Trans. Robot. Autom.

    (1994)
  • A. Alissandrakis et al.

    Correspondence mapping induced state and action metrics for robotic imitation

    IEEE Trans. Syst. Man Cybern. Part B

    (2007)
  • S. Calinon et al.

    On learning, representing, and generalizing a task in a humanoid robot

    IEEE Trans. Syst. Man Cybern. Part B

    (2007)
  • A. Ude et al.

    Task- specific generalization of discrete and periodic dynamic movement primitives

    IEEE Trans. Robot.

    (2010)
  • S.M. Khansari-Zadeh et al.

    Learning stable nonlinear dynamical systems with Gaussian mixture models

    IEEE Trans. Robot.

    (2011)
  • A. Thobbi, W. Sheng, Imitation learning of hand gestures and its evaluation for humanoid robots, in: IEEE Int. Conf. on...
  • H.Y. Liu et al.

    Image recognition and force measurement application in the humanoid robot imitation

    IEEE Trans. Instrum. Meas.

    (2012)
  • J.W. Hsieh et al.

    Video-based human movement analysis and its application to surveillance systems

    IEEE Trans. Multimedia

    (2008)
  • K. Takahashi, T. Sakaguchi, J. Ohya, Remarks on a real-time 3D human body posture estimation method using trinocular...
  • C.F. Juang et al.

    Computer vision-based human body segmentation and posture estimation

    IEEE Trans. Syst. Man Cybern. Part A

    (2009)
  • V.V. Nguyen, J.H. Lee, Full-body imitation of human motions with Kinect and heterogeneous kinematic structure of...
  • L.W. Chuang, C.Y. Lin, A. Cangelosi, Learning of composite actions and visual categories via grounded linguistic...
  • X. Zhang et al.

    Human pose estimation and tracking via parsing a tree structure based human model

    IEEE Trans. Syst. Man Cybern. Syst.

    (2014)
  • C.L. Hwang, B.L. Chen, The extraction of key-posture frame of 3D motion of a human, in: IEEE CIVEMSA2013, Milan, Italy,...
  • E.L. Allwein et al.

    Reducing multiclass to binary: a unifying approach for margin classifiers

    J. Mach. Learn. Res.

    (2000)
  • Cited by (8)

    View all citing articles on Scopus

    Chih-Lyang Hwang received the B.E. degree in Aeronautical Engineering from Tamkang University, Taiwan, R.O.C., in 1981, the M.E. and Ph.D. degrees in Mechanical Engineering from Tatung Institute of Technology, Taiwan, R.O.C., in 1986 and 1990, respectively. From 1990 to 2006, he had been with the Department of Mechanical Engineering of Tatung Institute of Technology (or Tatung University), where he was engaged in teaching and research in the area of servo control and control of manufacturing systems and robotic systems. He was a Professor of Mechanical Engineering of Tatung Institute of Technology (or Tatung University) from 1996 to 2006. During 1998–1999, he was a research scholar of George W. Woodruf School of Mechanical Engineering of Georgia Institute of Technology, USA. Since 2006, he was a Professor of the Department of Electrical Engineering of Tamkang University, Taiwan, R.O.C. After 2011, he is a professor of the Department of Electrical Engineering of National Taiwan University of Science and Technology. He is the author or coauthor of about 105 journal and conference papers in the related field. His current research interests include navigation of mobile robot, fuzzy (or neural-network) modeling and control, variable structure control, robotics, visual tracking system, network-based control, and distributed sensor-network. He has received a number of awards, including the Excellent Research Paper Award from the National Science Council of Taiwan and Hsieh-Chih Industry Renaissance Association of Tatung Company. From 2003 to 2006, he was one of the Committee in Automation Technology Program of National Science Council of Taiwan, R.O.C. He was a Member of Technical Committee of the 28th Annual Conference of the IEEE Industrial Electronics Society. During 31 August–12 September 2008, he was also a member of the Visiting Program of Automation, Machinery & Solid Mechanics from National Science Council.

    Bo-Lin Chen was born in Taipei, Taiwan, 1985. He received the B.S. degree from the Department of Electrical Engineering, Chinese Culture University, Taipei, Taiwan, in 2007, and the M.S. degree in Department of Electrical Engineering, Tamkang University, Taipei, Taiwan, in 2013. He is currently pursuing his Ph.D. in the Department of Electrical Engineering at the National Taiwan University of Science and Technology. His research interests include humanoid robotics, neural-network and image processing.

    Hsing-Hao Huang was born in Taipei, Taiwan, 1989. He received the B.S. degree from the Department of Electrical Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan, in 2013. He is currently pursuing his M.S. in Electrical Engineering at the National Taiwan University of Science and Technology. His research interests include fuzzy system and image processing.

    Huei-Ting Syu was born in Kaohsiung, Taiwan, 1990. She received the B.S. degree in Electrical Engineering from the National United University, Miaoli, Taiwan, in 2012. She is currently pursuing her M.S. in Electrical Engineering at the National Taiwan University of Science and Technology, Taipei, Taiwan. Her research interests include intelligent robots and visual imitation.

    View full text