Hybrid learning model and MMSVM classification for on-line visual imitation of a human with 3-D motions
Introduction
There are many researches about robot imitation (or learning) [1], [2], [3], [4], [5], [6], [7]. These are briefly introduced as follows. Since 1994, researches for the arm’s imitation, such as gesture, are proposed. Kuniyoshi et al. used visual recognition and analysis of human action sequences for robot to learn reusable tasks [1]. Recently, mimicking whole human body becomes an important topic. A body mapping model discussed by Alissandrakis et al. [2] overcomes the difficulties that demonstrator and HR possessed for different structures. In addition, Calinon et al. presented a model for robot to learn actions from demonstrator’s kinesthetic [3], and robots can apply trained skills to respond to an unknown situation [4]. Khansari-Zadeh and Billard [5] decomposed the motions into point-to-point for robots to imitate. Two papers of Thobbi and Sheng [6] and Liu et al. [7] used various marks on human body as identifications for robots to imitate. Therefore, it is a good idea to train robots by learning from the motion of demonstrator.
In these researches, the visual methods seem more direct for robots to imitate. Therefore, how to estimate the posture of demonstrator is of paramount importance. Since the demonstrator of this paper is a human, there are several researches about the posture estimation for human body [8], [9], [10], [11], [12]. Hsieh et al. [8] extracted the skeleton feature and the centroid context of human body image for posture estimation. Takahashi et al. [9] and Juang et al. [10] had used silhouette image to determine the feature points of human body. It is concluded that silhouette analysis is the fundamental posture estimation of human body. Full-body imitation of human motions with Kinect and heterogeneous kinematic structure of HR was developed by Nguyen and Lee [11]. A cognitive learning system by neural network for robot recognition and composite action learning was also developed by Chuang et al. [12]. The above-mentioned papers only discussed the concept and verified by some simulations. It seems not enough and incomplete for the development of visual imitation of a humanoid robot. Recently, the human pose estimation and tracking via Markov Chain dynamics and tree structure state space are developed by Zhang et al. [13]. However, it needs a lot of computation time and is not suitable to execute on-line visual imitation. Although the extraction of key-posture frames of a sequence of motion images developed by Hwang and Lin [14] is effective, it is not suitable for on-line visual imitation.
In this paper, the on-line visual imitation of an HR for a human with 3-D motions is developed by our visual imitation principle. Without the requirement of the identified color marks on a human (e.g., [7]), his (or her) 3-D motions are captured by a face to face stereo vision system (SVS). The proposed SVS can obtain the depth image such that the binary human silhouettes with the 3-D coordinates of fifteen main joints (i.e., head, neck, torso, left shoulder, left elbow, left hand, right shoulder, right elbow, right hand, left hip, left knee, left foot, right hip, right knee, and right foot) are achieved for the classified motions of LB and the hybrid learning model based IKs of UB.
Because the active dynamic balance of the HR is not considered, the proposed visual imitation is separated into upper body (UB) and lower body (LB). Based on the stable motion control of an HR (e.g., [11]), the total eleven classified motions (e.g., single and double support phases, four directions of motion) for the LB are classified by the proposed modified multiclass support vector machine (MMSVM) (e.g., [15], [16], [17], [18], [19], [20]) with the developed feature vectors based on the 3-D coordinates of head, left and right feet. In [15], a unifying framework for studying the solution of multiclass categorization problems by reducing them to multiple binary problems is presented. In [16], six features extracted from the Doppler spectrogram are applied to classify seven activities. A model selection method for SVMs with a Gaussian kernel is proposed by Varewyck and Martens [17] to find suitable values for the kernel parameter and the cost parameter with a minimum amount of central processing unit time. In [18], the in-sample approach can be favorably compared to out-of-sample methods, especially in cases where the latter ones provide questionable results. A piecewise linear support vector machine is constructed for a nonlinear classification boundary that can discriminate multi-view and multi-posture human bodies from the backgrounds in a high-dimensional feature space [19]. The method for human region detection and segmentation by constructing a generalized human upper body model is developed by the extraction of foreground connected regions and the minimization of an energy function to evolve the human contour [20]. Each paper applies the SVM classification for the corresponding tasks with their own features.
On the other hand, two pairs of (hand, elbow) are used for the imitation of the UB of a human. To accomplish the corresponding task, the computation time of inverse kinematics (IK) must be short enough. In addition, one-to-many or one-to-none mapping of IK is difficult to deal with (e.g., [21], [22], [23], [24], [25]). To enhance one-to-one mapping of IK, the work space of each arm is partitioned into four subwork spaces. Then a more systematic and effective approach using radial basis function neural network (RBFNN) is proposed to deal with this IK problem of each subwork space. The suggested hybrid learning model includes K-means clustering for the determination of center and number of Gaussian basis function [26]. After that, the recursive least-squares (RLS) algorithm is employed to estimate the unknown weight in RBFNN. In this situation, the coordinates of the pair of (hand, elbow) are the input data for the proposed hybrid learning model to enhance one-to-one mapping of IK and to accomplish more complex imitation. To verify the effectiveness of hybrid learning model based IK, the compared result with ordinary IK is also given.
Finally, the combined motions of LB and UB with suitable interpolation are employed to imitate the 3-D motions of a human. In addition, the performance index for the corresponding visual imitation is designed to evaluate and analyze the effectiveness of the proposed method. The main contributions of this study are as follows: (i) On-line visual imitation of the 3-D motions of a human is constructed by the combined motions of UB and LB. The motions of LB are classified as eleven stable motions; on the other hand, the motions of UB are imitated by that of two pairs of (hand, elbow). (ii) The proposed MMSVM (cf. [27]) and the SVM by the loss-based decoding (e.g., [15]) are compared to confirm the effectiveness of our multiclass classification. (iii) The trained hybrid learning model for multiple subwork spaces are designed for the IKs of two pairs of (hand, elbow) to reduce the complex computation, to improve robust accuracy, to achieve more complex imitation, and to prevent the possible one-to-many and one-to-none mappings. (iv) The performance index for the on-line visual imitation is developed. The integrated experiments of visual imitation are also presented to confirm the effectiveness and practicality of the proposed method.
Section snippets
Experimental setup, estimation of joint coordinate, and task description
Before discussing experimental setup, estimation of joint coordinate, and task description, the mathematical notations (or symbols) are listed and explained in Table 1.
MMSVM for the classified motions of low body
Since the dynamic balance of our HR is not considered, eleven stable motions for LB are designed to represent the possible LB movement of an HR. Through our developed feature vector, they are classified by the proposed MMSVM. It is expected that the suggested system can be applied to different users with large configuration variation. The comparison between the proposed MMSVM and the SVM by the loss-based decoding is employed to confirm its effectiveness.
Hybrid learning model for inverse kinematics
In this section, the inverse kinematics (IK) of HR arm is achieved by the hybrid learning model with the technique “K-means, then recursive least-squares (RLS)”. It is known that IK is an important problem in robotics research. It is difficult to derive (especially for high DOFs), is not always one-to-one, often one-to-many, or sometime one-to-none mapping. In this situation, only the corresponding input–output data (i.e., the mapping between the four joint angles and the 3-D coordinates of
Experimental results and discussions
The experimental results of the on-line visual imitation via the combined motions of LB and UB are presented in this section. The input images are the color images of pixels 320 × 240. Fig. 15 shows the important snapshots of one typical experiment. The corresponding video is also attached with this study (see Appendix A). There are 20 representative frames that belong to specific classified motions presented in Fig. 16, which possesses a total of 3000 frames. For evaluating the effectiveness
Conclusions
The correct detection and estimation of feature points are one of the key factors for the successful visual imitation. Based on the skeleton capture and the estimation of the 3-D coordinates of eight joints (i.e., head, neck, left elbow, left hand, right elbow, right hand, left foot, and right foot), the task of real-time visual imitation is achieved by the integrated motions of LB and UB. Because no active dynamic balance is considered, the visual imitation of a human with 3-D motions is
Acknowledgment
The authors thank the financial support from the project of NSC-102-2221-E-011-110-MY2 of Taiwan, R.O.C.
Chih-Lyang Hwang received the B.E. degree in Aeronautical Engineering from Tamkang University, Taiwan, R.O.C., in 1981, the M.E. and Ph.D. degrees in Mechanical Engineering from Tatung Institute of Technology, Taiwan, R.O.C., in 1986 and 1990, respectively. From 1990 to 2006, he had been with the Department of Mechanical Engineering of Tatung Institute of Technology (or Tatung University), where he was engaged in teaching and research in the area of servo control and control of manufacturing
References (31)
- et al.
Learning by watching: extracting reusable task knowledge from visual observation of human performance
IEEE Trans. Robot. Autom.
(1994) - et al.
Correspondence mapping induced state and action metrics for robotic imitation
IEEE Trans. Syst. Man Cybern. Part B
(2007) - et al.
On learning, representing, and generalizing a task in a humanoid robot
IEEE Trans. Syst. Man Cybern. Part B
(2007) - et al.
Task- specific generalization of discrete and periodic dynamic movement primitives
IEEE Trans. Robot.
(2010) - et al.
Learning stable nonlinear dynamical systems with Gaussian mixture models
IEEE Trans. Robot.
(2011) - A. Thobbi, W. Sheng, Imitation learning of hand gestures and its evaluation for humanoid robots, in: IEEE Int. Conf. on...
- et al.
Image recognition and force measurement application in the humanoid robot imitation
IEEE Trans. Instrum. Meas.
(2012) - et al.
Video-based human movement analysis and its application to surveillance systems
IEEE Trans. Multimedia
(2008) - K. Takahashi, T. Sakaguchi, J. Ohya, Remarks on a real-time 3D human body posture estimation method using trinocular...
- et al.
Computer vision-based human body segmentation and posture estimation
IEEE Trans. Syst. Man Cybern. Part A
(2009)
Human pose estimation and tracking via parsing a tree structure based human model
IEEE Trans. Syst. Man Cybern. Syst.
Reducing multiclass to binary: a unifying approach for margin classifiers
J. Mach. Learn. Res.
Cited by (8)
Comparative study of orthogonal moments for human postures recognition
2023, Engineering Applications of Artificial IntelligenceApplication and Visualization of Human 3D Anatomy Teaching for Healthy People Based on a Hybrid Network Model
2022, Journal of Healthcare EngineeringInfo-retrieval with relevance feedback using hybrid learning scheme for RS image
2019, Proceedings - 2019 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2019RGB-D face recognition using LBP with suitable feature dimension of depth image
2019, IET Cyber-Physical Systems: Theory and ApplicationsReal-time pose imitation by mid-size humanoid robot with servo-cradle-head RGB-D vision system
2019, IEEE Transactions on Systems, Man, and Cybernetics: SystemsExperimental validation of a car-like automated guided vehicle with trajectory tracking, obstacle avoidance, and target approach
2017, Proceedings IECON 2017 - 43rd Annual Conference of the IEEE Industrial Electronics Society
Chih-Lyang Hwang received the B.E. degree in Aeronautical Engineering from Tamkang University, Taiwan, R.O.C., in 1981, the M.E. and Ph.D. degrees in Mechanical Engineering from Tatung Institute of Technology, Taiwan, R.O.C., in 1986 and 1990, respectively. From 1990 to 2006, he had been with the Department of Mechanical Engineering of Tatung Institute of Technology (or Tatung University), where he was engaged in teaching and research in the area of servo control and control of manufacturing systems and robotic systems. He was a Professor of Mechanical Engineering of Tatung Institute of Technology (or Tatung University) from 1996 to 2006. During 1998–1999, he was a research scholar of George W. Woodruf School of Mechanical Engineering of Georgia Institute of Technology, USA. Since 2006, he was a Professor of the Department of Electrical Engineering of Tamkang University, Taiwan, R.O.C. After 2011, he is a professor of the Department of Electrical Engineering of National Taiwan University of Science and Technology. He is the author or coauthor of about 105 journal and conference papers in the related field. His current research interests include navigation of mobile robot, fuzzy (or neural-network) modeling and control, variable structure control, robotics, visual tracking system, network-based control, and distributed sensor-network. He has received a number of awards, including the Excellent Research Paper Award from the National Science Council of Taiwan and Hsieh-Chih Industry Renaissance Association of Tatung Company. From 2003 to 2006, he was one of the Committee in Automation Technology Program of National Science Council of Taiwan, R.O.C. He was a Member of Technical Committee of the 28th Annual Conference of the IEEE Industrial Electronics Society. During 31 August–12 September 2008, he was also a member of the Visiting Program of Automation, Machinery & Solid Mechanics from National Science Council.
Bo-Lin Chen was born in Taipei, Taiwan, 1985. He received the B.S. degree from the Department of Electrical Engineering, Chinese Culture University, Taipei, Taiwan, in 2007, and the M.S. degree in Department of Electrical Engineering, Tamkang University, Taipei, Taiwan, in 2013. He is currently pursuing his Ph.D. in the Department of Electrical Engineering at the National Taiwan University of Science and Technology. His research interests include humanoid robotics, neural-network and image processing.
Hsing-Hao Huang was born in Taipei, Taiwan, 1989. He received the B.S. degree from the Department of Electrical Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan, in 2013. He is currently pursuing his M.S. in Electrical Engineering at the National Taiwan University of Science and Technology. His research interests include fuzzy system and image processing.
Huei-Ting Syu was born in Kaohsiung, Taiwan, 1990. She received the B.S. degree in Electrical Engineering from the National United University, Miaoli, Taiwan, in 2012. She is currently pursuing her M.S. in Electrical Engineering at the National Taiwan University of Science and Technology, Taipei, Taiwan. Her research interests include intelligent robots and visual imitation.