research-article

Latent Support Vector Machine Modeling for Sign Language Recognition with Kinect

Authors:
Chao Sun

Institute of Automation, Chinese Academy of Sciences, Beijing, China

Institute of Automation, Chinese Academy of Sciences, Beijing, China
View Profile

,
Tianzhu Zhang

Institute of Automation, Chinese Academy of Sciences, Beijing, China

Institute of Automation, Chinese Academy of Sciences, Beijing, China
View Profile

,
Changsheng Xu

Institute of Automation, Chinese Academy of Sciences, Beijing, China

Institute of Automation, Chinese Academy of Sciences, Beijing, China
View Profile

ACM Transactions on Intelligent Systems and Technology Volume 6 Issue 2Article No.: 20pp 1–20https://doi.org/10.1145/2629481

Published:31 March 2015Publication History

ACM Transactions on Intelligent Systems and Technology

Abstract

Vision-based sign language recognition has attracted more and more interest from researchers in the computer vision field. In this article, we propose a novel algorithm to model and recognize sign language performed in front of a Microsoft Kinect sensor. Under the assumption that some frames are expected to be both discriminative and representative in a sign language video, we first assign a binary latent variable to each frame in training videos for indicating its discriminative capability, then develop a latent support vector machine model to classify the signs, as well as localize the discriminative and representative frames in each video. In addition, we utilize the depth map together with the color image captured by the Kinect sensor to obtain a more effective and accurate feature to enhance the recognition accuracy. To evaluate our approach, we conducted experiments on both word-level sign language and sentence-level sign language. An American Sign Language dataset including approximately 2,000 word-level sign language phrases and 2,000 sentence-level sign language phrases was collected using the Kinect sensor, and each phrase contains color, depth, and skeleton information. Experiments on our dataset demonstrate the effectiveness of the proposed method for sign language recognition.

References

Bing-Kun Bao, Guangcan Liu, Changsheng Xu, and Shuicheng Yan. 2012. Inductive robust principal component analysis. IEEE Transactions on Image Processing 21, 8, 3794--3800. Google ScholarDigital Library
Britta Bauer, Hermann Hienz, and Karl-Friedrich Kraiss. 2000. Video-based continuous sign language recognition using statistical methods. In Proceedings of the 15th International Conference on Pattern Recognition, Vol. 2. IEEE, Los Alamitos, CA, 463--466.Google ScholarCross Ref
Helene Brashear, Thad Starner, Paul Lukowicz, and Holger Junker. 2003. Using multiple sensors for mobile sign language recognition. In Proceedings of the 7th IEEE International Symposium on Wearable Computers (ISWC’03). 45. Google ScholarDigital Library
Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine Learning 20, 3, 273--297. Google ScholarDigital Library
Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Computer Society Conference on Vision and Pattern Recognition (CVPR’05), Vol. 1. IEEE, Los Alamitos, CA, 886--893. Google ScholarDigital Library
Trinh-Minh-Tri Do and Thierry Artières. 2009. Large margin training for hidden Markov models with partially observed states. In Proceedings of the 26th Annual International Conference on Machine Learning. ACM, New York, NY, 265--272. Google ScholarDigital Library
Gaolin Fang, Wen Gao, and Debin Zhao. 2003. Large vocabulary sign language recognition based on hierarchical decision trees. In Proceedings of the 5th International Conference on Multimodal Interfaces. ACM, New York, NY, 125--131. Google ScholarDigital Library
S. Sidney Fels and Geoffrey E. Hinton. 1993. Glove-talk: A neural network interface between a data-glove and a speech synthesizer. IEEE Transactions on Neural Networks 4, 1, 2--8. Google ScholarDigital Library
Pedro F. Felzenszwalb, Ross B. Girshick, David McAllester, and Deva Ramanan. 2010. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 9, 1627--1645. Google ScholarDigital Library
Pedro F. Felzenszwalb, David McAllester, and Deva Ramanan. 2008. A discriminatively trained, multiscale, deformable part model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’08). IEEE, Los Alamitos, CA, 1--8.Google ScholarCross Ref
Mohammad Hasanuzzaman, Vuthichai Ampornaramveth, Tao Zhang, Mohammad Al-Amin Bhuiyan, Yoshiaki Shirai, and Haruki Ueno. 2004. Real-time vision-based gesture recognition for human robot interaction. In Proceedings of the IEEE International Conference on Robotics and Biomimetics (ROBIO’04). IEEE, Los Alamitos, CA, 413--418.Google ScholarCross Ref
Jose-Luis Hernandez-Rebollar, Nicholas Kyriakopoulos, and Robert W. Lindeman. 2004. A new instrumented approach for translating American Sign Language into sound and text. In Proceedings of the 6th IEEE International Conference on Automatic Face and Gesture Recognition. IEEE, Los Alamitos, CA, 547--552. Google ScholarDigital Library
Jose-Luis Hernandez-Rebollar, Robert W. Lindeman, and Nicholas Kyriakopoulos. 2002. A multi-class pattern recognition system for practical finger spelling translation. In Proceedings of the 4th IEEE International Conference on Multimodal Interfaces. IEEE, Los Alamitos, CA, 185. Google ScholarDigital Library
Hermann Hienz, Britta Bauer, and Karl-Friedrich Kraiss. 1999. HMM-based continuous sign language recognition using stochastic grammars. In Gesture-Based Communication in Human-Computer Interaction. Lecture Notes in Computer Science, Vol. 1739. Springer, 185--196. Google ScholarDigital Library
Eun-Jung Holden and Robyn Owens. 2001. Visual sign language recognition. In Multi-Image Analysis. Lecture Notes in Computer Science, Vol. 2032. Springer, 270--287. Google ScholarDigital Library
Kazuyuki Imagawa, Shan Lu, and Seiji Igi. 1998. Color-based hands tracking system for sign language recognition. In Proceedings of the 3rd IEEE International Conference on Automatic Face and Gesture Recognition. IEEE, Los Alamitos, CA, 462--467. Google ScholarDigital Library
Timor Kadir, Richard Bowden, Eng-Jon Ong, and Andrew Zisserman. 2004. Minimal training, large lexicon, unconstrained sign language recognition. In Proceedings of the British Machine Vision Conference. 939--948.Google ScholarCross Ref
Mohammed Waleed Kadous. 1996. Machine recognition of Auslan signs using PowerGloves: Towards large-lexicon recognition of sign language. In Proceedings of the Workshop on the Integration of Gesture in Language and Speech. 165--174.Google Scholar
Cem Keskin, Furkan Kirac, Yunus Emre Kara, and Lale Akarun. 2013. Real time hand pose estimation using depth sensors. In Consumer Depth Cameras for Computer Vision. Advances in Computer Vision and Pattern Recognition 2013. Springer, 119--137.Google Scholar
Jong-Sung Kim, Won Jang, and Zeungnam Bien. 1996. A dynamic gesture recognition system for the Korean sign language (KSL). IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 26, 2, 354--359. Google ScholarDigital Library
Tian Lan, Yang Wang, and Greg Mori. 2011. Discriminative figure-centric models for joint action localization and recognition. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’11). IEEE, Los Alamitos, CA, 2003--2010. Google ScholarDigital Library
Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2. IEEE, Los Alamitos, CA, 2169--2178. Google ScholarDigital Library
Runghuei Liang and Ming Ouhyoung. 1998. A real-time continuous gesture recognition system for sign language. In Proceedings of the 3rd IEEE International Conference on Automatic Face and Gesture Recognition. IEEE, Los Alamitos, CA, 558--567. Google ScholarDigital Library
Lingqiao Liu, Lei Wang, and Xinwang Liu. 2011a. In defense of soft-assignment coding. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’11). IEEE, Los Alamitos, CA, 2486--2493. Google ScholarDigital Library
Si Liu, Jiashi Feng, Zheng Song, Tianzhu Zhang, Hanqing Lu, Changsheng Xu, and Shuicheng Yan. 2012a. Hi, magic closet, tell me what to wear&excl; In Proceedings of the 20th ACM International Conference on Multimedia. ACM, New York, NY, 619--628. Google ScholarDigital Library
Si Liu, Hairong Liu, Longin Jan Latecki, Shuicheng Yan, Changsheng Xu, and Hanqing Lu. 2011b. Size adaptive selection of most informative features. In Proceedings of the 25th AAAI Conference on Artificial Intelligence.Google Scholar
Si Liu, Zheng Song, Guangcan Liu, Changsheng Xu, Hanqing Lu, and Shuicheng Yan. 2012b. Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). IEEE, Los Alamitos, CA, 3330--3337. Google ScholarDigital Library
Hideaki Matsuo, Seiji Igi, Shan Lu, Yuji Nagashima, Yuji Takata, and Terutaka Teshima. 1998. The recognition algorithm with non-contact for Japanese Sign Language using morphological analysis. In Gesture and Sign Language in Human-Computer Interaction. Lecture Notes in Computer Science, Vol. 1371. Springer, 273--284. Google ScholarDigital Library
Kouichi Murakami and Hitomi Taguchi. 1991. Gesture recognition using recurrent neural networks. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems: Reaching through Technology. ACM, New York, NY, 237--242. Google ScholarDigital Library
Kevin P. Murphy, Yair Weiss, and Michael I. Jordan. 1999. Loopy belief propagation for approximate inference: An empirical study. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. 467--475. Google ScholarDigital Library
Jakub Segen and Senthil Kumar. 1999. Shadow gestures: 3D hand pose estimation using a single camera. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1. IEEE, Los Alamitos, CA.Google ScholarCross Ref
Josef Sivic and Andrew Zisserman. 2003. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the 9th IEEE International Conference on Computer Vision. IEEE, Los Alamitos, CA, 1470--1477. Google ScholarDigital Library
Thad Starner. 1995. Visual Recognition of American Sign Language Using Hidden Markov Models. Technical Report. Massachusetts Institute of Technology, Cambridge, MA.Google Scholar
Thad Starner, Joshua Weaver, and Alex Pentland. 1998. Real-time American Sign Language recognition using desk and wearable computer based video. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 12, 1371--1375. Google ScholarDigital Library
Chao Sun, Tianzhu Zhang, Bing-Kun Bao, Changsheng Xu, and Tao Mei. 2013. Discriminative exemplar coding for sign language recognition with Kinect. IEEE Transactions on Cybernetics 43, 1418--1428.Google ScholarCross Ref
Christian Vogler and Dimitris Metaxas. 1997. Adapting hidden Markov models for ASL recognition by using three-dimensional computer vision methods. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics: Computational Cybernetics and Simulation, Vol. 1. IEEE, Los Alamitos, CA, 156--161.Google ScholarCross Ref
Christian Vogler and Dimitris Metaxas. 1999. Parallel hidden Markov models for American Sign Language recognition. In Proceedings of the 7th IEEE International Conference on Computer Vision, Vol. 1. IEEE, Los Alamitos, CA, 116--122.Google ScholarCross Ref
Christian Vogler and Dimitris Metaxas. 2001. A framework for recognizing the simultaneous aspects of American Sign Language. Computer Vision and Image Understanding 81, 3, 358--384. Google ScholarDigital Library
Ulrich Von Agris, Jorg Zieren, Ulrich Canzler, Britta Bauer, and Karl-Friedrich Kraiss. 2008. Recent developments in visual sign language recognition. Universal Access in the Information Society 6, 4, 323--362. Google ScholarDigital Library
Ming-Hsuan Yang, Narendra Ahuja, and Mark Tabb. 2002. Extraction of 2D motion trajectories and its application to hand gesture recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 8, 1061--1074. Google ScholarDigital Library
B. Yao and L. Fei-Fei. 2010. Modeling mutual context of object and human pose in human-object interaction activities. In Proceeding of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, Los Alamitos, CA, 17--24.Google Scholar
Chun-Nam John Yu and Thorsten Joachims. 2009. Learning structural SVMs with latent variables. In Proceedings of the 26th Annual International Conference on Machine Learning. ACM, New York, NY, 1169--1176. Google ScholarDigital Library
Zahoor Zafrulla, Helene Brashear, Thad Starner, Harley Hamilton, and Peter Presti. 2011. American Sign Language recognition with the Kinect. In Proceedings of the 13th International Conference on Multimodal Interfaces. ACM, New York, NY, 279--286. Google ScholarDigital Library
Liang-Guo Zhang, Yiqiang Chen, Gaolin Fang, Xilin Chen, and Wen Gao. 2004. A vision-based sign language recognition system using tied-mixture density HMM. In Proceedings of the 6th International Conference on Multimodal Interfaces. ACM, New York, NY, 198--204. Google ScholarDigital Library
Tianzhu Zhang, Bernard Ghanemand Si Liu, Changsheng Xu, and Narendra Ahuja. 2013. Low-rank sparse coding for image classification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’13). IEEE, Los Alamitos, CA. Google ScholarDigital Library
Tianzhu Zhang, Jing Liu, Si Liu, Yi Ouyang, and Hanqing Lu. 2009. Boosted exemplar learning for human action recognition. In Proceedings of the IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops’09). IEEE, Los Alamitos, CA, 538--545.Google Scholar
Tianzhu Zhang, Jing Liu, Si Liu, Changsheng Xu, and Hanqing Lu. 2011. Boosted exemplar learning for action recognition and annotation. IEEE Transactions on Circuits and Systems for Video Technology 21, 7, 853--866. Google ScholarDigital Library
Tianzhu Zhang, Changsheng Xu, Guangyu Zhu, Si Liu, and Hanqing Lu. 2012. A generic framework for video annotation via semi-supervised learning. In IEEE Transactions on Multimedia 14, 4, 1206--1219. Google ScholarDigital Library
Zhengyou Zhang. 2012. Microsoft Kinect sensor and its effect. IEEE Multimedia 19, 2, 4--10. Google ScholarDigital Library

Index Terms

Latent Support Vector Machine Modeling for Sign Language Recognition with Kinect
1. Computing methodologies
  1. Machine learning

Recommendations

Robust sign language recognition by combining manual and non-manual features based on conditional random field and support vector machine

The sign language is composed of two categories of signals: manual signals such as signs and fingerspellings and non-manual ones such as body gestures and facial expressions. This paper proposes a new method for recognizing manual signals and facial ...
Read More
Subunit sign modeling framework for continuous sign language recognition
Abstract
A new framework named three subunit sign modeling is introduced for automatic sign language recognition. This works on continuous video sequences consisting of isolated words, signed sentences under different signer variations and ...
Read More
Automatic Sign Language Analysis: A Survey and the Future beyond Lexical Meaning

Research in automatic analysis of sign language has largely focused on recognizing the lexical (or citation) form of sign gestures as they appear in continuous signing, and developing algorithms that scale well to large vocabularies. However, successful ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Intelligent Systems and Technology Volume 6, Issue 2
Special Section on Visual Understanding with RGB-D Sensors
May 2015
381 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/2753829
Editor:
Huan Liu
Arizona State University
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 March 2015
- Accepted: 1 March 2014
- Revised: 1 January 2014
- Received: 1 July 2013
Published in tist Volume 6, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Kinect sensor
Sign language recognition
latent SVM
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 40
  Total Citations
  View Citations
- 557
  Total Downloads
- Downloads (Last 12 months)16
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Latent Support Vector Machine Modeling for Sign Language Recognition with Kinect

ACM Transactions on Intelligent Systems and Technology

Abstract

References

Cited By

Index Terms

Recommendations

Robust sign language recognition by combining manual and non-manual features based on conditional random field and support vector machine

Subunit sign modeling framework for continuous sign language recognition

Automatic Sign Language Analysis: A Survey and the Future beyond Lexical Meaning