skip to main content
research-article

Latent Support Vector Machine Modeling for Sign Language Recognition with Kinect

Authors Info & Claims
Published:31 March 2015Publication History
Skip Abstract Section

Abstract

Vision-based sign language recognition has attracted more and more interest from researchers in the computer vision field. In this article, we propose a novel algorithm to model and recognize sign language performed in front of a Microsoft Kinect sensor. Under the assumption that some frames are expected to be both discriminative and representative in a sign language video, we first assign a binary latent variable to each frame in training videos for indicating its discriminative capability, then develop a latent support vector machine model to classify the signs, as well as localize the discriminative and representative frames in each video. In addition, we utilize the depth map together with the color image captured by the Kinect sensor to obtain a more effective and accurate feature to enhance the recognition accuracy. To evaluate our approach, we conducted experiments on both word-level sign language and sentence-level sign language. An American Sign Language dataset including approximately 2,000 word-level sign language phrases and 2,000 sentence-level sign language phrases was collected using the Kinect sensor, and each phrase contains color, depth, and skeleton information. Experiments on our dataset demonstrate the effectiveness of the proposed method for sign language recognition.

References

  1. Bing-Kun Bao, Guangcan Liu, Changsheng Xu, and Shuicheng Yan. 2012. Inductive robust principal component analysis. IEEE Transactions on Image Processing 21, 8, 3794--3800. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Britta Bauer, Hermann Hienz, and Karl-Friedrich Kraiss. 2000. Video-based continuous sign language recognition using statistical methods. In Proceedings of the 15th International Conference on Pattern Recognition, Vol. 2. IEEE, Los Alamitos, CA, 463--466.Google ScholarGoogle ScholarCross RefCross Ref
  3. Helene Brashear, Thad Starner, Paul Lukowicz, and Holger Junker. 2003. Using multiple sensors for mobile sign language recognition. In Proceedings of the 7th IEEE International Symposium on Wearable Computers (ISWC’03). 45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine Learning 20, 3, 273--297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Computer Society Conference on Vision and Pattern Recognition (CVPR’05), Vol. 1. IEEE, Los Alamitos, CA, 886--893. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Trinh-Minh-Tri Do and Thierry Artières. 2009. Large margin training for hidden Markov models with partially observed states. In Proceedings of the 26th Annual International Conference on Machine Learning. ACM, New York, NY, 265--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Gaolin Fang, Wen Gao, and Debin Zhao. 2003. Large vocabulary sign language recognition based on hierarchical decision trees. In Proceedings of the 5th International Conference on Multimodal Interfaces. ACM, New York, NY, 125--131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Sidney Fels and Geoffrey E. Hinton. 1993. Glove-talk: A neural network interface between a data-glove and a speech synthesizer. IEEE Transactions on Neural Networks 4, 1, 2--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Pedro F. Felzenszwalb, Ross B. Girshick, David McAllester, and Deva Ramanan. 2010. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 9, 1627--1645. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Pedro F. Felzenszwalb, David McAllester, and Deva Ramanan. 2008. A discriminatively trained, multiscale, deformable part model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’08). IEEE, Los Alamitos, CA, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  11. Mohammad Hasanuzzaman, Vuthichai Ampornaramveth, Tao Zhang, Mohammad Al-Amin Bhuiyan, Yoshiaki Shirai, and Haruki Ueno. 2004. Real-time vision-based gesture recognition for human robot interaction. In Proceedings of the IEEE International Conference on Robotics and Biomimetics (ROBIO’04). IEEE, Los Alamitos, CA, 413--418.Google ScholarGoogle ScholarCross RefCross Ref
  12. Jose-Luis Hernandez-Rebollar, Nicholas Kyriakopoulos, and Robert W. Lindeman. 2004. A new instrumented approach for translating American Sign Language into sound and text. In Proceedings of the 6th IEEE International Conference on Automatic Face and Gesture Recognition. IEEE, Los Alamitos, CA, 547--552. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jose-Luis Hernandez-Rebollar, Robert W. Lindeman, and Nicholas Kyriakopoulos. 2002. A multi-class pattern recognition system for practical finger spelling translation. In Proceedings of the 4th IEEE International Conference on Multimodal Interfaces. IEEE, Los Alamitos, CA, 185. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Hermann Hienz, Britta Bauer, and Karl-Friedrich Kraiss. 1999. HMM-based continuous sign language recognition using stochastic grammars. In Gesture-Based Communication in Human-Computer Interaction. Lecture Notes in Computer Science, Vol. 1739. Springer, 185--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Eun-Jung Holden and Robyn Owens. 2001. Visual sign language recognition. In Multi-Image Analysis. Lecture Notes in Computer Science, Vol. 2032. Springer, 270--287. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kazuyuki Imagawa, Shan Lu, and Seiji Igi. 1998. Color-based hands tracking system for sign language recognition. In Proceedings of the 3rd IEEE International Conference on Automatic Face and Gesture Recognition. IEEE, Los Alamitos, CA, 462--467. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Timor Kadir, Richard Bowden, Eng-Jon Ong, and Andrew Zisserman. 2004. Minimal training, large lexicon, unconstrained sign language recognition. In Proceedings of the British Machine Vision Conference. 939--948.Google ScholarGoogle ScholarCross RefCross Ref
  18. Mohammed Waleed Kadous. 1996. Machine recognition of Auslan signs using PowerGloves: Towards large-lexicon recognition of sign language. In Proceedings of the Workshop on the Integration of Gesture in Language and Speech. 165--174.Google ScholarGoogle Scholar
  19. Cem Keskin, Furkan Kirac, Yunus Emre Kara, and Lale Akarun. 2013. Real time hand pose estimation using depth sensors. In Consumer Depth Cameras for Computer Vision. Advances in Computer Vision and Pattern Recognition 2013. Springer, 119--137.Google ScholarGoogle Scholar
  20. Jong-Sung Kim, Won Jang, and Zeungnam Bien. 1996. A dynamic gesture recognition system for the Korean sign language (KSL). IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 26, 2, 354--359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Tian Lan, Yang Wang, and Greg Mori. 2011. Discriminative figure-centric models for joint action localization and recognition. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’11). IEEE, Los Alamitos, CA, 2003--2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2. IEEE, Los Alamitos, CA, 2169--2178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Runghuei Liang and Ming Ouhyoung. 1998. A real-time continuous gesture recognition system for sign language. In Proceedings of the 3rd IEEE International Conference on Automatic Face and Gesture Recognition. IEEE, Los Alamitos, CA, 558--567. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Lingqiao Liu, Lei Wang, and Xinwang Liu. 2011a. In defense of soft-assignment coding. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’11). IEEE, Los Alamitos, CA, 2486--2493. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Si Liu, Jiashi Feng, Zheng Song, Tianzhu Zhang, Hanqing Lu, Changsheng Xu, and Shuicheng Yan. 2012a. Hi, magic closet, tell me what to wear! In Proceedings of the 20th ACM International Conference on Multimedia. ACM, New York, NY, 619--628. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Si Liu, Hairong Liu, Longin Jan Latecki, Shuicheng Yan, Changsheng Xu, and Hanqing Lu. 2011b. Size adaptive selection of most informative features. In Proceedings of the 25th AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  27. Si Liu, Zheng Song, Guangcan Liu, Changsheng Xu, Hanqing Lu, and Shuicheng Yan. 2012b. Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). IEEE, Los Alamitos, CA, 3330--3337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Hideaki Matsuo, Seiji Igi, Shan Lu, Yuji Nagashima, Yuji Takata, and Terutaka Teshima. 1998. The recognition algorithm with non-contact for Japanese Sign Language using morphological analysis. In Gesture and Sign Language in Human-Computer Interaction. Lecture Notes in Computer Science, Vol. 1371. Springer, 273--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Kouichi Murakami and Hitomi Taguchi. 1991. Gesture recognition using recurrent neural networks. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems: Reaching through Technology. ACM, New York, NY, 237--242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Kevin P. Murphy, Yair Weiss, and Michael I. Jordan. 1999. Loopy belief propagation for approximate inference: An empirical study. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. 467--475. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jakub Segen and Senthil Kumar. 1999. Shadow gestures: 3D hand pose estimation using a single camera. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1. IEEE, Los Alamitos, CA.Google ScholarGoogle ScholarCross RefCross Ref
  32. Josef Sivic and Andrew Zisserman. 2003. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the 9th IEEE International Conference on Computer Vision. IEEE, Los Alamitos, CA, 1470--1477. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Thad Starner. 1995. Visual Recognition of American Sign Language Using Hidden Markov Models. Technical Report. Massachusetts Institute of Technology, Cambridge, MA.Google ScholarGoogle Scholar
  34. Thad Starner, Joshua Weaver, and Alex Pentland. 1998. Real-time American Sign Language recognition using desk and wearable computer based video. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 12, 1371--1375. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Chao Sun, Tianzhu Zhang, Bing-Kun Bao, Changsheng Xu, and Tao Mei. 2013. Discriminative exemplar coding for sign language recognition with Kinect. IEEE Transactions on Cybernetics 43, 1418--1428.Google ScholarGoogle ScholarCross RefCross Ref
  36. Christian Vogler and Dimitris Metaxas. 1997. Adapting hidden Markov models for ASL recognition by using three-dimensional computer vision methods. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics: Computational Cybernetics and Simulation, Vol. 1. IEEE, Los Alamitos, CA, 156--161.Google ScholarGoogle ScholarCross RefCross Ref
  37. Christian Vogler and Dimitris Metaxas. 1999. Parallel hidden Markov models for American Sign Language recognition. In Proceedings of the 7th IEEE International Conference on Computer Vision, Vol. 1. IEEE, Los Alamitos, CA, 116--122.Google ScholarGoogle ScholarCross RefCross Ref
  38. Christian Vogler and Dimitris Metaxas. 2001. A framework for recognizing the simultaneous aspects of American Sign Language. Computer Vision and Image Understanding 81, 3, 358--384. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Ulrich Von Agris, Jorg Zieren, Ulrich Canzler, Britta Bauer, and Karl-Friedrich Kraiss. 2008. Recent developments in visual sign language recognition. Universal Access in the Information Society 6, 4, 323--362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Ming-Hsuan Yang, Narendra Ahuja, and Mark Tabb. 2002. Extraction of 2D motion trajectories and its application to hand gesture recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 8, 1061--1074. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. B. Yao and L. Fei-Fei. 2010. Modeling mutual context of object and human pose in human-object interaction activities. In Proceeding of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, Los Alamitos, CA, 17--24.Google ScholarGoogle Scholar
  42. Chun-Nam John Yu and Thorsten Joachims. 2009. Learning structural SVMs with latent variables. In Proceedings of the 26th Annual International Conference on Machine Learning. ACM, New York, NY, 1169--1176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Zahoor Zafrulla, Helene Brashear, Thad Starner, Harley Hamilton, and Peter Presti. 2011. American Sign Language recognition with the Kinect. In Proceedings of the 13th International Conference on Multimodal Interfaces. ACM, New York, NY, 279--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Liang-Guo Zhang, Yiqiang Chen, Gaolin Fang, Xilin Chen, and Wen Gao. 2004. A vision-based sign language recognition system using tied-mixture density HMM. In Proceedings of the 6th International Conference on Multimodal Interfaces. ACM, New York, NY, 198--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Tianzhu Zhang, Bernard Ghanemand Si Liu, Changsheng Xu, and Narendra Ahuja. 2013. Low-rank sparse coding for image classification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’13). IEEE, Los Alamitos, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Tianzhu Zhang, Jing Liu, Si Liu, Yi Ouyang, and Hanqing Lu. 2009. Boosted exemplar learning for human action recognition. In Proceedings of the IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops’09). IEEE, Los Alamitos, CA, 538--545.Google ScholarGoogle Scholar
  47. Tianzhu Zhang, Jing Liu, Si Liu, Changsheng Xu, and Hanqing Lu. 2011. Boosted exemplar learning for action recognition and annotation. IEEE Transactions on Circuits and Systems for Video Technology 21, 7, 853--866. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Tianzhu Zhang, Changsheng Xu, Guangyu Zhu, Si Liu, and Hanqing Lu. 2012. A generic framework for video annotation via semi-supervised learning. In IEEE Transactions on Multimedia 14, 4, 1206--1219. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Zhengyou Zhang. 2012. Microsoft Kinect sensor and its effect. IEEE Multimedia 19, 2, 4--10. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Latent Support Vector Machine Modeling for Sign Language Recognition with Kinect

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Intelligent Systems and Technology
      ACM Transactions on Intelligent Systems and Technology  Volume 6, Issue 2
      Special Section on Visual Understanding with RGB-D Sensors
      May 2015
      381 pages
      ISSN:2157-6904
      EISSN:2157-6912
      DOI:10.1145/2753829
      • Editor:
      • Huan Liu
      Issue’s Table of Contents

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 31 March 2015
      • Accepted: 1 March 2014
      • Revised: 1 January 2014
      • Received: 1 July 2013
      Published in tist Volume 6, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader