ABSTRACT
The purpose of this paper is twofold. First, we introduce our Microsoft Kinect--based video dataset of American Sign Language (ASL) signs designed for body part detection and tracking research. This dataset allows researchers to experiment with using more than 2-dimensional (2D) color video information in gesture recognition projects, as it gives them access to scene depth information. Not only can this make it easier to locate body parts like hands, but without this additional information, two completely different gestures that share a similar 2D trajectory projection can be difficult to distinguish from one another. Second, as an accurate hand locator is a critical element in any automated gesture or sign language recognition tool, this paper assesses the efficacy of one popular open source user skeleton tracker by examining its performance on random signs from the above dataset. We compare the hand positions as determined by the skeleton tracker to ground truth positions, which come from manual hand annotations of each video frame. The purpose of this study is to establish a benchmark for the assessment of more advanced detection and tracking methods that utilize scene depth data. For illustrative purposes, we compare the results of one of the methods previously developed in our lab for detecting a single hand to this benchmark.
- Developer SDK, toolkit & documentation | kinect for windows. http://www.microsoft.com/enus/kinectforwindows/develop/.Google Scholar
- OpenNI SDK | OpenNI. http://www.openni.org/openni-sdk/.Google Scholar
- V. Athitsos, C. Neidle, S. Sclaroff, J. Nash, A. Stefan, and A. Thangali. The American Sign Language Lexicon Video Dataset, June 2008.Google Scholar
- P. Doliotis, A. Stefan, C. McMurrough, D. Eckhard, and V. Athitsos. Comparing gesture recognition accuracy using color and depth information. In Proceedings of the 4th International Conference on PErvasive Technologies Related to Assistive Environments - PETRA '11, page 1, New York, New York, USA, 2011. ACM Press. Google ScholarDigital Library
- I. Guyon, V. Athitsos, P. Jangyodsuk, B. Hamner, and H. Escalante. Chalearn gesture challenge: Design and first results. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on, pages 1--6, 2012.Google ScholarCross Ref
- G. J. Iddan and G. Yahav. G.: 3d imaging in the studio (and elsewhere. In: SPIE, pages 48--55, 2001.Google Scholar
- J. B. Kruskal and M. Liberman. The symmetric time warping algorithm: From continuous to discrete. In Time Warps. Addison-Wesley, 1983.Google Scholar
- H. Lane, R. J. Hoffmeister, and B. Bahan. A Journey into the Deaf-World. DawnSign Press, San Diego, CA, 1996.Google Scholar
- H. Nanda and K. Fujimura. Visual tracking using depth data. In Computer Vision and Pattern Recognition Workshop, 2004. CVPRW '04. Conference on, page 37, june 2004. Google ScholarDigital Library
- J. Schein. At home among strangers. Gallaudet U. Press, Washington, DC, 1989.Google Scholar
- J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake. Real-time human pose recognition in parts from single depth images. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 1297--1304, june 2011. Google ScholarDigital Library
- A. Stefan, H. Wang, and V. Athitsos. Towards automated large vocabulary gesture search. Proceedings of the 2nd International Conference on PErvsive Technologies Related to Assistive Environments - PETRA '09, pages 1--8, 2009. Google ScholarDigital Library
- C. Valli, editor. The Gallaudet Dictionary of American Sign Language. Gallaudet U. Press, Washington, DC, 2006.Google Scholar
- M. Van den Bergh and L. Van Gool. Combining rgb and tof cameras for real-time 3d hand gesture interaction. In Applications of Computer Vision (WACV), 2011 IEEE Workshop on, pages 66--72, jan. 2011. Google ScholarDigital Library
- H. Wang, A. Stefan, S. Moradi, V. Athitsos, C. Neidle, and F. Kamangar. A system for large vocabulary sign search. In Proceedings of the 11th European conference on Trends and Topics in Computer Vision - Volume Part I, ECCV'10, pages 342--353, Berlin, Heidelberg, 2012. Springer-Verlag. Google ScholarDigital Library
Index Terms
- Toward a 3D body part detection video dataset and hand tracking benchmark
Recommendations
An integrated RGB-D system for looking up the meaning of signs
PETRA '15: Proceedings of the 8th ACM International Conference on PErvasive Technologies Related to Assistive EnvironmentsUsers of written languages have the ability to quickly and easily look up the meaning of an unknown word. Those who use sign languages, however, lack this advantage, and it can be a challenge to find the meaning of an unknown sign. While some sign-to-...
An evaluation of RGB-D skeleton tracking for use in large vocabulary complex gesture recognition
PETRA '14: Proceedings of the 7th International Conference on PErvasive Technologies Related to Assistive EnvironmentsAn essential component of any hand gesture recognition system is the hand detector and tracker. While a system with a small vocabulary of sufficiently dissimilar gestures may work well with approximate estimations of hand locations, more accurate hand ...
Hand tracking and gesture recognition system for human-computer interaction using low-cost hardware
Human-Computer Interaction (HCI) exists ubiquitously in our daily lives. It is usually achieved by using a physical controller such as a mouse, keyboard or touch screen. It hinders Natural User Interface (NUI) as there is a strong barrier between the ...
Comments