research-article

Toward a 3D body part detection video dataset and hand tracking benchmark

Authors:
Christopher Conly

University of Texas at Arlington, Arlington, TX

University of Texas at Arlington, Arlington, TX
View Profile

,
Paul Doliotis

University of Texas at Arlington, Arlington, TX and Institute of Informatics and Telecommunications, N.C.S.R. "Demokritos", Athens, Greece

University of Texas at Arlington, Arlington, TX and Institute of Informatics and Telecommunications, N.C.S.R. "Demokritos", Athens, Greece
View Profile

,
Pat Jangyodsuk

University of Texas at Arlington, Arlington, TX

University of Texas at Arlington, Arlington, TX
View Profile

,
Rommel Alonzo

University of Texas at Arlington, Arlington, TX

University of Texas at Arlington, Arlington, TX
View Profile

,
Vassilis Athitsos

University of Texas at Arlington, Arlington, TX

University of Texas at Arlington, Arlington, TX
View Profile

PETRA '13: Proceedings of the 6th International Conference on PErvasive Technologies Related to Assistive EnvironmentsMay 2013Article No.: 2Pages 1–6https://doi.org/10.1145/2504335.2504337

Published:29 May 2013Publication History

PETRA '13: Proceedings of the 6th International Conference on PErvasive Technologies Related to Assistive Environments

Pages 1–6

ABSTRACT

The purpose of this paper is twofold. First, we introduce our Microsoft Kinect--based video dataset of American Sign Language (ASL) signs designed for body part detection and tracking research. This dataset allows researchers to experiment with using more than 2-dimensional (2D) color video information in gesture recognition projects, as it gives them access to scene depth information. Not only can this make it easier to locate body parts like hands, but without this additional information, two completely different gestures that share a similar 2D trajectory projection can be difficult to distinguish from one another. Second, as an accurate hand locator is a critical element in any automated gesture or sign language recognition tool, this paper assesses the efficacy of one popular open source user skeleton tracker by examining its performance on random signs from the above dataset. We compare the hand positions as determined by the skeleton tracker to ground truth positions, which come from manual hand annotations of each video frame. The purpose of this study is to establish a benchmark for the assessment of more advanced detection and tracking methods that utilize scene depth data. For illustrative purposes, we compare the results of one of the methods previously developed in our lab for detecting a single hand to this benchmark.

References

Developer SDK, toolkit & documentation | kinect for windows. http://www.microsoft.com/enus/kinectforwindows/develop/.Google Scholar
OpenNI SDK | OpenNI. http://www.openni.org/openni-sdk/.Google Scholar
V. Athitsos, C. Neidle, S. Sclaroff, J. Nash, A. Stefan, and A. Thangali. The American Sign Language Lexicon Video Dataset, June 2008.Google Scholar
P. Doliotis, A. Stefan, C. McMurrough, D. Eckhard, and V. Athitsos. Comparing gesture recognition accuracy using color and depth information. In Proceedings of the 4th International Conference on PErvasive Technologies Related to Assistive Environments - PETRA '11, page 1, New York, New York, USA, 2011. ACM Press. Google ScholarDigital Library
I. Guyon, V. Athitsos, P. Jangyodsuk, B. Hamner, and H. Escalante. Chalearn gesture challenge: Design and first results. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on, pages 1--6, 2012.Google ScholarCross Ref
G. J. Iddan and G. Yahav. G.: 3d imaging in the studio (and elsewhere. In: SPIE, pages 48--55, 2001.Google Scholar
J. B. Kruskal and M. Liberman. The symmetric time warping algorithm: From continuous to discrete. In Time Warps. Addison-Wesley, 1983.Google Scholar
H. Lane, R. J. Hoffmeister, and B. Bahan. A Journey into the Deaf-World. DawnSign Press, San Diego, CA, 1996.Google Scholar
H. Nanda and K. Fujimura. Visual tracking using depth data. In Computer Vision and Pattern Recognition Workshop, 2004. CVPRW '04. Conference on, page 37, june 2004. Google ScholarDigital Library
J. Schein. At home among strangers. Gallaudet U. Press, Washington, DC, 1989.Google Scholar
J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake. Real-time human pose recognition in parts from single depth images. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 1297--1304, june 2011. Google ScholarDigital Library
A. Stefan, H. Wang, and V. Athitsos. Towards automated large vocabulary gesture search. Proceedings of the 2nd International Conference on PErvsive Technologies Related to Assistive Environments - PETRA '09, pages 1--8, 2009. Google ScholarDigital Library
C. Valli, editor. The Gallaudet Dictionary of American Sign Language. Gallaudet U. Press, Washington, DC, 2006.Google Scholar
M. Van den Bergh and L. Van Gool. Combining rgb and tof cameras for real-time 3d hand gesture interaction. In Applications of Computer Vision (WACV), 2011 IEEE Workshop on, pages 66--72, jan. 2011. Google ScholarDigital Library
H. Wang, A. Stefan, S. Moradi, V. Athitsos, C. Neidle, and F. Kamangar. A system for large vocabulary sign search. In Proceedings of the 11th European conference on Trends and Topics in Computer Vision - Volume Part I, ECCV'10, pages 342--353, Berlin, Heidelberg, 2012. Springer-Verlag. Google ScholarDigital Library

Index Terms

Toward a 3D body part detection video dataset and hand tracking benchmark
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Computer graphics
    1. Animation
      1. Motion capture
      2. Motion processing

Recommendations

An integrated RGB-D system for looking up the meaning of signs
PETRA '15: Proceedings of the 8th ACM International Conference on PErvasive Technologies Related to Assistive Environments

Users of written languages have the ability to quickly and easily look up the meaning of an unknown word. Those who use sign languages, however, lack this advantage, and it can be a challenge to find the meaning of an unknown sign. While some sign-to-...
Read More
An evaluation of RGB-D skeleton tracking for use in large vocabulary complex gesture recognition
PETRA '14: Proceedings of the 7th International Conference on PErvasive Technologies Related to Assistive Environments

An essential component of any hand gesture recognition system is the hand detector and tracker. While a system with a small vocabulary of sufficiently dissimilar gestures may work well with approximate estimations of hand locations, more accurate hand ...
Read More
Hand tracking and gesture recognition system for human-computer interaction using low-cost hardware

Human-Computer Interaction (HCI) exists ubiquitously in our daily lives. It is usually achieved by using a physical controller such as a mouse, keyboard or touch screen. It hinders Natural User Interface (NUI) as there is a strong barrier between the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PETRA '13: Proceedings of the 6th International Conference on PErvasive Technologies Related to Assistive Environments
May 2013
413 pages
ISBN:9781450319737
DOI:10.1145/2504335
Conference Chair:
Fillia Makedon
University of Texas at Arlington
,
Program Chairs:
Margrit Betke
Boston University
,
Magy Self El-Nasr
Northeastern University
,
Ilias Maglogiannis
University of Central Greece, Greece
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 May 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Kinect
gesture recognition
hand location
tracking
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 11
  Total Citations
  View Citations
- 245
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Toward a 3D body part detection video dataset and hand tracking benchmark

PETRA '13: Proceedings of the 6th International Conference on PErvasive Technologies Related to Assistive Environments

ABSTRACT

References

Cited By

Index Terms

Recommendations

An integrated RGB-D system for looking up the meaning of signs

An evaluation of RGB-D skeleton tracking for use in large vocabulary complex gesture recognition

Hand tracking and gesture recognition system for human-computer interaction using low-cost hardware