ABSTRACT
The recognition of continuous natural gestures is a complex and challenging problem due to the multi-modal nature of involved visual cues (e.g. fingers and lips movements, subtle facial expressions, body pose, etc.), as well as technical limitations such as spatial and temporal resolution and unreliable depth cues. In order to promote the research advance on this field, we organized a challenge on multi-modal gesture recognition. We made available a large video database of 13,858 gestures from a lexicon of 20 Italian gesture categories recorded with a Kinect™ camera, providing the audio, skeletal model, user mask, RGB and depth images. The focus of the challenge was on user independent multiple gesture learning. There are no resting positions and the gestures are performed in continuous sequences lasting 1-2 minutes, containing between 8 and 20 gesture instances in each sequence. As a result, the dataset contains around 1.720.800 frames. In addition to the 20 main gesture categories, "distracter" gestures are included, meaning that additional audio and gestures out of the vocabulary are included. The final evaluation of the challenge was defined in terms of the Levenshtein edit distance, where the goal was to indicate the real order of gestures within the sequence. 54 international teams participated in the challenge, and outstanding results were obtained by the first ranked participants.
- M. Everingham, L. V. Gool, C. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. IJCV, 88(2):303--338, 2010. Google ScholarDigital Library
- I. Guyon, V. Athitsos, P. Jangyodsuk, H. Escalante, and B. Hamner. Results and analysis of the chalearn gesture challenge 2012. ICPR, 2012.Google Scholar
- A. Hernandez-Vela, N. Zlateva, A. Marinov, M. Reyes, P. Radeva, D. Dimov, and S. Escalera. Human limb segmentation in depth maps based on spatio-temporal graph-cuts optimization. JAISE, 4(6):535--546, 2012.Google Scholar
- Pedregosa. Scikit-learn: Machine learning in python. In Journal of Machine Learning Research, pages 2825--2830, 2011. Google ScholarDigital Library
- J. Shotton. Real-time human pose recognition in parts from single depth images. CVPR, pages 1297--1304,2011. Google ScholarDigital Library
Index Terms
- Multi-modal gesture recognition challenge 2013: dataset and results
Recommendations
ChaLearn multi-modal gesture recognition 2013: grand challenge and workshop summary
ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interactionWe organized a Grand Challenge and Workshop on Multi-Modal Gesture Recognition.
The MMGR Grand Challenge focused on the recognition of continuous natural gestures from multi-modal data (including RGB, Depth, user mask, Skeletal model, and audio). We ...
Multi-scenario gesture recognition using Kinect
CGAMES '12: Proceedings of the 2012 17th International Conference on Computer Games: AI, Animation, Mobile, Interactive Multimedia, Educational & Serious Games (CGAMES)Hand gesture recognition (HGR) is an important research topic because some situations require silent communication with sign languages. Computational HGR systems assist silent communication, and help people learn a sign language. In this article, a ...
Pointing gesture recognition based on 3D-tracking of face, hands and head orientation
ICMI '03: Proceedings of the 5th international conference on Multimodal interfacesIn this paper, we present a system capable of visually detecting pointing gestures and estimating the 3D pointing direction in real-time. In order to acquire input features for gesture recognition, we track the positions of a person's face and hands on ...
Comments