research-article

Multi-modal gesture recognition challenge 2013: dataset and results

Authors:
Sergio Escalera

University of Barcelona & Computer Vision Center, Barcelona, Spain

University of Barcelona & Computer Vision Center, Barcelona, Spain
View Profile

,
Jordi Gonzàlez

Universitat Autonoma de Barcelona & Computer Vision Center, Barcelona, Spain

Universitat Autonoma de Barcelona & Computer Vision Center, Barcelona, Spain
View Profile

,
Xavier Baró

Universitat Oberta de Catalunya & Computer Vision Center, Barcelona, Spain

Universitat Oberta de Catalunya & Computer Vision Center, Barcelona, Spain
View Profile

,
Miguel Reyes

Universitat de Barcelona & Computer Vision Center, Barcelona, Spain

Universitat de Barcelona & Computer Vision Center, Barcelona, Spain
View Profile

,
Oscar Lopes

Computer Vision Center, Barcelona, Spain

Computer Vision Center, Barcelona, Spain
View Profile

,
Isabelle Guyon

ChaLearn & Berkeley, Berkeley, USA

ChaLearn & Berkeley, Berkeley, USA
View Profile

,
Vassilis Athitsos

University of Texas, Texas, USA

University of Texas, Texas, USA
View Profile

,
Hugo Escalante

INAOE, Puebla, Mexico

INAOE, Puebla, Mexico
View Profile

ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interactionDecember 2013Pages 445–452https://doi.org/10.1145/2522848.2532595

Published:09 December 2013Publication History

ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction

Pages 445–452

ABSTRACT

The recognition of continuous natural gestures is a complex and challenging problem due to the multi-modal nature of involved visual cues (e.g. fingers and lips movements, subtle facial expressions, body pose, etc.), as well as technical limitations such as spatial and temporal resolution and unreliable depth cues. In order to promote the research advance on this field, we organized a challenge on multi-modal gesture recognition. We made available a large video database of 13,858 gestures from a lexicon of 20 Italian gesture categories recorded with a Kinect™ camera, providing the audio, skeletal model, user mask, RGB and depth images. The focus of the challenge was on user independent multiple gesture learning. There are no resting positions and the gestures are performed in continuous sequences lasting 1-2 minutes, containing between 8 and 20 gesture instances in each sequence. As a result, the dataset contains around 1.720.800 frames. In addition to the 20 main gesture categories, "distracter" gestures are included, meaning that additional audio and gestures out of the vocabulary are included. The final evaluation of the challenge was defined in terms of the Levenshtein edit distance, where the goal was to indicate the real order of gestures within the sequence. 54 international teams participated in the challenge, and outstanding results were obtained by the first ranked participants.

References

M. Everingham, L. V. Gool, C. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. IJCV, 88(2):303--338, 2010. Google ScholarDigital Library
I. Guyon, V. Athitsos, P. Jangyodsuk, H. Escalante, and B. Hamner. Results and analysis of the chalearn gesture challenge 2012. ICPR, 2012.Google Scholar
A. Hernandez-Vela, N. Zlateva, A. Marinov, M. Reyes, P. Radeva, D. Dimov, and S. Escalera. Human limb segmentation in depth maps based on spatio-temporal graph-cuts optimization. JAISE, 4(6):535--546, 2012.Google Scholar
Pedregosa. Scikit-learn: Machine learning in python. In Journal of Machine Learning Research, pages 2825--2830, 2011. Google ScholarDigital Library
J. Shotton. Real-time human pose recognition in parts from single depth images. CVPR, pages 1297--1304,2011. Google ScholarDigital Library

Index Terms

Multi-modal gesture recognition challenge 2013: dataset and results
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Video summarization
      2. Image and video acquisition
        Motion capture
  2. Computer graphics
    1. Animation
      1. Motion capture
      2. Motion processing

Recommendations

ChaLearn multi-modal gesture recognition 2013: grand challenge and workshop summary
ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction

We organized a Grand Challenge and Workshop on Multi-Modal Gesture Recognition.

The MMGR Grand Challenge focused on the recognition of continuous natural gestures from multi-modal data (including RGB, Depth, user mask, Skeletal model, and audio). We ...
Read More
Multi-scenario gesture recognition using Kinect
CGAMES '12: Proceedings of the 2012 17th International Conference on Computer Games: AI, Animation, Mobile, Interactive Multimedia, Educational & Serious Games (CGAMES)

Hand gesture recognition (HGR) is an important research topic because some situations require silent communication with sign languages. Computational HGR systems assist silent communication, and help people learn a sign language. In this article, a ...
Read More
Pointing gesture recognition based on 3D-tracking of face, hands and head orientation
ICMI '03: Proceedings of the 5th international conference on Multimodal interfaces

In this paper, we present a system capable of visually detecting pointing gestures and estimating the 3D pointing direction in real-time. In order to acquire input features for gesture recognition, we track the positions of a person's face and hands on ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction
December 2013
630 pages
ISBN:9781450321297
DOI:10.1145/2522848
General Chairs:
Julien Epps
The University of New South Wales, Australia
,
Fang Chen
National ICT Australia, Australia
,
Sharon Oviatt
Incaa Designs, USA
,
Kenji Mase
Nagoya University, Japan
,
Program Chairs:
Andrew Sears
Rochester Institute of Technology, USA
,
Kristiina Jokinen
University of Helsinki, Finland
,
Björn Schuller
Technische Universität München, Germany
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 December 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
computer vision
gesture recognition
multi-modal data analysis
Qualifiers
- research-article
Conference

Acceptance Rates
ICMI '13 Paper Acceptance Rate49of133submissions,37%Overall Acceptance Rate453of1,080submissions,42%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 133
  Total Citations
  View Citations
- 795
  Total Downloads
- Downloads (Last 12 months)54
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Multi-modal gesture recognition challenge 2013: dataset and results

ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction

ABSTRACT

References

Cited By

Index Terms

Recommendations

ChaLearn multi-modal gesture recognition 2013: grand challenge and workshop summary

Multi-scenario gesture recognition using Kinect

Pointing gesture recognition based on 3D-tracking of face, hands and head orientation