skip to main content

An Evaluation of Gamesourced Data for Human Pose Estimation

Published: 31 March 2015 Publication History


Gamesourcing has emerged as an approach for rapidly acquiring labeled data for learning-based, computer vision recognition algorithms. In this article, we present an approach for using RGB-D sensors to acquire annotated training data for human pose estimation from 2D images. Unlike other gamesourcing approaches, our method does not require a specific game, but runs alongside any gesture-based game using RGB-D sensors. The automatically generated datasets resulting from this approach contain joint estimates within a few pixel units of manually labeled data, and a gamesourced dataset created using a relatively small number of players, games, and locations performs as well as large-scale, manually annotated datasets when used as training data with recent learning-based human pose estimation methods for 2D images.


Lubomir Bourdev and Jitendra Malik. 2009. Poselets: Body part detectors trained using 3D human pose annotations. In Proceedings of the International Conference on Computer Vision. IEEE.
Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. 2004. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the Symposium on Computational Geometry. ACM, New York, NY, 253--262.
Marcin Eichner and Vittorio Ferrari. 2009. Better appearance models for pictorial structures. In Proceedings of the British Machine Vision Conference.
P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. 2009. Object detection with discriminatively trained part-based models. Transactions on Pattern Analysis and Machine Intelligence 32, 1627--1645.
Vittorio Ferrari, Manuel Marin-Jimenez, and Andrew Zisserman. 2008. Progressive search space reduction for human pose estimation. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, 1--8.
Samantha Finkelstein, Andrea Nickel, Zachary Lipps, Tiffany Barnes, Zachary Wartell, and Evan A. Suma. 2011. Astrojumper: Motivating exercise with an immersive virtual reality exergame. Presence: Teleoperators and Virtual Environments 20, 78--92.
Simon Fothergill, Helena M. Mentis, Pushmeet Kohli, and Sebastian Nowozin. 2012. Instructing people for training gestural interactive systems. In Proceedings of the CHI, Joseph A. Konstan, Ed H. Chi, and Kristina Höök (Eds.). ACM, New York, NY, 1737--1746.
Chien-Ju Ho, Tsung-Hsiang Chang, and Jane Yung jen Hsu. 2007. PhotoSlap: A multi-player online game for semantic annotation. In Proceedings of the AAAI Conference on Artificial Intelligence. 1359--1364.
Allison Janoch, SSergey Karayev, Yangqing Jia, Jonathan T. Barron, Mario Fritz, Kate Saenko, and Trevor Darrell. 2011. A category-level 3-D object dataset: Putting the Kinect to work. In Proceedings of the International Conference on Computer Vision Workshops (ICCV Workshops). IEEE, 1168--1174.
Sam Johnson and Mark Everingham. 2010. Clustered pose and nonlinear appearance models for human pose estimation. In Proceedings of the British Machine Vision Conference.
Sam Johnson and Mark Everingham. 2011. Learning effective humanpose estimation from inaccurate annotation. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE.
Kinect Hacks. 2012. Kung Fu Tetris. Retrieved February 17, 2015 from
Kevin Lai, Liefeng Bo, Xiaofeng Ren, and Dieter Fox. 2011. A large-scale hierarchical multi-view RGB-D object dataset. In Proceedings of the International Conference on Robotics and Automation (ICRA’11). IEEE, 1817--1824.
Simone Milani and Giancarlo Calvagno. 2012. Joint denoising and interpolation of depth maps for MS Kinect sensors. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 797--800.
Deva Ramanan. 2007. Learning to parse images of articulated bodies. In Advances in Neural Information Processing Systems. 1129.
Rómer Rosales and Stan Sclaroff. 2001. Learning body pose via specialized maps. In Advances in Neural Information Processing Systems. 1263--1270.
Benjamin Sapp and Ben Taskar. 2013. Multimodal decomposable models for human pose estimation. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE.
Cristian Sminchisescu, Atul Kanaujia, Zhiguo Li, and Dimitris Metaxas. 2005. Discriminative density propagation for 3D human motion estimation. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, 390--397.
Richard Souvenir, Ayman Hajja, and Scott Spurlock. 2012. Gamesourcing to acquire labeled human pose estimation data. In Proceedings of the Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 1--6.
Luciano Spinello and Kai O. Arras. 2011. People detection in RGB-D data. In Proceedings of the International Conference on Intelligent Robots and Systems (IROS’11). IEEE, 3838--3843.
Evan A. Suma, Belinda Lange, Albert Rizzo, David Krum, and Mark Bolas. 2011. FAAST: The Flexible Action and Articulated Skeleton Toolkit. In Virtual Reality. IEEE, 247--248.
Antonio Torralba and Alexei A. Efros. 2011. Unbiased look at dataset bias. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, 1521--1528.
Luis Von Ahn and Laura Dabbish. 2004. Labeling images with a computer game. In Proceedings of the International Conference on Human Factors in Computing Systems. ACM, New York, NY, 319--326.
Luis Von Ahn, Ruoran Liu, and Manuel Blum. 2006. Peekaboom: a game for locating objects in images. In Proceedings of the International Conference on Human Factors in Computing Systems. ACM, New York, NY, 55--64.
Yi Yang and Deva Ramanan. 2011. Articulated pose estimation with flexible mixtures-of-parts. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE.

Cited By

View all
  • (2022)Gamesourcing: an unconventional tool to assist the solution of the traveling salesman problemNatural Computing: an international journal10.1007/s11047-020-09817-z21:2(347-357)Online publication date: 1-Jun-2022
  • (2016)Joint Structured Sparsity Regularized Multiview Dimension Reduction for Video-Based Facial Expression RecognitionACM Transactions on Intelligent Systems and Technology10.1145/29565568:2(1-21)Online publication date: 25-Oct-2016

Index Terms

  1. An Evaluation of Gamesourced Data for Human Pose Estimation



    Information & Contributors


    Published In

    cover image ACM Transactions on Intelligent Systems and Technology
    ACM Transactions on Intelligent Systems and Technology  Volume 6, Issue 2
    Special Section on Visual Understanding with RGB-D Sensors
    May 2015
    381 pages
    • Editor:
    • Huan Liu
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]


    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 31 March 2015
    Accepted: 01 March 2014
    Revised: 01 January 2014
    Received: 01 June 2013
    Published in TIST Volume 6, Issue 2


    Request permissions for this article.

    Check for updates

    Author Tags

    1. Crowdsourcing
    2. Kinect
    3. automatic annotation
    4. data collection
    5. dataset generation
    6. datasets
    7. depth images


    • Research-article
    • Research
    • Refereed


    Other Metrics

    Bibliometrics & Citations


    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 05 Mar 2025

    Other Metrics


    Cited By

    View all
    • (2022)Gamesourcing: an unconventional tool to assist the solution of the traveling salesman problemNatural Computing: an international journal10.1007/s11047-020-09817-z21:2(347-357)Online publication date: 1-Jun-2022
    • (2016)Joint Structured Sparsity Regularized Multiview Dimension Reduction for Video-Based Facial Expression RecognitionACM Transactions on Intelligent Systems and Technology10.1145/29565568:2(1-21)Online publication date: 25-Oct-2016

    View Options

    Login options

    Full Access

    View options


    View or Download as a PDF file.



    View online with eReader.







    Share this Publication link

    Share on social media