Skip to main content

Benchmark Datasets for Pose Estimation and Tracking

  • Chapter

Abstract

This chapter discusses the needs for standard datasets in the articulated pose estimation and tracking communities. It describes the datasets that are currently available and the performance of state-of-the-art methods on them. We discuss issues of ground-truth collection and quality, complexity of appearance and poses, evaluation metrics and partitioning of data. We also discusses limitations of current datasets and possible directions in developing new datasets for future use.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Note that in some publications this dataset is also referred to as the “Iterative Image Parsing” (IIP) dataset.

  2. 2.

    The dataset underwent modification since the publication of [16], see the documentation provided with the dataset for details.

  3. 3.

    www.flickr.com

  4. 4.

    In HumanEva synchronization was obtained through off-line optimization, and in HumanEva-II the video frames were synchronized in hardware.

  5. 5.

    Neither of the datasets in [44] or [48] is currently publicly available.

  6. 6.

    http://mocap.cs.cmu.edu/

  7. 7.

    Optical motion capture systems are unable to deal with loose clothing that does not drape tightly over the limbs of the body.

  8. 8.

    http://www.vicon.com

  9. 9.

    Note that according to the formulation in Chap. 9, Sect. 9.1.5.

  10. 10.

    Those observations are abridged from the editorial written by Leonid Sigal and Michael J. Black [46].

References

  1. Agarwal, A., Triggs, B.: 3d human pose from silhouettes by relevance vector regression. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 882–888 (2004)

    Google Scholar 

  2. Agarwal, A., Triggs, B.: Recovering 3d human pose from monocular images. IEEE Trans. Pattern Anal. Mach. Intell. 28, 44–58 (2006)

    Article  Google Scholar 

  3. Andriluka, M., Roth, S., Schiele, B.: Pictorial structures revisited: People detection and articulated pose estimation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2009)

    Google Scholar 

  4. Andriluka, M., Roth, S., Schiele, B.: Monocular 3d pose estimation and tracking by detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010)

    Google Scholar 

  5. Belongie, S., Malik, J., Puzicha, J.: Shape context: A new descriptor for shape matching and object recognition. In: Advances in Neural Information Processing Systems (2000)

    Google Scholar 

  6. Bergtholdt, M., Kappes, J.H., Schmidt, S., Schnörr, C.: A study of parts-based object class detection using complete graphs. Int. J. Comput. Vis. 87(1–2), 93–117 (2010)

    Article  MathSciNet  Google Scholar 

  7. Bo, L., Sminchisescu, C.: Twin Gaussian processes for structured prediction. Int. J. Comput. Vis. 87(1–2), 28–52 (2010)

    Article  Google Scholar 

  8. Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3d human pose annotations. In: IEEE International Conference on Computer Vision (2009). http://www.eecs.berkeley.edu/~lbourdev/h3d/

    Google Scholar 

  9. Brubaker, M., Fleet, D., Hertzmann, A.: Physics-based person tracking using the anthropomorphic walker. Int. J. Comput. Vis. 87(1–2), 140–155 (2010)

    Article  Google Scholar 

  10. Corazza, S., Mündermann, L., Gambaretto, E., Ferrigno, G., Andriacchi, T.: Markerless motion capture through visual hull, articulated ICP and subject specific model generation. Int. J. Comput. Vis. 87(1–2), 156–169 (2010)

    Article  Google Scholar 

  11. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2005)

    Google Scholar 

  12. Eichner, M., Ferrari, V.: Better appearance models for pictorial structures. In: British Machine Vision Conference (2009). http://www.vision.ee.ethz.ch/~calvin/ethz_pascal_stickmen/index.html

    Google Scholar 

  13. Eichner, M., Ferrari, V.: We are family: Joint pose estimation of multiple persons. In: European Conference on Computer Vision (2010)

    Google Scholar 

  14. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010). http://pascallin.ecs.soton.ac.uk/challenges/VOC/

    Article  Google Scholar 

  15. Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. Int. J. Comput. Vis. 61(1), 55–79 (2005)

    Article  Google Scholar 

  16. Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2008). http://www.robots.ox.ac.uk/~vgg/data/stickmen/index.html

    Google Scholar 

  17. Fischler, M.A., Elschlager, R.A.: The representation and matching of pictorial structures. IEEE Trans. Comput. C-22(1), 67–92 (1973)

    Article  Google Scholar 

  18. Fossati, A., Dimitrijevic, M., Lepetit, V., Fua, P.: Bridging the gap between detection and tracking for 3D monocular video-based motion capture. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2007)

    Google Scholar 

  19. Freifeld, O., Weiss, A., Zuff, S., Black, M.J.: Contour people: A parameterized model of 2D articulated human shape. In: Computer Vision and Pattern Recognition (2010)

    Google Scholar 

  20. Gall, J., Rosenhahn, B., Brox, T., Seidel, H.-P.: Optimization and filtering for human motion capture. Int. J. Comput. Vis. 87(1–2), 75–92 (2010)

    Article  Google Scholar 

  21. Gammeter, S., Ess, A., Jaeggli, T., Schindler, K., Leibe, B., Van Gool, L.: Articulated multi-body tracking under egomotion. In: European Conference on Computer Vision (2008)

    Google Scholar 

  22. Ganapathi, V., Plagemann, C., Koller, D., Thrun, S.: Real time motion capture using a single time-of-flight camera. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010). http://ai.stanford.edu/~varung/cvpr10/

    Google Scholar 

  23. Gupta, A., Kembhavi, A., Davis, L.S.: Observing human–object interactions: Using spatial and functional compatibility for recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1775–1789 (2009)

    Article  Google Scholar 

  24. Hasler, N., Rosenhahn, B., Thormahlen, T., Wand, M., Gall, J., Seidel, H.-P.: Markerless motion capture with unsynchronized moving cameras. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2009)

    Google Scholar 

  25. Hogg, D.: Model-based vision: a program to see a walking person. Image Vis. Comput. 1(1), 5–20 (1983)

    Article  Google Scholar 

  26. Hoiem, D., Efros, A., Hebert, M.: Putting objects in perspective. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2006)

    Google Scholar 

  27. Ionescu, C., Bo, L., Sminchisescu, C.: Structural SVM for visual localization and continuous state estimation. In: IEEE International Conference on Computer Vision (2009)

    Google Scholar 

  28. Jiang, H.: Human pose estimation using consistent max-covering. In: IEEE International Conference on Computer Vision (2009)

    Google Scholar 

  29. Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: British Machine Vision Conference (2010)

    Google Scholar 

  30. Johnson, S., Everingham, M.: Learning effective human pose estimation from inaccurate annotation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2011)

    Google Scholar 

  31. Kjellström, H., Kragić, D., Black, M.J.: Tracking people interacting with objects. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010)

    Google Scholar 

  32. Kumar, M.P., Zisserman, A., Torr, P.H.S.: Efficient discriminative learning of parts-based models. In: IEEE International Conference on Computer Vision (2009)

    Google Scholar 

  33. Lan, X., Huttenlocher, D.P.: Beyond trees: Common-factor models for 2d human pose recovery. In: IEEE International Conference on Computer Vision (2005)

    Google Scholar 

  34. Lee, C.-S., Elgammal, A.: Coupled visual and kinematic manifold models for tracking. Int. J. Comput. Vis. 87(1–2), 118–139 (2010)

    Article  Google Scholar 

  35. Lee, M.W., Cohen, I.: Proposal maps driven MCMC for estimating human body pose in static images. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2004)

    Google Scholar 

  36. Li, R., Tian, T.-P., Sclaroff, S., Yang, M.-H.: 3d human motion tracking with a coordinated mixture of factor analyzers. Int. J. Comput. Vis. 87(1–2), 170–190 (2010)

    Article  Google Scholar 

  37. Liu, Y., Stoll, C., Gall, J., Seidel, H.P., Theobalt, C.: Markerless motion capture of interacting characters using multi-view image segmentation. In: Computer Vision and Pattern Recognition (2011)

    Google Scholar 

  38. Ning, H., Xu, W., Gong, Y., Huang, T.: Latent pose estimator for continuous action recognition. In: European Conference on Computer Vision, pp. 419–433 (2008)

    Google Scholar 

  39. Peursum, P., Venkatesh, S., West, G.: A study on smoothing for particle filtered 3d human body tracking. Int. J. Comput. Vis. 87(1–2), 53–74 (2010)

    Article  Google Scholar 

  40. Pons-Moll, G., Baak, A., Helten, T., Müller, M., Seidel, H.-P., Rosenhahn, B.: Multisensor-fusion for 3d full-body human motion capture. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010). http://www.tnt.uni-hannover.de/project/MPI08_Database/

    Google Scholar 

  41. Ramanan, D.: Learning to parse images of articulated bodies. In: Advances in Neural Information Processing Systems (2006). http://www.ics.uci.edu/~dramanan/papers/parse/people.zip

    Google Scholar 

  42. Ren, X., Berg, A.C., Malik, J.: Recovering human body configurations using pairwise constraints between parts. In: IEEE International Conference on Computer Vision (2005)

    Google Scholar 

  43. Sapp, B., Jordan, C., Taskar, B.: Adaptive pose priors for pictorial structures. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010)

    Google Scholar 

  44. Shakhnarovich, G., Viola, P., Darrell, T.: Fast pose estimation with parameter-sensitive hashing. In: IEEE International Conference on Computer Vision, vol. 2, pp. 750–759 (2003)

    Chapter  Google Scholar 

  45. Sigal, L., Balan, A.O., Black, M.J.: Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comput. Vis. 87(1–2), 4–27 (2010). http://vision.cs.brown.edu/humaneva/index.html

    Article  Google Scholar 

  46. Sigal, L., Black, M.J.: Guest editorial: State of the art in image- and video-based human pose and motion estimation. Int. J. Comput. Vis. 87(1–2), 1–3 (2010)

    Article  Google Scholar 

  47. Singh, V., Nevatia, R., Huang, C.: Efficient inference with multiple heterogeneous part detectors for human pose estimation. In: European Conference on Computer Vision (2010)

    Google Scholar 

  48. Sminchisescu, C., Kanaujia, A., Li, Z., Metaxas, D.: Conditional visual tracking in kernel space. In: Advances in Neural Information Processing Systems (2005)

    Google Scholar 

  49. Sminchisescu, C., Kanaujia, A., Metaxas, D.: Learning joint top–down and bottom–up processes for 3d visual inference. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2006)

    Google Scholar 

  50. Tian, T.-P., Sclaroff, S.: Fast globally optimal 2d human detection with loopy graph models. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010)

    Google Scholar 

  51. Tran, D., Forsyth, D.: Improved human parsing with a full relational model. In: European Conference on Computer Vision (2010)

    Google Scholar 

  52. Urtasun, R., Darrell, T.: Local probabilistic regression for activity-independent human pose inference. In: IEEE International Conference on Computer Vision (2009)

    Google Scholar 

  53. Vlasic, D., Adelsberger, R., Vannucci, G., Barnwell, J., Gross, M., Matusik, W., Popović, J.: Practical motion capture in everyday surroundings. ACM Trans. Graph. 26(3), 35 (2007)

    Article  Google Scholar 

  54. Wang, P., Rehg, J.M.: A modular approach to the analysis and evaluation of particle filters for figure tracking. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 790–797 (2006). http://www.cc.gatech.edu/~pingwang/Project/FigureTracking.html

    Google Scholar 

  55. Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2011)

    Google Scholar 

  56. Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human–object interaction activities. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010). http://ai.stanford.edu/~bangpeng/resource/mutual_context_annotation.rar

    Google Scholar 

  57. Zhang, J., Luo, J., Collins, R., Liu, Y.: Body localization in still images using hierarchical models and hybrid search. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mykhaylo Andriluka .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag London Limited

About this chapter

Cite this chapter

Andriluka, M., Sigal, L., Black, M.J. (2011). Benchmark Datasets for Pose Estimation and Tracking. In: Moeslund, T., Hilton, A., Krüger, V., Sigal, L. (eds) Visual Analysis of Humans. Springer, London. https://doi.org/10.1007/978-0-85729-997-0_13

Download citation

  • DOI: https://doi.org/10.1007/978-0-85729-997-0_13

  • Publisher Name: Springer, London

  • Print ISBN: 978-0-85729-996-3

  • Online ISBN: 978-0-85729-997-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics