Benchmark Datasets for Pose Estimation and Tracking

Andriluka, Mykhaylo; Sigal, Leonid; Black, Michael J.

doi:10.1007/978-0-85729-997-0_13

Mykhaylo Andriluka⁵,
Leonid Sigal⁶ &
Michael J. Black^7,8

3466 Accesses

Abstract

This chapter discusses the needs for standard datasets in the articulated pose estimation and tracking communities. It describes the datasets that are currently available and the performance of state-of-the-art methods on them. We discuss issues of ground-truth collection and quality, complexity of appearance and poses, evaluation metrics and partitioning of data. We also discusses limitations of current datasets and possible directions in developing new datasets for future use.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

BOP: Benchmark for 6D Object Pose Estimation

Simple Baselines for Human Pose Estimation and Tracking

OpenMonkeyChallenge: Dataset and Benchmark Challenges for Pose Estimation of Non-human Primates

Article 16 October 2022

Notes

1.
Note that in some publications this dataset is also referred to as the “Iterative Image Parsing” (IIP) dataset.
2.
The dataset underwent modification since the publication of [16], see the documentation provided with the dataset for details.
3.
www.flickr.com
4.
In HumanEva synchronization was obtained through off-line optimization, and in HumanEva-II the video frames were synchronized in hardware.
5.
Neither of the datasets in [44] or [48] is currently publicly available.
6.
http://mocap.cs.cmu.edu/
7.
Optical motion capture systems are unable to deal with loose clothing that does not drape tightly over the limbs of the body.
8.
http://www.vicon.com
9.
Note that according to the formulation in Chap. 9, Sect. 9.1.5.
10.
Those observations are abridged from the editorial written by Leonid Sigal and Michael J. Black [46].

References

Agarwal, A., Triggs, B.: 3d human pose from silhouettes by relevance vector regression. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 882–888 (2004)
Google Scholar
Agarwal, A., Triggs, B.: Recovering 3d human pose from monocular images. IEEE Trans. Pattern Anal. Mach. Intell. 28, 44–58 (2006)
Article Google Scholar
Andriluka, M., Roth, S., Schiele, B.: Pictorial structures revisited: People detection and articulated pose estimation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2009)
Google Scholar
Andriluka, M., Roth, S., Schiele, B.: Monocular 3d pose estimation and tracking by detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010)
Google Scholar
Belongie, S., Malik, J., Puzicha, J.: Shape context: A new descriptor for shape matching and object recognition. In: Advances in Neural Information Processing Systems (2000)
Google Scholar
Bergtholdt, M., Kappes, J.H., Schmidt, S., Schnörr, C.: A study of parts-based object class detection using complete graphs. Int. J. Comput. Vis. 87(1–2), 93–117 (2010)
Article MathSciNet Google Scholar
Bo, L., Sminchisescu, C.: Twin Gaussian processes for structured prediction. Int. J. Comput. Vis. 87(1–2), 28–52 (2010)
Article Google Scholar
Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3d human pose annotations. In: IEEE International Conference on Computer Vision (2009). http://www.eecs.berkeley.edu/~lbourdev/h3d/
Google Scholar
Brubaker, M., Fleet, D., Hertzmann, A.: Physics-based person tracking using the anthropomorphic walker. Int. J. Comput. Vis. 87(1–2), 140–155 (2010)
Article Google Scholar
Corazza, S., Mündermann, L., Gambaretto, E., Ferrigno, G., Andriacchi, T.: Markerless motion capture through visual hull, articulated ICP and subject specific model generation. Int. J. Comput. Vis. 87(1–2), 156–169 (2010)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2005)
Google Scholar
Eichner, M., Ferrari, V.: Better appearance models for pictorial structures. In: British Machine Vision Conference (2009). http://www.vision.ee.ethz.ch/~calvin/ethz_pascal_stickmen/index.html
Google Scholar
Eichner, M., Ferrari, V.: We are family: Joint pose estimation of multiple persons. In: European Conference on Computer Vision (2010)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010). http://pascallin.ecs.soton.ac.uk/challenges/VOC/
Article Google Scholar
Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. Int. J. Comput. Vis. 61(1), 55–79 (2005)
Article Google Scholar
Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2008). http://www.robots.ox.ac.uk/~vgg/data/stickmen/index.html
Google Scholar
Fischler, M.A., Elschlager, R.A.: The representation and matching of pictorial structures. IEEE Trans. Comput. C-22(1), 67–92 (1973)
Article Google Scholar
Fossati, A., Dimitrijevic, M., Lepetit, V., Fua, P.: Bridging the gap between detection and tracking for 3D monocular video-based motion capture. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2007)
Google Scholar
Freifeld, O., Weiss, A., Zuff, S., Black, M.J.: Contour people: A parameterized model of 2D articulated human shape. In: Computer Vision and Pattern Recognition (2010)
Google Scholar
Gall, J., Rosenhahn, B., Brox, T., Seidel, H.-P.: Optimization and filtering for human motion capture. Int. J. Comput. Vis. 87(1–2), 75–92 (2010)
Article Google Scholar
Gammeter, S., Ess, A., Jaeggli, T., Schindler, K., Leibe, B., Van Gool, L.: Articulated multi-body tracking under egomotion. In: European Conference on Computer Vision (2008)
Google Scholar
Ganapathi, V., Plagemann, C., Koller, D., Thrun, S.: Real time motion capture using a single time-of-flight camera. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010). http://ai.stanford.edu/~varung/cvpr10/
Google Scholar
Gupta, A., Kembhavi, A., Davis, L.S.: Observing human–object interactions: Using spatial and functional compatibility for recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1775–1789 (2009)
Article Google Scholar
Hasler, N., Rosenhahn, B., Thormahlen, T., Wand, M., Gall, J., Seidel, H.-P.: Markerless motion capture with unsynchronized moving cameras. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2009)
Google Scholar
Hogg, D.: Model-based vision: a program to see a walking person. Image Vis. Comput. 1(1), 5–20 (1983)
Article Google Scholar
Hoiem, D., Efros, A., Hebert, M.: Putting objects in perspective. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2006)
Google Scholar
Ionescu, C., Bo, L., Sminchisescu, C.: Structural SVM for visual localization and continuous state estimation. In: IEEE International Conference on Computer Vision (2009)
Google Scholar
Jiang, H.: Human pose estimation using consistent max-covering. In: IEEE International Conference on Computer Vision (2009)
Google Scholar
Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: British Machine Vision Conference (2010)
Google Scholar
Johnson, S., Everingham, M.: Learning effective human pose estimation from inaccurate annotation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2011)
Google Scholar
Kjellström, H., Kragić, D., Black, M.J.: Tracking people interacting with objects. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010)
Google Scholar
Kumar, M.P., Zisserman, A., Torr, P.H.S.: Efficient discriminative learning of parts-based models. In: IEEE International Conference on Computer Vision (2009)
Google Scholar
Lan, X., Huttenlocher, D.P.: Beyond trees: Common-factor models for 2d human pose recovery. In: IEEE International Conference on Computer Vision (2005)
Google Scholar
Lee, C.-S., Elgammal, A.: Coupled visual and kinematic manifold models for tracking. Int. J. Comput. Vis. 87(1–2), 118–139 (2010)
Article Google Scholar
Lee, M.W., Cohen, I.: Proposal maps driven MCMC for estimating human body pose in static images. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2004)
Google Scholar
Li, R., Tian, T.-P., Sclaroff, S., Yang, M.-H.: 3d human motion tracking with a coordinated mixture of factor analyzers. Int. J. Comput. Vis. 87(1–2), 170–190 (2010)
Article Google Scholar
Liu, Y., Stoll, C., Gall, J., Seidel, H.P., Theobalt, C.: Markerless motion capture of interacting characters using multi-view image segmentation. In: Computer Vision and Pattern Recognition (2011)
Google Scholar
Ning, H., Xu, W., Gong, Y., Huang, T.: Latent pose estimator for continuous action recognition. In: European Conference on Computer Vision, pp. 419–433 (2008)
Google Scholar
Peursum, P., Venkatesh, S., West, G.: A study on smoothing for particle filtered 3d human body tracking. Int. J. Comput. Vis. 87(1–2), 53–74 (2010)
Article Google Scholar
Pons-Moll, G., Baak, A., Helten, T., Müller, M., Seidel, H.-P., Rosenhahn, B.: Multisensor-fusion for 3d full-body human motion capture. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010). http://www.tnt.uni-hannover.de/project/MPI08_Database/
Google Scholar
Ramanan, D.: Learning to parse images of articulated bodies. In: Advances in Neural Information Processing Systems (2006). http://www.ics.uci.edu/~dramanan/papers/parse/people.zip
Google Scholar
Ren, X., Berg, A.C., Malik, J.: Recovering human body configurations using pairwise constraints between parts. In: IEEE International Conference on Computer Vision (2005)
Google Scholar
Sapp, B., Jordan, C., Taskar, B.: Adaptive pose priors for pictorial structures. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010)
Google Scholar
Shakhnarovich, G., Viola, P., Darrell, T.: Fast pose estimation with parameter-sensitive hashing. In: IEEE International Conference on Computer Vision, vol. 2, pp. 750–759 (2003)
Chapter Google Scholar
Sigal, L., Balan, A.O., Black, M.J.: Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comput. Vis. 87(1–2), 4–27 (2010). http://vision.cs.brown.edu/humaneva/index.html
Article Google Scholar
Sigal, L., Black, M.J.: Guest editorial: State of the art in image- and video-based human pose and motion estimation. Int. J. Comput. Vis. 87(1–2), 1–3 (2010)
Article Google Scholar
Singh, V., Nevatia, R., Huang, C.: Efficient inference with multiple heterogeneous part detectors for human pose estimation. In: European Conference on Computer Vision (2010)
Google Scholar
Sminchisescu, C., Kanaujia, A., Li, Z., Metaxas, D.: Conditional visual tracking in kernel space. In: Advances in Neural Information Processing Systems (2005)
Google Scholar
Sminchisescu, C., Kanaujia, A., Metaxas, D.: Learning joint top–down and bottom–up processes for 3d visual inference. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2006)
Google Scholar
Tian, T.-P., Sclaroff, S.: Fast globally optimal 2d human detection with loopy graph models. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010)
Google Scholar
Tran, D., Forsyth, D.: Improved human parsing with a full relational model. In: European Conference on Computer Vision (2010)
Google Scholar
Urtasun, R., Darrell, T.: Local probabilistic regression for activity-independent human pose inference. In: IEEE International Conference on Computer Vision (2009)
Google Scholar
Vlasic, D., Adelsberger, R., Vannucci, G., Barnwell, J., Gross, M., Matusik, W., Popović, J.: Practical motion capture in everyday surroundings. ACM Trans. Graph. 26(3), 35 (2007)
Article Google Scholar
Wang, P., Rehg, J.M.: A modular approach to the analysis and evaluation of particle filters for figure tracking. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 790–797 (2006). http://www.cc.gatech.edu/~pingwang/Project/FigureTracking.html
Google Scholar
Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2011)
Google Scholar
Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human–object interaction activities. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010). http://ai.stanford.edu/~bangpeng/resource/mutual_context_annotation.rar
Google Scholar
Zhang, J., Luo, J., Collins, R., Liu, Y.: Body localization in still images using hierarchical models and hybrid search. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Max Planck Institute for Computer Science, Saarbrücken, Germany
Mykhaylo Andriluka
Disney Research, Pittsburgh, USA
Leonid Sigal
Max Planck Institute for Intelligent Systems, Tübingen, Germany
Michael J. Black
Department of Computer Science, Brown University, Providence, USA
Michael J. Black

Authors

Mykhaylo Andriluka
View author publications
You can also search for this author in PubMed Google Scholar
Leonid Sigal
View author publications
You can also search for this author in PubMed Google Scholar
Michael J. Black
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mykhaylo Andriluka .

Editor information

Editors and Affiliations

Department of Media Technology, Aalborg University, Niels Jernes Vej 14, Aalborg, 9220, Denmark
Thomas B. Moeslund
Centre for Vision, Speech & Signal Proc., University of Surrey, Guildford, GU2 7XH, Surrey, United Kingdom
Adrian Hilton
Copenhagen Institute of Technology, Aalborg University, Lautrupvang 2B, Ballerup, 2750, Denmark
Volker Krüger
Disney Research, Forbes Avenue 615, Pittsburgh, 15213, Pennsylvania, USA
Leonid Sigal

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Andriluka, M., Sigal, L., Black, M.J. (2011). Benchmark Datasets for Pose Estimation and Tracking. In: Moeslund, T., Hilton, A., Krüger, V., Sigal, L. (eds) Visual Analysis of Humans. Springer, London. https://doi.org/10.1007/978-0-85729-997-0_13

Download citation

DOI: https://doi.org/10.1007/978-0-85729-997-0_13
Publisher Name: Springer, London
Print ISBN: 978-0-85729-996-3
Online ISBN: 978-0-85729-997-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Benchmark Datasets for Pose Estimation and Tracking

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

BOP: Benchmark for 6D Object Pose Estimation

Simple Baselines for Human Pose Estimation and Tracking

OpenMonkeyChallenge: Dataset and Benchmark Challenges for Pose Estimation of Non-human Primates

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Benchmark Datasets for Pose Estimation and Tracking

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

BOP: Benchmark for 6D Object Pose Estimation

Simple Baselines for Human Pose Estimation and Tracking

OpenMonkeyChallenge: Dataset and Benchmark Challenges for Pose Estimation of Non-human Primates

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation