Real-Time Human Pose Recognition in Parts from Single Depth Images

Shotton, Jamie; Fitzgibbon, Andrew; Cook, Mat; Sharp, Toby; Finocchio, Mark; Moore, Richard; Kipman, Alex; Blake, Andrew

doi:10.1007/978-3-642-28661-2_5

Jamie Shotton⁴,
Andrew Fitzgibbon⁴,
Mat Cook⁴,
Toby Sharp⁴,
Mark Finocchio⁴,
Richard Moore⁴,
Alex Kipman⁴ &
…
Andrew Blake⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 411))

6282 Accesses

Abstract

This chapter describes a method to quickly and accurately predict 3D positions of body joints from a single depth image, using no temporal information. We take an object recognition approach, designing an intermediate body parts representation that maps the difficult pose estimation problem into a simpler per-pixel classification problem. Our large and highly varied training dataset allows the classifier to estimate body parts invariant to pose, body shape, clothing, etc.. Finally we generate confidence-scored 3D proposals of several body joints by reprojecting the classification result into world space and finding local modes of a 3D non-parametric density. The system runs at around 200 frames per second on consumer hardware. Our evaluation shows high accuracy on both synthetic and real test sets, and investigates the effect of several training parameters.We achieve state of the art accuracy in our comparison with related work and demonstrate improved generalization over exact whole-skeleton nearest neighbor matching.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model

Monocular Surface Reconstruction Using 3D Deformable Part Models

Efficient Pose-Based Action Recognition

References

Agarwal, A., Triggs, B.: 3D human pose from silhouettes by relevance vector regression. In: Proc. CVPR (2004)
Google Scholar
Amit, Y., Geman, D.: Shape quantization and recognition with randomized trees. Neural Computation 9(7), 1545–1588 (1997)
Article Google Scholar
Anguelov, D., Taskar, B., Chatalbashev, V., Koller, D., Gupta, D., Ng, A.: Discriminative learning of markov random fields for segmentation of 3D scan data. In: Proc. CVPR (2005)
Google Scholar
Autodesk MotionBuilder
Google Scholar
Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. PAMI 24 (2002)
Google Scholar
Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3D human pose annotations. In: Proc. ICCV (2009)
Google Scholar
Bregler, C., Malik, J.: Tracking people with twists and exponential maps. In: Proc. CVPR (1998)
Google Scholar
Breiman, L.: Random forests. Mach. Learning 45(1), 5–32 (2001)
Article MATH Google Scholar
CMU Mocap Database, http://mocap.cs.cmu.edu/
Comaniciu, D., Meer, P.: Mean shift: A robust approach toward feature space analysis. IEEE Trans. PAMI 24(5) (2002)
Google Scholar
Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. IJCV 61(1), 55–79 (2005)
Article Google Scholar
Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: Proc. CVPR (2003)
Google Scholar
Ganapathi, V., Plagemann, C., Koller, D., Thrun, S.: Real time motion capture using a single time-of-flight camera. In: Proc. CVPR (2010)
Google Scholar
Gavrila, D.M.: Pedestrian Detection from a Moving Vehicle. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1843, pp. 37–49. Springer, Heidelberg (2000)
Chapter Google Scholar
Gonzalez, T.F.: Clustering to minimize the maximum intercluster distance. Theor. Comp. Sci. 38 (1985)
Google Scholar
Grest, D., Woetzel, J., Koch, R.: Nonlinear Body Pose Estimation from Depth Images. In: Kropatsch, W.G., Sablatnig, R., Hanbury, A. (eds.) DAGM 2005. LNCS, vol. 3663, pp. 285–292. Springer, Heidelberg (2005)
Chapter Google Scholar
Ioffe, S., Forsyth, D.: Probabilistic methods for finding people. IJCV 43(1), 45–68 (2001)
Article MATH Google Scholar
Kalogerakis, E., Hertzmann, A., Singh, K.: Learning 3D mesh segmentation and labeling. ACM Trans. Graphics 29(3) (2010)
Google Scholar
Knoop, S., Vacek, S., Dillmann, R.: Sensor fusion for 3D human body tracking with an articulated 3D body model. In: Proc. ICRA (2006)
Google Scholar
Lepetit, V., Lagger, P., Fua, P.: Randomized trees for real-time keypoint recognition. In: Proc. CVPR, vol. 2, pp. 775–781 (2005)
Google Scholar
Microsoft Corp. Redmond WA. Kinect for Xbox 360
Google Scholar
Moeslund, T.B., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion capture and analysis. In: CVIU (2006)
Google Scholar
Moosmann, F., Triggs, B., Jurie, F.: Fast discriminative visual codebooks using randomized clustering forests. In: NIPS (2006)
Google Scholar
Mori, G., Malik, J.: Estimating human body configurations using shape context matching. In: Proc. ICCV (2003)
Google Scholar
Navaratnam, R., Fitzgibbon, A.W., Cipolla, R.: The joint manifold model for semi-supervised multi-valued regression. In: Proc. ICCV (2007)
Google Scholar
Ning, H., Xu, W., Gong, Y., Huang, T.S.: Discriminative learning of visual words for 3D human pose estimation. In: Proc. CVPR (2008)
Google Scholar
Okada, R., Soatto, S.: Relevant Feature Selection for Human Pose Estimation and Localization in Cluttered Images. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 434–445. Springer, Heidelberg (2008)
Chapter Google Scholar
Plagemann, C., Ganapathi, V., Koller, D., Thrun, S.: Real-time identification and localization of body parts from depth images. In: Proc. ICRA (2010)
Google Scholar
Poppe, R.: Vision-based human motion analysis: An overview. CVIU 108 (2007)
Google Scholar
Quinlan, J.R.: Induction of decision trees. Mach. Learn. (1986)
Google Scholar
Ramanan, D., Forsyth, D.A.: Finding and tracking people from the bottom up. In: Proc. CVPR (2003)
Google Scholar
Rogez, G., Rihan, J., Ramalingam, S., Orrite, C., Torr, P.H.S.: Randomized trees for human pose detection. In: Proc. CVPR (2008)
Google Scholar
Shakhnarovich, G., Viola, P., Darrell, T.: Fast pose estimation with parameter sensitive hashing. In: Proc. ICCV (2003)
Google Scholar
Sharp, T.: Implementing Decision Trees and Forests on a GPU. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 595–608. Springer, Heidelberg (2008)
Chapter Google Scholar
Shepherd, B.A.: An appraisal of a decision tree approach to image classification. In: IJCAI (1983)
Google Scholar
Shotton, J., Johnson, M., Cipolla, R.: Semantic texton forests for image categorization and segmentation. In: Proc. CVPR (2008)
Google Scholar
Siddiqui, M., Medioni, G.: Human pose estimation from a single view point, real-time range sensor. In: CVCG at CVPR (2010)
Google Scholar
Sidenbladh, H., Black, M.J., Sigal, L.: Implicit Probabilistic Models of Human Motion for Synthesis and Tracking. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 784–800. Springer, Heidelberg (2002)
Chapter Google Scholar
Sigal, L., Bhatia, S., Roth, S., Black, M.J., Isard, M.: Tracking loose-limbed people. In: Proc. CVPR (2004)
Google Scholar
Tu, Z.: Auto-context and its application to high-level vision tasks. In: Proc. CVPR (2008)
Google Scholar
Urtasun, R., Darrell, T.: Local probabilistic regression for activity-independent human pose inference. In: Proc. CVPR (2008)
Google Scholar
Wang, R.Y., Popović, J.: Real-time hand-tracking with a color glove. In: Proc. ACM SIGGRAPH (2009)
Google Scholar
Winn, J., Shotton, J.: The layout consistent random field for recognizing and segmenting partially occluded objects. In: Proc. CVPR (2006)
Google Scholar
Zhu, Y., Fujimura, K.: Constrained Optimization for Human Pose Estimation from Depth Sequences. In: Yagi, Y., Kang, S.B., Kweon, I.S., Zha, H. (eds.) ACCV 2007, Part I. LNCS, vol. 4843, pp. 408–418. Springer, Heidelberg (2007)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research Cambridge and Xbox Incubation, Cambridge, UK
Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman & Andrew Blake

Authors

Jamie Shotton
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Fitzgibbon
View author publications
You can also search for this author in PubMed Google Scholar
Mat Cook
View author publications
You can also search for this author in PubMed Google Scholar
Toby Sharp
View author publications
You can also search for this author in PubMed Google Scholar
Mark Finocchio
View author publications
You can also search for this author in PubMed Google Scholar
Richard Moore
View author publications
You can also search for this author in PubMed Google Scholar
Alex Kipman
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Blake
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Engineering, University of Cambridge, Trumpington Street, Cambridge, CB2 1PZ, United Kingdom
Roberto Cipolla
Dipartimento di Matematica e Informatica, Università di Catania, Viale Andrea Doria 6, Catania, 95125, Catania, Italy
Sebastiano Battiato
Dipartimento di Matematica e Informatica, Università di Catania, Viale A. Doria 6, Catania, 95125, Italy
Giovanni Maria Farinella

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Shotton, J. et al. (2013). Real-Time Human Pose Recognition in Parts from Single Depth Images. In: Cipolla, R., Battiato, S., Farinella, G. (eds) Machine Learning for Computer Vision. Studies in Computational Intelligence, vol 411. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28661-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-28661-2_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28660-5
Online ISBN: 978-3-642-28661-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Real-Time Human Pose Recognition in Parts from Single Depth Images

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model

Monocular Surface Reconstruction Using 3D Deformable Part Models

Efficient Pose-Based Action Recognition

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Real-Time Human Pose Recognition in Parts from Single Depth Images

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model

Monocular Surface Reconstruction Using 3D Deformable Part Models

Efficient Pose-Based Action Recognition

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation