Abstract
3D human pose estimation is a challenging but important research topic with abundant applications. As for discriminative human pose estimation, the main goal is to learn a nonlinear mapping from image descriptors to 3D human pose configurations, which is difficult due to the high-dimensionality of human pose space and the multimodality of the distribution. To address these problems, we propose a novel motionlet LLC coding in a discriminative framework. A motionlet consists of training examples covering a local area in terms of image space, pose space and time stream. We first group most informative and helpful training examples into motionlets, then perform LLC Coding to learn the nonlinear mapping and get candidate poses, and finally choose the most appropriate pose as the result estimate. To further eliminate ambiguities and improve robustness, we extend our framework to incorporate multiviews. We conduct qualitative evaluation on our Taichi data set and quantitative evaluation on HumanEva data set, which show that our approach has gained the-state-of-the-art performance and significant improvement against previous approaches.






Similar content being viewed by others
Notes
Most of the ground truth poses of Subject1 Throw/Catch sequence are invalid and those of Subject3 are unavailable.
References
Agarwal A, Triggs B (2004) 3D human pose from silhouettes by relevance vector regression. In: CVPR
Agarwal A, Triggs B (2006) Recovering 3D human pose from monocular images. PAMI 28(1):44–58
Agarwal A, Triggs B (2006) A local basis representation for estimating human pose from cluttered images. In: ACCV
Bo L, Sminchisescu C (2010) Twin gaussian processes for structured prediction. In: IJCV
Duan K, Batra D, Crandall D (2012) A multi-layer composite model for human pose estimation. In: BMVC
Elgammal A, Lee C (2004) Infering 3D body pose from silhouettes using activity manifold learning. In: CVPR
Elgammal A, Lee C-S (2007) Nonlinear manifold learning for dynamic shape and dynamic appearance. CVIU 106(1):31–46
Fergie M, Galata A (2010) Local Gaussian processes for pose recognition from noisy inputs. In: BMVC
Felzenszwalb PF, Huttenlocher DP (2005) Pictorial structures for object recognition. IJCV 61(1):55–79
Grauman K, Shakhnarovich G, Darell T (2003) Inferring 3D structure with a statistical image-based shape model. In: ICCV
Howe NR (2007) Silhouette lookup for monocular 3D pose tracking. Image Vis Comput 25(3):331–341
HumanEva project (2007) http://vision.cs.brown.edu/humaneva/
Jinjun W, Jianchao Y, Kai Y, Fengjun L, Huang T, Yihong G (2010) Locality-constrained linear coding for image classification. In: CVPR
Kanaujia A, Sminchisescu C, Metaxas D (2007) Semi-supervised hierarchical models for 3D human pose reconstruction. In: CVPR
Ning H, Wei X, Gong Y, Huang T (2008) Discriminative learning of visual words for 3D human pose estimation. In: CVPR
Lee MW, Chohen I (2004) Human upper body pose estimation in static images. In: ECCV
Ong E-J, Micilotta AS, Bowden R, Hilton A (2006) Viewpoint invariant exemplar-based 3D human tracking. CVIU 104(23):178–189
Poppe RW (2007) Evaluating example-based pose estimation: experiments on the Humaneva sets. Tech. Report TR-CTIT-07-72, University of Twente
Rosales, R, Sclaroff S (2002) Learning body pose via specialized maps. In: NIPS
Sapp B, Toshev A, Taskar B (2010) Cascaded models for articulated pose estimation. In: ECCV
Serre T, Wolf L, Poggion T (2005) Object recognition with features inspired by visual cortex. In: CVPR
Shakhnarovich G, Viola PA, Darrel T (2003) Fast pose estimation with parameter-sensitive hashing. In: ICCV
Sigal L, Black M (2006) Humaneva: synchronized video and motion capture dataset for evaluation of articulated human motion. Tech. Report CS-06-08, Brown University
Sminchisescu C, Kanaujia A, Li Z, Metaxas D (2005) Discriminative density propagation for 3D human motion estimation. In: CVPR
Sminchisescu C, Kanaujia A, Metaxas D (2006) Learning joint top-down and bottom-up processes for 3D visual inference. In: CVPR
Song M, Tao D, Liu Z, Li X, Zhou M (2010) Image ratio features for facial expression recognition application. TSMCB 40(3):779–788
Song M, Tao D, Li X (2010) Visual context boosting for eye detection. TSMCB 40(6):1460–1467
Stenger B, Thyananthan A, Torr PHS, Cipolla R (2006) Model-based hand tracking using a hierarchical Bayesian filter. PAMI 28(9):1372–1384
Sun L, Song ML, Bu JJ, Chen C (2012) Pose estimation with motionlet LLC coding. In: PCM
Urtasun R, Darrel T (2008) Local probabilistic regression for activity-indenpendent human pose inference. In: CVPR
Yang Y, Ramanan D (2011) Articulated pose estimation with flexible mixture-of-parts. In: CVPR
Yu K, Zhang T, Gong Y (2009) Nonlinear learning using local coordinate coding. In: NIPS
Zhao X, Ning H, Liu Y, Huang T (2008) Discriminative estimation of 3D human pose using Gaussian processes. In: CVPR
Zhao X, Fu Y, Liu Y (2009) Temporal-spatial local Gaussian processes experts for human pose estimation. In: ACCV
Zhao X, Fu Y, Liu Y (2011) Human motion tracking by temporal-spacial local Gaussian process experts. TIP 20(4):1141–1151
Acknowledgements
This work was supported in part by National Natural Science Foundation of China (61170142), National Key Technology R&D Program (2011BAG05B04), International Science & Technology Cooperation Program of China (2013DFG12840), and the Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sun, L., Song, M., Tao, D. et al. Motionlet LLC coding for discriminative human pose estimation. Multimed Tools Appl 73, 327–344 (2014). https://doi.org/10.1007/s11042-013-1617-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-013-1617-3