Abstract
Recovering the pose of a person from single images is a challenging problem. This paper discusses a bottom-up approach that uses local image features to estimate human upper body pose from single images in cluttered backgrounds. The method takes the image window with a dense grid of local gradient orientation histograms, followed by non negative matrix factorization to learn a set of bases that correspond to local features on the human body, enabling selective encoding of human-like features in the presence of background clutter. Pose is then recovered by direct regression. This approach allows us to key on gradient patterns such as shoulder contours and bent elbows that are characteristic of humans and carry important pose information, unlike current regressive methods that either use weak limb detectors or require prior segmentation to work. The system is trained on a database of images with labelled poses. We show that it estimates pose with similar performance levels to current example-based methods, but unlike them it works in the presence of natural backgrounds, without any prior segmentation.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agarwal, A., Triggs, B.: 3D Human Pose from Silhouettes by Relevance Vector Regression. In: Int. Conf. Computer Vision & Pattern Recognition (2004)
Agarwal, A., Triggs, B.: Monocular Human Motion Capture with a Mixture of Regressors. In: IEEE Workshop on Vision for Human-Computer Interaction (2005)
Agarwal, S., Awan, A., Roth, D.: Learning to detect objects in images via a sparse, part-based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(11), 1475–1490 (2004)
Dalal, N., Triggs, B.: Histograms of Oriented Gradients for Human Detection. In: Int. Conf. Computer Vision (2005)
Lowe, D.: Distinctive Image Features from Scale-invariant Keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)
Felzenszwalb, P., Huttenlocher, D.: Pictorial Structures for Object Recognition. International Journal of Computer Vision 61(1) (2005)
Fergus, R., Perona, P., Zisserman, A.: Object Class Recognition by Unsupervised Scale-Invariant Learning. In: Int. Conf. Computer Vision & Pattern Recognition (2003)
Hoyer, P.: Non-negative Matrix Factorization with Sparseness Constraints. J. Machine Learning Research 5, 1457–1469 (2004)
Mikolajczyk, K., Schmid, C., Zisserman, A.: Human Detection based on a Probabilistic Assembly of Robust Part Detectors. In: European Conference on Computer Vision, vol. I, pp. 69–81 (2004)
Lee, D.D., Seung, H.S.: Learning the Parts of Objects by Non–negative Matrix Factorization. Nature 401, 788–791 (1999)
Lee, M., Cohen, I.: Human Upper Body Pose Estimation in Static Images. In: European Conference on Computer Vision (2004)
Malik, J., Belongie, S., Leung, T., Shi, J.: Contour and texture analysis for image segmentation. International Journal of Computer Vision 43(1), 7–27 (2001)
Mori, G., Ren, X., Efros, A., Malik, J.: Recovering Human Body Configurations: Combining Segmentation and Recognition. In: Int. Conf. Computer Vision & Pattern Recognition (2004)
Olshausen, B., Field, D.: Natural image statistics and efficient coding. Network: Computation in Neural Systems 7(2), 333–339 (1996)
Ramanan, D., Forsyth, D.: Finding and Tracking People from the Bottom Up. In: Int. Conf. Computer Vision & Pattern Recognition (2003)
Ronfard, R., Schmid, C., Triggs, B.: Learning to Parse Pictures of People. In: European Conference on Computer Vision, Copenhagen, pp. IV 700–714 (2002)
Kumar, S., Hebert, M.: Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification. In: Int. Conf. Computer Vision (2003)
Sali, E., Ullman, S.: Combining Class-specific Fragments for Object Classification. In: British Machine Vision Conference (1999)
Shakhnarovich, G., Viola, P., Darrell, T.: Fast Pose Estimation with Parameter Sensitive Hashing. In: Int. Conf. Computer Vision (2003)
Sigal, L., Isard, M., Sigelman, B., Black, M.: Assembling Loose-limbed Models using Non-parametric Belief Propagation. In: NIPS (2003)
Sminchisescu, C., Triggs, B.: Estimating articulated human motion with covariance scaled sampling. International Journal of Robotics Research (Special issue on Visual Analysis of Human Movement) 22(6), 371–391 (2003)
Sullivan, J., Blake, A., Isaard, M., MacCormick, J.: Object Localization by Bayesian Correlation. In: Int. Conf. Computer Vision (1999)
van Haateran, J., vander Schaaf, A.: Independent component filters of natural images compared with simlpe cells in preimary visual cortex. Proc. R. Soc. Lond., B 265, 359–366 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Agarwal, A., Triggs, B. (2006). A Local Basis Representation for Estimating Human Pose from Cluttered Images. In: Narayanan, P.J., Nayar, S.K., Shum, HY. (eds) Computer Vision – ACCV 2006. ACCV 2006. Lecture Notes in Computer Science, vol 3851. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11612032_6
Download citation
DOI: https://doi.org/10.1007/11612032_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31219-2
Online ISBN: 978-3-540-32433-1
eBook Packages: Computer ScienceComputer Science (R0)