Discriminative Appearance Models for Pictorial Structures

Andriluka, Mykhaylo; Roth, Stefan; Schiele, Bernt

doi:10.1007/s11263-011-0498-z

Discriminative Appearance Models for Pictorial Structures

Published: 28 October 2011

Volume 99, pages 259–280, (2012)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Mykhaylo Andriluka¹,
Stefan Roth² &
Bernt Schiele¹

1063 Accesses
49 Citations
1 Altmetric
4 Mentions
Explore all metrics

Abstract

In this paper we consider people detection and articulated pose estimation, two closely related and challenging problems in computer vision. Conceptually, both of these problems can be addressed within the pictorial structures framework (Felzenszwalb and Huttenlocher in Int. J. Comput. Vis. 61(1):55–79, 2005; Fischler and Elschlager in IEEE Trans. Comput. C-22(1):67–92, 1973), even though previous approaches have not shown such generality. A principal difficulty for such a general approach is to model the appearance of body parts. The model has to be discriminative enough to enable reliable detection in cluttered scenes and general enough to capture highly variable appearance. Therefore, as the first important component of our approach, we propose a discriminative appearance model based on densely sampled local descriptors and AdaBoost classifiers. Secondly, we interpret the normalized margin of each classifier as likelihood in a generative model and compute marginal posteriors for each part using belief propagation. Thirdly, non-Gaussian relationships between parts are represented as Gaussians in the coordinate system of the joint between the parts. Additionally, in order to cope with shortcomings of tree-based pictorial structures models, we augment our model with additional repulsive factors in order to discourage overcounting of image evidence. We demonstrate that the combination of these components within the pictorial structures framework results in a generic model that yields state-of-the-art performance for several datasets on a variety of tasks: people detection, upper body pose estimation, and full body pose estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Andriluka, M., Roth, S., & Schiele, B. (2008). People-tracking-by-detection and people-detection-by-tracking. In IEEE conf. on comp. vis. and pat. recog. (CVPR).
Google Scholar
Andriluka, M., Roth, S., & Schiele, B. (2009). Pictorial structures revisited: people detection and articulated pose estimation. In IEEE conf. on comp. vis. and pat. recog. (CVPR).
Google Scholar
Andriluka, M., Roth, S., & Schiele, B. (2010). Monocular 3D pose estimation and tracking by detection. In IEEE conf. on comp. vis. and pat. recog. (CVPR).
Google Scholar
Belongie, S., Malik, J., & Puzicha, J. (2001). Shape context: a new descriptor for shape matching and object recognition. In Adv. in neur. inf. proc. sys. (NIPS).
Google Scholar
Bergtholdt, M., Kappes, J., Schmidt, S., & Schnörr, C. (2009). A study of parts-based object class detection using complete graphs. International Journal of Computer Vision, 87(1–2), 93–117.
Google Scholar
Bourdev, L., & Malik, J. (2009). Poselets: body part detectors trained using 3D human pose annotations. In IEEE int. conf. on comp. vis. (ICCV).
Google Scholar
Buehler, P., Everingham, M., Huttenlocher, D. P., & Zisserman, A. (2008). Long term arm and hand tracking for continuous sign language TV broadcasts. In Brit. mach. vis. conf. (BMVC).
Google Scholar
Crandall, D., Felzenszwalb, P. F., & Huttenlocher, D. P. (2005). Spatial priors for part-based recognition using statistical models. In IEEE conf. on comp. vis. and pat. recog. (CVPR).
Google Scholar
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE conf. on comp. vis. and pat. recog. (CVPR).
Google Scholar
Eichner, M., & Ferrari, V. (2009). Better appearance models for pictorial structures. In Brit. mach. vis. conf. (BMVC).
Google Scholar
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2008). The PASCAL visual object classes challenge 2008 (VOC2008) results. http://www.pascal-network.org/challenges/VOC/voc2008/workshop/index.html.
Google Scholar
Felzenszwalb, P. F., Girshick, R., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9(32), 1627–1645.
Article Google Scholar
Felzenszwalb, P. F., & Huttenlocher, D. P. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1), 55–79.
Article Google Scholar
Felzenszwalb, P. F., McAllester, D., & Ramanan, D. (2008). A discriminatively trained, multiscale, deformable part model. In IEEE conf. on comp. vis. and pat. recog. (CVPR).
Google Scholar
Ferrari, V., Marin, M., & Zisserman, A. (2008). Progressive search space reduction for human pose estimation. In IEEE conf. on comp. vis. and pat. recog. (CVPR).
Google Scholar
Ferrari, V., Marin, M., & Zisserman, A. (2009a). 2D human pose estimation in TV shows. In D. Cremers, B. Rosenhahn, A. L. Yuille, & F. R. Schmidt (Eds.), Lect. notes in comp. sci.: Vol. 5604. Statistical and geometrical approaches to visual motion analysis (pp. 128–147). Berlin: Springer.
Chapter Google Scholar
Ferrari, V., Marin, M., & Zisserman, A. (2009b). Pose search: Retrieving people using their pose. In IEEE conf. on comp. vis. and pat. recog. (CVPR).
Google Scholar
Fischler, M. A., & Elschlager, R. A. (1973). The representation and matching of pictorial structures. IEEE Transactions on Computers, C-22(1), 67–92.
Article Google Scholar
Freund, Y., & Schapire, R. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139.
Article MathSciNet MATH Google Scholar
Gall, J., & Lempitsky, V. (2009). Class-specific hough forests for object detection. In IEEE conf. on comp. vis. and pat. recog. (CVPR).
Google Scholar
Guan, P., Weiss, A., Balan, A., & Black, M. J. (2009). Estimating human shape and pose from a single image. In IEEE int. conf. on comp. vis. (ICCV).
Google Scholar
Ionescu, C., Bo, L., & Sminchisescu, C. (2009). Structural SVM for visual localization and continuous state estimation. In IEEE int. conf. on comp. vis. (ICCV).
Google Scholar
Jiang, H. (2009). Human pose estimation using consistent max-covering. In IEEE int. conf. on comp. vis. (ICCV).
Google Scholar
Jiang, H., & Martin, D. R. (2008). Global pose estimation using non-tree models. In IEEE conf. on comp. vis. and pat. recog. (CVPR).
Google Scholar
Jie, L., Caputo, B., & Ferrari, V. (2009). Who’s doing what: joint modeling of names and verbs for simultaneous face and pose annotation. In Adv. in neur. inf. proc. sys. (NIPS).
Google Scholar
Johnson, S., & Everingham, M. (2009). Combining discriminative appearance and segmentation cues for articulated human pose estimation. In 2nd IEEE international workshop on machine learning for vision-based motion analysis.
Google Scholar
Kschischang, F. R., Frey, B. J., & Loelinger, H.-A. (2001). Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 47(2), 498–519.
Article MATH Google Scholar
Kumar, P., Zisserman, A., & Torr, P. H. S. (2009). Efficient discriminative learning of parts-based models. In IEEE int. conf. on comp. vis. (ICCV).
Google Scholar
Lan, X., & Huttenlocher, D. P. (2005). Beyond trees: common-factor models for 2D human pose recovery. In IEEE int. conf. on comp. vis. (ICCV).
Google Scholar
Lee, H.-J., & Chen, Z. (1985). Determination of 3D human body postures from a single view. Computer Vision, Graphics, and Image Processing, 30, 148–168.
Article MathSciNet Google Scholar
Lee, M. W., & Cohen, I. (2004). Proposal maps driven MCMC for estimating human body pose in static images. In IEEE conf. on comp. vis. and pat. recog. (CVPR).
Google Scholar
Leibe, B., Seemann, E., & Schiele, B. (2005). Pedestrian detection in crowded scenes. In IEEE conf. on comp. vis. and pat. recog. (CVPR).
Google Scholar
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Article Google Scholar
Mikolajczyk, K., Leibe, B., & Schiele, B. (2006). Multiple object class detection with a generative model. In IEEE conf. on comp. vis. and pat. recog. (CVPR).
Google Scholar
Mikolajczyk, K., & Schmid, C. (2005). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1615–1630.
Article Google Scholar
Mooij, J. M. (2009). libDAI 0.2.2: a free/open source C++ library for discrete approximate inference. http://www.libdai.org/.
Pearl, Judea (1988). Probabilistic reasoning in intelligent systems: networks of plausible inference (2nd ed.) San Francisco: Morgan Kaufmann.
Google Scholar
Ramanan, D. (2007). Learning to parse images of articulated objects. In Adv. in neur. inf. proc. sys. (NIPS).
Google Scholar
Ramanan, D., & Sminchisescu, C. (2006). Training deformable models for localization. In IEEE conf. on comp. vis. and pat. recog. (CVPR).
Google Scholar
Ren, X., Berg, A. C., & Malik, J. (2005). Recovering human body configurations using pairwise constraints between parts. In IEEE int. conf. on comp. vis. (ICCV).
Google Scholar
Ronfard, R., Schmid, C., & Triggs, B. (2002). Learning to parse pictures of people. In Eur. conf. on comp. vis. (ECCV).
Google Scholar
Roth, S., & Black, M. J. (2009). Fields of experts. International Journal of Computer Vision, 82(2), 205–229.
Article Google Scholar
Rother, C., Kolmogorov, V., & Blake, A. (2004). “Grabcut”: interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics, 23, 309–314.
Article Google Scholar
Sapp, B., Jordan, C., & Taskar, B. (2010). Adaptive pose priors for pictorial structures. In IEEE conf. on comp. vis. and pat. recog. (CVPR).
Google Scholar
Sigal, L., & Black, M. J. (2006). Measure locally, reason globally: occlusion-sensitive articulated pose estimation. In IEEE conf. on comp. vis. and pat. recog. (CVPR).
Google Scholar
Sigal, L., & Black, M. J. (2006). Predicting 3D people from 2D pictures. In AMDO.
Google Scholar
Sudderth, E. B., Mandel, M. I., Freeman, W. T., & Willsky, A. S. (2005). Distributed occlusion reasoning for tracking with nonparametric belief propagation. In Adv. in neur. inf. proc. sys. (NIPS).
Google Scholar
Taylor, C. J. (2000). Reconstruction of articulated objects from point correspondences in a single uncalibrated image. Computer Vision and Image Understanding, 80, 349–363.
Article MATH Google Scholar
Tran, D., & Forsyth, D. (2008). Configuration estimates improve pedestrian finding. In Adv. in neur. inf. proc. sys. (NIPS).
Google Scholar
Tu, Z., Chen, X., Yuille, A. L., & Zhu, S.-C. (2005). Image parsing: unifying segmentation, detection, and recognition. International Journal of Computer Vision, 63(2), 113–140.
Article Google Scholar
Urtasun, R., Fleet, D. J., & Fua, P. (2006). 3D people tracking with Gaussian process dynamical models. In IEEE conf. on comp. vis. and pat. recog. (CVPR).
Google Scholar
Viola, P., Jones, M., & Snow, D. (2003). Detecting pedestrians using patterns of motion and appearance. In IEEE int. conf. on comp. vis. (ICCV).
Google Scholar
Wang, Y., & Mori, G. (2008). Multiple tree models for occlusion and spatial constraints in human pose estimation. In Eur. conf. on comp. vis. (ECCV).
Google Scholar
Yao, B., & Fei-Fei, L. (2010). Modeling mutual context of object and human pose in human-object interaction activities. In Eur. conf. on comp. vis. (ECCV).
Google Scholar
Zhang, J., Luo, J., Collins, R., & Liu, Y. (2006). Body localization in still images using hierarchical models and hybrid search. In IEEE conf. on comp. vis. and pat. recog. (CVPR).
Google Scholar
Zhang, X., Li, C., Tong, X., Hu, W., Maybank, S., & Zhang, Y. (2009). Efficient human pose estimation via parsing a tree structure based human model. In IEEE int. conf. on comp. vis. (ICCV).
Google Scholar

Download references

Author information

Authors and Affiliations

MPI Informatics, Stuhlsatzenhausweg 85, 66123, Saarbrücken, Germany
Mykhaylo Andriluka & Bernt Schiele
Department of Computer Science, TU Darmstadt, Fraunhoferstr. 5, 64283, Darmstadt, Germany
Stefan Roth

Authors

Mykhaylo Andriluka
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Roth
View author publications
You can also search for this author in PubMed Google Scholar
Bernt Schiele
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mykhaylo Andriluka.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Andriluka, M., Roth, S. & Schiele, B. Discriminative Appearance Models for Pictorial Structures. Int J Comput Vis 99, 259–280 (2012). https://doi.org/10.1007/s11263-011-0498-z

Download citation

Received: 31 July 2010
Accepted: 16 September 2011
Published: 28 October 2011
Issue Date: September 2012
DOI: https://doi.org/10.1007/s11263-011-0498-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discriminative Appearance Models for Pictorial Structures

Abstract

Access this article

Similar content being viewed by others

End-to-End Object Detection with Transformers

Microsoft COCO: Common Objects in Context

Attention mechanisms in computer vision: A survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Discriminative Appearance Models for Pictorial Structures

Abstract

Access this article

Similar content being viewed by others

End-to-End Object Detection with Transformers

Microsoft COCO: Common Objects in Context

Attention mechanisms in computer vision: A survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation