Pictorial Structures for Object Recognition

Felzenszwalb, Pedro F.; Huttenlocher, Daniel P.

doi:10.1023/B:VISI.0000042934.15159.49

Pictorial Structures for Object Recognition

Published: January 2005

Volume 61, pages 55–79, (2005)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Pedro F. Felzenszwalb¹ &
Daniel P. Huttenlocher²

5745 Accesses
1607 Citations
7 Altmetric
Explore all metrics

Abstract

In this paper we present a computationally efficient framework for part-based modeling and recognition of objects. Our work is motivated by the pictorial structure models introduced by Fischler and Elschlager. The basic idea is to represent an object by a collection of parts arranged in a deformable configuration. The appearance of each part is modeled separately, and the deformable configuration is represented by spring-like connections between pairs of parts. These models allow for qualitative descriptions of visual appearance, and are suitable for generic recognition problems. We address the problem of using pictorial structure models to find instances of an object in an image as well as the problem of learning an object model from training examples, presenting efficient algorithms in both cases. We demonstrate the techniques by learning models that represent faces and human bodies and using the resulting models to locate the corresponding objects in novel images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Amini, A.A., Weymouth, T.E., and Jain, R.C. 1990. Using dynamic programming for solving variational problems in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence,12(9):855-867.
Google Scholar
Amit, Y. and Geman, D. 1999. A computational model for visual selection. Neural Computation,11(7):1691-1715.
Google Scholar
Ayache, N.J. and Faugeras, O.D. 1986. Hyper: A new approach for the recognition and positioning of two-dimensional objects. IEEE Transactions on Pattern Analysis and Machine Intelligence,8(1):44-54.
Google Scholar
Berger, J.O. 1985. Statistical Decision Theory and Bayesian Analysis. Springer-Verlag.
Borgefors, G. 1986. Distance transformations in digital images. Computer Vision, Graphics, and Image Processing,34(3):344-371.
Google Scholar
Borgefors, G. 1988. Hierarchical chamfer matching: A parametric edge matching algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence,10(6):849-865.
Google Scholar
Boykov, Y., Veksler, O., and Zabih, R. 2001. Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence,23(11):1222-1239.
Google Scholar
Bregler, C. and Malik, J. 1998. Tracking people with twists and exponential maps. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 8-15.
Burl, M.C. and Perona, P. 1996. Recognition of planar object classes. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 223-230.
Burl, M.C., Weber, M., and Perona, P. 1998. Aprobabilistic approach to object recognition using local photometry and global geometry. In European Conference on Computer Vision, pp. II:628-641.
Google Scholar
Chow, C.K. and Liu, C.N. 1968. Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory,14(3):462-467.
Google Scholar
Cormen, T.H., Leiserson, C.E., and Rivest, R.L. 1996. Introduction to Algorithms. MIT Press and McGraw-Hill.
Dickinson, S.J., Biederman, I., Pentland, A.P., Eklundh, J.O., Bergevin, R., and Munck-Fairwood, R.C. 1993. The use of geons for generic 3-d object recognition. In International Joint Conference on Artificial Intelligence, pp. 1693-1699.
Felzenszwalb, P.F. and Huttenlocher, D.P. 2000. Efficient matching of pictorial structures. In IEEE Conference on Computer Vision and Pattern Recognition, pp. II:66-73.
Google Scholar
Fischler, M.A. and Bolles, R.C. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM,24(6):381-395.
Google Scholar
Fischler, M.A. and Elschlager, R.A. 1973. The representation and matching of pictorial structures. IEEE Transactions on Computer,22(1):67-92.
Google Scholar
Freeman, W.T. and Adelson, E.H. 1991. The design and use of steerable filters. IEEE Transactions on Pattern Analysis and Machine Intelligence,13(9):891-906.
Google Scholar
Gdalyahu, Y. and Weinshall, D. 1999. Flexible syntactic matching of curves and its application to automatic hierarchical classification of silhouettes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(12):1312-1328.
Google Scholar
Geman, S. and Geman, D. 1984. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence,6(6):721-741.
Google Scholar
Grimson, W.E.L. and Lozano-Perez, T. 1987. Localizing overlapping parts by searching the interpretation tree. IEEE Transactions on Pattern Analysis and Machine Intelligence,9(4):469-482.
Google Scholar
Gumbel, E.J., Greenwood, J.A., and Durand, D. 1953. The circular normal distribution: Theory and tables. Journal of the American Statistical Association,48:131-152.
Google Scholar
Huttenlocher, D.P., Klanderman, G.A., and Rucklidge, W.J. 1993. Comparing images using the hausdorff distance. IEEE Transactions on Pattern Analysis and Machine Intelligence,15(9):850-863.
Google Scholar
Huttenlocher, D.P. and Ullman, S. 1990. Recognizing solid objects by alignment with an image. International Journal of Computer Vision,5(2):195-212.
Google Scholar
Ioffe, S. and Forsyth, D.A. 2001. Probabilistic methods for finding people. International Journal of Computer Vision, 43(1):45-68.
Google Scholar
Ishikawa, H. and Geiger, D. 1998. Segmentation by grouping junctions. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 125-131.
Ju, S.X., Black, M.J., and Yacoob, Y. 1996. Cardboard people: A parameterized model of articulated motion. In International Conference on Automatic Face and Gesture Recognition, pp. 38-44.
Karzanov, A.V. 1992. Quick algorithm for determining the distances from the points of the given subset of an integer lattice to the points of its complement. Cybernetics and System Analysis, pp. 177-181. Translation from the Russian by Julia Komissarchik.
Lamdan, Y., Schwartz, J.T., and Wolfson, H.J. 1990. Affine invariant model-based object recognition. IEEE Transactions on Robotics and Automation,6(5):578-589.
Google Scholar
Moghaddam, B. and Pentland, A.P. 1997. Probabilistic visual learning for object representation. IEEE Transactions on Pattern Analysis and Machine Intelligence,19(7):696-710.
Google Scholar
Murase, H. and Nayar, S.K. 1995. Visual learning and recognition of 3-d objects from appearance. International Journal of Computer Vision,14(1):5-24.
Google Scholar
Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann.
Pentland, A.P. 1987. Recognition by parts. In IEEE International Conference on Computer Vision, pp. 612-620.
Rabiner, L. and Juang, B. 1993. Fundamentals of Speech Recognition. Prentice Hall.
Ramanan, D. and Forsyth, D.A. 2003. Finding and tracking people from the bottom up. In IEEE Conference on Computer Vision and Pattern Recognition, pp. II:467-474.
Rao, R.P.N. and Ballard, D.H. 1995. An active vision architecture based on iconic representations. Artificial Intelligence,78(1/2):461-505.
Google Scholar
Rivlin, E., Dickinson, S.J., and Rosenfeld, A. Recognition by functional parts. Computer Vision and Image Understanding,62(2):164-176, September 1995.
Google Scholar
Roberts, L.G. 1965. Machine perception of 3-d solids. In Optical and Electro-optical Information Processing, pp. 159-197.
Rucklidge, W. 1996. Efficient Visual Recognition Using the Hausdorff Distance. Springer-Verlag, LNCS 1173.
Sebastian, T.B., Klein, P.N., and Kimia, B.B. 2001. Recognition of shapes by editing shock graphs. In IEEE International Conference on Computer Vision, pp. I:755-762.
Google Scholar
Turk, M. and Pentland, A.P. 1991. Eigenfaces for recognition. Journal of Cognitive Neuroscience,3(1):71-96.
Google Scholar
Wells, W.M. III 1986. Efficient synthesis of Gaussian filters by cascaded uniform filters. IEEE Transactions on Pattern Analysis and Machine Intelligence,8(2):234-239.
Google Scholar

Download references

Author information

Authors and Affiliations

Artificial Intelligence Lab, Massachusetts Institute of Technology, USA
Pedro F. Felzenszwalb
Computer Science Department, Cornell University, USA
Daniel P. Huttenlocher

Authors

Pedro F. Felzenszwalb
View author publications
You can also search for this author in PubMed Google Scholar
Daniel P. Huttenlocher
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Felzenszwalb, P.F., Huttenlocher, D.P. Pictorial Structures for Object Recognition. International Journal of Computer Vision 61, 55–79 (2005). https://doi.org/10.1023/B:VISI.0000042934.15159.49

Download citation

Issue Date: January 2005
DOI: https://doi.org/10.1023/B:VISI.0000042934.15159.49

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pictorial Structures for Object Recognition

Abstract

Access this article

Similar content being viewed by others

Discriminative Hierarchical Part-Based Models for Human Parsing and Action Recognition

An Object Recognition Model Based on Visual Grammars and Bayesian Networks

Detecting People in Cubist Art

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Pictorial Structures for Object Recognition

Abstract

Access this article

Similar content being viewed by others

Discriminative Hierarchical Part-Based Models for Human Parsing and Action Recognition

An Object Recognition Model Based on Visual Grammars and Bayesian Networks

Detecting People in Cubist Art

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation