A Study of Parts-Based Object Class Detection Using Complete Graphs

Bergtholdt, Martin; Kappes, Jörg; Schmidt, Stefan; Schnörr, Christoph

doi:10.1007/s11263-009-0209-1

A Study of Parts-Based Object Class Detection Using Complete Graphs

Published: 28 January 2009

Volume 87, pages 93–117, (2010)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Martin Bergtholdt¹,
Jörg Kappes¹,
Stefan Schmidt¹ &
…
Christoph Schnörr¹

779 Accesses
91 Citations
Explore all metrics

Abstract

Object detection is one of the key components in modern computer vision systems. While the detection of a specific rigid object under changing viewpoints was considered hard just a few years ago, current research strives to detect and recognize classes of non-rigid, articulated objects. Hampered by the omnipresent confusing information due to clutter and occlusion, the focus has shifted from holistic approaches for object detection to representations of individual object parts linked by structural information, along with richer contextual descriptions of object configurations. Along this line of research, we present a practicable and expandable probabilistic framework for parts-based object class representation, enabling the detection of rigid and articulated object classes in arbitrary views. We investigate learning of this representation from labelled training images and infer globally optimal solutions to the contextual MAP-detection problem, using A ^*-search with a novel lower-bound as admissible heuristic. An assessment of the inference performance of Belief-Propagation and Tree-Reweighted Belief Propagation is obtained as a by-product. The generality of our approach is demonstrated on four different datasets utilizing domain dependent information cues.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agarwal, A., & Triggs, B. (2006). Recovering 3D human pose from monocular images. IEEE PAMI, 28(1), 44–58.
Google Scholar
Balan, A., Black, M., Haussecker, H., & Sigal, L. (2007). Shining a light on human pose: On shadows, shading and the estimation of pose and shape. In ICCV.
Becker, F. (2004). Matrix-valued filters as convex programs. Master’s thesis, CVGPR group, University of Mannheim.
Belongie, S., Malik, J., & Puzicha, J. (2002). Shape matching and object recognition using shape contexts. IEEE PAMI, 24(4), 509–522.
Google Scholar
Bennett, K., & Parrado-Hernández, E. (2006). The interplay of optimization and machine learning research. IJCV, 7, 1265–1281.
Google Scholar
Bergtholdt, M., Kappes, J. H., & Schnörr, C. (2006a). Graphical knowledge representation for human detection. In Int. works. on the representation and use of prior knowledge in vision.
Bergtholdt, M., Kappes, J. H., & Schnörr, C. (2006b). Learning of graphical models and efficient inference for object class recognition. In Ann. symp. German assoc. for patt. recog.
Bray, M., Kohli, P., & Torr, P. (2006). Posecut: Simultaneous segmentation and 3D pose estimation of humans using dynamic graph-cuts. In ECCV (pp. 642–655).
Cheng, S. Y., & Trivedi, M. M. (2006). Articulated human body pose inference from voxel data using a kinematically constrained Gaussian mixture model. In CVPR EHuM2: 2nd workshop on evaluation of articulated human motion and pose estimation.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.
Article Google Scholar
Coughlan, J. M., & Ferreira, S. J. (2002). Finding deformable shapes using loopy belief propagation. In ECCV (pp. 453–468). Berlin: Springer.
Google Scholar
Coughlan, J., & Shen, H. (2004). Shape matching with belief propagation: Using dynamic quantization to accommodate occlusion and clutter. In CVPR workshop (p. 180). Los Alamitos: IEEE Computer Society.
Google Scholar
Coughlan, J., & Yuille, A. (2002). Bayesian A ^* tree search with expected O(N) node expansions: applications to road tracking. Neural Computation, 14(8), 1929–1958.
Article MATH Google Scholar
Cowell, R., Dawid, A., Lauritzen, S., & Spiegelhalter, D. (2003). Probabilistic networks and expert systems. Berlin: Springer.
Google Scholar
Csurka, G., Dance, C., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In ECCV int. workshop on stat. learn. in comp. vis.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR (pp. 886–893).
DeGroot, M. H., & Fienberg, S. E. (1982). The comparison and evaluation of forecasters. Statistician, 32(1), 12–22.
Article Google Scholar
Everingham, M., Zisserman, A., Williams, C. K. I., & Van Gool, L. (2006). The PASCAL visual object classes challenge 2006 (VOC2006) Results. http://www.pascal-network.org/challenges/VOC/voc2006/results.pdf.
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2007). The PASCAL visual object classes challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.
Fawcett, T. (2004). ROC graphs: Notes and practical considerations for researchers.
Felzenszwalb, P., & Huttenlocher, D. (2005). Pictorial structures for object recognition. IJCV, 61(1), 55–79.
Article Google Scholar
Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In CVPR (Vol. 2, pp. 264–271).
Fergus, R., Perona, P., & Zisserman, A. (2005). A sparse object category model for efficient learning and exhaustive recognition. In CVPR.
Fergus, R., Perona, P., & Zisserman, A. (2007). Weakly supervised scale-invariant learning of models for visual recognition. IJCV, 71(3), 273–303.
Article Google Scholar
Fergus, R., Weber, M., & Perona, P. (2001). Efficient methods for object recognition using the constellation model. Tech. rep., California Institute of Technology.
Ferrari, V., Marin-Jimenez, M., & Zisserman, A. (2008). Progressive search space reduction for human pose estimation. In ICCV.
Frey, B., & Jojic, N. (2005). A comparison of algorithms for inference and learning in probabilistic graphical models. IEEE PAMI, 27(9), 1392–1416.
Google Scholar
Gavrila, D. (2007). A Bayesian, exemplar-based approach to hierarchical shape matching. IEEE PAMI, 29(8), 1408–1421.
Google Scholar
Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 36(1), 3–42.
Article Google Scholar
Gupta, A., Mittal, A., & Davis, L. S. (2008). Constraint integration for efficient multiview pose estimation with self-occlusions. IEEE PAMI, 30(3), 493–506.
Google Scholar
Hart, P., Nilsson, N., & Raphael, B. (1968). A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics, 4, 100–107.
Article Google Scholar
Hartley, R. I. (1992). Estimation of relative camera positions for uncalibrated cameras. In Lect. notes comp. sci. : Vol. 588. ECCV (pp. 589–587). Berlin: Springer.
Google Scholar
Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8), 1771–1800.
Article MATH Google Scholar
Howe, N. R. (2007). Recognition-based motion capture and the HumanEva II test data. In CVPR EHuM2: 2nd workshop on evaluation of articulated human motion and pose estimation.
Jiang, H., & Martin, D. R. (2008). Global pose estimation using non-tree models. In CVPR.
Kolmogorov, V. (2006). Convergent tree-reweighted message passing for energy minimization. IJCV, 28(10), 1568–1583.
Google Scholar
Kolmogorov, V., & Rother, C. (2006). Comparison of energy minimization algorithms for highly connected graphs. In ECCV.
Komodakis, N., & Tziritas, G. (2007). Approximate labeling via graph cuts based on linear programming. IEEE PAMI, 29(8), 2649–2661.
MathSciNet Google Scholar
Kumar, S., & Hebert, M. (2006). Discriminative random fields. IJCV, 68(2), 179–201.
Article Google Scholar
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. int. conf. on mach. learn.
Lee, M. W., & Cohen, I. (2006). A model-based approach for estimating human 3D poses in static images. IEEE PAMI, 28(6), 905–916.
Google Scholar
Lepetit, V., & Fua, P. (2006). Keypoint recognition using randomized trees. IEEE PAMI, 28(9), 1465–1479.
Google Scholar
Levin, A., & Weiss, Y. (2006). Learning to combine bottom-up and top-down segmentation. In ECCV (pp. 581–594).
Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. IJCV, 60(2), 91–110.
Article Google Scholar
Mikolajczyk, K., Schmid, C., & Zisserman, A. (2004). Human detection based on a probabilistic assembly of robust part detectors. In ECCV. Berlin: Springer.
Google Scholar
Mori, G., & Malik, J. (2006). Recovering 3D human body configurations using shape contexts. IEEE PAMI, 28(7), 1052–1062.
Google Scholar
Pearl, J. (1984). Heuristics: intelligent search strategies for computer problem solving. Reading: Addison-Wesley.
Google Scholar
Pham, T. V., & Smeulders, A. W. M. (2005). Object recognition with uncertain geometry and uncertain part detection. Computer Vision and Image Understanding, 99(2), 241–258.
Article Google Scholar
Platt, J. (2000). Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In Advances in large margin classifiers (pp. 61–74). Cambridge: MIT Press.
Google Scholar
Ponce, J., Hebert, M., Schmid, C., & Zisserman, A. (eds.) (2006). Toward category-level object recognition. Lect. notes comp. sci., Vol. 4170. Berlin: Springer.
Google Scholar
Quattoni, A., Collins, M., & Darrell, T. (2004). Conditional random fields for object recognition. In NIPS.
Ramanan, D., Forsyth, D. A., & Zisserman, A. (2007). Tracking people by learning their appearance. IEEE PAMI, 29(1), 65–81.
Google Scholar
Rifkin, R., & Klautau, A. (2004). In defense of one-vs-all classification. JMLR, 5, 101–141.
MathSciNet Google Scholar
Roberts, T., McKenna, S., & Ricketts, I. (2007). Human pose estimation using partial configurations and probabilistic regions. IJCV, 73(3), 285–306.
Article Google Scholar
Rosenhahn, B., Brox, T., & Weickert, J. (2007). Three-dimensional shape knowledge for joint image segmentation and pose tracking. IJCV, 73(3), 243–262.
Article Google Scholar
Russell, S. J., & Norvig, P. (2003). Artificial intelligence: a modern approach. Upper Saddle River: Pearson Education.
Google Scholar
Schmidt, S., Kappes, J. H., Bergtholdt, M., Pekar, V., Dries, S., Bystrov, D., & Schnörr, C. (2007). Spine detection and labeling using a parts-based graphical model. In N. Karssemeijer & B. Lelieveldt (Eds.), Lect. notes comp. sci. : Vol. 4584. Information processing in medical imaging (pp. 122–133). Berlin: Springer.
Chapter Google Scholar
Seemann, E., Leibe, B., & Schiele, B. (2006). Multi-aspect detection of articulated objects. In CVPR.
Sigal, L., & Black, M. (2006a). Measure locally, reason globally: Occlusion-sensitive articulated pose estimation. In CVPR (Vol. 2).
Sigal, L., & Black, M. J. (2006b). Humaneva: Synchronized video and motion capture dataset for evaluation of articulated human motion. Tech. rep. CS-06-08, Brown University, Department of Computer Science, Providence, RI.
Sivic, J., Russell, B., Efros, A., Zisserman, A., & Freeman, W. (2005). Discovering objects and their locations in images. In ICCV. New York: IEEE.
Google Scholar
Sminchisescu, C., Kanaujia, A., & Metaxas, D. (2007). Bm³e: Discriminative density propagation for visual tracking. IEEE PAMI, 29(11), 2030–2044.
Google Scholar
Sudderth, E., Ihler, A., Freeman, W., & Willsky, A. (2003). Nonparametric belief propagation. In CVPR.
Sutton, C., McCallum, A., & Rohanimanesh, K. (2007). Dynamic conditional random fields: Factorized probabilistic models for labeling and segmenting sequence data. JMLR, 8, 693–723.
Google Scholar
Szeliski, R., Zabih, R., Scharstein, D., Veksler, O., Kolmogorov, V., Agarwala, A., Tappen, M., & Rother, C. (2006). A comparative study of energy minimization methods for Markov random fields. In ECCV.
Wainwright, M. (2006). Estimating the wrong Markov random field: Benefits in the computation-limited setting. In Y. Weiss, B. Schölkopf, & J. Platt (Eds.), Adv. in neur. inf. proc. sys. (pp. 1425–1432). Cambridge: MIT Press.
Google Scholar
Wainwright, M., Jaakola, T., & Willsky, A. (2005). Map estimation via agreement on trees: message-passing and linear programming. IEEE Transactions and Information Theory, 51(11), 3697–3717.
Article Google Scholar
Weber, M., Welling, M., & Perona, P. (2000). Unsupervised learning of models for recognition. In ECCV (pp. 18–32).
Welk, M., Weickert, J., Becker, F., Schnörr, C., Feddern, C., & Burgeth, B. (2007). Median and related local filters for tensor-valued images. Signal Processing, 87(2), 291–308.
Article Google Scholar
Werner, T. (2007). A linear programming approach to max-sum problem: A review. IEEE PAMI, 29(7), 1165–1179.
Google Scholar
Winkler, G. (2006). Image analysis, random fields and Markov chain Monte Carlo methods. Berlin: Springer.
Google Scholar
Yedida, J. S., Freeman, W. T., & Weiss, Y. (2005). Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Transactions and Information Theory, 51(7), 2282–2312.
Article Google Scholar
Yuille, A., & Coughlan, J. (2000). An A ^* perspective on deterministic optimization for deformable templates. Pattern Recognition, 33(4), 603–616.
Article Google Scholar
Zadrozny, B., & Elkan, C. (2002). Transforming classifier scores into accurate multiclass probability estimates. In KDD ’02: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 694–699). New York: ACM.
Chapter Google Scholar
Zhang, J., Marszalek, M., Lazebnik, S., & Schmid, C. (2007a). Local features and kernels for classification of texture and object categories: A comprehensive study. IJCV, 73(2), 213–238.
Article Google Scholar
Zhang, L., Nevatia, R., & Wu, B. (2007b). Detection and tracking of multiple humans with extensive pose articulation. In ICCV.

Download references

Author information

Authors and Affiliations

Dept. Mathematics and Computer Science, University of Heidelberg, Speyerer Strasse 4-6, 69115, Heidelberg, Germany
Martin Bergtholdt, Jörg Kappes, Stefan Schmidt & Christoph Schnörr

Authors

Martin Bergtholdt
View author publications
You can also search for this author in PubMed Google Scholar
Jörg Kappes
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Schmidt
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Schnörr
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Bergtholdt.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bergtholdt, M., Kappes, J., Schmidt, S. et al. A Study of Parts-Based Object Class Detection Using Complete Graphs. Int J Comput Vis 87, 93–117 (2010). https://doi.org/10.1007/s11263-009-0209-1

Download citation

Received: 08 April 2008
Accepted: 06 January 2009
Published: 28 January 2009
Issue Date: March 2010
DOI: https://doi.org/10.1007/s11263-009-0209-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Study of Parts-Based Object Class Detection Using Complete Graphs

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

ImageNet Large Scale Visual Recognition Challenge

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Study of Parts-Based Object Class Detection Using Complete Graphs

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

ImageNet Large Scale Visual Recognition Challenge

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation