Skip to main content
Log in

A Study of Parts-Based Object Class Detection Using Complete Graphs

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Object detection is one of the key components in modern computer vision systems. While the detection of a specific rigid object under changing viewpoints was considered hard just a few years ago, current research strives to detect and recognize classes of non-rigid, articulated objects. Hampered by the omnipresent confusing information due to clutter and occlusion, the focus has shifted from holistic approaches for object detection to representations of individual object parts linked by structural information, along with richer contextual descriptions of object configurations. Along this line of research, we present a practicable and expandable probabilistic framework for parts-based object class representation, enabling the detection of rigid and articulated object classes in arbitrary views. We investigate learning of this representation from labelled training images and infer globally optimal solutions to the contextual MAP-detection problem, using A *-search with a novel lower-bound as admissible heuristic. An assessment of the inference performance of Belief-Propagation and Tree-Reweighted Belief Propagation is obtained as a by-product. The generality of our approach is demonstrated on four different datasets utilizing domain dependent information cues.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agarwal, A., & Triggs, B. (2006). Recovering 3D human pose from monocular images. IEEE PAMI, 28(1), 44–58.

    Google Scholar 

  • Balan, A., Black, M., Haussecker, H., & Sigal, L. (2007). Shining a light on human pose: On shadows, shading and the estimation of pose and shape. In ICCV.

  • Becker, F. (2004). Matrix-valued filters as convex programs. Master’s thesis, CVGPR group, University of Mannheim.

  • Belongie, S., Malik, J., & Puzicha, J. (2002). Shape matching and object recognition using shape contexts. IEEE PAMI, 24(4), 509–522.

    Google Scholar 

  • Bennett, K., & Parrado-Hernández, E. (2006). The interplay of optimization and machine learning research. IJCV, 7, 1265–1281.

    Google Scholar 

  • Bergtholdt, M., Kappes, J. H., & Schnörr, C. (2006a). Graphical knowledge representation for human detection. In Int. works. on the representation and use of prior knowledge in vision.

  • Bergtholdt, M., Kappes, J. H., & Schnörr, C. (2006b). Learning of graphical models and efficient inference for object class recognition. In Ann. symp. German assoc. for patt. recog.

  • Bray, M., Kohli, P., & Torr, P. (2006). Posecut: Simultaneous segmentation and 3D pose estimation of humans using dynamic graph-cuts. In ECCV (pp. 642–655).

  • Cheng, S. Y., & Trivedi, M. M. (2006). Articulated human body pose inference from voxel data using a kinematically constrained Gaussian mixture model. In CVPR EHuM2: 2nd workshop on evaluation of articulated human motion and pose estimation.

  • Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.

    Article  Google Scholar 

  • Coughlan, J. M., & Ferreira, S. J. (2002). Finding deformable shapes using loopy belief propagation. In ECCV (pp. 453–468). Berlin: Springer.

    Google Scholar 

  • Coughlan, J., & Shen, H. (2004). Shape matching with belief propagation: Using dynamic quantization to accommodate occlusion and clutter. In CVPR workshop (p. 180). Los Alamitos: IEEE Computer Society.

    Google Scholar 

  • Coughlan, J., & Yuille, A. (2002). Bayesian A * tree search with expected O(N) node expansions: applications to road tracking. Neural Computation, 14(8), 1929–1958.

    Article  MATH  Google Scholar 

  • Cowell, R., Dawid, A., Lauritzen, S., & Spiegelhalter, D. (2003). Probabilistic networks and expert systems. Berlin: Springer.

    Google Scholar 

  • Csurka, G., Dance, C., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In ECCV int. workshop on stat. learn. in comp. vis.

  • Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR (pp. 886–893).

  • DeGroot, M. H., & Fienberg, S. E. (1982). The comparison and evaluation of forecasters. Statistician, 32(1), 12–22.

    Article  Google Scholar 

  • Everingham, M., Zisserman, A., Williams, C. K. I., & Van Gool, L. (2006). The PASCAL visual object classes challenge 2006 (VOC2006) Results. http://www.pascal-network.org/challenges/VOC/voc2006/results.pdf.

  • Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2007). The PASCAL visual object classes challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.

  • Fawcett, T. (2004). ROC graphs: Notes and practical considerations for researchers.

  • Felzenszwalb, P., & Huttenlocher, D. (2005). Pictorial structures for object recognition. IJCV, 61(1), 55–79.

    Article  Google Scholar 

  • Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In CVPR (Vol. 2, pp. 264–271).

  • Fergus, R., Perona, P., & Zisserman, A. (2005). A sparse object category model for efficient learning and exhaustive recognition. In CVPR.

  • Fergus, R., Perona, P., & Zisserman, A. (2007). Weakly supervised scale-invariant learning of models for visual recognition. IJCV, 71(3), 273–303.

    Article  Google Scholar 

  • Fergus, R., Weber, M., & Perona, P. (2001). Efficient methods for object recognition using the constellation model. Tech. rep., California Institute of Technology.

  • Ferrari, V., Marin-Jimenez, M., & Zisserman, A. (2008). Progressive search space reduction for human pose estimation. In ICCV.

  • Frey, B., & Jojic, N. (2005). A comparison of algorithms for inference and learning in probabilistic graphical models. IEEE PAMI, 27(9), 1392–1416.

    Google Scholar 

  • Gavrila, D. (2007). A Bayesian, exemplar-based approach to hierarchical shape matching. IEEE PAMI, 29(8), 1408–1421.

    Google Scholar 

  • Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 36(1), 3–42.

    Article  Google Scholar 

  • Gupta, A., Mittal, A., & Davis, L. S. (2008). Constraint integration for efficient multiview pose estimation with self-occlusions. IEEE PAMI, 30(3), 493–506.

    Google Scholar 

  • Hart, P., Nilsson, N., & Raphael, B. (1968). A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics, 4, 100–107.

    Article  Google Scholar 

  • Hartley, R. I. (1992). Estimation of relative camera positions for uncalibrated cameras. In Lect. notes comp. sci. : Vol. 588. ECCV (pp. 589–587). Berlin: Springer.

    Google Scholar 

  • Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8), 1771–1800.

    Article  MATH  Google Scholar 

  • Howe, N. R. (2007). Recognition-based motion capture and the HumanEva II test data. In CVPR EHuM2: 2nd workshop on evaluation of articulated human motion and pose estimation.

  • Jiang, H., & Martin, D. R. (2008). Global pose estimation using non-tree models. In CVPR.

  • Kolmogorov, V. (2006). Convergent tree-reweighted message passing for energy minimization. IJCV, 28(10), 1568–1583.

    Google Scholar 

  • Kolmogorov, V., & Rother, C. (2006). Comparison of energy minimization algorithms for highly connected graphs. In ECCV.

  • Komodakis, N., & Tziritas, G. (2007). Approximate labeling via graph cuts based on linear programming. IEEE PAMI, 29(8), 2649–2661.

    MathSciNet  Google Scholar 

  • Kumar, S., & Hebert, M. (2006). Discriminative random fields. IJCV, 68(2), 179–201.

    Article  Google Scholar 

  • Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. int. conf. on mach. learn.

  • Lee, M. W., & Cohen, I. (2006). A model-based approach for estimating human 3D poses in static images. IEEE PAMI, 28(6), 905–916.

    Google Scholar 

  • Lepetit, V., & Fua, P. (2006). Keypoint recognition using randomized trees. IEEE PAMI, 28(9), 1465–1479.

    Google Scholar 

  • Levin, A., & Weiss, Y. (2006). Learning to combine bottom-up and top-down segmentation. In ECCV (pp. 581–594).

  • Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. IJCV, 60(2), 91–110.

    Article  Google Scholar 

  • Mikolajczyk, K., Schmid, C., & Zisserman, A. (2004). Human detection based on a probabilistic assembly of robust part detectors. In ECCV. Berlin: Springer.

    Google Scholar 

  • Mori, G., & Malik, J. (2006). Recovering 3D human body configurations using shape contexts. IEEE PAMI, 28(7), 1052–1062.

    Google Scholar 

  • Pearl, J. (1984). Heuristics: intelligent search strategies for computer problem solving. Reading: Addison-Wesley.

    Google Scholar 

  • Pham, T. V., & Smeulders, A. W. M. (2005). Object recognition with uncertain geometry and uncertain part detection. Computer Vision and Image Understanding, 99(2), 241–258.

    Article  Google Scholar 

  • Platt, J. (2000). Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In Advances in large margin classifiers (pp. 61–74). Cambridge: MIT Press.

    Google Scholar 

  • Ponce, J., Hebert, M., Schmid, C., & Zisserman, A. (eds.) (2006). Toward category-level object recognition. Lect. notes comp. sci., Vol. 4170. Berlin: Springer.

    Google Scholar 

  • Quattoni, A., Collins, M., & Darrell, T. (2004). Conditional random fields for object recognition. In NIPS.

  • Ramanan, D., Forsyth, D. A., & Zisserman, A. (2007). Tracking people by learning their appearance. IEEE PAMI, 29(1), 65–81.

    Google Scholar 

  • Rifkin, R., & Klautau, A. (2004). In defense of one-vs-all classification. JMLR, 5, 101–141.

    MathSciNet  Google Scholar 

  • Roberts, T., McKenna, S., & Ricketts, I. (2007). Human pose estimation using partial configurations and probabilistic regions. IJCV, 73(3), 285–306.

    Article  Google Scholar 

  • Rosenhahn, B., Brox, T., & Weickert, J. (2007). Three-dimensional shape knowledge for joint image segmentation and pose tracking. IJCV, 73(3), 243–262.

    Article  Google Scholar 

  • Russell, S. J., & Norvig, P. (2003). Artificial intelligence: a modern approach. Upper Saddle River: Pearson Education.

    Google Scholar 

  • Schmidt, S., Kappes, J. H., Bergtholdt, M., Pekar, V., Dries, S., Bystrov, D., & Schnörr, C. (2007). Spine detection and labeling using a parts-based graphical model. In N. Karssemeijer & B. Lelieveldt (Eds.), Lect. notes comp. sci. : Vol. 4584. Information processing in medical imaging (pp. 122–133). Berlin: Springer.

    Chapter  Google Scholar 

  • Seemann, E., Leibe, B., & Schiele, B. (2006). Multi-aspect detection of articulated objects. In CVPR.

  • Sigal, L., & Black, M. (2006a). Measure locally, reason globally: Occlusion-sensitive articulated pose estimation. In CVPR (Vol. 2).

  • Sigal, L., & Black, M. J. (2006b). Humaneva: Synchronized video and motion capture dataset for evaluation of articulated human motion. Tech. rep. CS-06-08, Brown University, Department of Computer Science, Providence, RI.

  • Sivic, J., Russell, B., Efros, A., Zisserman, A., & Freeman, W. (2005). Discovering objects and their locations in images. In ICCV. New York: IEEE.

    Google Scholar 

  • Sminchisescu, C., Kanaujia, A., & Metaxas, D. (2007). Bm3e: Discriminative density propagation for visual tracking. IEEE PAMI, 29(11), 2030–2044.

    Google Scholar 

  • Sudderth, E., Ihler, A., Freeman, W., & Willsky, A. (2003). Nonparametric belief propagation. In CVPR.

  • Sutton, C., McCallum, A., & Rohanimanesh, K. (2007). Dynamic conditional random fields: Factorized probabilistic models for labeling and segmenting sequence data. JMLR, 8, 693–723.

    Google Scholar 

  • Szeliski, R., Zabih, R., Scharstein, D., Veksler, O., Kolmogorov, V., Agarwala, A., Tappen, M., & Rother, C. (2006). A comparative study of energy minimization methods for Markov random fields. In ECCV.

  • Wainwright, M. (2006). Estimating the wrong Markov random field: Benefits in the computation-limited setting. In Y. Weiss, B. Schölkopf, & J. Platt (Eds.), Adv. in neur. inf. proc. sys. (pp. 1425–1432). Cambridge: MIT Press.

    Google Scholar 

  • Wainwright, M., Jaakola, T., & Willsky, A. (2005). Map estimation via agreement on trees: message-passing and linear programming. IEEE Transactions and Information Theory, 51(11), 3697–3717.

    Article  Google Scholar 

  • Weber, M., Welling, M., & Perona, P. (2000). Unsupervised learning of models for recognition. In ECCV (pp. 18–32).

  • Welk, M., Weickert, J., Becker, F., Schnörr, C., Feddern, C., & Burgeth, B. (2007). Median and related local filters for tensor-valued images. Signal Processing, 87(2), 291–308.

    Article  Google Scholar 

  • Werner, T. (2007). A linear programming approach to max-sum problem: A review. IEEE PAMI, 29(7), 1165–1179.

    Google Scholar 

  • Winkler, G. (2006). Image analysis, random fields and Markov chain Monte Carlo methods. Berlin: Springer.

    Google Scholar 

  • Yedida, J. S., Freeman, W. T., & Weiss, Y. (2005). Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Transactions and Information Theory, 51(7), 2282–2312.

    Article  Google Scholar 

  • Yuille, A., & Coughlan, J. (2000). An A * perspective on deterministic optimization for deformable templates. Pattern Recognition, 33(4), 603–616.

    Article  Google Scholar 

  • Zadrozny, B., & Elkan, C. (2002). Transforming classifier scores into accurate multiclass probability estimates. In KDD ’02: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 694–699). New York: ACM.

    Chapter  Google Scholar 

  • Zhang, J., Marszalek, M., Lazebnik, S., & Schmid, C. (2007a). Local features and kernels for classification of texture and object categories: A comprehensive study. IJCV, 73(2), 213–238.

    Article  Google Scholar 

  • Zhang, L., Nevatia, R., & Wu, B. (2007b). Detection and tracking of multiple humans with extensive pose articulation. In ICCV.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Bergtholdt.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bergtholdt, M., Kappes, J., Schmidt, S. et al. A Study of Parts-Based Object Class Detection Using Complete Graphs. Int J Comput Vis 87, 93–117 (2010). https://doi.org/10.1007/s11263-009-0209-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-009-0209-1

Keywords

Navigation