Abstract
Object detection is one of the key components in modern computer vision systems. While the detection of a specific rigid object under changing viewpoints was considered hard just a few years ago, current research strives to detect and recognize classes of non-rigid, articulated objects. Hampered by the omnipresent confusing information due to clutter and occlusion, the focus has shifted from holistic approaches for object detection to representations of individual object parts linked by structural information, along with richer contextual descriptions of object configurations. Along this line of research, we present a practicable and expandable probabilistic framework for parts-based object class representation, enabling the detection of rigid and articulated object classes in arbitrary views. We investigate learning of this representation from labelled training images and infer globally optimal solutions to the contextual MAP-detection problem, using A *-search with a novel lower-bound as admissible heuristic. An assessment of the inference performance of Belief-Propagation and Tree-Reweighted Belief Propagation is obtained as a by-product. The generality of our approach is demonstrated on four different datasets utilizing domain dependent information cues.
Similar content being viewed by others
References
Agarwal, A., & Triggs, B. (2006). Recovering 3D human pose from monocular images. IEEE PAMI, 28(1), 44–58.
Balan, A., Black, M., Haussecker, H., & Sigal, L. (2007). Shining a light on human pose: On shadows, shading and the estimation of pose and shape. In ICCV.
Becker, F. (2004). Matrix-valued filters as convex programs. Master’s thesis, CVGPR group, University of Mannheim.
Belongie, S., Malik, J., & Puzicha, J. (2002). Shape matching and object recognition using shape contexts. IEEE PAMI, 24(4), 509–522.
Bennett, K., & Parrado-Hernández, E. (2006). The interplay of optimization and machine learning research. IJCV, 7, 1265–1281.
Bergtholdt, M., Kappes, J. H., & Schnörr, C. (2006a). Graphical knowledge representation for human detection. In Int. works. on the representation and use of prior knowledge in vision.
Bergtholdt, M., Kappes, J. H., & Schnörr, C. (2006b). Learning of graphical models and efficient inference for object class recognition. In Ann. symp. German assoc. for patt. recog.
Bray, M., Kohli, P., & Torr, P. (2006). Posecut: Simultaneous segmentation and 3D pose estimation of humans using dynamic graph-cuts. In ECCV (pp. 642–655).
Cheng, S. Y., & Trivedi, M. M. (2006). Articulated human body pose inference from voxel data using a kinematically constrained Gaussian mixture model. In CVPR EHuM2: 2nd workshop on evaluation of articulated human motion and pose estimation.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.
Coughlan, J. M., & Ferreira, S. J. (2002). Finding deformable shapes using loopy belief propagation. In ECCV (pp. 453–468). Berlin: Springer.
Coughlan, J., & Shen, H. (2004). Shape matching with belief propagation: Using dynamic quantization to accommodate occlusion and clutter. In CVPR workshop (p. 180). Los Alamitos: IEEE Computer Society.
Coughlan, J., & Yuille, A. (2002). Bayesian A * tree search with expected O(N) node expansions: applications to road tracking. Neural Computation, 14(8), 1929–1958.
Cowell, R., Dawid, A., Lauritzen, S., & Spiegelhalter, D. (2003). Probabilistic networks and expert systems. Berlin: Springer.
Csurka, G., Dance, C., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In ECCV int. workshop on stat. learn. in comp. vis.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR (pp. 886–893).
DeGroot, M. H., & Fienberg, S. E. (1982). The comparison and evaluation of forecasters. Statistician, 32(1), 12–22.
Everingham, M., Zisserman, A., Williams, C. K. I., & Van Gool, L. (2006). The PASCAL visual object classes challenge 2006 (VOC2006) Results. http://www.pascal-network.org/challenges/VOC/voc2006/results.pdf.
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2007). The PASCAL visual object classes challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.
Fawcett, T. (2004). ROC graphs: Notes and practical considerations for researchers.
Felzenszwalb, P., & Huttenlocher, D. (2005). Pictorial structures for object recognition. IJCV, 61(1), 55–79.
Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In CVPR (Vol. 2, pp. 264–271).
Fergus, R., Perona, P., & Zisserman, A. (2005). A sparse object category model for efficient learning and exhaustive recognition. In CVPR.
Fergus, R., Perona, P., & Zisserman, A. (2007). Weakly supervised scale-invariant learning of models for visual recognition. IJCV, 71(3), 273–303.
Fergus, R., Weber, M., & Perona, P. (2001). Efficient methods for object recognition using the constellation model. Tech. rep., California Institute of Technology.
Ferrari, V., Marin-Jimenez, M., & Zisserman, A. (2008). Progressive search space reduction for human pose estimation. In ICCV.
Frey, B., & Jojic, N. (2005). A comparison of algorithms for inference and learning in probabilistic graphical models. IEEE PAMI, 27(9), 1392–1416.
Gavrila, D. (2007). A Bayesian, exemplar-based approach to hierarchical shape matching. IEEE PAMI, 29(8), 1408–1421.
Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 36(1), 3–42.
Gupta, A., Mittal, A., & Davis, L. S. (2008). Constraint integration for efficient multiview pose estimation with self-occlusions. IEEE PAMI, 30(3), 493–506.
Hart, P., Nilsson, N., & Raphael, B. (1968). A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics, 4, 100–107.
Hartley, R. I. (1992). Estimation of relative camera positions for uncalibrated cameras. In Lect. notes comp. sci. : Vol. 588. ECCV (pp. 589–587). Berlin: Springer.
Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8), 1771–1800.
Howe, N. R. (2007). Recognition-based motion capture and the HumanEva II test data. In CVPR EHuM2: 2nd workshop on evaluation of articulated human motion and pose estimation.
Jiang, H., & Martin, D. R. (2008). Global pose estimation using non-tree models. In CVPR.
Kolmogorov, V. (2006). Convergent tree-reweighted message passing for energy minimization. IJCV, 28(10), 1568–1583.
Kolmogorov, V., & Rother, C. (2006). Comparison of energy minimization algorithms for highly connected graphs. In ECCV.
Komodakis, N., & Tziritas, G. (2007). Approximate labeling via graph cuts based on linear programming. IEEE PAMI, 29(8), 2649–2661.
Kumar, S., & Hebert, M. (2006). Discriminative random fields. IJCV, 68(2), 179–201.
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. int. conf. on mach. learn.
Lee, M. W., & Cohen, I. (2006). A model-based approach for estimating human 3D poses in static images. IEEE PAMI, 28(6), 905–916.
Lepetit, V., & Fua, P. (2006). Keypoint recognition using randomized trees. IEEE PAMI, 28(9), 1465–1479.
Levin, A., & Weiss, Y. (2006). Learning to combine bottom-up and top-down segmentation. In ECCV (pp. 581–594).
Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. IJCV, 60(2), 91–110.
Mikolajczyk, K., Schmid, C., & Zisserman, A. (2004). Human detection based on a probabilistic assembly of robust part detectors. In ECCV. Berlin: Springer.
Mori, G., & Malik, J. (2006). Recovering 3D human body configurations using shape contexts. IEEE PAMI, 28(7), 1052–1062.
Pearl, J. (1984). Heuristics: intelligent search strategies for computer problem solving. Reading: Addison-Wesley.
Pham, T. V., & Smeulders, A. W. M. (2005). Object recognition with uncertain geometry and uncertain part detection. Computer Vision and Image Understanding, 99(2), 241–258.
Platt, J. (2000). Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In Advances in large margin classifiers (pp. 61–74). Cambridge: MIT Press.
Ponce, J., Hebert, M., Schmid, C., & Zisserman, A. (eds.) (2006). Toward category-level object recognition. Lect. notes comp. sci., Vol. 4170. Berlin: Springer.
Quattoni, A., Collins, M., & Darrell, T. (2004). Conditional random fields for object recognition. In NIPS.
Ramanan, D., Forsyth, D. A., & Zisserman, A. (2007). Tracking people by learning their appearance. IEEE PAMI, 29(1), 65–81.
Rifkin, R., & Klautau, A. (2004). In defense of one-vs-all classification. JMLR, 5, 101–141.
Roberts, T., McKenna, S., & Ricketts, I. (2007). Human pose estimation using partial configurations and probabilistic regions. IJCV, 73(3), 285–306.
Rosenhahn, B., Brox, T., & Weickert, J. (2007). Three-dimensional shape knowledge for joint image segmentation and pose tracking. IJCV, 73(3), 243–262.
Russell, S. J., & Norvig, P. (2003). Artificial intelligence: a modern approach. Upper Saddle River: Pearson Education.
Schmidt, S., Kappes, J. H., Bergtholdt, M., Pekar, V., Dries, S., Bystrov, D., & Schnörr, C. (2007). Spine detection and labeling using a parts-based graphical model. In N. Karssemeijer & B. Lelieveldt (Eds.), Lect. notes comp. sci. : Vol. 4584. Information processing in medical imaging (pp. 122–133). Berlin: Springer.
Seemann, E., Leibe, B., & Schiele, B. (2006). Multi-aspect detection of articulated objects. In CVPR.
Sigal, L., & Black, M. (2006a). Measure locally, reason globally: Occlusion-sensitive articulated pose estimation. In CVPR (Vol. 2).
Sigal, L., & Black, M. J. (2006b). Humaneva: Synchronized video and motion capture dataset for evaluation of articulated human motion. Tech. rep. CS-06-08, Brown University, Department of Computer Science, Providence, RI.
Sivic, J., Russell, B., Efros, A., Zisserman, A., & Freeman, W. (2005). Discovering objects and their locations in images. In ICCV. New York: IEEE.
Sminchisescu, C., Kanaujia, A., & Metaxas, D. (2007). Bm3e: Discriminative density propagation for visual tracking. IEEE PAMI, 29(11), 2030–2044.
Sudderth, E., Ihler, A., Freeman, W., & Willsky, A. (2003). Nonparametric belief propagation. In CVPR.
Sutton, C., McCallum, A., & Rohanimanesh, K. (2007). Dynamic conditional random fields: Factorized probabilistic models for labeling and segmenting sequence data. JMLR, 8, 693–723.
Szeliski, R., Zabih, R., Scharstein, D., Veksler, O., Kolmogorov, V., Agarwala, A., Tappen, M., & Rother, C. (2006). A comparative study of energy minimization methods for Markov random fields. In ECCV.
Wainwright, M. (2006). Estimating the wrong Markov random field: Benefits in the computation-limited setting. In Y. Weiss, B. Schölkopf, & J. Platt (Eds.), Adv. in neur. inf. proc. sys. (pp. 1425–1432). Cambridge: MIT Press.
Wainwright, M., Jaakola, T., & Willsky, A. (2005). Map estimation via agreement on trees: message-passing and linear programming. IEEE Transactions and Information Theory, 51(11), 3697–3717.
Weber, M., Welling, M., & Perona, P. (2000). Unsupervised learning of models for recognition. In ECCV (pp. 18–32).
Welk, M., Weickert, J., Becker, F., Schnörr, C., Feddern, C., & Burgeth, B. (2007). Median and related local filters for tensor-valued images. Signal Processing, 87(2), 291–308.
Werner, T. (2007). A linear programming approach to max-sum problem: A review. IEEE PAMI, 29(7), 1165–1179.
Winkler, G. (2006). Image analysis, random fields and Markov chain Monte Carlo methods. Berlin: Springer.
Yedida, J. S., Freeman, W. T., & Weiss, Y. (2005). Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Transactions and Information Theory, 51(7), 2282–2312.
Yuille, A., & Coughlan, J. (2000). An A * perspective on deterministic optimization for deformable templates. Pattern Recognition, 33(4), 603–616.
Zadrozny, B., & Elkan, C. (2002). Transforming classifier scores into accurate multiclass probability estimates. In KDD ’02: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 694–699). New York: ACM.
Zhang, J., Marszalek, M., Lazebnik, S., & Schmid, C. (2007a). Local features and kernels for classification of texture and object categories: A comprehensive study. IJCV, 73(2), 213–238.
Zhang, L., Nevatia, R., & Wu, B. (2007b). Detection and tracking of multiple humans with extensive pose articulation. In ICCV.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bergtholdt, M., Kappes, J., Schmidt, S. et al. A Study of Parts-Based Object Class Detection Using Complete Graphs. Int J Comput Vis 87, 93–117 (2010). https://doi.org/10.1007/s11263-009-0209-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-009-0209-1