Abstract
This contribution proposes a compositionality architecture for visual object categorization, i.e., learning and recognizing multiple visual object classes in unsegmented, cluttered real-world scenes. We propose a sparse image representation based on localized feature histograms of salient regions. Category specific information is then aggregated by using relations from perceptual organization to form compositions of these descriptors. The underlying concept of image region aggregation to condense semantic information advocates for a statistical representation founded on graphical models. On the basis of this structure, objects and their constituent parts are localized.
To complement the learned dependencies between compositions and categories, a global shape model of all compositions that form an object is trained. During inference, belief propagation reconciles bottom-up feature-driven categorization with top-down category models. The system achieves a competitive recognition performance on the standard CalTech database.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agarwal, S., Awan, A., Roth, D.: Learning to detect objects in images via a sparse, part-based representation. IEEE Trans. Pattern Anal. Machine Intell. 26(11) (2004)
Biederman, I.: Recognition-by-components: A theory of human image understanding. Psychological Review 94(2), 115–147 (1987)
Bienenstock, E., Geman, S., Potter, D.: Compositionality, mdl priors, and object recognition. In: NIPS, vol. 9 (1997)
Borenstein, E., Sharon, E., Ullman, S.: Combining top-down and bottom-up segmentation. In: CVPR Workshop on Perceptual Organization in Computer Vision (2004)
Borenstein, E., Ullman, S.: Class-specific, top-down segmentation. In: ECCV (2002)
Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: CVPR (2003)
Fergus, R., Perona, P., Zisserman, A.: A visual category filter for google images. In: ECCV (2004)
Fischler, M.A., Elschlager, R.A.: The representation and matching of pictorial structures. IEEE Trans. Comput. 22(1) (1973)
Geman, S., Potter, D.F., Chi, Z.: Composition Systems. Technical report, Division of Applied Mathematics, Brown University, Providence, RI (1998)
Kschischang, F.R., Frey, B.J., Loeliger, H.-A.: Factor graphs and the sum-product algorithm. IEEE Trans. Inform. Theory 47(2) (2001)
Lades, M., Vorbrüggen, J.C., Buhmann, J.M., Lange, J., von der Malsburg, C., Würtz, R.P., Konen, W.: Distortion invariant object recognition in the dynamic link architecture. IEEE Trans. Comput. 42 (1993)
Leibe, B., Leonardis, A., Schiele, B.: Combined object categorization and segmentation with an implicit shape model. In: ECCV Workshop on Stat. Learning in Computer Vision (2004)
Leibe, B., Schiele, B.: Scale-invariant object categorization using a scale-adaptive mean-shift search. In: Pattern Recognition, DAGM (2004)
Lowe, D.G.: Perceptual Organization and Visual Recognition. Kluwer Academic Publishers, Norwell (1985)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Computer Vision 60(2) (2004)
Mikolajczyk, K., Schmid, C.: Scale & affine invariant interest point detectors. Int. J. Computer Vision 60(1) (2004)
Murphy, K., Weiss, Y., Jordan, M.: Loopy-belief propagation for approximate inference: An empirical study. In: UAI (1999)
Ommer, B., Buhmann, J.M.: A compositionality architecture for perceptual feature grouping. In: EMMCVPR (2003)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco (1988)
Veltkamp, R.C., Tanase, M.: Content-based image and video retrieval. In: Marques, O., Furht, B. (eds.) A Survey of Content-Based Image Retrieval Systems. Kluwer, Dordrecht (2002)
Weber, M., Welling, M., Perona, P.: Unsupervised learning of models for recognition. In: ECCV (2000)
Winkler, G.: Image Analysis, Random Fields and Markov Chain Monte Carlo Methods—A Mathematical Introduction, 2nd edn. Springer, Heidelberg (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ommer, B., Buhmann, J.M. (2005). Object Categorization by Compositional Graphical Models. In: Rangarajan, A., Vemuri, B., Yuille, A.L. (eds) Energy Minimization Methods in Computer Vision and Pattern Recognition. EMMCVPR 2005. Lecture Notes in Computer Science, vol 3757. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11585978_16
Download citation
DOI: https://doi.org/10.1007/11585978_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30287-2
Online ISBN: 978-3-540-32098-2
eBook Packages: Computer ScienceComputer Science (R0)