Abstract
In this paper we propose an object recognition approach that is based on shape masks—generalizations of segmentation masks. As shape masks carry information about the extent (outline) of objects, they provide a convenient tool to exploit the geometry of objects. We apply our ideas to two common object class recognition tasks—classification and localization. For classification, we extend the orderless bag-of-features image representation. In the proposed setup shape masks can be seen as weak geometrical constraints over bag-of-features. Those constraints can be used to reduce background clutter and help recognition. For localization, we propose a new recognition scheme based on high-dimensional hypothesis clustering. Shape masks allow to go beyond bounding boxes and determine the outline (approximate segmentation) of the object during localization. Furthermore, the method easily learns and detects possible object viewpoints and articulations, which are often well characterized by the object outline. Our experiments reveal that shape masks can improve recognition accuracy of state-of-the-art methods while returning richer recognition answers at the same time. We evaluate the proposed approach on the challenging natural-scene Graz-02 object classes dataset.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agarwal, S., & Roth, D. (2002). Learning a sparse representation for object detection. In ECCV.
Agarwal, S., Awan, A., & Roth, D. (2004). Learning to detect objects in images via a sparse, part-based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(11), 1475–1490.
Borenstein, E., & Ullman, S. (2002). Class-specific, top-down segmentation. In ECCV.
Chapelle, O., Haffner, P., & Vapnik, V. (1999). Support vector machines for histogram-based image classification. IEEE Transactions on Neural Networks, 10(5), 1055–1064.
Csurka, G., Dance, C., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In ECCV workshop on statistical learning in computer vision.
Dorkó, G., & Schmid, C. (2003). Selection of scale-invariant parts for object class recognition. In ICCV.
Everingham, M., Zisserman, A., Williams, C., & Gool, L.V., et al. (2006). The 2005 PASCAL visual object classes challenge. In Selected proceedings of the first PASCAL challenges workshop.
Everingham, M., van Gool, L., Williams, C., Winn, J., & Zisserman, A. (2008). Overview and results of the detection challenge. In The PASCAL VOC’08 challenge workshop in conj. with ECCV.
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2009). The PASCAL visual object classes challenge 2009 (VOC2009) results. http://www.pascal-network.org/challenges/VOC/voc2009/workshop/index.html.
Fergus, R., Perona, P., & Zisserman, A. (2007). Weakly supervised scale-invariant learning of models for visual recognition. International Journal of Computer Vision, 71(3), 273–303.
Fowlkes, C., Belongie, S., Chung, F., & Malik, J. (2004). Spectral grouping using the Nyström method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2), 1–12.
Fritz, M., Leibe, B., Caputo, B., & Schiele, B. (2005). Integrating representative and discriminant models for object category detection. In ICCV.
Fussenegger, M., Opelt, A., & Pinz, A. (2006). Object localization/segmentation using generic shape priors. In ICPR.
Galleguillos, C., Babenko, B., Rabinovich, A., & Belongie, S. (2008). Weakly supervised object localization with stable segmentations. In ECCV.
Gårding, J., & Lindeberg, T. (1996). Direct computation of shape cues using scale-adapted spatial derivative operators. International Journal of Computer Vision, 17(2), 163–191.
Grauman, K., & Darrell, T. (2005). The pyramid match kernel: Discriminative classification with sets of image features. In ICCV.
Gu, C., Lim, J., Arbelaez, P., & Malik, J. (2009). Recognition using regions. In CVPR.
Hayman, E., Caputo, B., Fritz, M., & Eklundh, JO (2004). On the significance of real-world conditions for material classification. In ECCV.
Jing, F., Li, M., Zhang, H. J., & Zhang, B. (2003). Support vector machines for region-based image retrieval. In ICME.
Lazebnik, S., Schmid, C., & Ponce, J. (2005). A maximum entropy framework for part-based texture and object recognition. In ICCV.
Leibe, B., Seemann, E., & Schiele, B. (2005). Pedestrian detection in crowded scenes. In CVPR.
Leibe, B., Leonardis, A., & Schiele, B. (2008). Robust object detection with interleaved categorization and segmentation. International Journal of Computer Vision, 77(1–3), 259–289.
Li, L. J., Socher, R., & Fei-Fei, L. (2009). Towards total scene understanding: classification, annotation and segmentation in an unsupervised framework. In CVPR.
Lindeberg, T. (1998). Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2), 79–116.
Lowe, D. (2004). Distinctive image features form scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Lyu, S. (2005). Mercer kernels for object recognition with local features. In CVPR.
Marr, D. (1982). Vision. New York: Freeman.
Marszałek, M., & Schmid, C. (2006). Spatial weighting for bag-of-features. In CVPR.
Marszałek, M., & Schmid, C. (2007). Accurate object localization with shape masks. In CVPR.
Mikolajczyk, K., & Schmid, C. (2004). Scale and affine invariant interest point detectors. International Journal of Computer Vision, 60(1), 63–86.
Opelt, A., & Pinz, A. (2005). Object localization with boosting and weak supervision for generic object recognition. In SCIA.
Opelt, A., Fussenegger, M., Pinz, A., & Auer, P. (2004a). Generic object recognition with boosting. Tech. rep. TR-EMT-2004-01, TU Graz.
Opelt, A., Fussenegger, M., Pinz, A., & Auer, P. (2004b). Weak hypotheses and boosting for generic object detection and recognition. In ECCV.
Opelt, A., Pinz, A., Fussenegger, M., & Auer, P. (2006). Generic object recognition with boosting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(3), 416–431.
Peterson, M. (1994). Object recognition processes can and do operate before figure-ground organization. Current Directions in Psychological Science, 3, 105–111.
Ramanan, D. (2007). Using segmentation to verify object hypotheses. In CVPR.
Rothganger, F., Lazebnik, S., Schmid, C., & Ponce, J. (2003). 3D object modeling and recognition using affine-invariant patches and multi-view spatial constraints. In CVPR.
Rowley, H., Baluja, S., & Kanade, T. (1998). Neural networks based face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1), 22–38.
Rubner, Y., Tomasi, C., & Guibas, L. (2000). The Earth Mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40(2), 99–121.
Russell, B., Efros, A., Sivic, J., Freeman, W., & Zisserman, A. (2006). Using multiple segmentations to discover objects and their extents in image collections. In CVPR.
Schölkopf, B., & Smola, A. (2002). Learning with kernels: support vector machines, regularization, optimization and beyond. Cambridge: MIT Press.
Seemann, E., & Schiele, B. (2006). Cross-articulation learning for robust detection of pedestrians. In DAGM.
Seemann, E., Leibe, B., & Schiele, B. (2006). Multi-aspect detection of articulated objects. In CVPR.
Shotton, J., Blake, A., & Cipolla, R. (2005). Contour-based learning for object detection. In ICCV.
Shotton, J., Johnson, M., & Cipolla, R. (2008). Semantic texton forests for image categorization and segmentation. In CVPR.
Sivic, J., & Zisserman, A. (2003). Video Google: a text retrieval approach to object matching in videos. In ICCV.
Sivic, J., Russell, B., Efros, A., Zisserman, A., & Freeman, W. (2005). Discovering objects and their location in images. In ICCV.
Thomas, A., Ferrari, V., Leibe, B., Tuytelaars, T., Schiele, B., & Gool, L. V. (2006). Towards multi-view object class detection. In CVPR.
Todorovic, S., & Ahuja, N. (2006). Extracting subimages of an unknown category from a set of images. In CVPR.
Vecera, S. (1998). Figure-ground organization and object recognition processes: an interactive account. Journal of Experimental Psychology. Human Perception and Performance, 24(2), 441–462.
Viola, P., & Jones, M. (2004). Robust real-time object detection. International Journal of Computer Vision, 57(2), 137–154.
Winn, J., & Joijic, N. (2005). LOCUS: learning object classes with unsupervised segmentation. In ICCV.
Wu, B., & Nevatia, R. (2007). Simultaneous object detection and segmentation by boosting local shape feature based classifier. In CVPR.
Yu, S., & Shi, J. (2003). Object-specific figure-ground segregation. In CVPR.
Zhang, J., Marszałek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: a comprehensive study. International Journal of Computer Vision, 73(2), 213–238.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Marszałek, M., Schmid, C. Accurate Object Recognition with Shape Masks. Int J Comput Vis 97, 191–209 (2012). https://doi.org/10.1007/s11263-011-0479-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-011-0479-2