Accurate Object Recognition with Shape Masks

Marszałek, Marcin; Schmid, Cordelia

doi:10.1007/s11263-011-0479-2

Accurate Object Recognition with Shape Masks

Published: 01 July 2011

Volume 97, pages 191–209, (2012)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Marcin Marszałek¹ &
Cordelia Schmid¹

1054 Accesses
26 Citations
Explore all metrics

Abstract

In this paper we propose an object recognition approach that is based on shape masks—generalizations of segmentation masks. As shape masks carry information about the extent (outline) of objects, they provide a convenient tool to exploit the geometry of objects. We apply our ideas to two common object class recognition tasks—classification and localization. For classification, we extend the orderless bag-of-features image representation. In the proposed setup shape masks can be seen as weak geometrical constraints over bag-of-features. Those constraints can be used to reduce background clutter and help recognition. For localization, we propose a new recognition scheme based on high-dimensional hypothesis clustering. Shape masks allow to go beyond bounding boxes and determine the outline (approximate segmentation) of the object during localization. Furthermore, the method easily learns and detects possible object viewpoints and articulations, which are often well characterized by the object outline. Our experiments reveal that shape masks can improve recognition accuracy of state-of-the-art methods while returning richer recognition answers at the same time. We evaluate the proposed approach on the challenging natural-scene Graz-02 object classes dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Agarwal, S., & Roth, D. (2002). Learning a sparse representation for object detection. In ECCV.
Google Scholar
Agarwal, S., Awan, A., & Roth, D. (2004). Learning to detect objects in images via a sparse, part-based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(11), 1475–1490.
Article Google Scholar
Borenstein, E., & Ullman, S. (2002). Class-specific, top-down segmentation. In ECCV.
Google Scholar
Chapelle, O., Haffner, P., & Vapnik, V. (1999). Support vector machines for histogram-based image classification. IEEE Transactions on Neural Networks, 10(5), 1055–1064.
Article Google Scholar
Csurka, G., Dance, C., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In ECCV workshop on statistical learning in computer vision.
Google Scholar
Dorkó, G., & Schmid, C. (2003). Selection of scale-invariant parts for object class recognition. In ICCV.
Google Scholar
Everingham, M., Zisserman, A., Williams, C., & Gool, L.V., et al. (2006). The 2005 PASCAL visual object classes challenge. In Selected proceedings of the first PASCAL challenges workshop.
Google Scholar
Everingham, M., van Gool, L., Williams, C., Winn, J., & Zisserman, A. (2008). Overview and results of the detection challenge. In The PASCAL VOC’08 challenge workshop in conj. with ECCV.
Google Scholar
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2009). The PASCAL visual object classes challenge 2009 (VOC2009) results. http://www.pascal-network.org/challenges/VOC/voc2009/workshop/index.html.
Fergus, R., Perona, P., & Zisserman, A. (2007). Weakly supervised scale-invariant learning of models for visual recognition. International Journal of Computer Vision, 71(3), 273–303.
Article Google Scholar
Fowlkes, C., Belongie, S., Chung, F., & Malik, J. (2004). Spectral grouping using the Nyström method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2), 1–12.
Article Google Scholar
Fritz, M., Leibe, B., Caputo, B., & Schiele, B. (2005). Integrating representative and discriminant models for object category detection. In ICCV.
Google Scholar
Fussenegger, M., Opelt, A., & Pinz, A. (2006). Object localization/segmentation using generic shape priors. In ICPR.
Google Scholar
Galleguillos, C., Babenko, B., Rabinovich, A., & Belongie, S. (2008). Weakly supervised object localization with stable segmentations. In ECCV.
Google Scholar
Gårding, J., & Lindeberg, T. (1996). Direct computation of shape cues using scale-adapted spatial derivative operators. International Journal of Computer Vision, 17(2), 163–191.
Article Google Scholar
Grauman, K., & Darrell, T. (2005). The pyramid match kernel: Discriminative classification with sets of image features. In ICCV.
Google Scholar
Gu, C., Lim, J., Arbelaez, P., & Malik, J. (2009). Recognition using regions. In CVPR.
Google Scholar
Hayman, E., Caputo, B., Fritz, M., & Eklundh, JO (2004). On the significance of real-world conditions for material classification. In ECCV.
Google Scholar
Jing, F., Li, M., Zhang, H. J., & Zhang, B. (2003). Support vector machines for region-based image retrieval. In ICME.
Google Scholar
Lazebnik, S., Schmid, C., & Ponce, J. (2005). A maximum entropy framework for part-based texture and object recognition. In ICCV.
Google Scholar
Leibe, B., Seemann, E., & Schiele, B. (2005). Pedestrian detection in crowded scenes. In CVPR.
Google Scholar
Leibe, B., Leonardis, A., & Schiele, B. (2008). Robust object detection with interleaved categorization and segmentation. International Journal of Computer Vision, 77(1–3), 259–289.
Article Google Scholar
Li, L. J., Socher, R., & Fei-Fei, L. (2009). Towards total scene understanding: classification, annotation and segmentation in an unsupervised framework. In CVPR.
Google Scholar
Lindeberg, T. (1998). Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2), 79–116.
Article Google Scholar
Lowe, D. (2004). Distinctive image features form scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Article Google Scholar
Lyu, S. (2005). Mercer kernels for object recognition with local features. In CVPR.
Google Scholar
Marr, D. (1982). Vision. New York: Freeman.
Google Scholar
Marszałek, M., & Schmid, C. (2006). Spatial weighting for bag-of-features. In CVPR.
Google Scholar
Marszałek, M., & Schmid, C. (2007). Accurate object localization with shape masks. In CVPR.
Google Scholar
Mikolajczyk, K., & Schmid, C. (2004). Scale and affine invariant interest point detectors. International Journal of Computer Vision, 60(1), 63–86.
Article Google Scholar
Opelt, A., & Pinz, A. (2005). Object localization with boosting and weak supervision for generic object recognition. In SCIA.
Google Scholar
Opelt, A., Fussenegger, M., Pinz, A., & Auer, P. (2004a). Generic object recognition with boosting. Tech. rep. TR-EMT-2004-01, TU Graz.
Opelt, A., Fussenegger, M., Pinz, A., & Auer, P. (2004b). Weak hypotheses and boosting for generic object detection and recognition. In ECCV.
Google Scholar
Opelt, A., Pinz, A., Fussenegger, M., & Auer, P. (2006). Generic object recognition with boosting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(3), 416–431.
Article Google Scholar
Peterson, M. (1994). Object recognition processes can and do operate before figure-ground organization. Current Directions in Psychological Science, 3, 105–111.
Article Google Scholar
Ramanan, D. (2007). Using segmentation to verify object hypotheses. In CVPR.
Google Scholar
Rothganger, F., Lazebnik, S., Schmid, C., & Ponce, J. (2003). 3D object modeling and recognition using affine-invariant patches and multi-view spatial constraints. In CVPR.
Google Scholar
Rowley, H., Baluja, S., & Kanade, T. (1998). Neural networks based face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1), 22–38.
Article Google Scholar
Rubner, Y., Tomasi, C., & Guibas, L. (2000). The Earth Mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40(2), 99–121.
Article MATH Google Scholar
Russell, B., Efros, A., Sivic, J., Freeman, W., & Zisserman, A. (2006). Using multiple segmentations to discover objects and their extents in image collections. In CVPR.
Google Scholar
Schölkopf, B., & Smola, A. (2002). Learning with kernels: support vector machines, regularization, optimization and beyond. Cambridge: MIT Press.
Google Scholar
Seemann, E., & Schiele, B. (2006). Cross-articulation learning for robust detection of pedestrians. In DAGM.
Google Scholar
Seemann, E., Leibe, B., & Schiele, B. (2006). Multi-aspect detection of articulated objects. In CVPR.
Google Scholar
Shotton, J., Blake, A., & Cipolla, R. (2005). Contour-based learning for object detection. In ICCV.
Google Scholar
Shotton, J., Johnson, M., & Cipolla, R. (2008). Semantic texton forests for image categorization and segmentation. In CVPR.
Google Scholar
Sivic, J., & Zisserman, A. (2003). Video Google: a text retrieval approach to object matching in videos. In ICCV.
Google Scholar
Sivic, J., Russell, B., Efros, A., Zisserman, A., & Freeman, W. (2005). Discovering objects and their location in images. In ICCV.
Google Scholar
Thomas, A., Ferrari, V., Leibe, B., Tuytelaars, T., Schiele, B., & Gool, L. V. (2006). Towards multi-view object class detection. In CVPR.
Google Scholar
Todorovic, S., & Ahuja, N. (2006). Extracting subimages of an unknown category from a set of images. In CVPR.
Google Scholar
Vecera, S. (1998). Figure-ground organization and object recognition processes: an interactive account. Journal of Experimental Psychology. Human Perception and Performance, 24(2), 441–462.
Article Google Scholar
Viola, P., & Jones, M. (2004). Robust real-time object detection. International Journal of Computer Vision, 57(2), 137–154.
Article Google Scholar
Winn, J., & Joijic, N. (2005). LOCUS: learning object classes with unsupervised segmentation. In ICCV.
Google Scholar
Wu, B., & Nevatia, R. (2007). Simultaneous object detection and segmentation by boosting local shape feature based classifier. In CVPR.
Google Scholar
Yu, S., & Shi, J. (2003). Object-specific figure-ground segregation. In CVPR.
Google Scholar
Zhang, J., Marszałek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: a comprehensive study. International Journal of Computer Vision, 73(2), 213–238.
Article Google Scholar

Download references

Author information

Authors and Affiliations

INRIA Grenoble, LEAR - LJK, 665 av de l’Europe, 38330, Montbonnot, France
Marcin Marszałek & Cordelia Schmid

Authors

Marcin Marszałek
View author publications
You can also search for this author inPubMed Google Scholar
Cordelia Schmid
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Marcin Marszałek.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marszałek, M., Schmid, C. Accurate Object Recognition with Shape Masks. Int J Comput Vis 97, 191–209 (2012). https://doi.org/10.1007/s11263-011-0479-2

Download citation

Received: 06 August 2009
Accepted: 13 June 2011
Published: 01 July 2011
Issue Date: April 2012
DOI: https://doi.org/10.1007/s11263-011-0479-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accurate Object Recognition with Shape Masks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

ObjectNet3D: A Large Scale Database for 3D Object Recognition

Dense RepPoints: Representing Visual Objects with Dense Point Sets

A Novel Spatial Layout Representation for Object Recognition

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Accurate Object Recognition with Shape Masks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

ObjectNet3D: A Large Scale Database for 3D Object Recognition

Dense RepPoints: Representing Visual Objects with Dense Point Sets

A Novel Spatial Layout Representation for Object Recognition

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now