Skip to main content
Log in

Object Recognition by Sequential Figure-Ground Ranking

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

We present an approach to visual object-class segmentation and recognition based on a pipeline that combines multiple figure-ground hypotheses with large object spatial support, generated by bottom-up computational processes that do not exploit knowledge of specific categories, and sequential categorization based on continuous estimates of the spatial overlap between the image segment hypotheses and each putative class. We differ from existing approaches not only in our seemingly unreasonable assumption that good object-level segments can be obtained in a feed-forward fashion, but also in formulating recognition as a regression problem. Instead of focusing on a one-vs.-all winning margin that may not preserve the ordering of segment qualities inside the non-maximum (non-winning) set, our learning method produces a globally consistent ranking with close ties to segment quality, hence to the extent entire object or part hypotheses are likely to spatially overlap the ground truth. We demonstrate results beyond the current state of the art for image classification, object detection and semantic segmentation, in a number of challenging datasets including Caltech-101, ETHZ-Shape as well as PASCAL VOC 2009 and 2010.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Arbelaez, P., & Cohen, L. (2008). Constrained image segmentation from hierarchical boundaries. In Computer vision and pattern recognition, IEEE computer society conference on (pp. 1–8).

    Google Scholar 

  • Arbelaez, P., Maire, M., Fowlkes, C., & Malik, J. (2009). From contours to regions: an empirical evaluation. In IEEE conference on computer vision and pattern recognition.

    Google Scholar 

  • Bishop, C. M. (2007) Pattern recognition and machine learning Information science and statistics, 1st edn, 2006. Springer, Berlin corr. 2nd printing edn.

    Google Scholar 

  • Blaschko, M. B., & Lampert, C. H. (2008). Learning to localize objects with structured output regression. In European conference on computer vision (pp. 2–15).

    Google Scholar 

  • Bo, L., & Sminchisescu, C. (2009). Efficient match kernels between sets of features for visual recognition. In Advances in neural information processing systems.

    Google Scholar 

  • Boiman, O., Shechtman, E., & Irani, M. (2008). In defense of nearest-neighbor based image classification. In Computer vision and pattern recognition, IEEE conference on CVPR 2008 (pp. 1–8).

    Google Scholar 

  • Borenstein, E., & Ullman, S. (2002). Class-specific, top-down segmentation. In European conference on computer vision.

    Google Scholar 

  • Borenstein, E., & Ullman, S. (2008). Combined top-down/bottom-up segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(12), 2109–2125.

    Article  Google Scholar 

  • Bosch, A., Zisserman, A., & Munoz, X. (2007). Representing shape with a spatial pyramid kernel. In CIVR’07.

    Google Scholar 

  • Boykov, Y., & Jolly, M. P. (2001). Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images. In ICCV (pp. 105–112).

    Google Scholar 

  • Carreira, J., & Sminchisescu, C. (2010a). Constrained parametric min-cuts for automatic object segmentation, release 1. http://sminchisescu.ins.uni-bonn.de/code/cpmc/.

  • Carreira, J., & Sminchisescu, C. (2010b). Constrained parametric min cuts for automatic object segmentation. In IEEE conference on computer vision and pattern recognition.

    Google Scholar 

  • Carreira, J., & Sminchisescu, C. (2012). Cpmc: Automatic object segmentation using constrained parametric min-cuts. IEEE Transaction on Pattern Analysis and Machine Intelligence (accepted).

  • Carreira, J., Ion, A., & Sminchisescu, C. (2010). Image segmentation by discounted cumulative ranking on maximal cliques (Tech. Rep.). 06-2010 (arXiv:1009.4823), Computer Vision and Machine Learning Group, Institute for Numerical Simulation, University of Bonn. Available at http://arxiv.org/abs/1009.4823.

  • Cour, T., & Shi, J. (2007). Recognizing objects by piecing together the segmentation puzzle. In IEEE conference on computer vision and pattern recognition (pp. 1–8).

    Chapter  Google Scholar 

  • Csurka, G., & Perronnin, F. (2008). A simple high performance approach to semantic segmentation. In BMVC.

    Google Scholar 

  • Csurka, G., & Perronnin, F. (2010). An efficient approach to semantic segmentation. International Journal of Computer Vision 1–15.

  • Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.

    Article  Google Scholar 

  • Fei-Fei, L., Fergus, R., & Perona, P. (2007). Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Computer Vision and Image Understanding, 106(1), 59–70.

    Article  Google Scholar 

  • Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 1627–1645.

    Article  Google Scholar 

  • Ferrari, V., Jurie, F., & Schmid, C. (2007). Accurate object detection with deformable shape models learnt from images. In IEEE conference on computer vision and pattern recognition.

    Google Scholar 

  • Fulkerson, B., Vedaldi, A., & Soatto, S. (2009). Class segmentation and object localization with superpixel neighborhoods. In International conference on computer vision (pp. 670–677).

    Chapter  Google Scholar 

  • Gallo, G., Grigoriadis, M. D., & Tarjan, R. E. (1989). A fast parametric maximum flow algorithm and applications. SIAM Journal on Computing, 18(1), 30–55. doi:10.1137/0218003.

    Article  MathSciNet  MATH  Google Scholar 

  • Gehler, P. V., & Nowozin, S. (2009). On feature combination for multiclass object classification. In International conference on computer vision.

    Google Scholar 

  • Gonfaus, J., Boix, X., de Weijer, J. V., Bagdanov, A., Serrat, J., & Gonzàlez, J. (2010). Harmony potentials for joint classification and segmentation. In IEEE conference on computer vision and pattern recognition.

    Google Scholar 

  • Gould, S., Fulton, R., & Koller, D. (2009a). Decomposing a scene into geometric and semantically consistent regions. In International conference on computer vision.

    Google Scholar 

  • Gould, S., Gao, T., & Koller, D. (2009b). Region-based segmentation and object detection. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams & A. Culotta (Eds.), Advances in neural information processing systems (pp. 655–663).

    Google Scholar 

  • Grauman, K., & Darrell, T. (2005). The pyramid match kernel: discriminative classification with sets of image features. In International conference on computer vision (Vol. 2, pp. 1458–1465).

    Google Scholar 

  • Griffin, G., Holub, A., & Perona, P. (2007). Caltech-256 object category dataset (Tech. Rep. 7694). California Institute of Technology.

  • Gu, C., Lim, J. J., Arbeláez, P., & Malik, J. (2009). Recognition using regions. In IEEE conference on computer vision and pattern recognition.

    Google Scholar 

  • He, X., Zemel, R. S., & Carreira-Perpiñán, M. (2004). Multiscale conditional random fields for image labeling. IEEE conference on computer vision and pattern recognition (Vol. 2, pp. 695–702).

    Google Scholar 

  • Ion, A., Carreira, J., & Sminchisescu, C. (2011). Image segmentation by figure-ground composition into maximal cliques. In International conference on computer vision.

    Google Scholar 

  • Kohli, P., Ladicky, L., & Torr, P. (2008). Robust higher order potentials for enforcing label consistency. In IEEE conference on computer vision and pattern recognition (pp. 1–8).

    Chapter  Google Scholar 

  • Kumar, A., & Sminchisescu, C. (2007). Support kernel machines for object recognition. In International conference on computer vision.

    Google Scholar 

  • Kumar, M. P., Torr, P. H. S., & Zisserman, A. (2005). Obj cut. In IEEE conference on computer vision and pattern recognition.

    Google Scholar 

  • Ladicky, L., Russell, C., Kohli, P., & Torr, P. H. S. (2009a). Associative hierarchical crfs for object class image segmentation. In International conference on computer vision.

    Google Scholar 

  • Ladicky, L., Russell, C., Kohli, P., & Torr, P. H. S. (2009b). Associative hierarchical crfs for object class image segmentation. In International conference on computer vision.

    Google Scholar 

  • Ladicky, L., Sturgess, P., Alaharia, K., Russel, C., & Torr, P. H. (2010). What, where & how many ? combining object detectors and crfs. In European conference on computer vision.

    Google Scholar 

  • Lampert, C., Blaschko, M., & Hofmann, T. (2008). Beyond sliding windows: object localization by efficient subwindow search. In Computer vision and pattern recognition. IEEE conference on CVPR 2008 (pp. 1–8).

    Google Scholar 

  • Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In IEEE conference on computer vision and pattern recognition (Vol. 2, pp. 2169–2178).

    Google Scholar 

  • Leibe, B., Leonardis, A., & Schiele, B. (2008). Robust object detection with interleaved categorization and segmentation. International Journal of Computer Vision, 77(1–3), 259–289.

    Article  Google Scholar 

  • Levin, A., & Weiss, Y. (2009). Learning to combine bottom-up and top-down segmentation. International Journal of Computer Vision, 81(1), 105–118.

    Article  Google Scholar 

  • Li, F., Carreira, J., & Sminchisescu, C. (2010a). Object recognition as ranking holistic figure-ground hypotheses. In IEEE conference on computer vision and pattern recognition.

    Google Scholar 

  • Li, F., Ionescu, C., & Sminchisescu, C. (2010b). Random Fourier approximations for skewed multiplicative histogram kernels. In Annual symposium of the German association for pattern recognition (DAGM).

    Google Scholar 

  • Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.

    Article  Google Scholar 

  • Maire, M., Arbelaez, P., Fowlkes, C., & Malik, J. (2008). Using contours to detect and localize junctions in natural images. In IEEE conference on computer vision and pattern recognition.

    Google Scholar 

  • Malisiewicz, T., & Efros, A. (2007). Improving spatial support for objects via multiple segmentations. In British machine vision conference.

    Google Scholar 

  • Malisiewicz, T., & Efros, A. A. (2008). Recognition by association via learning per-exemplar distances. In IEEE conference on computer vision and pattern recognition.

    Google Scholar 

  • Mori, G., Ren, X., Efros, A., & Malik, J. (2004). Recovering human body configurations: combining segmentation and recognition. In Computer vision and pattern recognition. Proceedings of the 2004 IEEE computer society conference on CVPR 2004 (Vol. 2, pp. II-326–II-333).

    Google Scholar 

  • Pantofaru, C., Schmid, C., & Hebert, M. (2008). Object recognition by integrating multiple image segmentations. In European conference on computer vision.

    Google Scholar 

  • Pinto, N., Cox, D. D., & DiCarlo, J. J. (2008). Why is real-world visual object recognition hard? PLoS Computational Biology 4(1), e27.

    Article  MathSciNet  Google Scholar 

  • Rabinovich, A., Belongie, S., Lange, T., & Buhmann, J. M. (2006). Model order selection and cue combination for image segmentation. In IEEE conference on computer vision and pattern recognition (Vol. 1, pp. 1130–1137).

    Google Scholar 

  • Rabinovich, A., Vedaldi, A., & Belongie, S. (2007). Does image segmentation improve object categorization? (Tech. Rep.). CS2007-090.

  • Rahimi, A., & Recht, B. (2007). Random features for large-scale kernel machines. In Advances in neural information processing systems.

    Google Scholar 

  • Schoenemann, T., & Cremers, D. (2010). A combinatorial solution for model-based image segmentation and real-time tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 1153–1164.

    Article  Google Scholar 

  • Shi, J., & Malik, J. (2000) Normalized cuts and image segmentation. IEEE Transaction on Pattern Analysis and Machine Intelligence. doi:10.1109/34.868688.

  • Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2006). Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In European conference on computer vision (pp. 1–15).

    Google Scholar 

  • Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2009). Textonboost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context. International Journal of Computer Vision, 81, 2–23.

    Article  Google Scholar 

  • Srinivasan, P., & Shi, J. (2007). Botom-up recognition and parsing of the human body. In IEEE conference on computer vision and pattern recognition.

    Google Scholar 

  • Todorovic, S., & Ahuja, N. (2008). Learning subcategory relevances for category recognition. In IEEE conference on computer vision and pattern recognition.

    Google Scholar 

  • Toshev, A., Taskar, B., & Daniilidis, K. (2010). Object detection via boundary structure segmentation. In IEEE conference on computer vision and pattern recognition (pp. 950–957).

    Google Scholar 

  • Tsochantaridis, I., Hofmann, T., Joachims, T., & Altun, Y. (2004). Support vector machine learning for interdependent and structured output spaces. In Proceedings of the international conference of machine learning.

    Google Scholar 

  • Tukey, J. W. (1977). Exploratory data analysis. Addison-Wesley: Reading.

    MATH  Google Scholar 

  • van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2010). Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9, 1582–1596.

    Article  Google Scholar 

  • Vedaldi, A., & Zisserman, A. (2010). Efficient additive kernels via explicit feature maps. In IEEE conference on computer vision and pattern recognition.

    Google Scholar 

  • Vedaldi, A., Gulshan, V., Varma, M., & Zisserman, A. (2009). Multiple kernels for object detection. In International conference on computer vision.

    Google Scholar 

  • Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In IEEE conference on computer vision and pattern recognition.

    Google Scholar 

  • Yang, Y., Hallman, S., Ramanan, D., & Fowlkes, C. (2010). Layered object detection for multi-class segmentation. In IEEE conference on computer vision and pattern recognition.

    Google Scholar 

  • Yu, H. F., Hsieh, C. J., Chang, K. W., & Lin, C. J. (2010). Large linear classification when data cannot fit in memory. In ACM SIGKDD conference on knowledge discovery and data mining.

    Google Scholar 

  • Yu, S. X., & Shi, J. (2003). Object-specific figure-ground segregation. In IEEE conference on computer vision and pattern recognition (Vol. 2, p. 39).

    Google Scholar 

  • Zhang, H., Berg, A., Maire, M., & Malik, J. (2006). Svm-knn: discriminative nearest neighbor classification for visual category recognition. In Computer vision and pattern recognition. IEEE computer society conference on (Vol. 2, pp. 2126–2136).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cristian Sminchisescu.

Additional information

The first two authors contributed equally.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Carreira, J., Li, F. & Sminchisescu, C. Object Recognition by Sequential Figure-Ground Ranking. Int J Comput Vis 98, 243–262 (2012). https://doi.org/10.1007/s11263-011-0507-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-011-0507-2

Keywords

Navigation