Skip to main content
Log in

Categorization of Multiple Objects in a Scene Using a Biased Sampling Strategy

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Recently, various bag-of-features (BoF) methods show their good resistance to within-class variations and occlusions in object categorization. In this paper, we present a novel approach for multi-object categorization within the BoF framework. The approach addresses two issues in BoF related methods simultaneously: how to avoid scene modeling and how to predict labels of an image when multiple categories of objects are co-existing. We employ a biased sampling strategy which combines the bottom-up, biologically inspired saliency information and loose, top-down class prior information for object class modeling. Then this biased sampling component is further integrated with a multi-instance multi-label leaning and classification algorithm. With the proposed biased sampling strategy, we can perform multi-object categorization within an image without semantic segmentation. The experimental results on PASCAL VOC2007 and SUN09 show that the proposed method significantly improves the discriminative ability of BoF methods and achieves good performance in multi-object categorization tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • Achanta, R., Hemami, S., Estrada, F., & Susstrunk, S. (2009). Frequency-tuned salient region detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1597–1604).

  • Agarwal, S., Awan, A., & Roth, D. (2004). Learning to detect objects in images via a sparse, part based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 1475–1490.

    Article  Google Scholar 

  • Chen, Y., Bi, J., & Wang, J. Z. (2006). Miles: Multiple-instance learning via embedded instance selection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(12), 1931–1947.

    Article  Google Scholar 

  • Choi, M., Lim, J., Torralba, A., & Willsky, A. (2010). Exploiting hierarchical context on a large database of object categories. In IEEE Conference on Computer Vision and Pattern Recogntion (CVPR) (pp. 129–136).

  • Chum, O., & Zisserman, A. (2007). An exemplar model for learning object classes. In: detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1–8).

  • Csurka, G., Dance, C. R., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In ECCV Workshop on Statistical Learning in Computer Vision (pp. 59–74).

  • Dorkó G, & Schmid C. (2003). Selection of scale-invariant parts for object class recognition. In International Conference on Computer Vision (ICCV) (Vol. 1, pp. 634–640).

  • Edgar, G. A. (1990). Measure, topology, and fractal geometry. Berlin: Springer.

    Book  MATH  Google Scholar 

  • Everingham, M., Gool, L. V., Williams, C. K. I., Winn, J., & Zisserman, A. (2007). The PASCAL visual object classes challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html. Accessed 23 Oct 2007.

  • Everingham, M., Gool, L. V., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.

    Article  Google Scholar 

  • Fei-Fei, L., Fergus, R., & Perona, P. (2006). One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4), 594–611.

    Google Scholar 

  • Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181.

    Article  Google Scholar 

  • Felzenszwalb, P., McAllester, D., & Ramanan, D. (2008). A discriminatively trained, multiscale, deformable part model. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Ferrari, V., Fevrier, L., Jurie, F., & Schmid, C. (2008). Groups of adjacent contour segments for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(1), 36–51.

    Article  Google Scholar 

  • Fulkerson, B., Vedaldi, A., & Soatto, S. (2008). Localizing objects with smart dictionaries. European Conference on Computer Vision (ECCV), I, 179–192.

  • Galleguillos, C., Babenko, B., Rabinovich, A., & Belongie, S. (2008). Weakly supervised object localization with stable segmentations. In European Conference on Computer Vision (ECCV) (pp. 193–207).

  • Harzallah, H., Jurie, F., & Schmid, C. (2009). Combining efficient object localization and image classification. In International Conference on Computer Vision (ICCV) (pp. 237–244).

  • Hou, X., & Zhang, L. (2007). Saliency detection: A spectral residual approach. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1–8).

  • Hu, M. (1962). Visual pattern recognition by moment invariants. IRE Transactions on Information Theory, IT-8, 179–187.

  • Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 1254–1259.

    Article  Google Scholar 

  • Jégou, H., Douze, M., & Schmid, C. (2010). Improving bag-of-features for large scale image search. International Journal of Computer Vision, 87(3), 316–336.

    Article  Google Scholar 

  • Kang, F., Jin, R., & Sukthankar, R. (2006). Correlated label propagation with application to multi-label learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1719–1726).

  • Khan, F. S., van de Weijer, J., & Vanrell, M. (2009). Top-down color attention for object recognition. In International Conference on Computer Vision (ICCV) (pp. 1719–1726).

  • Kittler, J., & Illingworth, J. (1986). Minimum error thresholding. Pattern Recognition, 19(1), 41–47.

    Article  Google Scholar 

  • Lampert, C. H., Blaschko, M. B., & Hofmann, T. (2009). Efficient subwindow search: A branch and bound framework for object localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(12), 2129–2142.

    Article  Google Scholar 

  • Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Vol. 2, pp. 2169–2178).

  • Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In ECCV Workshop on Statistical Learning in Computer Vision (pp. 17–32).

  • Leung, T., & Malik, J. (2001). Representing and recognizing the visual appearance of materials using three-dimensional textons. International Journal of Computer Vision, 43, 29–44.

    Article  MATH  Google Scholar 

  • Li, J., & Wang, J. Z. (2008). Real-time computerized annotation of pictures. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(6), 985–1002.

    Article  Google Scholar 

  • Li, L. J., Socher, R., & Fei-Fei, L. (2009). Towards total scene understanding:classification, annotation and segmentation in an automatic framework. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.

    Article  Google Scholar 

  • Mairal, J., Leordeanu, M., Bach, F., Ponce, J., & Hebert, M. (2008). Discriminative sparse image models for class-specific edge detection and image interpretation. In European Conference on Computer Vision (ECCV) (Vol. 3, pp. 43–56).

  • Marszałek, M., & Schmid, C. (2006). Spatial weighting for bag-of-features. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Vol. 2, pp. 2118–2125).

  • Marszałek, M., Schmid, C., Harzallah, H., & van de Weijer, J. (2007). Learning object representations for visual object class recognition. In Visual Recognition Challenge Workshop.

  • Martin, D., Fowlkes, C., Tal, D., & Malik, J. (2001). A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In International Conference on Computer Vision (ICCV) (Vol. 2, pp. 416–423).

  • Mikolajczyk, K., & Schmid, C. (2004). Scale and affine invariant interest point detectors. International Journal of Computer Vision, 60(1), 63–86.

    Article  Google Scholar 

  • Moosmann, F., Nowak, E., & Jurie, F. (2008). Randomized clustering forests for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(9), 1632–1646.

    Article  Google Scholar 

  • Niblack, W. (1986). An introduction to digital image processing. Englewood Cliffs, NJ: Prentice/Hall International.

    Google Scholar 

  • Nistér, D., & Stewénius, H. (2006). Scalable recognition with a vocabulary tree. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2161–2168).

  • Nowak, E., Jurie, F., & Triggs, B. (2006). Sampling strategies for bag-of-features images classification. In European Conference on Computer Vision (ECCV) (Vol. 4, pp. 490–503).

  • Otsu, N. (1979). A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics SMC, 9(1), 62–66.

    Article  MathSciNet  Google Scholar 

  • Pantofaru, C., Dorko, G., Schmid, C., & Hebert, M. (2006). Combining regions and patches for object class localization. In CVPR Workshop on Beyond Patches (pp. 23–30).

  • Parikh, D., Zitnick, L., & Chen, T. (2008). Determining patch saliency using low-level context. In European Conference on Computer Vision (ECCV) (Vol. 2, pp. 446–459).

  • Perronnin, F., Senchez, J., & Liu, Y. (2010). Large-scale image categorization with explicit data embedding. In IEEE Conference on Computer Vision and Pattern Recogntion (CVPR) (pp. 2297–2304).

  • Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., & Belongie, S. (2007). Objects in context. In International Conference on Computer Vision (ICCV) (pp. 1–8).

  • Shotton, J., & Cipolla, R. (2008). Semantic texton forests for image categorization and segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2009). Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. International Journal of Computer Vision, 81(1), 2–23.

    Article  Google Scholar 

  • Tu, Z., Chen, X., Yuille, A. L., & Zhu, S. C. (2005). Image parsing: Unifying segmentation, detection, and recognition. International Journal of Computer Vision, 63(2), 113–140.

    Article  Google Scholar 

  • van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2008). A comparison of color features for visual concept classification. In ACM International Conference on Image and Video Retrieval (CIVR) (pp. 141–150).

  • van de Weijer, J., & Schmid, C. (2006). Coloring local feature extraction. In European Conference on Computer Vision (ECCV) (Vol. 2, pp. 334–348).

  • Walther, D., Rutishauser, U., Hoch, C., & Perona, P. (2004). On the usefulness of attention for object recognition. In European Conference on Computer Vision (ECCV) (pp. 96–103).

  • Wang, C., Blei, D. M., & Li, F. F. (2009). Simultaneous image classification and annotation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1903–1910).

  • Yang, L., Zheng, N., Chen, M., Yang, Y., & Yang, J. (2009a). Categorization of multiple objects in a scene without semantic segmentation. In Asian Conference of Computer Vision.

  • Yang, L., Zheng, N., Yang, J., Chen, M., & Chen, H. (2009b). A biased sampling strategy for object categorization. In International Conference on Computer Vision (ICCV).

  • Zha, Z., Hua, X., Mei, T., Wang, J., Qi, G., & Wang, Z. (2008). Joint multi-label multi-instance learning for image classification. In IEEE Conference on Computer Vision and Pattern Recogntion (CVPR) (pp. 1–8).

  • Zhang, J., Marszałek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: A comprehensive study. International Journal of Computer Vision, 73(2), 213–238.

    Article  Google Scholar 

  • Zhang, M., & Zhou, Z. (2007). Multi-label learning by instance differentiation. In AAAI Conference on Artificial Intelligence (pp. 669–674).

  • Zhou, Z., & Zhang, M. (2007). Multi-instance multi-label learning with application to scene classification. In Advances in Neural Information Processing Systems (pp. 1609–1616).

Download references

Acknowledgments

This research was supported by the State Key Program of National Natural Science of China(Grant No. 60635050). The last Author was partially supported by National Science Foundation of USA.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Yang.

Additional information

This paper was originally submitted to the special issue featuring extended papers from ACCV.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, L., Zheng, N., Chen, M. et al. Categorization of Multiple Objects in a Scene Using a Biased Sampling Strategy. Int J Comput Vis 105, 1–18 (2013). https://doi.org/10.1007/s11263-013-0629-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-013-0629-9

Keywords

Navigation