Skip to main content
Log in

A comparative study of irregular pyramid matching in bag-of-bags of words model for image retrieval

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

In this paper, we assess three standard approaches to build irregular pyramid partitions for image retrieval in the bag-of-bags of words model that we recently proposed. These three approaches are: kernel \(k\)-means to optimize multilevel weighted graph cuts, normalized cuts and graph cuts, respectively. The bag-of-bags of words (BBoW) model is an approach based on irregular pyramid partitions over the image. An image is first represented as a connected graph of local features on a regular grid of pixels. Irregular partitions (subgraphs) of the image are further built by using graph partitioning methods. Each subgraph in the partition is then represented by its own signature. The BBoW model with the aid of graph extends the classical bag-of-words model, by embedding color homogeneity and limited spatial information through irregular partitions of an image. Compared with existing methods for image retrieval, such as spatial pyramid matching, the BBoW model does not assume that similar parts of a scene always appear at the same location in images of the same category. The extension of the proposed model to pyramid gives rise to a method we name irregular pyramid matching. The experiments on Caltech-101 benchmark demonstrate that applying kernel \(k\)-means to graph clustering process produces better retrieval results, as compared with other graph partitioning methods such as graph cuts and normalized cuts for BBoW. Moreover, this proposed method achieves comparable results and outperforms SPM in 19 object categories on the whole Caltech-101 dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://trec.nist.gov/.

  2. http://perso-etis.ensea.fr/yren/thesis/thesis.pdf.

  3. For more detail, see supplementary material online.

References

  1. Agarwal, A., Triggs, B.: Hyperfeatures—multilevel local coding for visual recognition. In: ECCV, pp. 30–43 (2006)

  2. Birchfield, S., Rangarajan, S.: Spatiograms versus histograms for region-based tracking. In: CVPR, pp. 1158–1163 (2005)

  3. Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. Pattern Anal. Mach. Intell. 26(9), 1124–1137 (2004)

    Article  Google Scholar 

  4. Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23(11), 1222–1239 (2001)

    Article  Google Scholar 

  5. Bunke, H., Riesen, K.: Towards the unification of structural and statistical pattern recognition. Pattern Recognit. Lett. 33(7), 811–825 (2012)

    Article  Google Scholar 

  6. Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: BMVC, pp. 1–12 (2011)

  7. Dhillon, I.S., Guan, Y., Kulis, B.: Kernel k-means: spectral clustering and normalized cuts. In: International Conference on Knowledge Discovery and Data Mining, pp. 551–556 (2004)

  8. Dhillon, I.S., Guan, Y., Kulis, B.: Weighted graph cuts without eigenvectors a multilevel approach. IEEE Trans. Pattern Anal. Mach. Intell. 29(11), 1944–1957 (2007)

    Article  Google Scholar 

  9. Duchenne, O., Joulin, A., Ponce, J.: A graph-matching kernel for object categorization. In: ICCV (2011)

  10. Gibert, J., Valveny, E., Bunke, H.: Graph embedding in vector spaces by node attribute statistics. Pattern Recognit. 45(9), 3072–3083 (2012)

    Article  Google Scholar 

  11. Jegou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: CVPR (2010)

  12. Jou, F.D., Fan, K.C., Chang, Y.L.: Efficient matching of large-size histograms. Pattern Recognit. Lett. 25(3), 277–286 (2004)

    Article  Google Scholar 

  13. Kolmogorov, V., Zabih, R.: What energy functions can be minimized via graph cuts? IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 147–159 (2004)

    Article  Google Scholar 

  14. Krapac, J., Verbeek, J.J., Jurie, F.: Modeling spatial layout with fisher vectors for image categorization. In: ICCV, pp. 1487–1494 (2011)

  15. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)

  16. Leibe, B., Leonardis, A., Schiele, B.: Combined object categorization and segmentation with an implicit shape model. In: ECCV Workshop on Statistical Learning in Computer Vision (2004)

  17. Li, F.F., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In: Computer Vision and Pattern Recognition Workshop on Generative-Model Based Vision p. 178 (2004)

  18. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comp. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  19. McCann, S., Lowe, D.G.: Spatially local coding for object recognition. In: ACCV, pp. 204–217 (2012)

  20. Perronnin, F., Dance, C.R.: Fisher kernels on visual vocabularies for image categorization. In: CVPR (2007)

  21. Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: ECCV, pp. 143–156 (2010)

  22. Ren, X., Malik, J.: Learning a classification model for segmentation. In: ICCV (2003)

  23. Ren, Y., Bugeau, A., Benois-Pineau, J.: Bag-of-bags of words—irregular graph pyramids vs spatial pyramid matching for image retrieval. In: IPTA, pp. 247–252 (2014)

  24. Rother, C., Kolmogorov, V., Blake, A.: Grabcut: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23(3), 309–314 (2004)

    Article  Google Scholar 

  25. Sánchez, J., Perronnin, F., de Campos, T.E.: Modeling the spatial layout of images beyond spatial pyramids. Pattern Recognit. Lett. 33(16), 2216–2223 (2012)

  26. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)

  27. Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: ICCV, pp. 1470–1477 (2003)

  28. Wang, J., Yang, J., Yu, K., Lv, F., Huang, T.S., Gong, Y.: Locality-constrained linear coding for image classification. In: CVPR, pp. 3360–3367 (2010)

  29. Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR, pp. 1794–1801 (2009)

Download references

Acknowledgments

This work was conducted as Ph.D work of the author, supported by CNRS (Centre national de la recherche scientifique) and Region of Aquitaine Grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Ren.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ren, Y. A comparative study of irregular pyramid matching in bag-of-bags of words model for image retrieval. SIViP 10, 471–478 (2016). https://doi.org/10.1007/s11760-015-0763-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-015-0763-7

Keywords

Navigation