Abstract
In this paper, we assess three standard approaches to build irregular pyramid partitions for image retrieval in the bag-of-bags of words model that we recently proposed. These three approaches are: kernel \(k\)-means to optimize multilevel weighted graph cuts, normalized cuts and graph cuts, respectively. The bag-of-bags of words (BBoW) model is an approach based on irregular pyramid partitions over the image. An image is first represented as a connected graph of local features on a regular grid of pixels. Irregular partitions (subgraphs) of the image are further built by using graph partitioning methods. Each subgraph in the partition is then represented by its own signature. The BBoW model with the aid of graph extends the classical bag-of-words model, by embedding color homogeneity and limited spatial information through irregular partitions of an image. Compared with existing methods for image retrieval, such as spatial pyramid matching, the BBoW model does not assume that similar parts of a scene always appear at the same location in images of the same category. The extension of the proposed model to pyramid gives rise to a method we name irregular pyramid matching. The experiments on Caltech-101 benchmark demonstrate that applying kernel \(k\)-means to graph clustering process produces better retrieval results, as compared with other graph partitioning methods such as graph cuts and normalized cuts for BBoW. Moreover, this proposed method achieves comparable results and outperforms SPM in 19 object categories on the whole Caltech-101 dataset.






Similar content being viewed by others
Notes
For more detail, see supplementary material online.
References
Agarwal, A., Triggs, B.: Hyperfeatures—multilevel local coding for visual recognition. In: ECCV, pp. 30–43 (2006)
Birchfield, S., Rangarajan, S.: Spatiograms versus histograms for region-based tracking. In: CVPR, pp. 1158–1163 (2005)
Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. Pattern Anal. Mach. Intell. 26(9), 1124–1137 (2004)
Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23(11), 1222–1239 (2001)
Bunke, H., Riesen, K.: Towards the unification of structural and statistical pattern recognition. Pattern Recognit. Lett. 33(7), 811–825 (2012)
Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: BMVC, pp. 1–12 (2011)
Dhillon, I.S., Guan, Y., Kulis, B.: Kernel k-means: spectral clustering and normalized cuts. In: International Conference on Knowledge Discovery and Data Mining, pp. 551–556 (2004)
Dhillon, I.S., Guan, Y., Kulis, B.: Weighted graph cuts without eigenvectors a multilevel approach. IEEE Trans. Pattern Anal. Mach. Intell. 29(11), 1944–1957 (2007)
Duchenne, O., Joulin, A., Ponce, J.: A graph-matching kernel for object categorization. In: ICCV (2011)
Gibert, J., Valveny, E., Bunke, H.: Graph embedding in vector spaces by node attribute statistics. Pattern Recognit. 45(9), 3072–3083 (2012)
Jegou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: CVPR (2010)
Jou, F.D., Fan, K.C., Chang, Y.L.: Efficient matching of large-size histograms. Pattern Recognit. Lett. 25(3), 277–286 (2004)
Kolmogorov, V., Zabih, R.: What energy functions can be minimized via graph cuts? IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 147–159 (2004)
Krapac, J., Verbeek, J.J., Jurie, F.: Modeling spatial layout with fisher vectors for image categorization. In: ICCV, pp. 1487–1494 (2011)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)
Leibe, B., Leonardis, A., Schiele, B.: Combined object categorization and segmentation with an implicit shape model. In: ECCV Workshop on Statistical Learning in Computer Vision (2004)
Li, F.F., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In: Computer Vision and Pattern Recognition Workshop on Generative-Model Based Vision p. 178 (2004)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comp. Vis. 60(2), 91–110 (2004)
McCann, S., Lowe, D.G.: Spatially local coding for object recognition. In: ACCV, pp. 204–217 (2012)
Perronnin, F., Dance, C.R.: Fisher kernels on visual vocabularies for image categorization. In: CVPR (2007)
Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: ECCV, pp. 143–156 (2010)
Ren, X., Malik, J.: Learning a classification model for segmentation. In: ICCV (2003)
Ren, Y., Bugeau, A., Benois-Pineau, J.: Bag-of-bags of words—irregular graph pyramids vs spatial pyramid matching for image retrieval. In: IPTA, pp. 247–252 (2014)
Rother, C., Kolmogorov, V., Blake, A.: Grabcut: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23(3), 309–314 (2004)
Sánchez, J., Perronnin, F., de Campos, T.E.: Modeling the spatial layout of images beyond spatial pyramids. Pattern Recognit. Lett. 33(16), 2216–2223 (2012)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: ICCV, pp. 1470–1477 (2003)
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T.S., Gong, Y.: Locality-constrained linear coding for image classification. In: CVPR, pp. 3360–3367 (2010)
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR, pp. 1794–1801 (2009)
Acknowledgments
This work was conducted as Ph.D work of the author, supported by CNRS (Centre national de la recherche scientifique) and Region of Aquitaine Grant.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Ren, Y. A comparative study of irregular pyramid matching in bag-of-bags of words model for image retrieval. SIViP 10, 471–478 (2016). https://doi.org/10.1007/s11760-015-0763-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-015-0763-7