Abstract
We investigate the problem of visual-query based retrieval from large image datasets when the visual queries comprise arbitrary regions of interest (ROI) rather than entire images. Our proposal is a compact image descriptor that combines the vector of locally aggregated descriptors (VLAD) of Jegou et. al. with a multi-level, Voronoi-based, spatial partitioning of each dataset image, and it is termed as the Voronoi VLAD (VVLAD). The proposed multi-level Voronoi partitioning uses a spatial hierarchical K-means over interest-point locations, and computes a VLAD over each cell. In order to reduce the matching complexity when handling very large datasets, we propose the following modifications. First, we utilize the tree structure of the spatial hierarchical K-means to perform a top-to-bottom pruning for local similarity maxima, rather than exhaustively matching against all cells (Fast-VVLAD). Second, we propose to aggregate VLADs of adjacent Voronoi cells in order to reduce the overall VVLAD storage requirement per image. Finally, we propose a new image similarity score for Fast-VVLAD that combines relevant information from all partition levels into a single measure for similarity. For a range of ROI queries in two standard datasets, Fast-VVLAD achieves comparable or higher mean Average Precision against the state-of-the-art Multi-VLAD framework while offering more than two-fold acceleration.
This work was funded in part by Innovate UK, project REVQUAL (101855), and EPSRC (Industrial PhD CASE award, co-sponsored by BAFTA).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Arandjelovic, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2911–2918 (2012)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. of Comput. Vis. 60(2), 91–110 (2004)
Lazebnik, S. et al.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories, In: IEEE International Conference on Computer Vision and Pattern Recogonition, vol. 2, pp. 2169–2178 (2006)
Philbin, J. et al.: Lost in quantization: improving particular object retrieval in large scale image databases. In: IEEE International Conference on Computer Vision and Pattern Recogonition, pp. 1–8 (2008)
Chum, O. et al.: Total recall: automatic query expansion with a generative feature model for object retrieval, In: IEEE International Conference on Computer Vision, pp. 1–8 (2007)
Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation, In: IEEE International Conference on Computer Vision and Pattern Recogonition, pp. 3304–3311 (2010)
Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: IEEE International Conference on Computer Vision and Pattern Recogonition, pp. 1–8 (2007)
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: IEEE International Conference on Computer Vision and Pattern Recogonition, pp. 1–8 (2007)
Arandjelovic, R., Zisserman, A.: All about VLAD. In: IEEE International Conference on Computer Vision and Pattern Recogonition, pp. 1578–1585 (2013)
Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008)
Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: IEEE International Conference on Computer Vision and Pattern Recogonition, vol. 2, pp. II-264–II-271 (2003)
Perronnin, F., Liu, Y., Sánchez, J., Poirier, H.: Large-scale image retrieval with compressed fisher vectors. In: IEEE International Conference on Computer Vision and Pattern Recogonition, pp. 3384–3391 (2010)
Jégou, H., Perronnin, F., Douze, M., Sánchez, J., Pérez, P., Schmid, C.: Aggregating local image descriptors into compact codes. In: IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 34, no. 9, pp. 1704–1716 (2012)
Jégou, H., Chum, O.: Negative evidences and co-occurences in image retrieval: the benefit of pca and whitening. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 774–787. Springer, Heidelberg (2012)
Chum, O., Matas, J.: Unsupervised discovery of co-occurrence in sparse high dimensional data. In: IEEE International Conference on Computer Vision and Pattern Recogonition, pp. 3416–3423 (2010)
Mikolajczyk, K., et al.: A comparison of affine region detectors. Int. J. of Comput. Vis. 65(1–2), 43–72 (2005)
Mikolajczyk, K., Schmid, C.: An affine invariant interest point detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part I. LNCS, vol. 2350, pp. 128–142. Springer, Heidelberg (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Chadha, A., Andreopoulos, Y. (2015). Region-of-Interest Retrieval in Large Image Datasets with Voronoi VLAD. In: Nalpantidis, L., Krüger, V., Eklundh, JO., Gasteratos, A. (eds) Computer Vision Systems. ICVS 2015. Lecture Notes in Computer Science(), vol 9163. Springer, Cham. https://doi.org/10.1007/978-3-319-20904-3_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-20904-3_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20903-6
Online ISBN: 978-3-319-20904-3
eBook Packages: Computer ScienceComputer Science (R0)