ABSTRACT
One of the most successful method to link all similar images within a large collection is min-Hash, which is a way to significantly speed-up the comparison of images when the underlying image representation is bag-of-words. However, the quantization step of min-Hash introduces important information loss. In this paper, we propose a generalization of min-Hash, called Sim-min-Hash, to compare sets of real-valued vectors. We demonstrate the effectiveness of our approach when combined with the Hamming embedding similarity. Experiments on large-scale popular benchmarks demonstrate that Sim-min-Hash is more accurate and faster than min-Hash for similar image search. Linking a collection of one million images described by 2 billion local descriptors is done in 7 minutes on a single core machine.
- R. Arandjelovic and A. Zisserman. Three things everyone should know to improve object retrieval. In CVPR, 2012. Google ScholarDigital Library
- A. Z. Broder. On the resemblance and containment of documents. In Proc. Compression and Complexity of Sequences, 1997. Google ScholarDigital Library
- O. Chum and J. Matas. Large-scale discovery of spatially related images. PAMI, Feb. 2010. Google ScholarDigital Library
- O. Chum, J. Philbin, M. Isard, and A. Zisserman. Scalable near identical image and shot detection. In CIVR, 2007. Google ScholarDigital Library
- O. Chum, J. Philbin, and A. Zisserman. Near duplicate image detection: min-hash and tf-idf weighting. In BMVC, 2008.Google ScholarCross Ref
- K. Heath, N. Gelfand, M. Ovsjanikov, M. Aanjaneya, and L. J. Guibas. Image webs: Computing and exploiting connectivity in image collections. In CVPR, 2010.Google ScholarCross Ref
- H. Jegou, M. Douze, C. Schmid. Improving bag-of-features for large scale image search. IJCV, May 2010. Google ScholarDigital Library
- H. Jegou, C. Schmid, H. Harzallah, and J. Verbeek. Accurate image search using the contextual dissimilarity measure. PAMI, Jan. 2010. Google ScholarDigital Library
- D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. In CVPR, 2006. Google ScholarDigital Library
- L. Pang, W. Zhang, H.-K. Tan, and C.-W. Ngo. Video hyperlinking: libraries and tools for threading and visualizing large video collection. In ACMMM, 2012. Google ScholarDigital Library
- J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In CVPR, 2007.Google ScholarCross Ref
- D. Qin, S. Gammeter, L. Bossard, T. Quack, and L. van Gool. Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors. In CVPR, 201Google Scholar
- J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. In ICCV, 2003. Google ScholarDigital Library
- W. Voravuthikunchai, B. Cremilleux, and F. Jurie. Findging groups of duplicate images in very large datasets. In BMVC, 2012.Google Scholar
- B. Wang, Z. Li, M. Li, and W.-Y. Ma. Large-scale duplicate detection for web image search. In ICME, 2006.Google ScholarCross Ref
- X.-J. Wang, L. Zhang, M. Liu, and W.-Y. Ma. ARISTA: image search to annotation on billions of web photos. In CVPR, 2010.Google ScholarCross Ref
Index Terms
- Sim-min-hash: an efficient matching technique for linking large image collections
Recommendations
Spatial min-Hash for similar image search
ICIMCS '13: Proceedings of the Fifth International Conference on Internet Multimedia Computing and ServiceWe propose a spatial min-Hash algorithm that groups the minimal hashing functions into an s-tuples called a sketch depending on the spatial context. We use the bag-of-words technology to represent an image in a spatial pyramid way, and generate a ...
A Scalable Content-based Image Retrieval Scheme Using Locality-sensitive Hashing
CINC '09: Proceedings of the 2009 International Conference on Computational Intelligence and Natural Computing - Volume 01To develop a fast solution for indexing high-dimensional image contents, which is crucial to building large-scale CBIR systems, is one key challenge in content-based image retrieval (CBIR). In this paper, we propose a scalable content-based image ...
A method using locality-sensitive hashing for large-scale content-based image retrieval
CCDC'09: Proceedings of the 21st annual international conference on Chinese control and decision conferenceTo develop a fast solution for indexing high-dimensional image contents, which is crucial to building large-scale CBIR systems, is one key challenge in content-based image retrieval(CBIR). In this paper, we propose a scalable content-based image ...
Comments