ABSTRACT
Locality Sensitive Hashing (LSH) and its variants, are generally believed to be the most effective radius search methods in high-dimensional spaces. However, many applications involve finding the k nearest neighbors (k-NN), where the k-NN distances of different query points may differ greatly and the performance of LSH suffers. We propose a novel indexing scheme called Selective Hashing, where a disjoint set of indices are built with different granularities and each point is only stored in the most effective index. Theoretically, we show that k-NN search using selective hashing can achieve the same recall as a fixed radius LSH search, using a radius equal to the distance of the c1kth nearest neighbor, with at most c2 times overhead, where c1 and c2 are small constants. Selective hashing is also easy to build and update, and outperforms all the state-of-the-art algorithms such as DSH and IsoHash.
Supplemental Material
- A. Andoni, P. Indyk, H. L. Nguyen, and I. Razenshteyn. Beyond locality-sensitive hashing. In SODA, 2014. Google ScholarDigital Library
- V. Athitsos, J. Alon, S. Sclaroff, and G. Kollios. Boostmap: A method for efficient approximate similarity rankings. In CVPR, 2004. Google ScholarDigital Library
- K. P. Bennett, U. Fayyad, and D. Geiger. Density-based indexing for approximate nearest-neighbor queries. In SIGKDD, 1999. Google ScholarDigital Library
- T. Bertin-Mahieux, D. P. Ellis, B. Whitman, and P. Lamere. The million song dataset. In ISMIR, 2011.Google Scholar
- E. Chávez, G. Navarro, R. Baeza-Yates, and J. L. Marroquín. Searching in metric spaces. ACM Computing Surveys, 2001.Google ScholarDigital Library
- A. Dasgupta, R. Kumar, and T. Sarlós. Fast locality-sensitive hashing. In SIGKDD, 2011. Google ScholarDigital Library
- M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In SoCG, 2004. Google ScholarDigital Library
- J. Gan, J. Feng, Q. Fang, and W. Ng. Locality-sensitive hashing scheme based on dynamic collision counting. In SIGMOD, 2012. Google ScholarDigital Library
- J. Gao, H. V. Jagadish, W. Lu, and B. C. Ooi. Dsh: data sensitive hashing for high-dimensional k-nnsearch. In SIGMOD, 2014. Google ScholarDigital Library
- A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, 1999. Google ScholarDigital Library
- G. R. Hjaltason and H. Samet. Index-driven similarity search in metric spaces (survey article). TODS, 2003. Google ScholarDigital Library
- Y. Hwang, B. Han, and H.-K. Ahn. A fast nearest neighbor search algorithm by nonlinear embedding. In CVPR, 2012. Google ScholarDigital Library
- H. Jégou, L. Amsaleg, C. Schmid, and P. Gros. Query adaptative locality sensitive hashing. In ICASSP, 2008.Google ScholarCross Ref
- J. F. C. Kingman. Poisson processes, volume 3. Oxford university press, 1992.Google Scholar
- W. Kong and W.-J. Li. Isotropic hashing. In NIPS, 2012.Google ScholarDigital Library
- Y. Lin, R. Jin, D. Cai, S. Yan, and X. Li. Compressed hashing. In CVPR, 2013. Google ScholarDigital Library
- Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe lsh: efficient indexing for high-dimensional similarity search. In VLDB, 2007. Google ScholarDigital Library
- R. Motwani, A. Naor, and R. Panigrahy. Lower bounds on locality sensitive hashing. Discrete Mathematics, 2007. Google ScholarDigital Library
- Y. Mu, J. Shen, and S. Yan. Weakly-supervised hashing in kernel space. In CVPR, 2010.Google ScholarCross Ref
- R. Panigrahy. Entropy based nearest neighbor search in high dimensions. In SODA, 2006. Google ScholarDigital Library
- D. W. Scott. Multivariate density estimation: theory, practice, and visualization, volume 383. John Wiley & Sons, 2009.Google Scholar
- N. Srivastava and R. Salakhutdinov. Multimodal learning with deep boltzmann machines. In NIPS, 2012.Google ScholarDigital Library
- M. Stonebraker. The case for partial indexes. SIGMOD Record, 1989. Google ScholarDigital Library
- Y. Tao, K. Yi, C. Sheng, and P. Kalnis. Quality and efficiency in high dimensional nearest neighbor search. In SIGMOD, 2009. Google ScholarDigital Library
- J. Wang, S. Kumar, and S.-F. Chang. Semi-supervised hashing for large-scale search. TPAMI, 2012. Google ScholarDigital Library
- Q. Wang, S. R. Kulkarni, and S. Verdú. Divergence estimation for multidimensional densities via-nearest-neighbor distances. Information Theory, 2009. Google ScholarDigital Library
- R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, 1998. Google ScholarDigital Library
- Y. Weiss, R. Fergus, and A. Torralba. Multidimensional spectral hashing. In ECCV. 2012. Google ScholarDigital Library
- Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In NIPS, 2008.Google ScholarDigital Library
Index Terms
- Selective Hashing: Closing the Gap between Radius Search and k-NN Search
Recommendations
Query-aware locality-sensitive hashing for approximate nearest neighbor search
Locality-Sensitive Hashing (LSH) and its variants are the well-known indexing schemes for the c-Approximate Nearest Neighbor (c-ANN) search problem in high-dimensional Euclidean space. Traditionally, LSH functions are constructed in a query-oblivious ...
Locality-sensitive hashing scheme based on p-stable distributions
SCG '04: Proceedings of the twentieth annual symposium on Computational geometryWe present a novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under lp norm, based on p-stable distributions.Our scheme improves the running time of the earlier algorithm for the case of the lp norm. It also yields ...
Revisiting kd-tree for Nearest Neighbor Search
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining\kdtree \citefriedman1976algorithm has long been deemed unsuitable for exact nearest-neighbor search in high dimensional data. The theoretical guarantees and the empirical performance of \kdtree do not show significant improvements over brute-force ...
Comments