skip to main content
10.1145/2783258.2783284acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Selective Hashing: Closing the Gap between Radius Search and k-NN Search

Published: 10 August 2015 Publication History

Abstract

Locality Sensitive Hashing (LSH) and its variants, are generally believed to be the most effective radius search methods in high-dimensional spaces. However, many applications involve finding the k nearest neighbors (k-NN), where the k-NN distances of different query points may differ greatly and the performance of LSH suffers. We propose a novel indexing scheme called Selective Hashing, where a disjoint set of indices are built with different granularities and each point is only stored in the most effective index. Theoretically, we show that k-NN search using selective hashing can achieve the same recall as a fixed radius LSH search, using a radius equal to the distance of the c1kth nearest neighbor, with at most c2 times overhead, where c1 and c2 are small constants. Selective hashing is also easy to build and update, and outperforms all the state-of-the-art algorithms such as DSH and IsoHash.

Supplementary Material

MP4 File (p349.mp4)

References

[1]
A. Andoni, P. Indyk, H. L. Nguyen, and I. Razenshteyn. Beyond locality-sensitive hashing. In SODA, 2014.
[2]
V. Athitsos, J. Alon, S. Sclaroff, and G. Kollios. Boostmap: A method for efficient approximate similarity rankings. In CVPR, 2004.
[3]
K. P. Bennett, U. Fayyad, and D. Geiger. Density-based indexing for approximate nearest-neighbor queries. In SIGKDD, 1999.
[4]
T. Bertin-Mahieux, D. P. Ellis, B. Whitman, and P. Lamere. The million song dataset. In ISMIR, 2011.
[5]
E. Chávez, G. Navarro, R. Baeza-Yates, and J. L. Marroquín. Searching in metric spaces. ACM Computing Surveys, 2001.
[6]
A. Dasgupta, R. Kumar, and T. Sarlós. Fast locality-sensitive hashing. In SIGKDD, 2011.
[7]
M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In SoCG, 2004.
[8]
J. Gan, J. Feng, Q. Fang, and W. Ng. Locality-sensitive hashing scheme based on dynamic collision counting. In SIGMOD, 2012.
[9]
J. Gao, H. V. Jagadish, W. Lu, and B. C. Ooi. Dsh: data sensitive hashing for high-dimensional k-nnsearch. In SIGMOD, 2014.
[10]
A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, 1999.
[11]
G. R. Hjaltason and H. Samet. Index-driven similarity search in metric spaces (survey article). TODS, 2003.
[12]
Y. Hwang, B. Han, and H.-K. Ahn. A fast nearest neighbor search algorithm by nonlinear embedding. In CVPR, 2012.
[13]
H. Jégou, L. Amsaleg, C. Schmid, and P. Gros. Query adaptative locality sensitive hashing. In ICASSP, 2008.
[14]
J. F. C. Kingman. Poisson processes, volume 3. Oxford university press, 1992.
[15]
W. Kong and W.-J. Li. Isotropic hashing. In NIPS, 2012.
[16]
Y. Lin, R. Jin, D. Cai, S. Yan, and X. Li. Compressed hashing. In CVPR, 2013.
[17]
Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe lsh: efficient indexing for high-dimensional similarity search. In VLDB, 2007.
[18]
R. Motwani, A. Naor, and R. Panigrahy. Lower bounds on locality sensitive hashing. Discrete Mathematics, 2007.
[19]
Y. Mu, J. Shen, and S. Yan. Weakly-supervised hashing in kernel space. In CVPR, 2010.
[20]
R. Panigrahy. Entropy based nearest neighbor search in high dimensions. In SODA, 2006.
[21]
D. W. Scott. Multivariate density estimation: theory, practice, and visualization, volume 383. John Wiley & Sons, 2009.
[22]
N. Srivastava and R. Salakhutdinov. Multimodal learning with deep boltzmann machines. In NIPS, 2012.
[23]
M. Stonebraker. The case for partial indexes. SIGMOD Record, 1989.
[24]
Y. Tao, K. Yi, C. Sheng, and P. Kalnis. Quality and efficiency in high dimensional nearest neighbor search. In SIGMOD, 2009.
[25]
J. Wang, S. Kumar, and S.-F. Chang. Semi-supervised hashing for large-scale search. TPAMI, 2012.
[26]
Q. Wang, S. R. Kulkarni, and S. Verdú. Divergence estimation for multidimensional densities via-nearest-neighbor distances. Information Theory, 2009.
[27]
R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, 1998.
[28]
Y. Weiss, R. Fergus, and A. Torralba. Multidimensional spectral hashing. In ECCV. 2012.
[29]
Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In NIPS, 2008.

Cited By

View all
  • (2022)Effective community search over large star-schema heterogeneous information networksProceedings of the VLDB Endowment10.14778/3551793.355179515:11(2307-2320)Online publication date: 1-Jul-2022
  • (2022)EGM: Enhanced Graph-based Model for Large-scale Video Advertisement SearchProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539061(4443-4451)Online publication date: 14-Aug-2022
  • (2022)Secure Similarity Search Over Encrypted Non-Uniform DatasetsIEEE Transactions on Cloud Computing10.1109/TCC.2020.300023310:3(2102-2117)Online publication date: 1-Jul-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2015
2378 pages
ISBN:9781450336642
DOI:10.1145/2783258
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 August 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. adaptive hashing
  2. data sensitive hashing
  3. high-dimensional indexing
  4. locality sensitive hashing
  5. lsh
  6. selective hashing

Qualifiers

  • Research-article

Funding Sources

  • National Research Foundation Prime Minister's Office Singapore
  • NSF

Conference

KDD '15
Sponsor:

Acceptance Rates

KDD '15 Paper Acceptance Rate 160 of 819 submissions, 20%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)3
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Effective community search over large star-schema heterogeneous information networksProceedings of the VLDB Endowment10.14778/3551793.355179515:11(2307-2320)Online publication date: 1-Jul-2022
  • (2022)EGM: Enhanced Graph-based Model for Large-scale Video Advertisement SearchProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539061(4443-4451)Online publication date: 14-Aug-2022
  • (2022)Secure Similarity Search Over Encrypted Non-Uniform DatasetsIEEE Transactions on Cloud Computing10.1109/TCC.2020.300023310:3(2102-2117)Online publication date: 1-Jul-2022
  • (2022)LayerLSH: Rebuilding Locality-Sensitive Hashing Indices by Exploring Density of Hash ValuesIEEE Access10.1109/ACCESS.2022.318280210(69851-69865)Online publication date: 2022
  • (2020)IDARProceedings of the VLDB Endowment10.14778/3397230.339724113:9(1456-1468)Online publication date: 26-Jun-2020
  • (2020)VHPProceedings of the VLDB Endowment10.14778/3397230.339724013:9(1443-1455)Online publication date: 1-May-2020
  • (2020)Improving Approximate Nearest Neighbor Search through Learned Adaptive Early TerminationProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3380600(2539-2554)Online publication date: 11-Jun-2020
  • (2020)Online Nearest Neighbor Search Using Hamming Weight TreesIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2019.290239142:7(1729-1740)Online publication date: 1-Jul-2020
  • (2020)Approximate Nearest Neighbor Search on High Dimensional Data — Experiments, Analyses, and ImprovementIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.290920432:8(1475-1488)Online publication date: 1-Aug-2020
  • (2020)A machine learning approach for image retrieval tasks2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)10.1109/IVCNZ51579.2020.9290617(1-5)Online publication date: 25-Nov-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media