skip to main content
10.1145/2783258.2783284acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Selective Hashing: Closing the Gap between Radius Search and k-NN Search

Authors Info & Claims
Published:10 August 2015Publication History

ABSTRACT

Locality Sensitive Hashing (LSH) and its variants, are generally believed to be the most effective radius search methods in high-dimensional spaces. However, many applications involve finding the k nearest neighbors (k-NN), where the k-NN distances of different query points may differ greatly and the performance of LSH suffers. We propose a novel indexing scheme called Selective Hashing, where a disjoint set of indices are built with different granularities and each point is only stored in the most effective index. Theoretically, we show that k-NN search using selective hashing can achieve the same recall as a fixed radius LSH search, using a radius equal to the distance of the c1kth nearest neighbor, with at most c2 times overhead, where c1 and c2 are small constants. Selective hashing is also easy to build and update, and outperforms all the state-of-the-art algorithms such as DSH and IsoHash.

Skip Supplemental Material Section

Supplemental Material

p349.mp4

mp4

272.2 MB

References

  1. A. Andoni, P. Indyk, H. L. Nguyen, and I. Razenshteyn. Beyond locality-sensitive hashing. In SODA, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. V. Athitsos, J. Alon, S. Sclaroff, and G. Kollios. Boostmap: A method for efficient approximate similarity rankings. In CVPR, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. P. Bennett, U. Fayyad, and D. Geiger. Density-based indexing for approximate nearest-neighbor queries. In SIGKDD, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Bertin-Mahieux, D. P. Ellis, B. Whitman, and P. Lamere. The million song dataset. In ISMIR, 2011.Google ScholarGoogle Scholar
  5. E. Chávez, G. Navarro, R. Baeza-Yates, and J. L. Marroquín. Searching in metric spaces. ACM Computing Surveys, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Dasgupta, R. Kumar, and T. Sarlós. Fast locality-sensitive hashing. In SIGKDD, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In SoCG, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Gan, J. Feng, Q. Fang, and W. Ng. Locality-sensitive hashing scheme based on dynamic collision counting. In SIGMOD, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Gao, H. V. Jagadish, W. Lu, and B. C. Ooi. Dsh: data sensitive hashing for high-dimensional k-nnsearch. In SIGMOD, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. R. Hjaltason and H. Samet. Index-driven similarity search in metric spaces (survey article). TODS, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. Hwang, B. Han, and H.-K. Ahn. A fast nearest neighbor search algorithm by nonlinear embedding. In CVPR, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. Jégou, L. Amsaleg, C. Schmid, and P. Gros. Query adaptative locality sensitive hashing. In ICASSP, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  14. J. F. C. Kingman. Poisson processes, volume 3. Oxford university press, 1992.Google ScholarGoogle Scholar
  15. W. Kong and W.-J. Li. Isotropic hashing. In NIPS, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Lin, R. Jin, D. Cai, S. Yan, and X. Li. Compressed hashing. In CVPR, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe lsh: efficient indexing for high-dimensional similarity search. In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Motwani, A. Naor, and R. Panigrahy. Lower bounds on locality sensitive hashing. Discrete Mathematics, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Y. Mu, J. Shen, and S. Yan. Weakly-supervised hashing in kernel space. In CVPR, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  20. R. Panigrahy. Entropy based nearest neighbor search in high dimensions. In SODA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. W. Scott. Multivariate density estimation: theory, practice, and visualization, volume 383. John Wiley & Sons, 2009.Google ScholarGoogle Scholar
  22. N. Srivastava and R. Salakhutdinov. Multimodal learning with deep boltzmann machines. In NIPS, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Stonebraker. The case for partial indexes. SIGMOD Record, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Tao, K. Yi, C. Sheng, and P. Kalnis. Quality and efficiency in high dimensional nearest neighbor search. In SIGMOD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Wang, S. Kumar, and S.-F. Chang. Semi-supervised hashing for large-scale search. TPAMI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Q. Wang, S. R. Kulkarni, and S. Verdú. Divergence estimation for multidimensional densities via-nearest-neighbor distances. Information Theory, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Y. Weiss, R. Fergus, and A. Torralba. Multidimensional spectral hashing. In ECCV. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In NIPS, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Selective Hashing: Closing the Gap between Radius Search and k-NN Search

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
        August 2015
        2378 pages
        ISBN:9781450336642
        DOI:10.1145/2783258

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 10 August 2015

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        KDD '15 Paper Acceptance Rate160of819submissions,20%Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader