skip to main content
10.1145/2745754.2745761acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
research-article

Smooth Tradeoffs between Insert and Query Complexity in Nearest Neighbor Search

Published: 20 May 2015 Publication History

Abstract

Locality Sensitive Hashing (LSH) has emerged as the method of choice for high dimensional similarity search, a classical problem of interest in numerous applications. LSH-based solutions require that each data point be inserted into a number A of hash tables, after which a query can be answered by performing B lookups. The original LSH solution of [IM98] showed for the first time that both A and B can be made sublinear in the number of data points. Unfortunately, the classical LSH solution does not provide any tradeoff between insert and query complexity, whereas for data (respectively, query) intensive applications one would like to minimize insert time by choosing a smaller $A$ (respectively, minimize query time by choosing a smaller B). A partial remedy for this is provided by Entropy LSH [Pan06], which allows to make either inserts or queries essentially constant time at the expense of a loss in the other parameter, but no algorithm that achieves a smooth tradeoff is known.
In this paper, we present an algorithm for performing similarity search under the Euclidean metric that resolves the problem above. Our solution is inspired by Entropy LSH, but uses a very different analysis to achieve a smooth tradeoff between insert and query complexity. Our results improve upon or match, up to lower order terms in the exponent, best known data-oblivious algorithms for the Euclidean metric.

References

[1]
A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In FOCS'06.
[2]
Alexandr Andoni, Piotr Indyk, Huy L. Nguyen, and Ilya Razenshteyn. Beyond locality-sensitive hashing. SODA, 2014.
[3]
Alexandr Andoni. Nearest Neighbor Search: the Old, the New, and the Impossible. Ph.D. Thesis, MIT, 2009.
[4]
J. Bentley. Multidimensional binary search trees used for associative searching. In Comm. ACM, 1975.
[5]
P. Berkhin. A survey of clustering data mining techniques. Springer, 2002.
[6]
Franck Barthe, Olivier Gu edon, Shahar Mendelson, and Assaf Naor. A probabilistic approach to the geometry of the ln/p-ball. The Annals of Probability, 33:480--513, 2005.
[7]
A. Beygelzimer, S. Kakade, and J. Langford. Cover trees for nearest neighbors. In ICML, 2006.
[8]
T. Cover and P. Hart. Nearest neighbour pattern classification. In IEEE Trans. on Inf. Theory, 1967.
[9]
A. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: Scalable online collaborative filtering. In WWW, 2007.
[10]
Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Symposium on Computational Geometry, pages 253--262, 2004.
[11]
A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, 1999.
[12]
A. Guttman. R-trees: a dynamic index structure for spatial searching. In SIGMOD, 1984.
[13]
P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In STOC, 1998.
[14]
B. Kulis and K. Grauman. Kernelized locality-sensitive hashing for scalable image search. In ICCV, 2009.
[15]
R. Krauthgamer and J. Lee. Navigating nets:simple algorithms for proximity search. In SODA, 2004.
[16]
E. Kushilevitz, R. Ostrovsky, and Y. Rabani. Efficient search of approximate nearest neighbor in high dimensional spaces. In STOC, 1998.
[17]
N. Katayama and S. Satoh. The sr-tree: an index structure for high-dimensional nearest neighbor queries. In SIGMOD, 1997.
[18]
Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe lsh: Efficient indexing for high-dimensional similarity search. In VLDB, 2007.
[19]
R. Motwani, A. Naor, and R. Panigrahy. Lower bounds on locality sensitive hashing. In SCG '06: Proceedings of the twenty-second annual symposium on Computational geometry, pages 154--157, 2006.
[20]
Ryan O'Donnell, Yi Wu, and Yuan Zhou. Optimal lower bounds for locality sensitive hashing (except when q is tiny). ITCS, 2011.
[21]
Rina Panigrahy. Entropy based nearest neighbor search in high dimensions. In SODA'06.
[22]
R. Panigrahy, K. Talwar, and U. Wieder. Lower bounds on near neighbor search via metric expansion. FOCS'10.
[23]
Rina Panigrahy, Kunal Talwar, and Udi Wieder. A geometric approach to lower bounds for approximate near-neighbor search and partial match. FOCS, pages 414--423, 2008.
[24]
S. T. Rachev and L. Ruschendorf. Approximate independence of distributions on spheres and their stability properties. The Annals of Probability, 19(3):1311--1337, 07 1991.
[25]
G. Schechtman and J. Zinn. On the volume of intersection of two lpballs. Proc. Amer. Math. Soc., 110:217--224, 1990.
[26]
R. Weber, H. Schek, and S. Blott. A quantititative analysis and performance study for similarity search methods in high dimensional spaces. In VLDB, 1998.

Cited By

View all
  • (2020)On the I/O Complexity of the k-Nearest Neighbors ProblemProceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3375395.3387649(205-212)Online publication date: 14-Jun-2020
  • (2020)Subsets and Supermajorities: Optimal Hashing-based Set Similarity Search2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS46700.2020.00073(728-739)Online publication date: Nov-2020
  • (2020)Explicit Correlation Amplifiers for Finding Outlier Correlations in Deterministic Subquadratic TimeAlgorithmica10.1007/s00453-020-00727-182:11(3306-3337)Online publication date: 1-Nov-2020
  • Show More Cited By

Index Terms

  1. Smooth Tradeoffs between Insert and Query Complexity in Nearest Neighbor Search

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PODS '15: Proceedings of the 34th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
    May 2015
    358 pages
    ISBN:9781450327572
    DOI:10.1145/2745754
    • General Chair:
    • Tova Milo,
    • Program Chair:
    • Diego Calvanese
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 May 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. locality sensitive hashing
    2. nearest neighbor search

    Qualifiers

    • Research-article

    Funding Sources

    • U.S. Air Force Office of Scientific Research (AFOSR)
    • NSF

    Conference

    SIGMOD/PODS'15
    Sponsor:
    SIGMOD/PODS'15: International Conference on Management of Data
    May 31 - June 4, 2015
    Victoria, Melbourne, Australia

    Acceptance Rates

    PODS '15 Paper Acceptance Rate 25 of 80 submissions, 31%;
    Overall Acceptance Rate 642 of 2,707 submissions, 24%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 15 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)On the I/O Complexity of the k-Nearest Neighbors ProblemProceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3375395.3387649(205-212)Online publication date: 14-Jun-2020
    • (2020)Subsets and Supermajorities: Optimal Hashing-based Set Similarity Search2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS46700.2020.00073(728-739)Online publication date: Nov-2020
    • (2020)Explicit Correlation Amplifiers for Finding Outlier Correlations in Deterministic Subquadratic TimeAlgorithmica10.1007/s00453-020-00727-182:11(3306-3337)Online publication date: 1-Nov-2020
    • (2018)Set Similarity Search for Skewed DataProceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3196959.3196985(63-74)Online publication date: 27-May-2018
    • (2018)Distance-Sensitive HashingProceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3196959.3196976(89-104)Online publication date: 27-May-2018
    • (2018)Randomized Embeddings with Slack and High-Dimensional Approximate Nearest NeighborACM Transactions on Algorithms10.1145/317854014:2(1-21)Online publication date: 16-Apr-2018
    • (2018)A Faster Subquadratic Algorithm for Finding Outlier CorrelationsACM Transactions on Algorithms10.1145/317480414:3(1-26)Online publication date: 16-Jun-2018
    • (2018)CoveringLSHACM Transactions on Algorithms10.1145/315530014:3(1-17)Online publication date: 16-Jun-2018
    • (2018)Index Structures for Fast Similarity Search for Real-Valued Vectors. ICybernetics and Systems Analysis10.1007/s10559-018-0016-154:1(152-164)Online publication date: 1-Feb-2018
    • (2017)Parameter-free locality sensitive hashing for spherical range reportingProceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3039686.3039702(239-256)Online publication date: 16-Jan-2017
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media