skip to main content
10.1145/3219819.3219971acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Accurate and Fast Asymmetric Locality-Sensitive Hashing Scheme for Maximum Inner Product Search

Published: 19 July 2018 Publication History

Abstract

The problem of Approximate Maximum Inner Product (AMIP) search has received increasing attention due to its wide applications. Interestingly, based on asymmetric transformation, the problem can be reduced to the Approximate Nearest Neighbor (ANN) search, and hence leverage Locality-Sensitive Hashing (LSH) to find solution. However, existing asymmetric transformations such as L2-ALSH and XBOX, suffer from large distortion error in reducing AMIP search to ANN search, such that the results of AMIP search can be arbitrarily bad. In this paper, we propose a novel Asymmetric LSH scheme based on Homocentric Hypersphere partition (H2-ALSH) for high-dimensional AMIP search. On the one hand, we propose a novel Query Normalized First (QNF) transformation to significantly reduce the distortion error. On the other hand, by adopting the homocentric hypersphere partition strategy, we can not only improve the search efficiency with early stop pruning, but also get higher search accuracy by further reducing the distortion error with limited data range. Our theoretical studies show that H2-ALSH enjoys a guarantee on search accuracy. Experimental results over four real datasets demonstrate that H2-ALSH significantly outperforms the state-of-the-art schemes.

References

[1]
Alexandr Andoni and Piotr Indyk . 2006. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions FOCS. 459--468.
[2]
Alex Auvolat, Sarath Chandar, Pascal Vincent, Hugo Larochelle, and Yoshua Bengio . 2015. Clustering is efficient for approximate maximum inner product search. arXiv preprint arXiv:1507.05910 (2015).
[3]
Yoram Bachrach, Yehuda Finkelstein, Ran Gilad-Bachrach, Liran Katzir, Noam Koenigstein, Nir Nice, and Ulrich Paquet . 2014. Speeding up the xbox recommender system using a euclidean transformation for inner-product spaces. In RecSys. 257--264.
[4]
James Bennett, Stan Lanning, et almbox. . 2007. The netflix prize. In Proceedings of KDD cup and workshop. 35.
[5]
Moses S Charikar . 2002. Similarity estimation techniques from rounding algorithms STOC. 380--388.
[6]
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin . 2010. Performance of recommender algorithms on top-n recommendation tasks RecSys. 39--46.
[7]
Ryan R Curtin and Parikshit Ram . 2014. Dual-tree fast exact max-kernel search. Statistical Analysis and Data Mining Vol. 7, 4 (2014), 229--253.
[8]
Ryan R Curtin, Parikshit Ram, and Alexander G Gray . 2013. Fast exact max-kernel search. In ICDM. 1--9.
[9]
Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S Mirrokni . 2004. Locality-sensitive hashing scheme based on p-stable distributions SoCG. 253--262.
[10]
Thomas Dean, Mark A Ruzon, Mark Segal, Jonathon Shlens, Sudheendra Vijayanarasimhan, and Jay Yagnik . 2013. Fast, accurate detection of 100,000 object classes on a single machine CVPR. 1814--1821.
[11]
Gideon Dror, Noam Koenigstein, Yehuda Koren, and Markus Weimer . 2012. The Yahoo! Music Dataset and KDD-Cup'11. In KDD Cup. 8--18.
[12]
Junhao Gan, Jianlin Feng, Qiong Fang, and Wilfred Ng . 2012. Locality-sensitive hashing scheme based on dynamic collision counting SIGMOD. 541--552.
[13]
Ruiqi Guo, Sanjiv Kumar, Krzysztof Choromanski, and David Simcha . 2016. Quantization based fast inner product search. In Artificial Intelligence and Statistics. 482--490.
[14]
Qiang Huang, Jianlin Feng, Qiong Fang, et almbox. . 2017. Query-aware locality-sensitive hashing scheme for $l_p$ norm. The VLDB Journal Vol. 26, 5 (2017), 683--708.
[15]
Qiang Huang, Jianlin Feng, Yikai Zhang, Qiong Fang, and Wilfred Ng . 2015. Query-aware locality-sensitive hashing for approximate nearest neighbor search. Proceedings of the VLDB Endowment Vol. 9, 1 (2015), 1--12.
[16]
Prateek Jain and Ashish Kapoor . 2009. Active learning for large multi-class problems. In CVPR. 762--769.
[17]
Thorsten Joachims, Thomas Finley, and Chun-Nam John Yu . 2009. Cutting-plane training of structural SVMs. Machine Learning Vol. 77, 1 (2009), 27--59.
[18]
Noam Koenigstein, Parikshit Ram, and Yuval Shavitt . 2012. Efficient retrieval of recommendations in a matrix factorization framework CIKM. 535--544.
[19]
Yehuda Koren, Robert Bell, and Chris Volinsky . 2009. Matrix factorization techniques for recommender systems. Computer Vol. 42, 8 (2009).
[20]
Hui Li, Tsz Nam Chan, Man Lung Yiu, and Nikos Mamoulis . 2017. FEXIPRO: Fast and Exact Inner Product Retrieval in Recommender Systems SIGMOD. 835--850.
[21]
Behnam Neyshabur and Nathan Srebro . 2015. On Symmetric and Asymmetric LSHs for Inner Product Search ICML. 1926--1934.
[22]
Parikshit Ram and Alexander G Gray . 2012. Maximum inner-product search using cone trees. In SIGKDD. 931--939.
[23]
Anshumali Shrivastava and Ping Li . 2014. Asymmetric LSH (ALSH) for sublinear time Maximum Inner Product Search (MIPS) NIPS. 2321--2329.
[24]
Anshumali Shrivastava and Ping Li . 2015. Improved asymmetric locality sensitive hashing (ALSH) for Maximum Inner Product Search (MIPS). In UAI. 812--821.
[25]
Ryan Spring and Anshumali Shrivastava . 2017. Scalable and sustainable deep learning via randomized hashing SIGKDD. 445--454.
[26]
Nathan Srebro, Jason Rennie, and Tommi S Jaakkola . 2005. Maximum-margin matrix factorization. In NIPS. 1329--1336.
[27]
Yifang Sun, Wei Wang, Jianbin Qin, Ying Zhang, and Xuemin Lin . 2014. SRS: solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index. Proceedings of the VLDB Endowment Vol. 8, 1 (2014), 1--12.
[28]
Yufei Tao, Ke Yi, Cheng Sheng, and Panos Kalnis . 2009. Quality and efficiency in high dimensional nearest neighbor search SIGMOD. 563--576.
[29]
Christina Teflioudi and Rainer Gemulla . 2016. Exact and approximate maximum inner product search with lemp. ACM TODS Vol. 42, 1 (2016), 5.
[30]
Christina Teflioudi, Rainer Gemulla, and Olga Mykytiuk . 2015. LEMP: Fast retrieval of large entries in a matrix product SIGMOD. 107--122.
[31]
Sudheendra Vijayanarasimhan et almbox. . 2014. Deep networks with large output spaces. arXiv preprint arXiv:1412.7479 (2014).
[32]
Roger Weber, Hans-Jörg Schek, and Stephen Blott . 1998. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, Vol. Vol. 98. 194--205.
[33]
Yuxin Zheng, Qi Guo, Anthony KH Tung, and Sai Wu . 2016. Lazylsh: Approximate nearest neighbor search for multiple distance functions with a single index. In SIGMOD. 2023--2037.

Cited By

View all
  • (2024)Faster maximum inner product search in high dimensionsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694045(48344-48361)Online publication date: 21-Jul-2024
  • (2024)Refining Codes for Locality Sensitive HashingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.329719536:3(1274-1284)Online publication date: Mar-2024
  • (2024)DB-LSH 2.0: Locality-Sensitive Hashing With Query-Based Dynamic BucketingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.329583136:3(1000-1015)Online publication date: Mar-2024
  • Show More Cited By

Index Terms

  1. Accurate and Fast Asymmetric Locality-Sensitive Hashing Scheme for Maximum Inner Product Search

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
    July 2018
    2925 pages
    ISBN:9781450355520
    DOI:10.1145/3219819
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 July 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. locality-sensitive hashing
    2. maximum inner product search
    3. nearest neighbor search

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    KDD '18
    Sponsor:

    Acceptance Rates

    KDD '18 Paper Acceptance Rate 107 of 983 submissions, 11%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)60
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Faster maximum inner product search in high dimensionsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694045(48344-48361)Online publication date: 21-Jul-2024
    • (2024)Refining Codes for Locality Sensitive HashingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.329719536:3(1274-1284)Online publication date: Mar-2024
    • (2024)DB-LSH 2.0: Locality-Sensitive Hashing With Query-Based Dynamic BucketingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.329583136:3(1000-1015)Online publication date: Mar-2024
    • (2024)Reconsidering Tree based Methods for k-Maximum Inner-Product Search: The LRUS-CoverTree2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00355(4671-4684)Online publication date: 13-May-2024
    • (2024)Efficient Approximate Maximum Inner Product Search Over Sparse Vectors2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00303(3961-3974)Online publication date: 13-May-2024
    • (2024)A nearest neighbor query method for searching objects with time and location informations based on spatiotemporal similarityEvolutionary Intelligence10.1007/s12065-024-00926-717:4(3031-3041)Online publication date: 25-Mar-2024
    • (2023)A New Sparse Data Clustering Method Based On Frequent ItemsProceedings of the ACM on Management of Data10.1145/35886851:1(1-28)Online publication date: 30-May-2023
    • (2023)Reverse Maximum Inner Product Search: Formulation, Algorithms, and AnalysisACM Transactions on the Web10.1145/358721517:4(1-23)Online publication date: 11-Jul-2023
    • (2023)Locality Sensitive Hashing for Optimizing Subgraph Query Processing in Parallel Computing SystemsProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599419(1885-1896)Online publication date: 6-Aug-2023
    • (2023)Reinforcement Routing on Proximity Graph for Efficient RecommendationACM Transactions on Information Systems10.1145/351276741:1(1-27)Online publication date: 10-Jan-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media