skip to main content
10.1145/3488560.3498425acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

GraSP: Optimizing Graph-based Nearest Neighbor Search with Subgraph Sampling and Pruning

Published: 15 February 2022 Publication History

Abstract

Nearest Neighbor Search (NNS) has recently drawn a rapid growth of interest because of its core role in high-dimensional vector data management in data science and AI applications. The interest is fueled by the success of neural embedding, where deep learning models transform unstructured data into semantically correlated feature vectors for data analysis, e.g., recommending popular items. Among several categories of methods for fast NNS, graph-based approximate nearest neighbor search algorithms have led to the best-in-class search performance on a wide range of real-world datasets. While prior works improve graph-based NNS search efficiency mainly through exploiting the structure of the graph with sophisticated heuristic rules, in this work, we show that the frequency distributions of edge visits for graph-based NNS can be highly skewed. This finding leads to the study of pruning unnecessary edges to avoid redundant computation during graph traversal by utilizing the query distribution, an important yet under-explored aspect of graph-based NNS. In particular, we formulate graph pruning as a discrete optimization problem, and introduce a graph optimization algorithm GraSP that improves the search efficiency of similarity graphs by learning to prune redundant edges. GraSP enhances an existing similarity graph with a probabilistic model. It then performs a novel subgraph sampling and iterative refinement optimization to explicitly maximize search efficiency when removing a subset of edges in expectation over a graph for a large set of training queries. The evaluation shows that GraSP consistently improves the search efficiency on real-world datasets, providing up to 2.24X faster search speed than state-of-the-art methods without losing accuracy.

Supplementary Material

MP4 File (WSDM22-fp304.mp4)
This video contains a presentation of the WSDM'22 accepted paper "GraSP: Optimizing Graph-based Nearest Neighbor Search with Subgraph Sampling and Pruning".

References

[1]
Alexandr Andoni and Piotr Indyk. 2008. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM (2008).
[2]
Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya P. Razenshteyn, and Ludwig Schmidt. 2015a. Practical and Optimal LSH for Angular Distance. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015. 1225--1233.
[3]
Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya P. Razenshteyn, and Ludwig Schmidt. 2015b. Practical and Optimal LSH for Angular Distance. In NeurIPS. 1225--1233.
[4]
Alexandr Andoni, Piotr Indyk, and Ilya P. Razenshteyn. 2018. Approximate Nearest Neighbor Search in High Dimensions . CoRR, Vol. abs/1806.09823 (2018).
[5]
Martin Aumü ller, Erik Bernhardsson, and Alexander John Faithfull. 2020. ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Inf. Syst., Vol. 87 (2020). https://doi.org/10.1016/j.is.2019.02.006
[6]
Artem Babenko and Victor S. Lempitsky. 2016. Efficient Indexing of Billion-Scale Datasets of Deep Descriptors. In CVPR 2016 . 2055--2063.
[7]
Dmitry Baranchuk and Artem Babenko. 2019. Towards Similarity Graphs Constructed by Deep Reinforcement Learning. arxiv: 1911.12122 [cs.LG]
[8]
Dmitry Baranchuk, Artem Babenko, and Yury Malkov. 2018. Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors. In ECCV 2018 . 209--224.
[9]
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. 1990. The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. In SIGMOD 1990 . 322--331.
[10]
Jon Louis Bentley. 1975. Multidimensional Binary Search Trees Used for Associative Searching . Commun. ACM, Vol. 18, 9 (Sept. 1975), 509--517.
[11]
Duncan S Callaway, Mark EJ Newman, Steven H Strogatz, and Duncan J Watts. 2000. Network robustness and fragility: Percolation on random graphs . Physical review letters, Vol. 85, 25 (2000), 5468.
[12]
Qi Chen, Haidong Wang, Mingqin Li, Gang Ren, Scarlett Li, Jeffery Zhu, Jason Li, Chuanjie Liu, Lintao Zhang, and Jingdong Wang. 2018. SPTAG: A library for fast approximate nearest neighbor search.
[13]
Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems, Shilad Sen, Werner Geyer, Jill Freyne, and Pablo Castells (Eds.). ACM .
[14]
Paul Adrien Maurice Dirac. 1926. On the theory of quantum mechanics . Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character, Vol. 112, 762 (1926), 661--677.
[15]
Matthijs Douze, Hervé Jé gou, Harsimrat Sandhawalia, Laurent Amsaleg, and Cordelia Schmid. 2009. Evaluation of GIST descriptors for web-scale image search. In Proceedings of the 8th ACM International Conference on Image and Video Retrieval, CIVR 2009, Santorini Island, Greece, July 8--10, 2009, Sté phane Marchand-Maillet and Yiannis Kompatsiaris (Eds.). ACM .
[16]
Matthijs Douze, Alexandre Sablayrolles, and Hervé Jé gou. 2018. Link and Code: Fast Indexing With Graphs and Compact Regression Codes. In CVPR 2018.
[17]
Karima Echihabi, Kostas Zoumpatianos, Themis Palpanas, and Houda Benbrahim. 2019. Return of the Lernaean Hydra: Experimental Evaluation of Data Series Approximate Similarity Search. Proc. VLDB Endow., Vol. 13, 3 (2019), 403--420.
[18]
Tobias Flach, Nandita Dukkipati, Andreas Terzis, Barath Raghavan, Neal Cardwell, Yuchung Cheng, Ankur Jain, Shuai Hao, Ethan Katz-Bassett, and Ramesh Govindan. 2013. Reducing Web Latency: The Virtue of Gentle Aggression. In SIGCOMM '13 . 159--170.
[19]
Cong Fu and Deng Cai. 2016. EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph . CoRR, Vol. abs/1609.07228 (2016).
[20]
Cong Fu, Changxu Wang, and Deng Cai. 2019 a. Satellite System Graph: Towards the Efficiency Up-Boundary of Graph-Based Approximate Nearest Neighbor Search. CoRR, Vol. abs/1907.06146 (2019).
[21]
Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. 2019 b. Fast Approximate Nearest Neighbor Search with the Navigating Spreading-out Graph. In VLDB'19 .
[22]
Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. 2015. Distilling the Knowledge in a Neural Network. CoRR, Vol. abs/1503.02531 (2015).
[23]
Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2011. Product Quantization for Nearest Neighbor Search. In TPAMI 2011 .
[24]
Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data (2019).
[25]
Yannis Kalantidis and Yannis S. Avrithis. 2014. Locally Optimized Product Quantization for Approximate Nearest Neighbor Search. In CVPR 2014 .
[26]
Quoc V. Le and Tomá s Mikolov. 2014. Distributed Representations of Sentences and Documents. In ICML 2014, Vol. 32. JMLR.org, 1188--1196.
[27]
D. T. Lee and C. K. Wong. 1977. Worst-case Analysis for Region and Partial Region Searches in Multidimensional Binary Search Trees and Balanced Quad Trees . Acta Informatica, Vol. 9, 1 (March 1977), 23--29.
[28]
Wen Li, Ying Zhang, Yifang Sun, Wei Wang, Wenjie Zhang, and Xuemin Lin. 2019. Approximate Nearest Neighbor Search on High Dimensional Data - Experiments, Analyses, and Improvement . IEEE Transactions on Knowledge and Data Engineering (2019).
[29]
David G. Lowe. 2004. Distinctive Image Features from Scale-Invariant Keypoints . Int. J. Comput. Vision, Vol. 60, 2 (Nov. 2004).
[30]
Yury Malkov, Alexander Ponomarenko, Andrey Logvinov, and Vladimir Krylov. 2014. Approximate nearest neighbor algorithm based on navigable small world graphs . Inf. Syst., Vol. 45 (2014), 61--68.
[31]
Yury A. Malkov. 2015. Growing homophilic networks are natural optimal navigable small worlds. CoRR, Vol. abs/1507.06529 (2015).
[32]
Yury A. Malkov and D. A. Yashunin. 2020. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 42, 4 (2020), 824--836.
[33]
Tomá s Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In ICLR 2013, Yoshua Bengio and Yann LeCun (Eds.).
[34]
Jose G. Moreno-Torres, Troy Raeder, Roc'i o Ala'i z-Rodr'i guez, Nitesh V. Chawla, and Francisco Herrera. 2012. A unifying view on dataset shift in classification. Pattern Recognit. (2012), 521--530.
[35]
Marius Muja and David G. Lowe. 2014. Scalable Nearest Neighbor Algorithms for High Dimensional Data . TPAMI 2014, Vol. 36, 11 (2014), 2227--2240.
[36]
Priyanka Nigam, Yiwei Song, Vijai Mohan, Vihan Lakshman, Weitian Allen Ding, Ankit Shingavi, Choon Hui Teo, Hao Gu, and Bing Yin. 2019 a. Semantic Product Search. In KDD 2019, Ankur Teredesai, Vipin Kumar, Ying Li, Ró mer Rosales, Evimaria Terzi, and George Karypis (Eds.). ACM, 2876--2885.
[37]
Priyanka Nigam, Yiwei Song, Vijai Mohan, Vihan Lakshman, Weitian Allen Ding, Ankit Shingavi, Choon Hui Teo, Hao Gu, and Bing Yin. 2019 b. Semantic Product Search. In KDD 2019, Ankur Teredesai, Vipin Kumar, Ying Li, Ró mer Rosales, Evimaria Terzi, and George Karypis (Eds.). ACM, 2876--2885.
[38]
Mohammad Norouzi and David J. Fleet. 2013. Cartesian K-Means. In CVPR 2013 .
[39]
Mohammad Norouzi, David J. Fleet, and Ruslan Salakhutdinov. 2012. Hamming Distance Metric Learning. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3--6, 2012, Lake Tahoe, Nevada, United States. 1070--1078.
[40]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation . In EMNLP. 1532--1543.
[41]
Xiang Ren, Yujing Wang, Xiao Yu, Jun Yan, Zheng Chen, and Jiawei Han. 2014. Heterogeneous Graph-based Intent Learning with Queries, Web Pages and Wikipedia Concepts. In WSDM '14 (New York, New York, USA). 23--32.
[42]
Christian M. Schneider, André A. Moreira, José S. Andrade Jr., Shlomo Havlin, and Hans J. Herrmann. 2011. Mitigation of Malicious Attacks on Networks. CoRR, Vol. abs/1103.1741 (2011). arxiv: 1103.1741
[43]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition . arXiv preprint arXiv:1409.1556 (2014).
[44]
Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vardhan Simhadri, Ravishankar Krishnawamy, and Rohan Kadekodi. 2019. Rand-NSG: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node. In NeurIPS . 13748--13758.
[45]
Danny Sullivan. 2018. FAQ: All about the Google RankBrain algorithm. https://searchengineland.com/faq-all-about-the-new-google-rankbrain-algorithm-234440 .
[46]
Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik Lun Lee. 2018. Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba. In KDD 2018 . 839--848.
[47]
Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. 2017. End-to-End Neural Ad-hoc Ranking with Kernel Pooling. In SIGIR 2017 . 55--64.
[48]
Peter N. Yianilos. 1993. Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces. In SODA '93. 311--321.
[49]
Hamed Zamani, Mostafa Dehghani, W. Bruce Croft, Erik G. Learned-Miller, and Jaap Kamps. 2018. From Neural Re-Ranking to Neural Ranking: Learning a Sparse Representation for Inverted Indexing. In CIKM 2018. 497--506.
[50]
Jialiang Zhang, Soroosh Khoram, and Jing Li. 2018. Efficient Large-Scale Approximate Nearest Neighbor Search on OpenCL FPGA. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 4924--4932.
[51]
Minjia Zhang and Yuxiong He. 2019. GRIP: Multi-Store Capacity-Optimized High-Performance Nearest Neighbor Search for Vector Search Engine. In CIKM 2019. 1673--1682.
[52]
Yanhao Zhang, Pan Pan, Yun Zheng, Kang Zhao, Yingya Zhang, Xiaofeng Ren, and Rong Jin. 2021. Visual Search at Alibaba. CoRR, Vol. abs/2102.04674 (2021). arxiv: 2102.04674

Cited By

View all
  • (2024)Tree and Graph Based Two-Stages Routing for Approximate Nearest Neighbor SearchWeb and Big Data10.1007/978-981-97-7238-4_24(376-390)Online publication date: 28-Aug-2024
  • (2023)TANGO: re-thinking quantization for graph neural network training on GPUsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607037(1-14)Online publication date: 12-Nov-2023
  • (2022)Signaling repurposable drug combinations against COVID-19 by developing the heterogeneous deep herb-graph methodBriefings in Bioinformatics10.1093/bib/bbac12423:5Online publication date: 4-May-2022

Index Terms

  1. GraSP: Optimizing Graph-based Nearest Neighbor Search with Subgraph Sampling and Pruning

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining
      February 2022
      1690 pages
      ISBN:9781450391320
      DOI:10.1145/3488560
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 15 February 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. graph sampling
      2. search efficiency
      3. vector management and search

      Qualifiers

      • Research-article

      Conference

      WSDM '22

      Acceptance Rates

      Overall Acceptance Rate 498 of 2,863 submissions, 17%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)149
      • Downloads (Last 6 weeks)14
      Reflects downloads up to 13 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Tree and Graph Based Two-Stages Routing for Approximate Nearest Neighbor SearchWeb and Big Data10.1007/978-981-97-7238-4_24(376-390)Online publication date: 28-Aug-2024
      • (2023)TANGO: re-thinking quantization for graph neural network training on GPUsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607037(1-14)Online publication date: 12-Nov-2023
      • (2022)Signaling repurposable drug combinations against COVID-19 by developing the heterogeneous deep herb-graph methodBriefings in Bioinformatics10.1093/bib/bbac12423:5Online publication date: 4-May-2022

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media