research-article

GraSP: Optimizing Graph-based Nearest Neighbor Search with Subgraph Sampling and Pruning

Authors:

Minjia Zhang,

Wenhan Wang,

Yuxiong HeAuthors Info & Claims

WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining

Pages 1395 - 1405

https://doi.org/10.1145/3488560.3498425

Published: 15 February 2022 Publication History

Get Access

Abstract

Nearest Neighbor Search (NNS) has recently drawn a rapid growth of interest because of its core role in high-dimensional vector data management in data science and AI applications. The interest is fueled by the success of neural embedding, where deep learning models transform unstructured data into semantically correlated feature vectors for data analysis, e.g., recommending popular items. Among several categories of methods for fast NNS, graph-based approximate nearest neighbor search algorithms have led to the best-in-class search performance on a wide range of real-world datasets. While prior works improve graph-based NNS search efficiency mainly through exploiting the structure of the graph with sophisticated heuristic rules, in this work, we show that the frequency distributions of edge visits for graph-based NNS can be highly skewed. This finding leads to the study of pruning unnecessary edges to avoid redundant computation during graph traversal by utilizing the query distribution, an important yet under-explored aspect of graph-based NNS. In particular, we formulate graph pruning as a discrete optimization problem, and introduce a graph optimization algorithm GraSP that improves the search efficiency of similarity graphs by learning to prune redundant edges. GraSP enhances an existing similarity graph with a probabilistic model. It then performs a novel subgraph sampling and iterative refinement optimization to explicitly maximize search efficiency when removing a subset of edges in expectation over a graph for a large set of training queries. The evaluation shows that GraSP consistently improves the search efficiency on real-world datasets, providing up to 2.24X faster search speed than state-of-the-art methods without losing accuracy.

Supplementary Material

MP4 File (WSDM22-fp304.mp4)

This video contains a presentation of the WSDM'22 accepted paper "GraSP: Optimizing Graph-based Nearest Neighbor Search with Subgraph Sampling and Pruning".

Download
115.99 MB

References

[1]

Alexandr Andoni and Piotr Indyk. 2008. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM (2008).

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Enhanced Iterative-Deepening Search

A novel approach to improving search efficiency in unstructured peer-to-peer networks

Pessimal Guesses may be Optimal: A Counterintuitive Search Result

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations