Abstract
Approximate nearest neighbor (ANN) search on high-dimensional data is a fundamental operation in many applications. In this paper, we study massive queries of ANN (MQ-ANN) search, which deals with a large number of queries simultaneously. To improve the throughput, we combine the parallel capacity of multi-core CPUs and the filtering power of the state-of-the-art index methods, i.e., proximity graphs. However, there are no solutions that exploit proximity graphs to handle MQ-ANN in parallel, except the one called query view, which simply assigns each query to a hardware thread but suffers from numerous cache misses. As the first attempt, we design efficient methods for MQ-ANN with proximity graphs and propose a novel scheduling mechanism called bridge view, which shares the same data access across multiple queries in order to reduce cache misses. Moreover, we extend our method to deal with MQ-ANN on large-scale data sets (e.g. \(10^8\) points). Finally, we conduct extensive experiments on real data sets to demonstrate the advantages of our method. According to our experimental results, bridge view significantly outperforms query view in various settings. In particular, bridge view with 8 hardware threads even outperforms query view with 24 hardware threads.
Similar content being viewed by others
Notes
We interchangeably use point, vector and vertex in this work.
We discuss the difference between our solution and MS-BFS in Sect. 6.4.
References
Sun Y, Wang W, Qin J, Zhang Y, Lin X (2015) SRS: solving c-approximate nearest neighbor queries in high dimensional Euclidean space with a tiny index. PVLDB 8(1):1–12
Huang Q, Feng J, Zhang Y, Fang Q, Ng W (2016) Query-aware locality-sensitive hashing for approximate nearest neighbor search. PVLDB 9(1):1–12
Muja M, Lowe DG (2014) Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans Pattern Anal Mach Intell 36(11):2227–2240
Jegou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. IEEE Trans Pattern Anal Mach Intell 33(1):117–128
Babenko A, Lempitsky V (2014) The inverted multi-index. IEEE Trans Pattern Anal Mach Intell 37(6):1247–1260
Li W, Zhang Y, Sun Y, Wang W, Li M, Zhang W, Lin X (2019) Approximate nearest neighbor search on high dimensional data—experiments, analyses, and improvement. IEEE Trans Knowl Data Eng 32:1475–1488
Malkov Y, Yashunin D (2018) Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans Pattern Anal Mach Intell 42:824–836
Fu C, Xiang C, Wang C, Cai D (2019) Fast approximate nearest neighbor search with the navigating spreading-out graph. PVLDB 12(5):461–474
Baranchuk D, Babenko A, Malkov Y (2018) Revisiting the inverted indices for billion-scale approximate nearest neighbors. In: ECCV, pp 202–216
Böhm C, Krebs F (2004) The k-nearest neighbour join: turbo charging the KDD process. Knowl Inf Syst 6(6):728–749
Dong W, Moses C, Li K (2011) Efficient k-nearest neighbor graph construction for generic similarity measures. In: Proceedings of the 20th international conference on world wide web, pp 577–586
Liu Y, Cheng H, Cui J (2021) Revisiting k-nearest neighbor graph construction on high-dimensional data: experiments and analyses. arXiv Preprint arXiv:2112.02234
Then M, Kaufmann M, Chirigati F, Hoang-Vu T-A, Pham K, Kemper A, Neumann T, Vo HT (2014) The more the merrier: efficient multi-source graph traversal. PVLDB 8(4):449–460
Dong W, Moses C, Li K (2011) Efficient k-nearest neighbor graph construction for generic similarity measures. In: WWW, pp 577–586
Malkov Y, Ponomarenko A, Logvinov A, Krylov V (2014) Approximate nearest neighbor algorithm based on navigable small world graphs. Inf Syst 45:61–68
Chen J, Fang HR, Saad Y (2009) Fast approximate kNN graph construction for high dimensional data via recursive Lanczos bisection. J Mach Learn Res 10(9):1989–2012
Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517
Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: SIGMOD, pp 47–57
Berchtold S, Keim DA, Kriegel H-P (1996) The X-tree: an index structure for high-dimensional data. In: VLDB, pp 28–39
Jagadish HV, Ooi BC, Tan KL, Yu C, Zhang R (2005) iDistance: an adaptive B\(^+\)-tree based indexing method for nearest neighbor search. ACM Trans Database Syst 30(2):364–397
Tao Y, Yi K, Sheng C, Kalnis P (2009) Quality and efficiency in high dimensional nearest neighbor search. In: SIGMOD, pp 563–576
Gan J, Feng J, Fang Q, Ng W (2012) Locality sensitive hashing scheme based on dynamic collision counting. In: SIGMOD, pp 541–552
Liu Y, Cui J, Huang Z, Li H, shen H (2014) SK-LSH : an efficient index structure for approximate nearest neighbor search. PVLDB 7(9):745–756
Uribe-Paredes R, Valero-Lara P, Arias E, Sánchez JL, Cazorla D (2011) Similarity search implementations for multi-core and many-core processors. In: HPCS. IEEE, pp 656–663
Gedik B (2013) Auto-tuning similarity search algorithms on multi-core architectures. Int J Parallel Prog 41(5):595–620
Gieseke F, Heinermann J, Oancea C, Igel C (2014) Buffer kd trees: processing massive nearest neighbor queries on GPUs. In: ICML, pp 172–180
Kim M, Liu L, Choi W (2018) A GPU-aware parallel index for processing high-dimensional big data. IEEE Trans Comput 67(10):1388–1402
Kim J, Hong S, Nam B (2012) A performance study of traversing spatial indexing structures in parallel on GPU. In: HPCC. IEEE, pp 855–860
Pan J, Manocha D (2011) Fast GPU-based locality sensitive hashing for k-nearest neighbor computation. In: SIGSPATIAL GIS, pp 211–220
Pan J, Manocha D (2012) Bi-level locality sensitive hashing for k-nearest neighbor computation. In: ICDE. IEEE, pp 378–389
Matsumoto T, Yiu ML (2015) Accelerating exact similarity search on CPU-GPU systems. In: ICDM. IEEE, pp 320–329
Wang Y, Shrivastava A, Wang J, Ryu J (2018) FLASH: randomized algorithms accelerated over CPU-GPU for ultra-high dimensional similarity search. In: SIGMOD, pp 889–903
Xia C, Lu H, Ooi BC, Hu J (2004) Gorder: an efficient method for KNN join processing. In: VLDB, pp 756–767
Yu C, Cui B, Wang S, Su J (2007) Efficient index-based KNN join processing for high-dimensional data. Inf Softw Technol 49(4):332–344
Jagadish HV, Ooi BC, Tan KL, Yu C, Zhang R (2005) iDistance: an adaptive B\(^+\)-tree based indexing method for nearest neighbor search. ACM Trans Database Syst 30(2):364–397
Yao B, Li F, Kumar P (2010) K nearest neighbor queries and KNN-joins in large relational databases (almost) for free. In: ICDE, pp 4–15
Lu W, Shen Y, Chen S, Ooi B (2012) Efficient processing of k nearest neighbor joins using mapreduce, pp 1016–1027
Zhang C, Li F, Jestes J (2012) Efficient parallel KNN joins for large data in MapReduce. In: EDBT, pp 38–49
Bader DA, Madduri K (2006) Designing multithreaded algorithms for breadth-first search and st-connectivity on the cray MTA-2. In: ICPP, pp 523–530
Chhugani J, Satish N, Kim C, Sewall J, Dubey P (2012) Fast and efficient graph traversal algorithm for CPUs: maximizing single-node efficiency. In: IPDPS, pp 378–389
Liu H, Huang HH, Hu Y (2016) iBFS: concurrent breadth-first search on GPUs. In: SIGMOD, pp 403–416
Wei H, Yu JX, Lu C, Lin X (2016) Speedup graph processing by graph ordering. In: SIGMOD, pp 1813–1828
Acknowledgements
This work was supported by Project funded by National Natural Science Foundation of China (NSFC) under Grants 62002274, 61902299 and 61976168. This work was also supported in part by Project funded by China Postdoctoral Science Foundation under Grants 2019TQ0239 and 2019M663636.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, Y., Song, C., Cheng, H. et al. Accelerating massive queries of approximate nearest neighbor search on high-dimensional data. Knowl Inf Syst 65, 4185–4212 (2023). https://doi.org/10.1007/s10115-023-01899-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-023-01899-2