Accelerating massive queries of approximate nearest neighbor search on high-dimensional data

Liu, Yingfan; Song, Chaowei; Cheng, Hong; Xia, Xiaofang; Cui, Jiangtao

doi:10.1007/s10115-023-01899-2

Accelerating massive queries of approximate nearest neighbor search on high-dimensional data

Regular Paper
Published: 19 May 2023

Volume 65, pages 4185–4212, (2023)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Yingfan Liu ORCID: orcid.org/0000-0002-3743-5249¹,
Chaowei Song¹,
Hong Cheng²,
Xiaofang Xia¹ &
…
Jiangtao Cui¹

217 Accesses
Explore all metrics

Abstract

Approximate nearest neighbor (ANN) search on high-dimensional data is a fundamental operation in many applications. In this paper, we study massive queries of ANN (MQ-ANN) search, which deals with a large number of queries simultaneously. To improve the throughput, we combine the parallel capacity of multi-core CPUs and the filtering power of the state-of-the-art index methods, i.e., proximity graphs. However, there are no solutions that exploit proximity graphs to handle MQ-ANN in parallel, except the one called query view, which simply assigns each query to a hardware thread but suffers from numerous cache misses. As the first attempt, we design efficient methods for MQ-ANN with proximity graphs and propose a novel scheduling mechanism called bridge view, which shares the same data access across multiple queries in order to reduce cache misses. Moreover, we extend our method to deal with MQ-ANN on large-scale data sets (e.g. \(10^8\) points). Finally, we conduct extensive experiments on real data sets to demonstrate the advantages of our method. According to our experimental results, bridge view significantly outperforms query view in various settings. In particular, bridge view with 8 hardware threads even outperforms query view with 24 hardware threads.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient locality-sensitive hashing over high-dimensional streaming data

Article 17 September 2020

PM-LSH: a fast and accurate in-memory framework for high-dimensional approximate NN and closest pair search

Article 03 July 2021

Turbo Scan: Fast Sequential Nearest Neighbor Search in High Dimensions

Notes

We interchangeably use point, vector and vertex in this work.
We discuss the difference between our solution and MS-BFS in Sect. 6.4.
http://corpus-texmex.irisa.fr/.
https://yadi.sk/d/11eDCm7Dsn9GA.
http://horatio.cs.nyu.edu/mit/tiny/data/.

References

Sun Y, Wang W, Qin J, Zhang Y, Lin X (2015) SRS: solving c-approximate nearest neighbor queries in high dimensional Euclidean space with a tiny index. PVLDB 8(1):1–12
Google Scholar
Huang Q, Feng J, Zhang Y, Fang Q, Ng W (2016) Query-aware locality-sensitive hashing for approximate nearest neighbor search. PVLDB 9(1):1–12
Google Scholar
Muja M, Lowe DG (2014) Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans Pattern Anal Mach Intell 36(11):2227–2240
Article Google Scholar
Jegou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. IEEE Trans Pattern Anal Mach Intell 33(1):117–128
Article Google Scholar
Babenko A, Lempitsky V (2014) The inverted multi-index. IEEE Trans Pattern Anal Mach Intell 37(6):1247–1260
Article Google Scholar
https://github.com/spotify/annoy
https://github.com/searchivarius/nmslib
Li W, Zhang Y, Sun Y, Wang W, Li M, Zhang W, Lin X (2019) Approximate nearest neighbor search on high dimensional data—experiments, analyses, and improvement. IEEE Trans Knowl Data Eng 32:1475–1488
Article Google Scholar
Malkov Y, Yashunin D (2018) Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans Pattern Anal Mach Intell 42:824–836
Article Google Scholar
Fu C, Xiang C, Wang C, Cai D (2019) Fast approximate nearest neighbor search with the navigating spreading-out graph. PVLDB 12(5):461–474
Google Scholar
Baranchuk D, Babenko A, Malkov Y (2018) Revisiting the inverted indices for billion-scale approximate nearest neighbors. In: ECCV, pp 202–216
Böhm C, Krebs F (2004) The k-nearest neighbour join: turbo charging the KDD process. Knowl Inf Syst 6(6):728–749
Article Google Scholar
Dong W, Moses C, Li K (2011) Efficient k-nearest neighbor graph construction for generic similarity measures. In: Proceedings of the 20th international conference on world wide web, pp 577–586
Liu Y, Cheng H, Cui J (2021) Revisiting k-nearest neighbor graph construction on high-dimensional data: experiments and analyses. arXiv Preprint arXiv:2112.02234
Then M, Kaufmann M, Chirigati F, Hoang-Vu T-A, Pham K, Kemper A, Neumann T, Vo HT (2014) The more the merrier: efficient multi-source graph traversal. PVLDB 8(4):449–460
Google Scholar
Dong W, Moses C, Li K (2011) Efficient k-nearest neighbor graph construction for generic similarity measures. In: WWW, pp 577–586
Malkov Y, Ponomarenko A, Logvinov A, Krylov V (2014) Approximate nearest neighbor algorithm based on navigable small world graphs. Inf Syst 45:61–68
Article Google Scholar
Chen J, Fang HR, Saad Y (2009) Fast approximate kNN graph construction for high dimensional data via recursive Lanczos bisection. J Mach Learn Res 10(9):1989–2012
MathSciNet MATH Google Scholar
Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517
Article MATH Google Scholar
Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: SIGMOD, pp 47–57
Berchtold S, Keim DA, Kriegel H-P (1996) The X-tree: an index structure for high-dimensional data. In: VLDB, pp 28–39
Jagadish HV, Ooi BC, Tan KL, Yu C, Zhang R (2005) iDistance: an adaptive B\(^+\)-tree based indexing method for nearest neighbor search. ACM Trans Database Syst 30(2):364–397
Article Google Scholar
Tao Y, Yi K, Sheng C, Kalnis P (2009) Quality and efficiency in high dimensional nearest neighbor search. In: SIGMOD, pp 563–576
Gan J, Feng J, Fang Q, Ng W (2012) Locality sensitive hashing scheme based on dynamic collision counting. In: SIGMOD, pp 541–552
Liu Y, Cui J, Huang Z, Li H, shen H (2014) SK-LSH : an efficient index structure for approximate nearest neighbor search. PVLDB 7(9):745–756
Google Scholar
Uribe-Paredes R, Valero-Lara P, Arias E, Sánchez JL, Cazorla D (2011) Similarity search implementations for multi-core and many-core processors. In: HPCS. IEEE, pp 656–663
Gedik B (2013) Auto-tuning similarity search algorithms on multi-core architectures. Int J Parallel Prog 41(5):595–620
Article Google Scholar
Gieseke F, Heinermann J, Oancea C, Igel C (2014) Buffer kd trees: processing massive nearest neighbor queries on GPUs. In: ICML, pp 172–180
Kim M, Liu L, Choi W (2018) A GPU-aware parallel index for processing high-dimensional big data. IEEE Trans Comput 67(10):1388–1402
Article MathSciNet MATH Google Scholar
Kim J, Hong S, Nam B (2012) A performance study of traversing spatial indexing structures in parallel on GPU. In: HPCC. IEEE, pp 855–860
Pan J, Manocha D (2011) Fast GPU-based locality sensitive hashing for k-nearest neighbor computation. In: SIGSPATIAL GIS, pp 211–220
Pan J, Manocha D (2012) Bi-level locality sensitive hashing for k-nearest neighbor computation. In: ICDE. IEEE, pp 378–389
Matsumoto T, Yiu ML (2015) Accelerating exact similarity search on CPU-GPU systems. In: ICDM. IEEE, pp 320–329
Wang Y, Shrivastava A, Wang J, Ryu J (2018) FLASH: randomized algorithms accelerated over CPU-GPU for ultra-high dimensional similarity search. In: SIGMOD, pp 889–903
Xia C, Lu H, Ooi BC, Hu J (2004) Gorder: an efficient method for KNN join processing. In: VLDB, pp 756–767
Yu C, Cui B, Wang S, Su J (2007) Efficient index-based KNN join processing for high-dimensional data. Inf Softw Technol 49(4):332–344
Article Google Scholar
Jagadish HV, Ooi BC, Tan KL, Yu C, Zhang R (2005) iDistance: an adaptive B\(^+\)-tree based indexing method for nearest neighbor search. ACM Trans Database Syst 30(2):364–397
Article Google Scholar
Yao B, Li F, Kumar P (2010) K nearest neighbor queries and KNN-joins in large relational databases (almost) for free. In: ICDE, pp 4–15
Lu W, Shen Y, Chen S, Ooi B (2012) Efficient processing of k nearest neighbor joins using mapreduce, pp 1016–1027
Zhang C, Li F, Jestes J (2012) Efficient parallel KNN joins for large data in MapReduce. In: EDBT, pp 38–49
Bader DA, Madduri K (2006) Designing multithreaded algorithms for breadth-first search and st-connectivity on the cray MTA-2. In: ICPP, pp 523–530
Chhugani J, Satish N, Kim C, Sewall J, Dubey P (2012) Fast and efficient graph traversal algorithm for CPUs: maximizing single-node efficiency. In: IPDPS, pp 378–389
Liu H, Huang HH, Hu Y (2016) iBFS: concurrent breadth-first search on GPUs. In: SIGMOD, pp 403–416
Wei H, Yu JX, Lu C, Lin X (2016) Speedup graph processing by graph ordering. In: SIGMOD, pp 1813–1828

Download references

Acknowledgements

This work was supported by Project funded by National Natural Science Foundation of China (NSFC) under Grants 62002274, 61902299 and 61976168. This work was also supported in part by Project funded by China Postdoctoral Science Foundation under Grants 2019TQ0239 and 2019M663636.

Author information

Authors and Affiliations

School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi, China
Yingfan Liu, Chaowei Song, Xiaofang Xia & Jiangtao Cui
The Chinese University of Hong Kong, Hong Kong, China
Hong Cheng

Authors

Yingfan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chaowei Song
View author publications
You can also search for this author in PubMed Google Scholar
Hong Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofang Xia
View author publications
You can also search for this author in PubMed Google Scholar
Jiangtao Cui
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yingfan Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, Y., Song, C., Cheng, H. et al. Accelerating massive queries of approximate nearest neighbor search on high-dimensional data. Knowl Inf Syst 65, 4185–4212 (2023). https://doi.org/10.1007/s10115-023-01899-2

Download citation

Received: 05 January 2022
Revised: 20 April 2023
Accepted: 26 April 2023
Published: 19 May 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10115-023-01899-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerating massive queries of approximate nearest neighbor search on high-dimensional data

Abstract

Access this article

Similar content being viewed by others

Efficient locality-sensitive hashing over high-dimensional streaming data

PM-LSH: a fast and accurate in-memory framework for high-dimensional approximate NN and closest pair search

Turbo Scan: Fast Sequential Nearest Neighbor Search in High Dimensions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Accelerating massive queries of approximate nearest neighbor search on high-dimensional data

Abstract

Access this article

Similar content being viewed by others

Efficient locality-sensitive hashing over high-dimensional streaming data

PM-LSH: a fast and accurate in-memory framework for high-dimensional approximate NN and closest pair search

Turbo Scan: Fast Sequential Nearest Neighbor Search in High Dimensions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation