Abstract
Some efficient top-k algorithms, i.e., Fagin’s Algorithm, threshold algorithm (TA), and best position algorithm (BPA), can be used to answer k nearest neighbor (kNN) queries. However, extending the existing algorithms without further changes to the algorithms themselves would not be efficient since there are the different characteristics between the kNN queries and top-k queries. For example, the kNN queries are more distance-sensitive rather than the position of data points. Second, it is necessary to add some novel parallel heuristics and pruning policies for the kNN queries. Third, there are still many redundant random accesses among FA, TA, and BPA. In this paper, we address aforementioned these problems and take these algorithms to answer parallel kNN (PkNN) queries in spatial databases. We integrate the advantages of the B + -tree and Open MP programming and propose three efficient parallel kNN query algorithms, namely distance priority-based PkNN, optimized PkNN, and partition-based PkNN. Our performance evaluation shows that our proposed algorithms achieve significant improvement in comparison with existing algorithms, i.e., BPA and BPA2. In addition, our approaches are also capable of returning kNN results incrementally which greatly shorten the query response time and enhance user experience.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The datasets generated during and/or analyzed during the current study are not publicly available but are available from the corresponding author on reasonable request.
References
Akbarinia R, Pacitti E, Valduriez P (2011) Best position algorithms for efficient top-k query processing. Inf Syst 36(6):973–989
Ali MH, Saad AA, Ismail MA (2005) The PN-tree: a parallel and distributed multidimensional index. Distrib Parallel Databases 17(2):111–133
Berchtold S, Bohm C, Braunmuller B, Keim DA, Kriegel HP (1997) Fast parallel similarity search in multimedia databases. SIGMOD Rec 26(2):1–12
Cao M, Jia W, Lv Z et al (2018) Two-pass k nearest neighbor search for feature tracking. IEEE Access 6:72939–72951
Cao M, Li L, Xie W et al (2019) Parallel k nearest neighbor matching for 3D reconstruction. IEEE Access 7:55248–55260
Challa JS, Goyal P, Nikhil S, Balasubramaniam S, Goyal N (2015) A concurrent k-NN search algorithm for R-tree. In: Proceeding annual ACM India conference, pp. 123–128.
Chester S, Sidlauskas D, Assent I et al. (2015) Scalable parallelization of skyline computation for multi-core processors. In: ICDE, pp. 1083–1094
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Ouyang D, Wen D, Qin L et al. (2020) Progressive top-K nearest neighbors search in large road networks. In: SIGMOD, pp. 1781–1795.
Fagin R (1999) Combining fuzzy information from multiple systems. J Comput Syst Sci 58(1):83–99
Fagin R, Lotem J, Naor M (2003) Optimal aggregation algorithms for middleware. J Comput Syst Sci 66(4):614–656
Feng X, Gao Y, Jiang T et al (2013) Parallel k-Skyband computation on multicore architecture. LNCS 7808:827–837
Gao Y, Chen L, Chen G, Chen C (2006) Efficient parallel processing for k-nearest-neighbor search in spatial databases. LNCS 3984:39–48
Gieseke F, Heinermann J, Oancea C, Igel C (2014) Buffer k-d trees: processing massive nearest neighbor queries on GPUs. In: Proceeding. of international conference on machine learning, pp. 1–9
Gowanlock M (2021) Hybrid KNN-join: parallel nearest neighbor searches exploiting CPU and GPU architectural features. J Parallel Distrib Comput 149:119–137
Guzun G, Tosado J, Canahuate G (2014) Slicing the dimensionality: top-k query processing for high-dimensional spaces. Trans Large Scale Data Knowl Centered Syst 14:26–50
Han X, Li J, Gao H (2015) Efficient top-k retrieval on massive data. IEEE Trans Knowl Data Eng 27(10):2687–2699
Hjaltason GR, Samet H (1999) Distance browsing in spatial databases. ACM Trans Database Syst 24(2):265–318
Jagadish HV, Ooi BC, Tan K-L et al (2005) iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans Database Syst 30(2):364–397
Jiang T, Gao Y, Zhang B, Lin D, Li Q (2014) Monochromatic and bichromatic mutual skyline queries. Expert Syst Appl 41(4):1885–1900
Jiang T, Zhang B, Yu F (2017) Efficient parallel processing for kNN queries. In: ICIDE, pp. 88–94
Jin W, Patel JM (2011) Efficient and generic evaluation of ranked queries. In: SIGMOD, pp. 601–612
Jo J, Seo J, Fekete JD (2020) PANENE: a progressive algorithm for indexing and querying approximate k-nearest neighbors. IEEE Trans Vis Comput Graph 26(2):1347–1360
Lee J, Cho H, Hwang S et al (2014) Toward scalable indexing for top-k queries. IEEE Trans Knowl Data Eng 26(12):3103–3116
Li W, Zhang Y, Sun Y et al (2020c) Approximate nearest neighbor search on high dimensional data—experiments, analyses, and improvement. IEEE Trans Knowl Data Eng 32(8):1475–1488
Li M, Zhang Y, Sun Y et al. (2020a) I/O efficient approximate nearest neighbor search based on learned functions. In: ICDE, pp. 289–300
Li C, Zhang M, Andersen DG et al. (2020b) Improving approximate nearest neighbor search through learned adaptive early termination. In: SIGMOD, pp. 2539–2554
Lu K, Wang H, Wang W, Kudo M (2020) VHP: approximate nearest neighbor search via virtual hypersphere partitioning. PVLDB 13(9):1443–1455
Lu K, Kudo M, Xiao C et al (2021) HVS: hierarchical graph structure based on voronoi diagrams for solving approximate nearest neighbor search. Proc VLDB Endow 15(2):246–258
Maillo J, Triguero I, Herrera F (2015) A MapReduce-based k-nearest neighbor approach for big data classification. In: Proceeding IEEE BigDataSE, pp. 167–172
Malkov YuA, Yashunin DA (2020) Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans Pattern Anal Mach Intell 42:824–836
Muñoz JV, Gonçalves MA, Dias Z et al (2019) Hierarchical clustering-based graphs for large scale approximate nearest neighbor search. Pattern Recognit 96:106970
Nam M, Kim J, Nam B (2016) Parallel tree traversal for nearest neighbor query on the GPU. In: ICPP, pp. 113–122
Pan J, Manocha D (2011) Fast GPU-based locality sensitive hashing for k-nearest neighbor computation. In: ACM SIGSPATIAL, pp. 211–220
Papadopoulos AN, Manolopoulos Y (1996) Parallel processing of nearest neighbor queries in declustered spatial data. In: Proceeding ACM-GIS conference, pp. 35–43
Papadopoulos AN, Manolopoulos Y (1998) Similarity query processing using disk arrays. In: SIGMOD, pp. 225–236
Patwary MMA, Satish NR, Sundaram N et al. (2016) PANDA: extreme scale parallel k-nearest neighbor on distributed architectures. In: IPDPS, pp. 494–503
Ram P, Sinha K (2019) Revisiting kd-tree for nearest neighbor search. In: SIGKDD, pp. 1378–1388
Roussopoulos N, Kelley S, Vincent F (1995) Nearest neighbor queries. ACM SIGMOD Rec 24(2):71–79
Shahvarani A, Jacobsen HA (2021) Distributed stream KNN join. In: SIGMOD conference, pp. 1597–1609
Tao J, Zhang B, Lin D, Gao Y, Li Q (2020) Efficient column-oriented processing for mutual subspace skyline queries. Soft Comput 24:15427–15445
Tao Y, Yi K, Sheng C et al. (2009) Quality and efficiency in high dimensional nearest neighbor search. In SIGMOD, pp. 563–576
Wang M, Xu X, Yue Q et al (2021) A comprehensive survey and experimental comparison of graph-based approximate nearest neighbor search. Proc VLDB Endow 14(11):1964–1978
Zhang B, Jiang T, Bao Z, Wong R, Chen L (2016a) Monochromatic and bichromatic reverse top-k group nearest neighbor queries. Expert Syst Appl 53(1):57–74
Zhang S, Sun C, He Z (2016b) ListMerge: accelerating top-k aggregation queries over large number of lists. In: Proceeding of the 21st InInternational Conference on DASFAA, pp. 67–81
Funding
This work was supported in part by the ZJNSF Grant LY16F020026, NSFC Grants. 61522208 and 61379033.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Tao Jiang and Bin Zhang. The first draft of the manuscript was written by Tao Jiang and all authors commented on previous versions of the manuscript. The revised work is mainly finished by Dan Lin. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
TJ declares that he/she has no conflict of interest. BZ declares that he/she has no conflict of interest. TJ declares that he/she has no conflict of interest. DL declares that he/she has no conflict of interest. YG declares that he/she has no conflict of interest. QL declares that he/she has no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This manuscript is the authors' original work and has not been published except for a preliminary version [19] nor has it been submitted simultaneously elsewhere. All authors have checked the manuscript and have agreed to the submission.
Rights and permissions
About this article
Cite this article
Jiang, T., Zhang, B., Lin, D. et al. Efficient parallel processing of high-dimensional spatial kNN queries. Soft Comput 26, 12291–12316 (2022). https://doi.org/10.1007/s00500-022-07081-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-022-07081-0