Efficient parallel processing of high-dimensional spatial kNN queries

Jiang, Tao; Zhang, Bin; Lin, Dan; Gao, Yunjun; Li, Qing

doi:10.1007/s00500-022-07081-0

Efficient parallel processing of high-dimensional spatial kNN queries

Application of soft computing
Published: 02 May 2022

Volume 26, pages 12291–12316, (2022)
Cite this article

Soft Computing Aims and scope Submit manuscript

Tao Jiang ORCID: orcid.org/0000-0002-0740-5962¹,
Bin Zhang¹,
Dan Lin²,
Yunjun Gao³ &
…
Qing Li⁴

364 Accesses
1 Citation
Explore all metrics

Abstract

Some efficient top-k algorithms, i.e., Fagin’s Algorithm, threshold algorithm (TA), and best position algorithm (BPA), can be used to answer k nearest neighbor (kNN) queries. However, extending the existing algorithms without further changes to the algorithms themselves would not be efficient since there are the different characteristics between the kNN queries and top-k queries. For example, the kNN queries are more distance-sensitive rather than the position of data points. Second, it is necessary to add some novel parallel heuristics and pruning policies for the kNN queries. Third, there are still many redundant random accesses among FA, TA, and BPA. In this paper, we address aforementioned these problems and take these algorithms to answer parallel kNN (PkNN) queries in spatial databases. We integrate the advantages of the B + -tree and Open MP programming and propose three efficient parallel kNN query algorithms, namely distance priority-based PkNN, optimized PkNN, and partition-based PkNN. Our performance evaluation shows that our proposed algorithms achieve significant improvement in comparison with existing algorithms, i.e., BPA and BPA2. In addition, our approaches are also capable of returning kNN results incrementally which greatly shorten the query response time and enhance user experience.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Algorithms for processing the group K nearest-neighbor query on distributed frameworks

Article 09 November 2020

Efficient spatial data partitioning for distributed $k$NN joins

Article Open access 02 June 2022

Efficient top-k spatial-range-constrained approximate nearest neighbor search on geo-tagged high-dimensional vectors

Article 04 January 2025

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The datasets generated during and/or analyzed during the current study are not publicly available but are available from the corresponding author on reasonable request.

References

Akbarinia R, Pacitti E, Valduriez P (2011) Best position algorithms for efficient top-k query processing. Inf Syst 36(6):973–989
Article Google Scholar
Ali MH, Saad AA, Ismail MA (2005) The PN-tree: a parallel and distributed multidimensional index. Distrib Parallel Databases 17(2):111–133
Article Google Scholar
Berchtold S, Bohm C, Braunmuller B, Keim DA, Kriegel HP (1997) Fast parallel similarity search in multimedia databases. SIGMOD Rec 26(2):1–12
Article Google Scholar
Cao M, Jia W, Lv Z et al (2018) Two-pass k nearest neighbor search for feature tracking. IEEE Access 6:72939–72951
Article Google Scholar
Cao M, Li L, Xie W et al (2019) Parallel k nearest neighbor matching for 3D reconstruction. IEEE Access 7:55248–55260
Article Google Scholar
Challa JS, Goyal P, Nikhil S, Balasubramaniam S, Goyal N (2015) A concurrent k-NN search algorithm for R-tree. In: Proceeding annual ACM India conference, pp. 123–128.
Chester S, Sidlauskas D, Assent I et al. (2015) Scalable parallelization of skyline computation for multi-core processors. In: ICDE, pp. 1083–1094
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Article Google Scholar
Ouyang D, Wen D, Qin L et al. (2020) Progressive top-K nearest neighbors search in large road networks. In: SIGMOD, pp. 1781–1795.
Fagin R (1999) Combining fuzzy information from multiple systems. J Comput Syst Sci 58(1):83–99
Article MathSciNet Google Scholar
Fagin R, Lotem J, Naor M (2003) Optimal aggregation algorithms for middleware. J Comput Syst Sci 66(4):614–656
Article MathSciNet Google Scholar
Feng X, Gao Y, Jiang T et al (2013) Parallel k-Skyband computation on multicore architecture. LNCS 7808:827–837
Google Scholar
Gao Y, Chen L, Chen G, Chen C (2006) Efficient parallel processing for k-nearest-neighbor search in spatial databases. LNCS 3984:39–48
MATH Google Scholar
Gieseke F, Heinermann J, Oancea C, Igel C (2014) Buffer k-d trees: processing massive nearest neighbor queries on GPUs. In: Proceeding. of international conference on machine learning, pp. 1–9
Gowanlock M (2021) Hybrid KNN-join: parallel nearest neighbor searches exploiting CPU and GPU architectural features. J Parallel Distrib Comput 149:119–137
Article Google Scholar
Guzun G, Tosado J, Canahuate G (2014) Slicing the dimensionality: top-k query processing for high-dimensional spaces. Trans Large Scale Data Knowl Centered Syst 14:26–50
Google Scholar
Han X, Li J, Gao H (2015) Efficient top-k retrieval on massive data. IEEE Trans Knowl Data Eng 27(10):2687–2699
Article Google Scholar
Hjaltason GR, Samet H (1999) Distance browsing in spatial databases. ACM Trans Database Syst 24(2):265–318
Article Google Scholar
Jagadish HV, Ooi BC, Tan K-L et al (2005) iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans Database Syst 30(2):364–397
Article Google Scholar
Jiang T, Gao Y, Zhang B, Lin D, Li Q (2014) Monochromatic and bichromatic mutual skyline queries. Expert Syst Appl 41(4):1885–1900
Article Google Scholar
Jiang T, Zhang B, Yu F (2017) Efficient parallel processing for kNN queries. In: ICIDE, pp. 88–94
Jin W, Patel JM (2011) Efficient and generic evaluation of ranked queries. In: SIGMOD, pp. 601–612
Jo J, Seo J, Fekete JD (2020) PANENE: a progressive algorithm for indexing and querying approximate k-nearest neighbors. IEEE Trans Vis Comput Graph 26(2):1347–1360
Article Google Scholar
Lee J, Cho H, Hwang S et al (2014) Toward scalable indexing for top-k queries. IEEE Trans Knowl Data Eng 26(12):3103–3116
Article Google Scholar
Li W, Zhang Y, Sun Y et al (2020c) Approximate nearest neighbor search on high dimensional data—experiments, analyses, and improvement. IEEE Trans Knowl Data Eng 32(8):1475–1488
Article Google Scholar
Li M, Zhang Y, Sun Y et al. (2020a) I/O efficient approximate nearest neighbor search based on learned functions. In: ICDE, pp. 289–300
Li C, Zhang M, Andersen DG et al. (2020b) Improving approximate nearest neighbor search through learned adaptive early termination. In: SIGMOD, pp. 2539–2554
Lu K, Wang H, Wang W, Kudo M (2020) VHP: approximate nearest neighbor search via virtual hypersphere partitioning. PVLDB 13(9):1443–1455
Google Scholar
Lu K, Kudo M, Xiao C et al (2021) HVS: hierarchical graph structure based on voronoi diagrams for solving approximate nearest neighbor search. Proc VLDB Endow 15(2):246–258
Article Google Scholar
Maillo J, Triguero I, Herrera F (2015) A MapReduce-based k-nearest neighbor approach for big data classification. In: Proceeding IEEE BigDataSE, pp. 167–172
Malkov YuA, Yashunin DA (2020) Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans Pattern Anal Mach Intell 42:824–836
Article Google Scholar
Muñoz JV, Gonçalves MA, Dias Z et al (2019) Hierarchical clustering-based graphs for large scale approximate nearest neighbor search. Pattern Recognit 96:106970
Article Google Scholar
Nam M, Kim J, Nam B (2016) Parallel tree traversal for nearest neighbor query on the GPU. In: ICPP, pp. 113–122
Pan J, Manocha D (2011) Fast GPU-based locality sensitive hashing for k-nearest neighbor computation. In: ACM SIGSPATIAL, pp. 211–220
Papadopoulos AN, Manolopoulos Y (1996) Parallel processing of nearest neighbor queries in declustered spatial data. In: Proceeding ACM-GIS conference, pp. 35–43
Papadopoulos AN, Manolopoulos Y (1998) Similarity query processing using disk arrays. In: SIGMOD, pp. 225–236
Patwary MMA, Satish NR, Sundaram N et al. (2016) PANDA: extreme scale parallel k-nearest neighbor on distributed architectures. In: IPDPS, pp. 494–503
Ram P, Sinha K (2019) Revisiting kd-tree for nearest neighbor search. In: SIGKDD, pp. 1378–1388
Roussopoulos N, Kelley S, Vincent F (1995) Nearest neighbor queries. ACM SIGMOD Rec 24(2):71–79
Article Google Scholar
Shahvarani A, Jacobsen HA (2021) Distributed stream KNN join. In: SIGMOD conference, pp. 1597–1609
Tao J, Zhang B, Lin D, Gao Y, Li Q (2020) Efficient column-oriented processing for mutual subspace skyline queries. Soft Comput 24:15427–15445
Article Google Scholar
Tao Y, Yi K, Sheng C et al. (2009) Quality and efficiency in high dimensional nearest neighbor search. In SIGMOD, pp. 563–576
Wang M, Xu X, Yue Q et al (2021) A comprehensive survey and experimental comparison of graph-based approximate nearest neighbor search. Proc VLDB Endow 14(11):1964–1978
Article Google Scholar
Zhang B, Jiang T, Bao Z, Wong R, Chen L (2016a) Monochromatic and bichromatic reverse top-k group nearest neighbor queries. Expert Syst Appl 53(1):57–74
Article Google Scholar
Zhang S, Sun C, He Z (2016b) ListMerge: accelerating top-k aggregation queries over large number of lists. In: Proceeding of the 21^st InInternational Conference on DASFAA, pp. 67–81

Download references

Funding

This work was supported in part by the ZJNSF Grant LY16F020026, NSFC Grants. 61522208 and 61379033.

Author information

Authors and Affiliations

College of Information Science and Engineering, Jiaxing University, 56 Yuexiu Road (South), 118 Jiahang Road, Jiaxing, 314001, People’s Republic of China
Tao Jiang & Bin Zhang
Department of Computer Science, Missouri University of Science and Technology, 500West 15th Street, Rolla, MO, 65409, USA
Dan Lin
College of Computer Science, Zhejiang University, 38 Zheda Road, Hangzhou, 310027, China
Yunjun Gao
Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, China
Qing Li

Authors

Tao Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Bin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Dan Lin
View author publications
You can also search for this author in PubMed Google Scholar
Yunjun Gao
View author publications
You can also search for this author in PubMed Google Scholar
Qing Li
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Tao Jiang and Bin Zhang. The first draft of the manuscript was written by Tao Jiang and all authors commented on previous versions of the manuscript. The revised work is mainly finished by Dan Lin. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Tao Jiang.

Ethics declarations

Conflict of interest

TJ declares that he/she has no conflict of interest. BZ declares that he/she has no conflict of interest. TJ declares that he/she has no conflict of interest. DL declares that he/she has no conflict of interest. YG declares that he/she has no conflict of interest. QL declares that he/she has no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This manuscript is the authors' original work and has not been published except for a preliminary version [19] nor has it been submitted simultaneously elsewhere. All authors have checked the manuscript and have agreed to the submission.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, T., Zhang, B., Lin, D. et al. Efficient parallel processing of high-dimensional spatial kNN queries. Soft Comput 26, 12291–12316 (2022). https://doi.org/10.1007/s00500-022-07081-0

Download citation

Accepted: 28 March 2022
Published: 02 May 2022
Issue Date: November 2022
DOI: https://doi.org/10.1007/s00500-022-07081-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient parallel processing of high-dimensional spatial kNN queries

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Algorithms for processing the group K nearest-neighbor query on distributed frameworks

Efficient spatial data partitioning for distributed \(k\)NN joins

Efficient top-k spatial-range-constrained approximate nearest neighbor search on geo-tagged high-dimensional vectors

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Efficient parallel processing of high-dimensional spatial kNN queries

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Algorithms for processing the group K nearest-neighbor query on distributed frameworks

Efficient spatial data partitioning for distributed \(k\)NN joins

Efficient top-k spatial-range-constrained approximate nearest neighbor search on geo-tagged high-dimensional vectors

Explore related subjects

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation