Skip to main content

Efficient parallel processing of high-dimensional spatial kNN queries

  • Application of soft computing
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Some efficient top-k algorithms, i.e., Fagin’s Algorithm, threshold algorithm (TA), and best position algorithm (BPA), can be used to answer k nearest neighbor (kNN) queries. However, extending the existing algorithms without further changes to the algorithms themselves would not be efficient since there are the different characteristics between the kNN queries and top-k queries. For example, the kNN queries are more distance-sensitive rather than the position of data points. Second, it is necessary to add some novel parallel heuristics and pruning policies for the kNN queries. Third, there are still many redundant random accesses among FA, TA, and BPA. In this paper, we address aforementioned these problems and take these algorithms to answer parallel kNN (PkNN) queries in spatial databases. We integrate the advantages of the B + -tree and Open MP programming and propose three efficient parallel kNN query algorithms, namely distance priority-based PkNN, optimized PkNN, and partition-based PkNN. Our performance evaluation shows that our proposed algorithms achieve significant improvement in comparison with existing algorithms, i.e., BPA and BPA2. In addition, our approaches are also capable of returning kNN results incrementally which greatly shorten the query response time and enhance user experience.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The datasets generated during and/or analyzed during the current study are not publicly available but are available from the corresponding author on reasonable request.

References

  • Akbarinia R, Pacitti E, Valduriez P (2011) Best position algorithms for efficient top-k query processing. Inf Syst 36(6):973–989

    Article  Google Scholar 

  • Ali MH, Saad AA, Ismail MA (2005) The PN-tree: a parallel and distributed multidimensional index. Distrib Parallel Databases 17(2):111–133

    Article  Google Scholar 

  • Berchtold S, Bohm C, Braunmuller B, Keim DA, Kriegel HP (1997) Fast parallel similarity search in multimedia databases. SIGMOD Rec 26(2):1–12

    Article  Google Scholar 

  • Cao M, Jia W, Lv Z et al (2018) Two-pass k nearest neighbor search for feature tracking. IEEE Access 6:72939–72951

    Article  Google Scholar 

  • Cao M, Li L, Xie W et al (2019) Parallel k nearest neighbor matching for 3D reconstruction. IEEE Access 7:55248–55260

    Article  Google Scholar 

  • Challa JS, Goyal P, Nikhil S, Balasubramaniam S, Goyal N (2015) A concurrent k-NN search algorithm for R-tree. In: Proceeding annual ACM India conference, pp. 123–128.

  • Chester S, Sidlauskas D, Assent I et al. (2015) Scalable parallelization of skyline computation for multi-core processors. In: ICDE, pp. 1083–1094

  • Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  • Ouyang D, Wen D, Qin L et al. (2020) Progressive top-K nearest neighbors search in large road networks. In: SIGMOD, pp. 1781–1795.

  • Fagin R (1999) Combining fuzzy information from multiple systems. J Comput Syst Sci 58(1):83–99

    Article  MathSciNet  Google Scholar 

  • Fagin R, Lotem J, Naor M (2003) Optimal aggregation algorithms for middleware. J Comput Syst Sci 66(4):614–656

    Article  MathSciNet  Google Scholar 

  • Feng X, Gao Y, Jiang T et al (2013) Parallel k-Skyband computation on multicore architecture. LNCS 7808:827–837

    Google Scholar 

  • Gao Y, Chen L, Chen G, Chen C (2006) Efficient parallel processing for k-nearest-neighbor search in spatial databases. LNCS 3984:39–48

    MATH  Google Scholar 

  • Gieseke F, Heinermann J, Oancea C, Igel C (2014) Buffer k-d trees: processing massive nearest neighbor queries on GPUs. In: Proceeding. of international conference on machine learning, pp. 1–9

  • Gowanlock M (2021) Hybrid KNN-join: parallel nearest neighbor searches exploiting CPU and GPU architectural features. J Parallel Distrib Comput 149:119–137

    Article  Google Scholar 

  • Guzun G, Tosado J, Canahuate G (2014) Slicing the dimensionality: top-k query processing for high-dimensional spaces. Trans Large Scale Data Knowl Centered Syst 14:26–50

    Google Scholar 

  • Han X, Li J, Gao H (2015) Efficient top-k retrieval on massive data. IEEE Trans Knowl Data Eng 27(10):2687–2699

    Article  Google Scholar 

  • Hjaltason GR, Samet H (1999) Distance browsing in spatial databases. ACM Trans Database Syst 24(2):265–318

    Article  Google Scholar 

  • Jagadish HV, Ooi BC, Tan K-L et al (2005) iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans Database Syst 30(2):364–397

    Article  Google Scholar 

  • Jiang T, Gao Y, Zhang B, Lin D, Li Q (2014) Monochromatic and bichromatic mutual skyline queries. Expert Syst Appl 41(4):1885–1900

    Article  Google Scholar 

  • Jiang T, Zhang B, Yu F (2017) Efficient parallel processing for kNN queries. In: ICIDE, pp. 88–94

  • Jin W, Patel JM (2011) Efficient and generic evaluation of ranked queries. In: SIGMOD, pp. 601–612

  • Jo J, Seo J, Fekete JD (2020) PANENE: a progressive algorithm for indexing and querying approximate k-nearest neighbors. IEEE Trans Vis Comput Graph 26(2):1347–1360

    Article  Google Scholar 

  • Lee J, Cho H, Hwang S et al (2014) Toward scalable indexing for top-k queries. IEEE Trans Knowl Data Eng 26(12):3103–3116

    Article  Google Scholar 

  • Li W, Zhang Y, Sun Y et al (2020c) Approximate nearest neighbor search on high dimensional data—experiments, analyses, and improvement. IEEE Trans Knowl Data Eng 32(8):1475–1488

    Article  Google Scholar 

  • Li M, Zhang Y, Sun Y et al. (2020a) I/O efficient approximate nearest neighbor search based on learned functions. In: ICDE, pp. 289–300

  • Li C, Zhang M, Andersen DG et al. (2020b) Improving approximate nearest neighbor search through learned adaptive early termination. In: SIGMOD, pp. 2539–2554

  • Lu K, Wang H, Wang W, Kudo M (2020) VHP: approximate nearest neighbor search via virtual hypersphere partitioning. PVLDB 13(9):1443–1455

    Google Scholar 

  • Lu K, Kudo M, Xiao C et al (2021) HVS: hierarchical graph structure based on voronoi diagrams for solving approximate nearest neighbor search. Proc VLDB Endow 15(2):246–258

    Article  Google Scholar 

  • Maillo J, Triguero I, Herrera F (2015) A MapReduce-based k-nearest neighbor approach for big data classification. In: Proceeding IEEE BigDataSE, pp. 167–172

  • Malkov YuA, Yashunin DA (2020) Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans Pattern Anal Mach Intell 42:824–836

    Article  Google Scholar 

  • Muñoz JV, Gonçalves MA, Dias Z et al (2019) Hierarchical clustering-based graphs for large scale approximate nearest neighbor search. Pattern Recognit 96:106970

    Article  Google Scholar 

  • Nam M, Kim J, Nam B (2016) Parallel tree traversal for nearest neighbor query on the GPU. In: ICPP, pp. 113–122

  • Pan J, Manocha D (2011) Fast GPU-based locality sensitive hashing for k-nearest neighbor computation. In: ACM SIGSPATIAL, pp. 211–220

  • Papadopoulos AN, Manolopoulos Y (1996) Parallel processing of nearest neighbor queries in declustered spatial data. In: Proceeding ACM-GIS conference, pp. 35–43

  • Papadopoulos AN, Manolopoulos Y (1998) Similarity query processing using disk arrays. In: SIGMOD, pp. 225–236

  • Patwary MMA, Satish NR, Sundaram N et al. (2016) PANDA: extreme scale parallel k-nearest neighbor on distributed architectures. In: IPDPS, pp. 494–503

  • Ram P, Sinha K (2019) Revisiting kd-tree for nearest neighbor search. In: SIGKDD, pp. 1378–1388

  • Roussopoulos N, Kelley S, Vincent F (1995) Nearest neighbor queries. ACM SIGMOD Rec 24(2):71–79

    Article  Google Scholar 

  • Shahvarani A, Jacobsen HA (2021) Distributed stream KNN join. In: SIGMOD conference, pp. 1597–1609

  • Tao J, Zhang B, Lin D, Gao Y, Li Q (2020) Efficient column-oriented processing for mutual subspace skyline queries. Soft Comput 24:15427–15445

    Article  Google Scholar 

  • Tao Y, Yi K, Sheng C et al. (2009) Quality and efficiency in high dimensional nearest neighbor search. In SIGMOD, pp. 563–576

  • Wang M, Xu X, Yue Q et al (2021) A comprehensive survey and experimental comparison of graph-based approximate nearest neighbor search. Proc VLDB Endow 14(11):1964–1978

    Article  Google Scholar 

  • Zhang B, Jiang T, Bao Z, Wong R, Chen L (2016a) Monochromatic and bichromatic reverse top-k group nearest neighbor queries. Expert Syst Appl 53(1):57–74

    Article  Google Scholar 

  • Zhang S, Sun C, He Z (2016b) ListMerge: accelerating top-k aggregation queries over large number of lists. In: Proceeding of the 21st InInternational Conference on DASFAA, pp. 67–81

Download references

Funding

This work was supported in part by the ZJNSF Grant LY16F020026, NSFC Grants. 61522208 and 61379033.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Tao Jiang and Bin Zhang. The first draft of the manuscript was written by Tao Jiang and all authors commented on previous versions of the manuscript. The revised work is mainly finished by Dan Lin. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Tao Jiang.

Ethics declarations

Conflict of interest

TJ declares that he/she has no conflict of interest. BZ declares that he/she has no conflict of interest. TJ declares that he/she has no conflict of interest. DL declares that he/she has no conflict of interest. YG declares that he/she has no conflict of interest. QL declares that he/she has no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This manuscript is the authors' original work and has not been published except for a preliminary version [19] nor has it been submitted simultaneously elsewhere. All authors have checked the manuscript and have agreed to the submission.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, T., Zhang, B., Lin, D. et al. Efficient parallel processing of high-dimensional spatial kNN queries. Soft Comput 26, 12291–12316 (2022). https://doi.org/10.1007/s00500-022-07081-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-022-07081-0

Keywords