Skip to main content
Log in

High-dimensional kNN joins with incremental updates

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

The k Nearest Neighbor (kNN) join operation associates each data object in one data set with its k nearest neighbors from the same or a different data set. The kNN join on high-dimensional data (high-dimensional kNN join) is a very expensive operation. Existing high-dimensional kNN join algorithms were designed for static data sets and therefore cannot handle updates efficiently. In this article, we propose a novel kNN join method, named kNNJoin +, which supports efficient incremental computation of kNN join results with updates on high-dimensional data. As a by-product, our method also provides answers for the reverse kNN queries with very little overhead. We have performed an extensive experimental study. The results show the effectiveness of kNNJoin+ for processing high-dimensional kNN joins in dynamic workloads.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24

Similar content being viewed by others

Notes

  1. In this paper, an update means an insertion, a deletion or a change to an existing data object.

  2. Here kNN query only needs to find a new k-th NN.

References

  1. The UCI KDD Archive (1999) KDD Cup 1999 Data. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html (http://kdd.ics.uci.edu)

  2. Achtert E, Böhm C, Kröger P, Kunath P, Pryakhin A, Renz M (2006) Efficient reverse k-nearest neighbor search in arbitrary metric spaces. In: SIGMOD’06: proceedings of the 2006 ACM SIGMOD international conference on management of data, pp 515–526

  3. Berchtold S, Keim DA (1998) High-dimensional index structures database support for next decade’s applications (tutorial). In: SIGMOD ’98: proceedings of the 1998 ACM SIGMOD international conference on management of data, p 501

  4. Beyer KS, Goldstein J, Ramakrishnan R, Shaft U (1999) When is nearest neighbor meaningful? In: Proceeding of the 7th international conference on database theory (ICDT), pp 217–235

  5. Böhm C, Berchtold S, Keim DA (2001) Searching in high-dimensional spaces: index structures for improving the performance of multimedia databases. ACM Comput Surv 33(3):322–373

    Article  Google Scholar 

  6. Böhm C, Krebs F (2004) The k-nearest neighbor join: turbo charging the kdd process. Knowl Inf Syst (KAIS) 6(6):728–749

    Article  Google Scholar 

  7. Böhm C, Kriegel H-P (2000) Dynamically optimizing high-dimensional index structures. In: Proceedings of the 7th international conference on extending database technology (EDBT), pp 36–50

  8. Ciaccia P, Patella M, Zezula P (1997) M-tree: an efficient access method for similarity search in metric spaces. In: VLDB ’97: proceedings of the 23rd international conference on very large data bases, pp 426–435

  9. Dasarathy BV (1991) Nearest neighbor (nn) norms - nn pattern classification techniques. IEEE Computer Society, Silver Spring

    Google Scholar 

  10. Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: SIGMOD ’84: proceedings of the 1984 ACM SIGMOD international conference on management of data, pp 47–57

  11. Hartigan J, Wong M (1979) A K-means clustering algorithm. Appl Stat 28:100–108

    Article  Google Scholar 

  12. Huang X, Jensen CS, Saltenis S (2006) Multiple k nearest neighbor query processing in spatial network databases. In: ADBIS ’06: proceedings of 10th East European conference of advances in databases and information systems, pp 266–281

  13. Jagadish HV, Ooi BC, Tan K-L, Yu C, Zhang R (2005) iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans Database Syst (TODS) 30(2):364–397

    Article  Google Scholar 

  14. Korn F, Muthukrishnan S (2000) Influence sets based on reverse nearest neighbor queries. In: SIGMOD ’00: proceedings of the 2000 ACM SIGMOD international conference on management of data, pp 201–212

  15. Lin K-I, Jagadish HV, Faloutsos C (1994) The TV-tree: an index structure for high-dimensional data. VLDB J 3:517–542

    Article  Google Scholar 

  16. Rafiei D, Mendelzon A (2000) Querying time series data based on similarity. IEEE Trans Knowl Data Eng 12(5):675–693

    Article  Google Scholar 

  17. Tao Y, Papadias D, Lian X (2004) Reverse knn search in arbitrary dimensionality. In: VLDB ’04: Proceedings of the 30th international conference on very large data bases, pp 744–755

  18. Tao Y, Yiu, ML, Mamoulis N (2006) Reverse nearest neighbor search in metric spaces. IEEE Trans Knowl Data Eng 18(9):1239–1252

    Article  Google Scholar 

  19. Weber R, Schek H, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB ’98: proceedings of the 24rd international conference on very large data bases, pp 194–205

  20. Wong R, Tao Y, Fu A, Xiao X, Pryakhin A, Renz M (2007) On efficient spatial matching. In: VLDB ’07: proceedings of the 33rd international conference on very large data bases, pp 579–590

  21. Xia C, Lu H, Ooi BC, Hu J (2004) Gorder: an efficient method for knn join processing. In: VLDB ’04: proceedings of the 30th international conference on very large data bases, pp 756–767

  22. Yang C, Lin K (2001) An index structure for efficient reverse nearest neighbor queries In: Proceedings of the 17th international conference on data engineering (ICDE), pp 485–492

  23. Yu C, Cui B, Wang S, Su J (2007) Efficient index-based knn join processing for high-dimensional data. Inf Softw Technol 49(4):332–344

    Article  Google Scholar 

  24. Yu C, Ooi BC, Tan K-L, Jagadish HV (2001) Indexing the distance: an efficient method to knn processing. In: VLDB ’01: proceedings of the 27th international conference on very large data bases, pp 166–174

  25. Zhang R, Koudas N, Ooi BC, Srivastava D (2005) Multiple aggregations over data streams. In: SIGMOD ’05: proceedings of the 2005 ACM SIGMOD international conference on management of data, pp 299–310

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cui Yu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, C., Zhang, R., Huang, Y. et al. High-dimensional kNN joins with incremental updates. Geoinformatica 14, 55–82 (2010). https://doi.org/10.1007/s10707-009-0076-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-009-0076-5

Keywords

Navigation