Skip to main content

kNN Join for Dynamic High-Dimensional Data: A Parallel Approach

  • Conference paper
  • First Online:
Databases Theory and Applications (ADC 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14386))

Included in the following conference series:

  • 694 Accesses

Abstract

The k nearest neighbor (kNN) join operation is a fundamental task that combines two high-dimensional databases, enabling data points in the User dataset U to identify their k nearest neighbor points from the Item dataset I. This operation plays a crucial role in various domains, including knowledge discovery, data mining, similarity search applications, and scientific research. However, exact kNN search in high-dimensional spaces is computationally demanding, and existing sequential methods face challenges in handling large datasets. In this paper, we propose an efficient parallel solution for dynamic kNN join over high-dimensional data, leveraging the high-dimensional R tree (HDR Tree) for improved efficiency. Our solution harnesses the power of Simultaneous Multi-Threading (SMT) technologies and Single-Instruction-Multiple-Data (SIMD) instructions in modern CPUs for parallelisation. Importantly, our research is the first to introduce parallel computation for exact kNN join over high-dimensional data. Experimental results demonstrate that our proposed approach outperforms the sequential HDR Tree method by up to 1.2 times with a single thread. Moreover, our solution provides near-linear scalability as the number of threads increases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Böhm, C., Krebs, F.: Supporting KDD applications by the k-nearest neighbor join. In: Mařík, V., Retschitzegger, W., Štěpánková, O. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 504–516. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45227-0_50

    Chapter  Google Scholar 

  2. Böhm, C., Krebs, F.: The k-nearest neighbour join: turbo charging the KDD process. Knowl. Inf. Syst. 6(6), 728–749 (2004)

    Article  Google Scholar 

  3. Chakrabarti, K., Mehrotra, S.: Local dimensionality reduction: a new approach to indexing high dimensional spaces. In: VLDB Conference (2000)

    Google Scholar 

  4. Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: a real-world web image database from national university of Singapore. In: Proceedings of the ACM International Conference on Image and Video Retrieval, pp. 1–9 (2009)

    Google Scholar 

  5. Ferhatosmanoglu, H., Tuncel, E., Agrawal, D., El Abbadi, A.: Approximate nearest neighbor searching in multimedia databases. In: Proceedings 17th International Conference on Data Engineering, pp. 503–511. IEEE (2001)

    Google Scholar 

  6. Giacinto, G.: A nearest-neighbor approach to relevance feedback in content based image retrieval. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval, pp. 456–463 (2007)

    Google Scholar 

  7. Gowanlock, M.: KNN-joins using a hybrid approach: exploiting CPU/GPU workload characteristics. In: Proceedings of the 12th Workshop on General Purpose Processing Using GPUs, pp. 33–42 (2019)

    Google Scholar 

  8. Gowanlock, M.: Hybrid KNN-join: parallel nearest neighbor searches exploiting CPU and GPU architectural features. J. Parallel Distrib. Comput. 149, 119–137 (2021)

    Article  Google Scholar 

  9. Hu, Y., Yang, C., Zhan, P., Zhao, J., Li, Y., Li, X.: Efficient continuous KNN join processing for real-time recommendation. Pers. Ubiquit. Comput. 25, 1001–1011 (2021)

    Article  Google Scholar 

  10. Jagadish, H.V., Ooi, B.C., Tan, K.L., Yu, C., Zhang, R.: iDistance: an adaptive b+-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst. (TODS) 30(2), 364–397 (2005)

    Article  Google Scholar 

  11. Kouiroukidis, N., Evangelidis, G.: The effects of dimensionality curse in high dimensional KNN search. In: 2011 15th Panhellenic Conference on Informatics, pp. 41–45. IEEE (2011)

    Google Scholar 

  12. Lu, W., Shen, Y., Chen, S., Ooi, B.C.: Efficient processing of k nearest neighbor joins using MapReduce. arXiv preprint arXiv:1207.0141 (2012)

  13. McSherry, F., Isard, M., Murray, D.G.: Scalability! But at what \(\{\)COST\(\}\)? In: 15th Workshop on Hot Topics in Operating Systems (HotOS XV) (2015)

    Google Scholar 

  14. Shahvarani, A., Jacobsen, H.A.: Distributed stream KNN join. In: Proceedings of the 2021 International Conference on Management of Data, pp. 1597–1609 (2021)

    Google Scholar 

  15. Tanenbaum, A.S.: Distributed systems principles and paradigms (2007)

    Google Scholar 

  16. Ukey, N., Yang, Z., Li, B., Zhang, G., Hu, Y., Zhang, W.: Survey on exact kNN queries over high-dimensional data space. Sensors 23(2), 629 (2023)

    Article  Google Scholar 

  17. Ukey, N., Yang, Z., Zhang, G., Liu, B., Li, B., Zhang, W.: Efficient kNN join over dynamic high-dimensional data. In: Hua, W., Wang, H., Li, L. (eds.) ADC 2022. LNCS, vol. 13459, pp. 63–75. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-15512-3_5

    Chapter  Google Scholar 

  18. Wang, J., Lin, L., Huang, T., Wang, J., He, Z.: Efficient k-nearest neighbor join algorithms for high dimensional sparse data. arXiv preprint arXiv:1011.2807 (2010)

  19. Weber, R., Schek, H.J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB, vol. 98, pp. 194–205 (1998)

    Google Scholar 

  20. Xia, C., Lu, H., Ooi, B.C., Hu, J.: GORDER: an efficient method for KNN join processing. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, vol. 30, pp. 756–767 (2004)

    Google Scholar 

  21. Yang, C., Yu, X., Liu, Y.: Continuous KNN join processing for real-time recommendation. In: 2014 IEEE International Conference on Data Mining, pp. 640–649. IEEE (2014)

    Google Scholar 

  22. Yao, B., Li, F., Kumar, P.: K nearest neighbor queries and kNN-joins in large relational databases (almost) for free. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), pp. 4–15. IEEE (2010)

    Google Scholar 

  23. Yu, C., Cui, B., Wang, S., Su, J.: Efficient index-based KNN join processing for high-dimensional data. Inf. Softw. Technol. 49(4), 332–344 (2007)

    Article  Google Scholar 

  24. Yu, C., Ooi, B.C., Tan, K.L., Jagadish, H.: Indexing the distance: an efficient method to KNN processing. In: VLDB, vol. 1, pp. 421–430 (2001)

    Google Scholar 

  25. Yu, C., Zhang, R., Huang, Y., Xiong, H.: High-dimensional kNN joins with incremental updates. GeoInformatica 14(1), 55–82 (2010)

    Article  Google Scholar 

  26. Zhang, C., Li, F., Jestes, J.: Efficient parallel kNN joins for large data in MapReduce. In: Proceedings of the 15th International Conference on Extending Database Technology, pp. 38–49 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhengyi Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ukey, N., Yang, Z., Yang, W., Li, B., Li, R. (2024). kNN Join for Dynamic High-Dimensional Data: A Parallel Approach. In: Bao, Z., Borovica-Gajic, R., Qiu, R., Choudhury, F., Yang, Z. (eds) Databases Theory and Applications. ADC 2023. Lecture Notes in Computer Science, vol 14386. Springer, Cham. https://doi.org/10.1007/978-3-031-47843-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-47843-7_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-47842-0

  • Online ISBN: 978-3-031-47843-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics