Abstract
The k nearest neighbor (kNN) join operation is a fundamental task that combines two high-dimensional databases, enabling data points in the User dataset U to identify their k nearest neighbor points from the Item dataset I. This operation plays a crucial role in various domains, including knowledge discovery, data mining, similarity search applications, and scientific research. However, exact kNN search in high-dimensional spaces is computationally demanding, and existing sequential methods face challenges in handling large datasets. In this paper, we propose an efficient parallel solution for dynamic kNN join over high-dimensional data, leveraging the high-dimensional R tree (HDR Tree) for improved efficiency. Our solution harnesses the power of Simultaneous Multi-Threading (SMT) technologies and Single-Instruction-Multiple-Data (SIMD) instructions in modern CPUs for parallelisation. Importantly, our research is the first to introduce parallel computation for exact kNN join over high-dimensional data. Experimental results demonstrate that our proposed approach outperforms the sequential HDR Tree method by up to 1.2 times with a single thread. Moreover, our solution provides near-linear scalability as the number of threads increases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Böhm, C., Krebs, F.: Supporting KDD applications by the k-nearest neighbor join. In: Mařík, V., Retschitzegger, W., Štěpánková, O. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 504–516. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45227-0_50
Böhm, C., Krebs, F.: The k-nearest neighbour join: turbo charging the KDD process. Knowl. Inf. Syst. 6(6), 728–749 (2004)
Chakrabarti, K., Mehrotra, S.: Local dimensionality reduction: a new approach to indexing high dimensional spaces. In: VLDB Conference (2000)
Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: a real-world web image database from national university of Singapore. In: Proceedings of the ACM International Conference on Image and Video Retrieval, pp. 1–9 (2009)
Ferhatosmanoglu, H., Tuncel, E., Agrawal, D., El Abbadi, A.: Approximate nearest neighbor searching in multimedia databases. In: Proceedings 17th International Conference on Data Engineering, pp. 503–511. IEEE (2001)
Giacinto, G.: A nearest-neighbor approach to relevance feedback in content based image retrieval. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval, pp. 456–463 (2007)
Gowanlock, M.: KNN-joins using a hybrid approach: exploiting CPU/GPU workload characteristics. In: Proceedings of the 12th Workshop on General Purpose Processing Using GPUs, pp. 33–42 (2019)
Gowanlock, M.: Hybrid KNN-join: parallel nearest neighbor searches exploiting CPU and GPU architectural features. J. Parallel Distrib. Comput. 149, 119–137 (2021)
Hu, Y., Yang, C., Zhan, P., Zhao, J., Li, Y., Li, X.: Efficient continuous KNN join processing for real-time recommendation. Pers. Ubiquit. Comput. 25, 1001–1011 (2021)
Jagadish, H.V., Ooi, B.C., Tan, K.L., Yu, C., Zhang, R.: iDistance: an adaptive b+-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst. (TODS) 30(2), 364–397 (2005)
Kouiroukidis, N., Evangelidis, G.: The effects of dimensionality curse in high dimensional KNN search. In: 2011 15th Panhellenic Conference on Informatics, pp. 41–45. IEEE (2011)
Lu, W., Shen, Y., Chen, S., Ooi, B.C.: Efficient processing of k nearest neighbor joins using MapReduce. arXiv preprint arXiv:1207.0141 (2012)
McSherry, F., Isard, M., Murray, D.G.: Scalability! But at what \(\{\)COST\(\}\)? In: 15th Workshop on Hot Topics in Operating Systems (HotOS XV) (2015)
Shahvarani, A., Jacobsen, H.A.: Distributed stream KNN join. In: Proceedings of the 2021 International Conference on Management of Data, pp. 1597–1609 (2021)
Tanenbaum, A.S.: Distributed systems principles and paradigms (2007)
Ukey, N., Yang, Z., Li, B., Zhang, G., Hu, Y., Zhang, W.: Survey on exact kNN queries over high-dimensional data space. Sensors 23(2), 629 (2023)
Ukey, N., Yang, Z., Zhang, G., Liu, B., Li, B., Zhang, W.: Efficient kNN join over dynamic high-dimensional data. In: Hua, W., Wang, H., Li, L. (eds.) ADC 2022. LNCS, vol. 13459, pp. 63–75. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-15512-3_5
Wang, J., Lin, L., Huang, T., Wang, J., He, Z.: Efficient k-nearest neighbor join algorithms for high dimensional sparse data. arXiv preprint arXiv:1011.2807 (2010)
Weber, R., Schek, H.J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB, vol. 98, pp. 194–205 (1998)
Xia, C., Lu, H., Ooi, B.C., Hu, J.: GORDER: an efficient method for KNN join processing. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, vol. 30, pp. 756–767 (2004)
Yang, C., Yu, X., Liu, Y.: Continuous KNN join processing for real-time recommendation. In: 2014 IEEE International Conference on Data Mining, pp. 640–649. IEEE (2014)
Yao, B., Li, F., Kumar, P.: K nearest neighbor queries and kNN-joins in large relational databases (almost) for free. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), pp. 4–15. IEEE (2010)
Yu, C., Cui, B., Wang, S., Su, J.: Efficient index-based KNN join processing for high-dimensional data. Inf. Softw. Technol. 49(4), 332–344 (2007)
Yu, C., Ooi, B.C., Tan, K.L., Jagadish, H.: Indexing the distance: an efficient method to KNN processing. In: VLDB, vol. 1, pp. 421–430 (2001)
Yu, C., Zhang, R., Huang, Y., Xiong, H.: High-dimensional kNN joins with incremental updates. GeoInformatica 14(1), 55–82 (2010)
Zhang, C., Li, F., Jestes, J.: Efficient parallel kNN joins for large data in MapReduce. In: Proceedings of the 15th International Conference on Extending Database Technology, pp. 38–49 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ukey, N., Yang, Z., Yang, W., Li, B., Li, R. (2024). kNN Join for Dynamic High-Dimensional Data: A Parallel Approach. In: Bao, Z., Borovica-Gajic, R., Qiu, R., Choudhury, F., Yang, Z. (eds) Databases Theory and Applications. ADC 2023. Lecture Notes in Computer Science, vol 14386. Springer, Cham. https://doi.org/10.1007/978-3-031-47843-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-47843-7_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47842-0
Online ISBN: 978-3-031-47843-7
eBook Packages: Computer ScienceComputer Science (R0)