Abstract
Real-time recommendation is a necessary component of current social applications. It is responsible for suggesting relevant newly published data to the users based on their preferences. By representing the users and the published data in a metric space, each user can be recommended with their k nearest neighbors among the published data, i.e., the kNN join is computed. In this work, we aim at a frequent requirement that only the recently published data are subject of the recommendation, thus a sliding time window is defined and only the data published within the limits of the window can be recommended. Due to large amounts of both the users and the published data, it becomes a challenging task to continuously update the results of the kNN join as new data come into and go out of the sliding window.
We propose a binary sketch-based approximation technique suited especially to cases when the metric distance computation is an expensive operation (e.g., the Euclidean distance in high dimensional vector spaces). It applies cheap Hamming distances to skip over 90% of the expensive metric distance computations. As revealed by our experiments on 4,096 dimensional vectors, the proposed approach significantly outperforms compared existing approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
A Hamming distance of two sketches is computed as the number of positions in which their values differ.
- 2.
References
Böhm, C., Ooi, B.C., Plant, C., Yan, Y.: Efficiently processing continuous k-nn queries on data streams. In: ICDE, pp. 156–165. IEEE Computer Society (2007)
Budikova, P., Batko, M., Zezula, P.: Evaluation platform for content-based image retrieval systems. In: Gradmann, S., Borri, F., Meghini, C., Schuldt, H. (eds.) TPDL 2011. LNCS, vol. 6966, pp. 130–142. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24469-8_15
Chua, T., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: a real-world web image database from national university of singapore. In: CIVR, ACM (2009)
Ciaccia, P., Patella, M., Zezula, P.: M-tree: an efficient access method for similarity search in metric spaces. In: VLDB, pp. 426–435. Morgan Kaufmann (1997)
Hu, Y., Yang, C., Ji, C., Xu, Y., Li, X.: Efficient snapshot KNN join processing for large data using mapreduce. In: ICPADS, pp. 713–720. IEEE Computer Society (2016)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R.B., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM Multimedia, pp. 675–678. ACM (2014)
Mic, V., Novak, D., Zezula, P.: Improving sketches for similarity search. In: Proceedings of MEMICS, pp. 45–57 (2015)
Mic, V., Novak, D., Zezula, P.: Speeding up similarity search by sketches. In: Amsaleg, L., Houle, M.E., Schubert, E. (eds.) SISAP 2016. LNCS, vol. 9939, pp. 250–258. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-319-46759-7_19
Morales, G.D.F., Gionis, A.: Streaming similarity self-join. PVLDB 9(10), 792–803 (2016)
Novak, D., Batko, M., Zezula, P.: Metric index: an efficient and scalable solution for precise and approximate similarity search. Inf. Syst. 36(4), 721–733 (2011)
Yang, C., Yu, X., Liu, Y.: Continuous KNN join processing for real-time recommendation. In: ICDM, pp. 640–649. IEEE Computer Society (2014)
Yu, C., Ooi, B.C., Tan, K., Jagadish, H.V.: Indexing the distance: an efficient method to KNN processing. In: VLDB, pp. 421–430. Morgan Kaufmann (2001)
Yu, C., Zhang, R., Huang, Y., Xiong, H.: High-dimensional knn joins with incremental updates. GeoInformatica 14(1), 55–82 (2010)
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search - The Metric Space Approach, Advances in Database Systems, vol. 32. Kluwer (2006)
Acknowledgements
This work was supported by the Czech national research project GA16-18889S.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Nalepa, F., Batko, M., Zezula, P. (2018). Speeding Up Continuous kNN Join by Binary Sketches. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2018. Lecture Notes in Computer Science(), vol 10933. Springer, Cham. https://doi.org/10.1007/978-3-319-95786-9_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-95786-9_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-95785-2
Online ISBN: 978-3-319-95786-9
eBook Packages: Computer ScienceComputer Science (R0)