Skip to main content

Speeding Up Continuous kNN Join by Binary Sketches

  • Conference paper
  • First Online:
Book cover Advances in Data Mining. Applications and Theoretical Aspects (ICDM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10933))

Included in the following conference series:

Abstract

Real-time recommendation is a necessary component of current social applications. It is responsible for suggesting relevant newly published data to the users based on their preferences. By representing the users and the published data in a metric space, each user can be recommended with their k nearest neighbors among the published data, i.e., the kNN join is computed. In this work, we aim at a frequent requirement that only the recently published data are subject of the recommendation, thus a sliding time window is defined and only the data published within the limits of the window can be recommended. Due to large amounts of both the users and the published data, it becomes a challenging task to continuously update the results of the kNN join as new data come into and go out of the sliding window.

We propose a binary sketch-based approximation technique suited especially to cases when the metric distance computation is an expensive operation (e.g., the Euclidean distance in high dimensional vector spaces). It applies cheap Hamming distances to skip over 90% of the expensive metric distance computations. As revealed by our experiments on 4,096 dimensional vectors, the proposed approach significantly outperforms compared existing approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A Hamming distance of two sketches is computed as the number of positions in which their values differ.

  2. 2.

    https://github.com/chongzi1990/continuouskNNjoin.git.

References

  1. Böhm, C., Ooi, B.C., Plant, C., Yan, Y.: Efficiently processing continuous k-nn queries on data streams. In: ICDE, pp. 156–165. IEEE Computer Society (2007)

    Google Scholar 

  2. Budikova, P., Batko, M., Zezula, P.: Evaluation platform for content-based image retrieval systems. In: Gradmann, S., Borri, F., Meghini, C., Schuldt, H. (eds.) TPDL 2011. LNCS, vol. 6966, pp. 130–142. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24469-8_15

    Chapter  Google Scholar 

  3. Chua, T., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: a real-world web image database from national university of singapore. In: CIVR, ACM (2009)

    Google Scholar 

  4. Ciaccia, P., Patella, M., Zezula, P.: M-tree: an efficient access method for similarity search in metric spaces. In: VLDB, pp. 426–435. Morgan Kaufmann (1997)

    Google Scholar 

  5. Hu, Y., Yang, C., Ji, C., Xu, Y., Li, X.: Efficient snapshot KNN join processing for large data using mapreduce. In: ICPADS, pp. 713–720. IEEE Computer Society (2016)

    Google Scholar 

  6. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R.B., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM Multimedia, pp. 675–678. ACM (2014)

    Google Scholar 

  7. Mic, V., Novak, D., Zezula, P.: Improving sketches for similarity search. In: Proceedings of MEMICS, pp. 45–57 (2015)

    Google Scholar 

  8. Mic, V., Novak, D., Zezula, P.: Speeding up similarity search by sketches. In: Amsaleg, L., Houle, M.E., Schubert, E. (eds.) SISAP 2016. LNCS, vol. 9939, pp. 250–258. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-319-46759-7_19

    Chapter  Google Scholar 

  9. Morales, G.D.F., Gionis, A.: Streaming similarity self-join. PVLDB 9(10), 792–803 (2016)

    Google Scholar 

  10. Novak, D., Batko, M., Zezula, P.: Metric index: an efficient and scalable solution for precise and approximate similarity search. Inf. Syst. 36(4), 721–733 (2011)

    Article  Google Scholar 

  11. Yang, C., Yu, X., Liu, Y.: Continuous KNN join processing for real-time recommendation. In: ICDM, pp. 640–649. IEEE Computer Society (2014)

    Google Scholar 

  12. Yu, C., Ooi, B.C., Tan, K., Jagadish, H.V.: Indexing the distance: an efficient method to KNN processing. In: VLDB, pp. 421–430. Morgan Kaufmann (2001)

    Google Scholar 

  13. Yu, C., Zhang, R., Huang, Y., Xiong, H.: High-dimensional knn joins with incremental updates. GeoInformatica 14(1), 55–82 (2010)

    Article  Google Scholar 

  14. Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search - The Metric Space Approach, Advances in Database Systems, vol. 32. Kluwer (2006)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the Czech national research project GA16-18889S.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Filip Nalepa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nalepa, F., Batko, M., Zezula, P. (2018). Speeding Up Continuous kNN Join by Binary Sketches. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2018. Lecture Notes in Computer Science(), vol 10933. Springer, Cham. https://doi.org/10.1007/978-3-319-95786-9_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-95786-9_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-95785-2

  • Online ISBN: 978-3-319-95786-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics