Skip to main content

FreshJoin: An Efficient and Adaptive Algorithm for Set Containment Join

  • Conference paper
  • First Online:
Web and Big Data (APWeb-WAIM 2019)

Abstract

This paper revisits set containment join (SCJ), which has many fundamental applications in commercial and scientific fields. To improve the performance further, this paper proposes a new adaptive parameter-free in-memory algorithm for SCJ, named as \(\mathsf {FreshJoin}\). It accomplishes this by exploiting two flat indices, which record three kinds of signatures (i.e.,  the two least frequent elements and a hash signature). Experiments on 16 real-life datasets show that \(\mathsf {FreshJoin}\) usually reduces more than 50% of space costs while remains as competitive as the state-of-the-art algorithms in running time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Yang, J., Zhang, W., Yang, S., Zhang, Y., Lin, X.: TT-join: efficient set containment join. In: Proceedings of ICDE 2017, pp. 509–520 (2017)

    Google Scholar 

  2. Kunkel, A., Rheinländer, A., Schiefer, C., Helmer, S., Bouros, P., Leser, U.: Piejoin: towards parallel set containment joins. In: Baumann, P., Manolescu-Goujot, I., Trani, L. (eds.) SSDBM 2016, pp. 11–22 (2016)

    Google Scholar 

  3. Luo, Y., Fletcher, G., Hidders, J., De Bra, P.: Efficient and scalable trie-based algorithms for computing set containment relations. In: Gehrke, J., Lehner, W., Shim, K., et al. (eds.) ICDE 2015, pp. 303–314 (2015)

    Google Scholar 

  4. Bouros, P., Mamoulis, N., Ge, S., Terrovitis, M.: Set containment join revisited. Knowl. Inf. Syst. 49, 1–28 (2015)

    Article  Google Scholar 

  5. Jampani, R., Pudi, V.: Using prefix-trees for efficiently computing set joins. In: Zhou, L., Ooi, B.C., Meng, X. (eds.) DASFAA 2005. LNCS, vol. 3453, pp. 761–772. Springer, Heidelberg (2005). https://doi.org/10.1007/11408079_69

    Chapter  Google Scholar 

  6. Mamoulis, N.: Efficient processing of joins on set-valued attributes. In: Halevy, A., Ives, Z., Doan, A. (eds.) SIGMOD 2003, pp. 157–168 (2003)

    Google Scholar 

  7. Melnik, S., Molina, H.: Adaptive algorithms for set containment joins. ACM Trans. Database Syst. 28(1), 56–99 (2003)

    Article  Google Scholar 

  8. Melnik, S., Garcia-Molina, H.: Divide-and-conquer algorithm for computing set containment joins. In: Jensen, C.S., et al. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 427–444. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45876-X_28

    Chapter  Google Scholar 

  9. Ramasamy, K., Patel, J., Naughton, J., Kaushik, R.: Set containment joins: the good, the bad and the ugly. In: Abbadi, A., Brodie, M., Chakravarthy, S., et al. (eds.) VLDB 2000, pp. 386–395 (2000)

    Google Scholar 

  10. Helmer, S., Moerkotte, G.: Evaluation of main memory join algorithms for joins with set comparison predicates. In: Jarke, M., Carey, J., Dittrich, R., et al. (eds.) VLDB 1997, pp. 386–395 (1997)

    Google Scholar 

  11. Zhu, E., Nargesian, F., Pu, K., Miller, R.: LSH ensemble: internet scale domain search. Proc. VLDB Endow. 9(12), 1185–1196 (2016)

    Article  Google Scholar 

  12. Mann, W., Augsten, N., Bouros, P.: An empirical evaluation of set similarity join techniques. Proc. VLDB Endow. 9(9), 636–647 (2016)

    Article  Google Scholar 

  13. Luo, J., Gao, H., Li, J., et al.: Techique report on Freshjoin an adaptive algorithm for set containment join. https://doi.org/10.13140/RG.2.2.32373.63207

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jizhou Luo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Luo, J. et al. (2019). FreshJoin: An Efficient and Adaptive Algorithm for Set Containment Join. In: Shao, J., Yiu, M., Toyoda, M., Zhang, D., Wang, W., Cui, B. (eds) Web and Big Data. APWeb-WAIM 2019. Lecture Notes in Computer Science(), vol 11642. Springer, Cham. https://doi.org/10.1007/978-3-030-26075-0_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-26075-0_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-26074-3

  • Online ISBN: 978-3-030-26075-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics