Skip to main content

CPCF: A Flexible Chunking and Proactive Insertion Cuckoo Filter

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14850))

Included in the following conference series:

  • 746 Accesses

Abstract

Cuckoo filter is a data structure for approximate membership queries widely used in various data science fields. However, inefficient space usage and element insertion prevent cuckoo filters from completely replacing Bloom filters. We present CPCF, a new and efficient version of cuckoo filter, which improves space utilization and insertion speed without any sacrifice. CPCF employs flexible chunking to optimize space efficiency. It automatically adjusts chunk sizes to the number of elements while minimizing granularity. A proactive insertion strategy accelerates insertion with reduced moving hash conflict elements. CPCF also astutely detects hashing failure, enhancing insertion stability. Experiments show that CPCF conserves more space than the state-of-the-art cuckoo filter variant in most cases. Additionally, CPCF augments insertion throughput by 21%\(\sim \)101% under maximum load compared with other variants. The dynamic thresholds ensure accurate judgment of hashing failures at lower values. These optimizations render CPCF a versatile and high-performance approximate membership query filter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/huawendi/CPCF.

  2. 2.

    https://github.com/efficient/cuckoofilter.

  3. 3.

    https://github.com/CGCL-codes/BCF.

  4. 4.

    https://github.com/wuwuz/Vacuum-Filter.

  5. 5.

    https://github.com/google/cityhash.

References

  1. Alcantara, D.A., et al.: Real-time parallel hashing on the GPU. ACM Trans. Graph. 28(5), 1–9 (2009)

    Article  Google Scholar 

  2. Dayan, N., Twitto, M.: Chucky: a succinct cuckoo filter for LSM-tree. In: International Conference on Management of Data, pp. 365–378. Association for Computing Machinery, New York (2021)

    Google Scholar 

  3. Devroye, L., Morin, P.: Cuckoo hashing: further analysis. Inf. Process. Lett. 86(4), 215–219 (2003)

    Article  MathSciNet  Google Scholar 

  4. Dietzfelbinger, M., Weidling, C.: Balanced allocation and dictionaries with tightly packed constant size bins. Theoret. Comput. Sci. 380(1–2), 47–68 (2007)

    Article  MathSciNet  Google Scholar 

  5. Drmota, M., Kutzelnigg, R.: A precise analysis of cuckoo hashing. ACM Trans. Algorithms 8(2), 1–36 (2012)

    Article  MathSciNet  Google Scholar 

  6. Fan, B., Andersen, D.G., Kaminsky, M., Mitzenmacher, M.D.: Cuckoo filter: practically better than bloom. In: 10th ACM International on Conference on emerging Networking Experiments and Technologies, pp. 75–88. Association for Computing Machinery, New York (2014)

    Google Scholar 

  7. Fotakis, D., Pagh, R., Sanders, P., Spirakis, P.: Space efficient hash tables with worst case constant access time. Theor. Comput. Syst. 38(2), 229–248 (2005)

    Article  MathSciNet  Google Scholar 

  8. Frieze, A., Johansson, T.: On the insertion time of random walk cuckoo hashing. Random Struct. Algorithms 54(4), 721–729 (2019)

    Article  MathSciNet  Google Scholar 

  9. Frieze, A.M., Melsted, P., Mitzenmacher, M.: An analysis of random-walk cuckoo hashing. SIAM J. Comput. 40(2), 291–308 (2011)

    Article  MathSciNet  Google Scholar 

  10. Hua, W., Gao, Y., Lyu, M., Xie, P.: Research on bloom filter: a survey. J. Comput. Appl. 42(6), 1729–1747 (2022)

    Google Scholar 

  11. Krishna, R.S., Tekur, C., Bhashyam, R., Nannaka, V., Mukkamala, R.: Using cuckoo filters to improve performance in object store-based very large databases. In: 13th Annual Computing and Communication Workshop and Conference, pp. 0795–0800. IEEE, Las Vegas (2023)

    Google Scholar 

  12. Lemire, D.: Fast random integer generation in an interval. ACM Trans. Model. Comput. Simul. 29(1), 1–12 (2019)

    Article  MathSciNet  Google Scholar 

  13. Li, D., Du, R., Liu, Z., Yang, T., Cui, B.: Multi-copy cuckoo hashing. In: 35th International Conference on Data Engineering, pp. 1226–1237. IEEE, Macao (2019)

    Google Scholar 

  14. Li, P., Luo, B., Zhu, W., Xu, H.: Cluster-based distributed dynamic cuckoo filter system for Redis. Int. J. Parallel Emergent Distrib. Syst. 35(3), 340–353 (2020)

    Article  Google Scholar 

  15. Maier, T., Sanders, P., Walzer, S.: Dynamic space efficient hashing. Algorithmica 81(8), 3162–3185 (2019)

    Article  MathSciNet  Google Scholar 

  16. Minaud, B., Papamanthou, C.: Note on generalized cuckoo hashing with a stash. arXiv preprint arXiv:2010.01890 (2020)

  17. Mitzenmacher, M.: The power of two choices in randomized load balancing. IEEE Trans. Parallel Distrib. Syst. 12(10), 1094–1104 (2001)

    Article  Google Scholar 

  18. Mitzenmacher, M.: Some open questions related to cuckoo hashing. In: Fiat, A., Sanders, P. (eds.) ESA 2009. LNCS, vol. 5757, pp. 1–10. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04128-0_1

    Chapter  Google Scholar 

  19. Moreira, M.D.D., Laufer, R.P., Velloso, P.B., Duarte, O.C.M.: Capacity and robustness tradeoffs in bloom filters for distributed applications. IEEE Trans. Parallel Distrib. Syst. 23(12), 2219–2230 (2012)

    Article  Google Scholar 

  20. Pagh, R., Rodler, F.F.: Cuckoo hashing. In: auf der Heide, F.M. (ed.) ESA 2001. LNCS, vol. 2161, pp. 121–133. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44676-1_10

    Chapter  Google Scholar 

  21. Raab, M., Steger, A.: “balls’’ into bins— a simple and tight analysis. In: Luby, M., Rolim, J.D.P., Serna, M. (eds.) RANDOM 1998. LNCS, vol. 1518, pp. 159–170. Springer, Heidelberg (1998). https://doi.org/10.1007/3-540-49543-6_13

    Chapter  Google Scholar 

  22. Ren, K., Zheng, Q., Arulraj, J., Gibson, G.: SlimDB: a space-efficient key-value storage engine for semi-sorted data. Proc. VLDB Endow. 10(13), 2037–2048 (2017)

    Article  Google Scholar 

  23. Reviriego, P., Sánchez-Macián, A., Walzer, S., Dillinger, P.C.: Approximate membership query filters with a false positive free set. arXiv preprint arXiv:2111.06856 (2021)

  24. Sun, Y., Hua, Y., Feng, D., Yang, L., Zuo, P., Cao, S.: MinCounter: an efficient cuckoo hashing scheme for cloud storage systems. In: 31st Symposium on Mass Storage Systems and Technologies, pp. 1–7. IEEE, Santa Clara (2015)

    Google Scholar 

  25. Sun, Y., Hua, Y., Jiang, S., Li, Q., Cao, S., Zuo, P.: SmartCuckoo: a fast and cost-efficient hashing index scheme for cloud storage systems. In: USENIX Annual Technical Conference, pp. 553–565. USENIX Association, Santa Clara (2017)

    Google Scholar 

  26. Ting, D., Cole, R.: Conditional cuckoo filters. In: Proceedings of the International Conference on Management of Data, pp. 1838–1850. Association for Computing Machinery, New York (2021)

    Google Scholar 

  27. Walzer, S.: Insertion time of random walk cuckoo hashing below the peeling threshold. In: Chechik, S., Navarro, G., Eotenberg, E., Herman, G. (eds.) ESA 2022, LIPIcs, vol. 244. Springer, Potsdam (2022). https://doi.org/10.4230/LIPIcs.ESA.2022.87

  28. Wang, F., Chen, H., Liao, L., Zhang, F., Jin, H.: The power of better choice: reducing relocations in cuckoo filter. In: 39th International Conference on Distributed Computing Systems, pp. 358–367. IEEE, Dallas (2019)

    Google Scholar 

  29. Wang, M., Zhou, M.: Vacuum filters: more space-efficient and faster replacement for bloom and cuckoo filters. Proc. VLDB Endow. 13(2), 197–210 (2019)

    Article  Google Scholar 

  30. Xie, Z., Ding, W., Wang, H., Xiao, Y., Liu, Z.: D-Ary cuckoo filter: a space efficient data structure for set membership lookup. In: 23rd International Conference on Parallel and Distributed Systems, pp. 190–197. IEEE, Shenzhen (2017)

    Google Scholar 

  31. Zhang, F., Chen, H., Jin, H., Reviriego, P.: The logarithmic dynamic cuckoo filter. In: 37th International Conference on Data Engineering, pp. 948–959. IEEE, Chania, Greece (2021)

    Google Scholar 

Download references

Acknowledgements

This paper is supported by the National Nature Science Foundation of China(NSFC) under Grant No. 62362057 and 61762075, CCF-Tencent Rhino-Bird Open Research Fund CCF-Tencent RAGR20230126, Shenzhen University Research Instrument Development and Cultivation 2023YQ017, the Guangdong “Pearl River Talent Recruitment Program” under Grant 2019ZT08X603 and 2019JC01X235, the Foundation of Shenzhen Grant number 20220810142731001. Xiao Qin’s work is supported by the National Aeronautics and Space Administration (Grant 80NSSC20M0044), the National Highway Traffic Safety Administration (Grant 451861-19158).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ping Xie .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hua, W. et al. (2024). CPCF: A Flexible Chunking and Proactive Insertion Cuckoo Filter. In: Onizuka, M., et al. Database Systems for Advanced Applications. DASFAA 2024. Lecture Notes in Computer Science, vol 14850. Springer, Singapore. https://doi.org/10.1007/978-981-97-5552-3_19

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-5552-3_19

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-5551-6

  • Online ISBN: 978-981-97-5552-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics