Abstract
Cuckoo filter is a data structure for approximate membership queries widely used in various data science fields. However, inefficient space usage and element insertion prevent cuckoo filters from completely replacing Bloom filters. We present CPCF, a new and efficient version of cuckoo filter, which improves space utilization and insertion speed without any sacrifice. CPCF employs flexible chunking to optimize space efficiency. It automatically adjusts chunk sizes to the number of elements while minimizing granularity. A proactive insertion strategy accelerates insertion with reduced moving hash conflict elements. CPCF also astutely detects hashing failure, enhancing insertion stability. Experiments show that CPCF conserves more space than the state-of-the-art cuckoo filter variant in most cases. Additionally, CPCF augments insertion throughput by 21%\(\sim \)101% under maximum load compared with other variants. The dynamic thresholds ensure accurate judgment of hashing failures at lower values. These optimizations render CPCF a versatile and high-performance approximate membership query filter.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alcantara, D.A., et al.: Real-time parallel hashing on the GPU. ACM Trans. Graph. 28(5), 1–9 (2009)
Dayan, N., Twitto, M.: Chucky: a succinct cuckoo filter for LSM-tree. In: International Conference on Management of Data, pp. 365–378. Association for Computing Machinery, New York (2021)
Devroye, L., Morin, P.: Cuckoo hashing: further analysis. Inf. Process. Lett. 86(4), 215–219 (2003)
Dietzfelbinger, M., Weidling, C.: Balanced allocation and dictionaries with tightly packed constant size bins. Theoret. Comput. Sci. 380(1–2), 47–68 (2007)
Drmota, M., Kutzelnigg, R.: A precise analysis of cuckoo hashing. ACM Trans. Algorithms 8(2), 1–36 (2012)
Fan, B., Andersen, D.G., Kaminsky, M., Mitzenmacher, M.D.: Cuckoo filter: practically better than bloom. In: 10th ACM International on Conference on emerging Networking Experiments and Technologies, pp. 75–88. Association for Computing Machinery, New York (2014)
Fotakis, D., Pagh, R., Sanders, P., Spirakis, P.: Space efficient hash tables with worst case constant access time. Theor. Comput. Syst. 38(2), 229–248 (2005)
Frieze, A., Johansson, T.: On the insertion time of random walk cuckoo hashing. Random Struct. Algorithms 54(4), 721–729 (2019)
Frieze, A.M., Melsted, P., Mitzenmacher, M.: An analysis of random-walk cuckoo hashing. SIAM J. Comput. 40(2), 291–308 (2011)
Hua, W., Gao, Y., Lyu, M., Xie, P.: Research on bloom filter: a survey. J. Comput. Appl. 42(6), 1729–1747 (2022)
Krishna, R.S., Tekur, C., Bhashyam, R., Nannaka, V., Mukkamala, R.: Using cuckoo filters to improve performance in object store-based very large databases. In: 13th Annual Computing and Communication Workshop and Conference, pp. 0795–0800. IEEE, Las Vegas (2023)
Lemire, D.: Fast random integer generation in an interval. ACM Trans. Model. Comput. Simul. 29(1), 1–12 (2019)
Li, D., Du, R., Liu, Z., Yang, T., Cui, B.: Multi-copy cuckoo hashing. In: 35th International Conference on Data Engineering, pp. 1226–1237. IEEE, Macao (2019)
Li, P., Luo, B., Zhu, W., Xu, H.: Cluster-based distributed dynamic cuckoo filter system for Redis. Int. J. Parallel Emergent Distrib. Syst. 35(3), 340–353 (2020)
Maier, T., Sanders, P., Walzer, S.: Dynamic space efficient hashing. Algorithmica 81(8), 3162–3185 (2019)
Minaud, B., Papamanthou, C.: Note on generalized cuckoo hashing with a stash. arXiv preprint arXiv:2010.01890 (2020)
Mitzenmacher, M.: The power of two choices in randomized load balancing. IEEE Trans. Parallel Distrib. Syst. 12(10), 1094–1104 (2001)
Mitzenmacher, M.: Some open questions related to cuckoo hashing. In: Fiat, A., Sanders, P. (eds.) ESA 2009. LNCS, vol. 5757, pp. 1–10. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04128-0_1
Moreira, M.D.D., Laufer, R.P., Velloso, P.B., Duarte, O.C.M.: Capacity and robustness tradeoffs in bloom filters for distributed applications. IEEE Trans. Parallel Distrib. Syst. 23(12), 2219–2230 (2012)
Pagh, R., Rodler, F.F.: Cuckoo hashing. In: auf der Heide, F.M. (ed.) ESA 2001. LNCS, vol. 2161, pp. 121–133. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44676-1_10
Raab, M., Steger, A.: “balls’’ into bins— a simple and tight analysis. In: Luby, M., Rolim, J.D.P., Serna, M. (eds.) RANDOM 1998. LNCS, vol. 1518, pp. 159–170. Springer, Heidelberg (1998). https://doi.org/10.1007/3-540-49543-6_13
Ren, K., Zheng, Q., Arulraj, J., Gibson, G.: SlimDB: a space-efficient key-value storage engine for semi-sorted data. Proc. VLDB Endow. 10(13), 2037–2048 (2017)
Reviriego, P., Sánchez-Macián, A., Walzer, S., Dillinger, P.C.: Approximate membership query filters with a false positive free set. arXiv preprint arXiv:2111.06856 (2021)
Sun, Y., Hua, Y., Feng, D., Yang, L., Zuo, P., Cao, S.: MinCounter: an efficient cuckoo hashing scheme for cloud storage systems. In: 31st Symposium on Mass Storage Systems and Technologies, pp. 1–7. IEEE, Santa Clara (2015)
Sun, Y., Hua, Y., Jiang, S., Li, Q., Cao, S., Zuo, P.: SmartCuckoo: a fast and cost-efficient hashing index scheme for cloud storage systems. In: USENIX Annual Technical Conference, pp. 553–565. USENIX Association, Santa Clara (2017)
Ting, D., Cole, R.: Conditional cuckoo filters. In: Proceedings of the International Conference on Management of Data, pp. 1838–1850. Association for Computing Machinery, New York (2021)
Walzer, S.: Insertion time of random walk cuckoo hashing below the peeling threshold. In: Chechik, S., Navarro, G., Eotenberg, E., Herman, G. (eds.) ESA 2022, LIPIcs, vol. 244. Springer, Potsdam (2022). https://doi.org/10.4230/LIPIcs.ESA.2022.87
Wang, F., Chen, H., Liao, L., Zhang, F., Jin, H.: The power of better choice: reducing relocations in cuckoo filter. In: 39th International Conference on Distributed Computing Systems, pp. 358–367. IEEE, Dallas (2019)
Wang, M., Zhou, M.: Vacuum filters: more space-efficient and faster replacement for bloom and cuckoo filters. Proc. VLDB Endow. 13(2), 197–210 (2019)
Xie, Z., Ding, W., Wang, H., Xiao, Y., Liu, Z.: D-Ary cuckoo filter: a space efficient data structure for set membership lookup. In: 23rd International Conference on Parallel and Distributed Systems, pp. 190–197. IEEE, Shenzhen (2017)
Zhang, F., Chen, H., Jin, H., Reviriego, P.: The logarithmic dynamic cuckoo filter. In: 37th International Conference on Data Engineering, pp. 948–959. IEEE, Chania, Greece (2021)
Acknowledgements
This paper is supported by the National Nature Science Foundation of China(NSFC) under Grant No. 62362057 and 61762075, CCF-Tencent Rhino-Bird Open Research Fund CCF-Tencent RAGR20230126, Shenzhen University Research Instrument Development and Cultivation 2023YQ017, the Guangdong “Pearl River Talent Recruitment Program” under Grant 2019ZT08X603 and 2019JC01X235, the Foundation of Shenzhen Grant number 20220810142731001. Xiao Qin’s work is supported by the National Aeronautics and Space Administration (Grant 80NSSC20M0044), the National Highway Traffic Safety Administration (Grant 451861-19158).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Hua, W. et al. (2024). CPCF: A Flexible Chunking and Proactive Insertion Cuckoo Filter. In: Onizuka, M., et al. Database Systems for Advanced Applications. DASFAA 2024. Lecture Notes in Computer Science, vol 14850. Springer, Singapore. https://doi.org/10.1007/978-981-97-5552-3_19
Download citation
DOI: https://doi.org/10.1007/978-981-97-5552-3_19
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5551-6
Online ISBN: 978-981-97-5552-3
eBook Packages: Computer ScienceComputer Science (R0)