Abstract
Multi-key membership testing refers to checking whether a queried element exists in a given set of multi-key elements, which is a fundamental operation for computing systems and networking applications such as web search, mail systems, distributed databases, firewalls, and network routing. Most existing studies for membership testing are built on Bloom filter, a space-efficient and high-security probabilistic data structure. However, traditional Bloom filter always performs poorly in multi-key scenarios. Recently, a new variant of Bloom filter that has combined machine learning methods and Bloom filter, also known as Learned Bloom Filter (LBF), has drawn increasing attention for its significant improvements in reducing space occupation and False Positive Rate (FPR). More importantly, due to the introduction of the learned model, LBF can well address some problems of Bloom filter in multi-key scenarios. Because of this, we propose a Multi-key LBF (MLBF) data structure, which contains a value-interaction-based multi-key classifier and a multi-key Bloom filter. To reduce FPR, we further propose an Interval-based MLBF, which divides keys into specific intervals according to the data distribution. Extensive experiments based on two real datasets confirm the superiority of the proposed data structures in terms of FPR and query efficiency.
Y. Li, Z. Wang and R. Yang—Both authors contribute equally to this paper.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)
Bonomi, F., Mitzenmacher, M., Panigrahy, R., Singh, S., Varghese, G.: An improved construction for counting bloom filters. In: Azar, Y., Erlebach, T. (eds.) ESA 2006. LNCS, vol. 4168, pp. 684–695. Springer, Heidelberg (2006). https://doi.org/10.1007/11841036_61
Cai, M., Pan, J., Kwok, Y.K., Hwang, K.: Fast and accurate traffic matrix measurement using adaptive cardinality counting. In: SIGCOMM Workshop, pp. 205–206 (2005)
Chang, F., et al.: Bigtable: A distributed storage system for structured data. TOCS 26(2), 1–26 (2008)
Dai, Z., Shrivastava, A.: Adaptive learned bloom filter (ADA-BF): efficient utilization of the classifier with application to real-time information filtering on the web. NIPS 33, 11700–11710 (2020)
Fan, B., Andersen, D.G., Kaminsky, M., Mitzenmacher, M.D.: Cuckoo filter: practically better than bloom. In: CoNEXT, pp. 75–88 (2014)
Fan, L., Cao, P., Almeida, J., Broder, A.Z.: Summary cache: a scalable wide-area web cache sharing protocol. SIGCOMM 28(4), 254–265 (1998)
Flajolet, P., Fusy, É., Gandouet, O., Meunier, F.: HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. In: Discrete Mathematics and Theoretical Computer Science, pp. 137–156 (2007)
Geravand, S., Ahmadi, M.: Bloom filter applications in network security: a state-of-the-art survey. Comput. Netw. 57(18), 4047–4064 (2013)
Guo, H., Tang, R., Ye, Y., Li, Z., He, X.: DeepFM: a factorization-machine based neural network for CTR prediction. In: IJCAI, p. 1725–1731 (2017)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Kim, K., Ji, B., Yoon, D., Hwang, S.: Self-knowledge distillation with progressive refinement of targets. In: ICCV, pp. 6567–6576 (2021)
Kraska, T., Beutel, A., Chi, E.H., Dean, J., Polyzotis, N.: The case for learned index structures. In: SIGMOD, pp. 489–504 (2018)
LeCun, Y., et al.: Handwritten digit recognition: applications of neural network chips and automatic learning. IEEE Commun. Mag. 27, 41–46 (1989)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017)
Liu, Q., Zheng, L., Shen, Y., Chen, L.: Stable learned bloom filters for data streams. PVLDB 13(12), 2355–2367 (2020)
Mitzenmacher, M.: Compressed bloom filters. Trans. Netw. 10(5), 604–612 (2002)
Mitzenmacher, M.: A model for learned bloom filters, and optimizing by sandwiching. In: NIPS, pp. 462–471 (2018)
Montgomery, D.C., Peck, E.A.: Introduction to Linear Regression Analysis (2001)
Nakamoto, S.: Bitcoin: a peer-to-peer electronic cash system. Decentralized Bus. Rev. 21260 (2008)
Putze, F., Sanders, P., Singler, J.: Cache-, hash-, and space-efficient bloom filters. JEA 14, 4 (2010)
Rae, J., Bartunov, S., Lillicrap, T.: Meta-learning neural bloom filters. In: ICML, pp. 5271–5280 (2019)
Rigatti, S.J.: Random forest. J. Insur. Med. 47(1), 31–39 (2017)
Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 5998–6008 (2017)
Acknowledgements
This work is partially supported by NSFC (No. 61972069, 61836007 and 61832017), Shenzhen Municipal Science and Technology R &D Funding Basic Research Program (JCYJ20210324133607021), and Municipal Government of Quzhou under Grant No. 2022D037.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Li, Y., Wang, Z., Yang, R., Zhao, Y., Zhou, R., Zheng, K. (2023). Learned Bloom Filter for Multi-key Membership Testing. In: Wang, X., et al. Database Systems for Advanced Applications. DASFAA 2023. Lecture Notes in Computer Science, vol 13943. Springer, Cham. https://doi.org/10.1007/978-3-031-30637-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-30637-2_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30636-5
Online ISBN: 978-3-031-30637-2
eBook Packages: Computer ScienceComputer Science (R0)