Abstract
It is well known that Bloom Filters have a performance essentially independent of the data used to query the filters themselves, but this is no more true when considering Learned Bloom Filters. In this work we analyze how the performance of such learned data structures is impacted by the classifier chosen to build the filter and by the complexity of the dataset used in the training phase. Such analysis, which has not been proposed so far in the literature, involves the key performance indicators of space efficiency, false positive rate, and reject time. By screening various implementations of Learned Bloom Filters, our experimental study highlights that only one of these implementations exhibits higher robustness to classifier performance and to noisy data, and that only two families of classifiers have desirable properties in relation to the previous performance indicators.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The experiments and data about this preliminary part are available upon request.
References
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)
Broder, A., Mitzenmacher, M.: Network applications of bloom filters, a survey. Internet Math. 1, 636–646 (2002)
Carter, J., Wegman, M.N.: Universal classes of hash functions. J. Comput. Syst. Sci. 18(2), 143–154 (1979)
Cho, K., van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pp. 103–111. Association for Computational Linguistics, Doha, Qatar, October 2014
Cox, D.R.: The regression analysis of binary sequences. J. Roy. Stat. Soc.: Ser. B (Methodol.) 20(2), 215–232 (1958)
Dai, Z.: Adaptive learned bloom filter (ADA-BF): efficient utilization of the classifier (2022). https://github.com/DAIZHENWEI/Ada-BF. Checked 8 Nov 2022
Dai, Z., Shrivastava, A.: Adaptive Learned Bloom Filter (Ada-BF): efficient utilization of the classifier with application to real-time information filtering on the web. In: Advances in Neural Information Processing Systems, vol. 33, pp. 11700–11710. Curran Associates, Inc. (2020)
Dai, Z., Shrivastava, A., Reviriego, P., Hernández, J.A.: Optimizing learned bloom filters: how much should be learned? IEEE Embed. Syst. Lett. 14(3), 123–126 (2022). https://doi.org/10.1109/LES.2022.3156019
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Willey, New York (1973)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, New York (2000)
Freedman, D.: Statistical Models: Theory and Practice. Cambridge University Press, Cambridge (2005)
Fumagalli, G., Raimondi, D., Giancarlo, R., Malchiodi, D., Frasca, M.: On the choice of general purpose classifiers in learned bloom filters: an initial analysis within basic filters. In: Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods (ICPRAM), pp. 675–682 (2022)
Kirsche, M., Das, A., Schatz, M.C.: Sapling: accelerating suffix array queries with learned data models. Bioinformatics 37(6), 744–749 (2020)
Kraska, T.: Towards instance-optimized data systems. Proc. VLDB Endow. 14(12), 3222–3232 (2021)
Kraska, T., Beutel, A., Chi, E.H., Dean, J., Polyzotis, N.: The case for learned index structures. In: Proceedings of the 2018 International Conference on Management of Data, SIGMOD 2018, pp. 489–504. Association for Computing Machinery, New York, NY, USA (2018)
Lorena, A.C., Garcia, L.P.F., Lehmann, J., Souto, M.C.P., Ho, T.K.: How complex is your classification problem? A survey on measuring classification complexity. ACM Comput. Surv. 52(5), 1–34 (2019)
Malchiodi, D., Raimondi, D., Fumagalli, G., Giancarlo, R., Frasca, M.: A critical analysis of classifier selection in learned bloom filters (2022). https://doi.org/10.48550/ARXIV.2211.15565, https://arxiv.org/abs/2211.15565
Maltry, M., Dittrich, J.: A critical analysis of recursive model indexes. CoRR abs/2106.16166 (2021). https://arxiv.org/abs/2106.16166
Marinò, G.C., Petrini, A., Malchiodi, D., Frasca, M.: Deep neural networks compression: a comparative survey and choice recommendations. Neurocomputing 520, 152–170 (2023)
Mitzenmacher, M.: A model for learned bloom filters and optimizing by sandwiching. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Rahman, A., Medevedev, P.: Representation of k-Mer sets using spectrum-preserving string sets. J. Comput. Biol. 28(4), 381–394 (2021)
Raudys, S.: On the problems of sample size in pattern recognition. In: Detection, Pattern Recognition and Experiment Design. Proceedings of the 2nd All-Union Conference Statistical Methods in Control Theory. Publ. House “Nauka” (1970)
Vaidya, K., Knorr, E., Kraska, T., Mitzenmacher, M.: Partitioned learned bloom filters. In: International Conference on Learning Representations (2021)
Wegman, M.N., Carter, J.: New hash functions and their use in authentication and set equality. J. Comput. Syst. Sci. 22(3), 265–279 (1981)
Wu, Q., Wang, Q., Zhang, M., Zheng, R., Zhu, J., Hu, J.: Learned bloom-filter for the efficient name lookup in information-centric networking. J. Netw. Comput. Appl. 186, 103077 (2021)
Acknowledgements
This work has been supported by the Italian MUR PRIN project 2017WR7SHH “Multicriteria data structures and algorithms: from compressed to learned indexes, and beyond”. Additional support to R.G. has been granted by Project INdAM - GNCS “Analysis and Processing of Big Data based on Graph Models”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Malchiodi, D., Raimondi, D., Fumagalli, G., Giancarlo, R., Frasca, M. (2023). A Critical Analysis of Classifier Selection in Learned Bloom Filters: The Essentials. In: Iliadis, L., Maglogiannis, I., Alonso, S., Jayne, C., Pimenidis, E. (eds) Engineering Applications of Neural Networks. EANN 2023. Communications in Computer and Information Science, vol 1826. Springer, Cham. https://doi.org/10.1007/978-3-031-34204-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-34204-2_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34203-5
Online ISBN: 978-3-031-34204-2
eBook Packages: Computer ScienceComputer Science (R0)