A Critical Analysis of Classifier Selection in Learned Bloom Filters: The Essentials

Malchiodi, Dario; Raimondi, Davide; Fumagalli, Giacomo; Giancarlo, Raffaele; Frasca, Marco

doi:10.1007/978-3-031-34204-2_5

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1826))

Included in the following conference series:

International Conference on Engineering Applications of Neural Networks

639 Accesses
1 Citations

Abstract

It is well known that Bloom Filters have a performance essentially independent of the data used to query the filters themselves, but this is no more true when considering Learned Bloom Filters. In this work we analyze how the performance of such learned data structures is impacted by the classifier chosen to build the filter and by the complexity of the dataset used in the training phase. Such analysis, which has not been proposed so far in the literature, involves the key performance indicators of space efficiency, false positive rate, and reject time. By screening various implementations of Learned Bloom Filters, our experimental study highlights that only one of these implementations exhibits higher robustness to classifier performance and to noisy data, and that only two families of classifiers have desirable properties in relation to the previous performance indicators.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The experiments and data about this preliminary part are available upon request.

References

Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)
Article MATH Google Scholar
Broder, A., Mitzenmacher, M.: Network applications of bloom filters, a survey. Internet Math. 1, 636–646 (2002)
MathSciNet MATH Google Scholar
Carter, J., Wegman, M.N.: Universal classes of hash functions. J. Comput. Syst. Sci. 18(2), 143–154 (1979)
Article MathSciNet MATH Google Scholar
Cho, K., van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pp. 103–111. Association for Computational Linguistics, Doha, Qatar, October 2014
Google Scholar
Cox, D.R.: The regression analysis of binary sequences. J. Roy. Stat. Soc.: Ser. B (Methodol.) 20(2), 215–232 (1958)
MathSciNet MATH Google Scholar
Dai, Z.: Adaptive learned bloom filter (ADA-BF): efficient utilization of the classifier (2022). https://github.com/DAIZHENWEI/Ada-BF. Checked 8 Nov 2022
Dai, Z., Shrivastava, A.: Adaptive Learned Bloom Filter (Ada-BF): efficient utilization of the classifier with application to real-time information filtering on the web. In: Advances in Neural Information Processing Systems, vol. 33, pp. 11700–11710. Curran Associates, Inc. (2020)
Google Scholar
Dai, Z., Shrivastava, A., Reviriego, P., Hernández, J.A.: Optimizing learned bloom filters: how much should be learned? IEEE Embed. Syst. Lett. 14(3), 123–126 (2022). https://doi.org/10.1109/LES.2022.3156019
Article Google Scholar
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Willey, New York (1973)
MATH Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, New York (2000)
Google Scholar
Freedman, D.: Statistical Models: Theory and Practice. Cambridge University Press, Cambridge (2005)
Google Scholar
Fumagalli, G., Raimondi, D., Giancarlo, R., Malchiodi, D., Frasca, M.: On the choice of general purpose classifiers in learned bloom filters: an initial analysis within basic filters. In: Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods (ICPRAM), pp. 675–682 (2022)
Google Scholar
Kirsche, M., Das, A., Schatz, M.C.: Sapling: accelerating suffix array queries with learned data models. Bioinformatics 37(6), 744–749 (2020)
Article Google Scholar
Kraska, T.: Towards instance-optimized data systems. Proc. VLDB Endow. 14(12), 3222–3232 (2021)
Article Google Scholar
Kraska, T., Beutel, A., Chi, E.H., Dean, J., Polyzotis, N.: The case for learned index structures. In: Proceedings of the 2018 International Conference on Management of Data, SIGMOD 2018, pp. 489–504. Association for Computing Machinery, New York, NY, USA (2018)
Google Scholar
Lorena, A.C., Garcia, L.P.F., Lehmann, J., Souto, M.C.P., Ho, T.K.: How complex is your classification problem? A survey on measuring classification complexity. ACM Comput. Surv. 52(5), 1–34 (2019)
Article Google Scholar
Malchiodi, D., Raimondi, D., Fumagalli, G., Giancarlo, R., Frasca, M.: A critical analysis of classifier selection in learned bloom filters (2022). https://doi.org/10.48550/ARXIV.2211.15565, https://arxiv.org/abs/2211.15565
Maltry, M., Dittrich, J.: A critical analysis of recursive model indexes. CoRR abs/2106.16166 (2021). https://arxiv.org/abs/2106.16166
Marinò, G.C., Petrini, A., Malchiodi, D., Frasca, M.: Deep neural networks compression: a comparative survey and choice recommendations. Neurocomputing 520, 152–170 (2023)
Article Google Scholar
Mitzenmacher, M.: A model for learned bloom filters and optimizing by sandwiching. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
Rahman, A., Medevedev, P.: Representation of k-Mer sets using spectrum-preserving string sets. J. Comput. Biol. 28(4), 381–394 (2021)
Article MathSciNet Google Scholar
Raudys, S.: On the problems of sample size in pattern recognition. In: Detection, Pattern Recognition and Experiment Design. Proceedings of the 2nd All-Union Conference Statistical Methods in Control Theory. Publ. House “Nauka” (1970)
Google Scholar
Vaidya, K., Knorr, E., Kraska, T., Mitzenmacher, M.: Partitioned learned bloom filters. In: International Conference on Learning Representations (2021)
Google Scholar
Wegman, M.N., Carter, J.: New hash functions and their use in authentication and set equality. J. Comput. Syst. Sci. 22(3), 265–279 (1981)
Article MathSciNet MATH Google Scholar
Wu, Q., Wang, Q., Zhang, M., Zheng, R., Zhu, J., Hu, J.: Learned bloom-filter for the efficient name lookup in information-centric networking. J. Netw. Comput. Appl. 186, 103077 (2021)
Article Google Scholar

Download references

Acknowledgements

This work has been supported by the Italian MUR PRIN project 2017WR7SHH “Multicriteria data structures and algorithms: from compressed to learned indexes, and beyond”. Additional support to R.G. has been granted by Project INdAM - GNCS “Analysis and Processing of Big Data based on Graph Models”.

Author information

Authors and Affiliations

Department of Computer Science, University of Milan, Via Celoria 18, 20133, Milan, Italy
Dario Malchiodi, Davide Raimondi, Giacomo Fumagalli & Marco Frasca
Department of Mathematics and CS, University of Palermo, Palermo, Italy
Raffaele Giancarlo

Authors

Dario Malchiodi
View author publications
You can also search for this author in PubMed Google Scholar
Davide Raimondi
View author publications
You can also search for this author in PubMed Google Scholar
Giacomo Fumagalli
View author publications
You can also search for this author in PubMed Google Scholar
Raffaele Giancarlo
View author publications
You can also search for this author in PubMed Google Scholar
Marco Frasca
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dario Malchiodi .

Editor information

Editors and Affiliations

Democritus University of Thrace, Xanthi, Greece
Lazaros Iliadis
University of Piraeus, Piraeus, Greece
Ilias Maglogiannis
University of Leon, León, Spain
Serafin Alonso
Teesside University, Middlesbrough, UK
Chrisina Jayne
University of the West of England, Bristol, UK
Elias Pimenidis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Malchiodi, D., Raimondi, D., Fumagalli, G., Giancarlo, R., Frasca, M. (2023). A Critical Analysis of Classifier Selection in Learned Bloom Filters: The Essentials. In: Iliadis, L., Maglogiannis, I., Alonso, S., Jayne, C., Pimenidis, E. (eds) Engineering Applications of Neural Networks. EANN 2023. Communications in Computer and Information Science, vol 1826. Springer, Cham. https://doi.org/10.1007/978-3-031-34204-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-34204-2_5
Published: 07 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34203-5
Online ISBN: 978-3-031-34204-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Critical Analysis of Classifier Selection in Learned Bloom Filters: The Essentials