skip to main content
research-article

False-Positive Probability and Compression Optimization for Tree-Structured Bloom Filters

Published: 21 September 2016 Publication History

Abstract

Bloom filters are frequently used to to check the membership of an item in a set. However, Bloom filters face a dilemma: the transmission bandwidth and the accuracy cannot be optimized simultaneously. This dilemma is particularly severe for transmitting Bloom filters to remote nodes when the network bandwidth is limited. We propose a novel Bloom filter called BloomTree that consists of a tree-structured organization of smaller Bloom filters, each using a set of independent hash functions. BloomTree spreads items across levels that are compressed to reduce the transmission bandwidth need. We show how to find optimal configurations for BloomTree and investigate in detail by how much BloomTree outperforms the standard Bloom filter or the compressed Bloom filter. Finally, we use the intersection of BloomTrees to predict the set intersection, decreasing the false-positive probabilities by several orders of magnitude compared to both the compressed Bloom filter and the standard Bloom filter.

References

[1]
Karolina Alexiou, Donald Kossmann, and Per-Ake Larson. 2013. Adaptive range filters for cold data: Avoiding trips to Siberia. In Proceedings of the VLDB Endowment. 1714--1725.
[2]
Mayank Bawa, Tyson Condie, and Prasanna Ganesan. 2005. LSH forest: Self-tuning indexes for similarity search. In Proceedings of WWW. 651--660.
[3]
Theophilus Benson, Aditya Akella, and David A. Maltz. 2010. Network traffic characteristics of data centers in the wild. In Proceedings of IMC. 267--280.
[4]
Burton H. Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM 13, 7, 422--426.
[5]
Prosenjit Bose, Hua Guo, Evangelos Kranakis, Anil Maheshwari, Pat Morin, Jason Morrison, Michiel Smid, and Yihui Tang. 2008. On the false-positive rate of bloom filters. Information Processing Letters 108, 4, 210--213.
[6]
Andrei Z. Broder and Michael Mitzenmacher. 2003. Network applications of bloom filters: A survey. Internet Mathematics 1, 4.
[7]
Sang Kil Cha, Iulian Moraru, Jiyong Jang, John Truelove, David Brumley, and David G. Andersen. 2010. SplitScreen: Enabling efficient, distributed malware detection. In Proceedings of NSDI. 377--390.
[8]
Xu Cheng and Jiangchuan Liu. 2009. NetTube: Exploring social networks for peer-to-peer short video sharing. In IEEE INFOCOM. 1152--1160.
[9]
Ken Christensen, Allen Roginsky, and Miguel Jimeno. 2010. A new analysis of the false positive rate of a Bloom filter. Information Processing Letters 110, 21, 944--949.
[10]
Adina Crainiceanu and Daniel Lemire. 2015. Bloofi: Multidimensional Bloom filters. Information Systems 54, 311--324.
[11]
Dubhashi and D. Ranjan. 1998. Balls and bins: A study in negative dependence. Random Structures and Algorithms 13, 2, 99--124.
[12]
David Eppstein, Michael T. Goodrich, Frank Uyeda, and George Varghese. 2011. What’s the difference? Efficient set reconciliation without prior context. In Proceedings of SIGCOMM, Vol. 41. 218--229.
[13]
Min Fang, Narayanan Shivakumar, Hector Garcia-Molina, Rajeev Motwani, and Jeffrey D. Ullman. 1998. Computing Iceberg queries efficiently. In Proceedings of VLDB. 299--310.
[14]
Domenico Ficara, Stefano Giordano, Gregorio Procissi, and Fabio Vitucci. 2008. Blooming trees: Space-efficient structures for data representation. In Proceedings of ICC. 5828--5832.
[15]
Yongquan Fu and Yijie Wang. 2012. BCE: A privacy-preserving common-friend estimation method for distributed online social networks without cryptography. In 7th International ICST Conference on Communications and Networking in China (CHINACOM’12). 212--217.
[16]
Yongquan Fu, Yijie Wang, and Ernst Biersack. 2013. A general scalable and accurate decentralized level monitoring method for large-scale dynamic service provision in hybrid clouds. Future Generation Computer Systems 29, 5, 1235--1253.
[17]
Yongquan Fu, Yijie Wang, and Wei Peng. 2014. CommonFinder: A decentralized and privacy-preserving common-friend measurement method for the distributed online social networks. Computer Networks 64, 369--389.
[18]
David E. Goldberg. 1989. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Longman Publishing Co., Inc., Boston, MA.
[19]
Fang Hao, Murali Kodialam, and T. V. Lakshman. 2007. Building high accuracy Bloom filters using partitioned hashing. In Proceedings of SIGMETRICS. 277--288.
[20]
Mark C. Jeffrey and J. Gregory Steffan. 2011. Understanding Bloom filter intersection for lazy address-set disambiguation. In Proceedings of SPAA. 345--354.
[21]
Adam Kirsch and Michael Mitzenmacher. 2008. Less hashing, same performance: Building a better Bloom filter. Random Structures and Algorithms 33, 2, 187--218.
[22]
Georgia Koloniari, Nikos Ntarmos, Evaggelia Pitoura, and Dimitris Souravlias. 2011. One is enough: Distributed filtering for duplicate elimination. In Proceedings of ACM CIKM. 433--442.
[23]
Dan Li, Henggang Cui, Yan Hu, Yong Xia, and Xin Wang. 2011. Scalable data center multicast using multi-class Bloom filter. In Proceedings of IEEE ICNP. 266--275.
[24]
Steven S. Lumetta and Michael Mitzenmacher. 2007. Using the power of two choices to improve Bloom filters. Internet Mathematics 4, 1, 17--33.
[25]
Bruce M. Maggs and Ramesh K. Sitaraman. 2015. Algorithmic nuggets in content delivery. SIGCOMM Computer Communication Review 45, 3, 52--66.
[26]
Michael Mitzenmacher. 2002. Compressed Bloom filters. IEEE/ACM Transactions on Networking 10, 5, 604--612.
[27]
Michael Mitzenmacher and Salil Vadhan. 2008. Why simple hash functions work: Exploiting the entropy in a data stream. In Proceedings of SODA. 746--755.
[28]
Felix Putze, Peter Sanders, and Johannes Singler. 2009. Cache-, hash-, and space-efficient bloom filters. ACM Journal of Experimental Algorithmics 14, 4.4 (2009).
[29]
Brad Solomon and Carl Kingsford. 2015. Large-Scale Search of Transcriptomic Read Sets with Sequence Bloom Trees. Technical Report. Retrieved August 25, 2016 from http://repository.cmu.edu/cgi/viewcontent.cgi?article=1001&context=cbd.
[30]
S. Tarkoma, C. E. Rothenberg, and E. Lagerspetz. 2012. Theory and practice of Bloom filters for distributed systems. IEEE Communications Surveys Tutorials 14, 1, 131--155.
[31]
wikipedia.org. 2016a. Integer Programming. Retrieved August 25, 2016 from https://en.wikipedia.org/ wiki/Integer_programming.
[32]
wikipedia.org. 2016b. Multi-objective optimization. Retrieved August 25, 2016 from https://en.wikipedia.org/ wiki/Multi-objective_optimization.
[33]
Tong Yang, Alex X. Liu, Muhammad Shahzad, Yuankun Zhong, Qiaobin Fu, Zi Li, Gaogang Xie, and Xiaoming Li. 2016. A shifting Bloom filter framework for set queries. Proceedings of the VLDB Endowment 9, 5, 408--419.
[34]
MyungKeun Yoon, JinWoo Son, and Seon-Ho Shin. 2014. Bloom tree: A search tree based on Bloom filters for multiple-set membership testing. In Proc. of INFOCOM. 1429--1437.
[35]
Minlan Yu, Alex Fabrikant, and Jennifer Rexford. 2009. BUFFALO: Bloom filter forwarding architecture for large organizations. In Proceedings of ACM CoNEXT. 313--324.
[36]
Dong Zhou, Bin Fan, Hyeontaek Lim, David G. Andersen, Michael Kaminsky, Michael Mitzenmacher, Ren Wang, and Ajaypal Singh. 2015. Scaling up clustered network appliances with ScaleBricks. In Proceedings of SIGCOMM. 241--254.

Cited By

View all
  • (2022)Multivariate Probabilistic Range Queries for Scalable Interactive 3D VisualizationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.3209439(1-11)Online publication date: 2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Modeling and Performance Evaluation of Computing Systems
ACM Transactions on Modeling and Performance Evaluation of Computing Systems  Volume 1, Issue 4
September 2016
174 pages
ISSN:2376-3639
EISSN:2376-3647
DOI:10.1145/2982635
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 September 2016
Accepted: 01 May 2016
Revised: 01 May 2016
Received: 01 October 2015
Published in TOMPECS Volume 1, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Bloom filter
  2. Set query
  3. compression
  4. genetic algorithm
  5. tree

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • National Natural Science Foundation of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Multivariate Probabilistic Range Queries for Scalable Interactive 3D VisualizationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.3209439(1-11)Online publication date: 2022

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media