ABSTRACT
Counting Bloom filters (CBF) and their variants are data structures that support membership or multiplicity queries with a low probabilistic error. Yet, they incur a significant memory space overhead when compared to lower bounds as well as to (plain) Bloom filters, which can only represent set membership without removals.
This work presents TinyTable, an efficient hash table based algorithm that supports membership queries, removals and multiplicity queries (statistics). TinyTable improves space efficiency by as much as 28% compared to CBF variants and as much as 60% for monitoring flow statistics. When the required false positive rate is smaller than 1%, TinyTable is even slightly more space efficient than (plain) Bloom filters. Our performance study shows that TinyTable has acceptable runtime overheads.
- squid-cache.org. http://www.squid-cache.org/.Google Scholar
- TinyTable: An open source Java based implementation. https://code.google.com/p/tinytable/.Google Scholar
- B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422--426, July 1970. Google ScholarDigital Library
- F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese. Beyond bloom filters: from approximate membership checks to approximate state machines. In SIGCOMM, pages 315--326, 2006. Google ScholarDigital Library
- F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese. Bloom filters via d-left hashing and dynamic bit reassignment. In Proc. of the Allerton Conf. on Communication, Control and Computing, 2006.Google Scholar
- F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese. An improved construction for counting bloom filters. In 14th Annual European Symposium on Algorithms, LNCS 4168, pages 684--695, 2006. Google ScholarDigital Library
- A. Broder, M. Mitzenmacher, and A. B. I. M. Mitzenmacher. Network applications of bloom filters: A survey. In Internet Mathematics, pages 636--646, 2002.Google Scholar
- L. Carter, R. Floyd, J. Gill, G. Markowsky, and M. Wegman. Exact and approximate membership testers. In Proc. of the 10th Annual ACM Symposium on Theory of Computing, STOC, pages 59--65, 1978. Google ScholarDigital Library
- F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst., 26(2):4:1--4:26, June 2008. Google ScholarDigital Library
- B. Chazelle, J. Kilian, R. Rubinfeld, and A. Tal. The bloomier filter: An efficient data structure for static support lookup tables. In Proc. of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 30--39, 2004. Google ScholarDigital Library
- S. Cohen and Y. Matias. Spectral bloom filters. In Proc. of the 2003 ACM SIGMOD Int. Conf. on Management of Data, pages 241--252, 2003. Google ScholarDigital Library
- G. Cormode and S. Muthukrishnan. An improved data stream summary: The count-min sketch and its applications. J. Algorithms, 55: 29--38, 2004. Google ScholarDigital Library
- S. Dharmapurikar, P. Krishnamurthy, and D. E. Taylor. Longest prefix matching using bloom filters. In Proc. of ACM SIGCOMM, pages 201--212, 2003. Google ScholarDigital Library
- G. Einziger and R. Friedman. Postman: An elastic highly resilient publish/subscribe framework for self sustained service independent P2P networks. In Stabilization, Safety, and Security of Distributed Systems (SSS), 2014.Google Scholar
- G. Einziger and R. Friedman. TinyLFU: A highly efficient cache admission policy. In Euromicro PDP, 2014. Google ScholarDigital Library
- G. Einziger and R. Friedman. TinySet - an access efficient self adjusting bloom filter construction. In Proc. 24th Int. Conf. on Computer Communications and Networks (ICCCN), 2015.Google ScholarCross Ref
- G. Einziger, R. Friedman, and Y. Kantor. Shades: Expediting Kademlia's lookup process. In Euro-Par, 2014.Google ScholarCross Ref
- L. Fan, P. Cao, J. Almeida, and A. Z. Broder. Summary cache: A scalable wide-area web cache sharing protocol. IEEE/ACM Trans. Netw., 8(3):281--293, June 2000. Google ScholarDigital Library
- D. Ficara, A. D. Pietro, S. Giordano, G. Procissi, and F. Vitucci. Enhancing counting bloom filters through huffman-coded multilayer structures. IEEE/ACM Trans. Netw., 18(6):1977--1987, 2010. Google ScholarDigital Library
- P. Hick. CAIDA Anonymized 2008 Internet Trace, equinix-chicago 2008-03-19 19:00--20:00 UTC, Direction A.Google Scholar
- P. Hick. CAIDA Anonymized 2008 Internet Trace, equinix-chicago 2008-03-19 19:00--20:00 UTC, Direction B.Google Scholar
- P. Hick. CAIDA Anonymized 2013 Internet Trace, equinix-sanjose 2013-1-17 13:55 UTC, Direction B.Google Scholar
- P. Hick. CAIDA Anonymized 2014 Internet Trace, equinix-chicago 2014-03-20 13:55 UTC, Direction B.Google Scholar
- N. Hua, B. Lin, J. J. Xu, and H. C. Zhao. Brick: A novel exact active statistics counter architecture. In Proc. of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems, ANCS, pages 89--98, 2008. Google ScholarDigital Library
- N. Hua, H. C. Zhao, B. Lin, and J. Xu. Rank-indexed hashing: A compact construction of bloom filters and variants. In ICNP, pages 73--82. IEEE, 2008. Google ScholarDigital Library
- A. Lakshman and P. Malik. Cassandra: A decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44(2):35--40, Apr. 2010. Google ScholarDigital Library
- T. Lee, K. Kim, and H.-J. Kim. Join processing using bloom filter in mapreduce. In Proc. of the 2012 ACM Research in Applied Computation Symposium, RACS, pages 100--105, 2012. Google ScholarDigital Library
- L. Li, B. Wang, and J. Lan. A variable length counting bloom filter. In Computer Engineering and Technology (ICCET), 2010 2nd International Conf. on, volume 3, pages V3--504--V3--508, 2010.Google Scholar
- W. Li, K. Huang, D. Zhang, and Z. Qin. Accurate counting bloom filters for large-scale data processing. Mathematical Problems in Engineering, 2013.Google ScholarCross Ref
- Y. Lu, A. Montanari, B. Prabhakar, S. Dharmapurikar, and A. Kabbani. Counter braids: a novel counter architecture for per-flow measurement. In Proc. of the ACM SIGMETRICS Int. Conf. on Measurement and modeling of computer systems, pages 121--132, 2008. Google ScholarDigital Library
- Y. Matsumoto, H. Hazeyama, and Y. Kadobayashi. Adaptive bloom filter: A space-efficient counting algorithm for unpredictable network traffic. IEICE - Trans. Inf. Syst., E91-D(5):1292--1299, May 2008. Google ScholarDigital Library
- M. Mitzenmacher and E. Upfal. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, New York, NY, USA, 2005. Google ScholarDigital Library
- S. Quinlan and S. Dorward. Venti: A new approach to archival data storage. In Proc. of the 1st USENIX Conf. on File and Storage Technologies, FAST, 2002. Google ScholarDigital Library
- O. Rottenstreich, Y. Kanizo, and I. Keslassy. The variable-increment counting bloom filter. In INFOCOM, pages 1880--1888, 2012.Google ScholarCross Ref
- H. Song, F. Hao, M. S. Kodialam, and T. V. Lakshman. Ipv6 lookups using distributed and load balanced bloom filters for 100gbps core router line cards. In INFOCOM, pages 2518--2526. IEEE, 2009.Google ScholarCross Ref
Index Terms
- Counting with TinyTable: every bit counts!
Recommendations
Multi-Granularities counting bloom filter
HPCC'06: Proceedings of the Second international conference on High Performance Computing and CommunicationsCounting Bloom Filter is an efficient multi-hash algorithm based on Bloom Filter. It uses a space-efficient randomized data structure to represent a set with certain allowable errors, and allows membership and multiplicity queries over the set. Aiming ...
The Bloom paradox: when not to use a Bloom filter
In this paper, we uncover the Bloom paradox in Bloom Filters: Sometimes, the Bloom Filter is harmful and should not be queried. We first analyze conditions under which the Bloom paradox occurs in a Bloom Filter and demonstrate that it depends on the a ...
The variable-increment counting bloom filter
Counting Bloom Filters (CBFs) arewidely used in networking device algorithms. They implement fast set representations to support membership queries with limited error and support element deletions unlike Bloom Filters. However, they consume significant ...
Comments