skip to main content
10.1145/2896377.2901451acmconferencesArticle/Chapter ViewAbstractPublication PagesmetricsConference Proceedingsconference-collections
research-article

Noisy Bloom Filters for Multi-Set Membership Testing

Published: 14 June 2016 Publication History

Abstract

This paper is on designing a compact data structure for multi-set membership testing allowing fast set querying. Multi-set membership testing is a fundamental operation for computing systems and networking applications. Most existing schemes for multi-set membership testing are built upon Bloom filter, and fall short in either storage space cost or query speed. To address this issue, in this paper we propose Noisy Bloom Filter (NBF) and Error Corrected Noisy Bloom Filter (NBF-E) for multi-set membership testing. For theoretical analysis, we optimize their classification failure rate and false positive rate, and present criteria for selection between NBF and NBF-E. The key novelty of NBF and NBF-E is to store set ID information in a compact but noisy way that allows fast recording and querying, and use denoising method for querying. Especially, NBF-E incorporates asymmetric error-correcting coding technique into NBF to enhance the resilience of query results to noise by revealing and leveraging the asymmetric error nature of query results. To evaluate NBF and NBF-E in comparison with prior art, we conducted experiments using real-world network traces. The results show that NBF and NBF-E significantly advance the state-of-the-art on multi-set membership testing.

References

[1]
https://software.intel.com/en-us/articles/data-alignment-when-migrating-to-64-bit-intel-architecture.
[2]
E. Agrell, A. Vardy, and K. Zeger. Upper bounds for constant-weight codes. IEEE Transactions on Information Theory, 46(7):2373--2395, 2000.
[3]
R. Arratia and L. Gordon. Tutorial on large deviations for the binomial distribution. Bulletin of mathematical biology, 51(1):125--131, 1989.
[4]
F. Bonomi, M. Mitzenmacher, R. Panigrah, S. Singh, and G. Varghese. Beyond bloom filters: from approximate membership checks to approximate state machines. ACM SIGCOMM Computer Communication Review, 36(4):315--326, 2006.
[5]
B. Chazelle, J. Kilian, R. Rubinfeld, and A. Tal. The bloomier filter: an efficient data structure for static support lookup tables. In Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms, pages 30--39, 2004.
[6]
D. Eppstein, M. T. Goodrich, F. Uyeda, and G. Varghese. What's the difference?: efficient set reconciliation without prior context. In ACM SIGCOMM Computer Communication Review, volume 41, pages 218--229. ACM, 2011.
[7]
T. Etzion and A. Vardy. A new construction for constant weight codes. In IEEE International Symposium on Information Theory and its Applications (ISITA), pages 338--342, 2014.
[8]
L. Fan, P. Cao, J. Almeida, and A. Z. Broder. Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Transactions on Networking, 8(3):281--293, 2000.
[9]
M. T. Goodrich and M. Mitzenmacher. Invertible bloom lookup tables. In the 49th IEEE Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 792--799, 2011.
[10]
R. L. Graham and N. Sloane. Lower bounds for constant weight codes. IEEE Transactions on Information Theory, 26(1):37--43, 1980.
[11]
F. Hao, M. Kodialam, T. Lakshman, and H. Song. Fast dynamic multiple-set membership testing using combinatorial bloom filters. IEEE/ACM Transactions on Networking, 20(1):295--304, 2012.
[12]
T. Kløve. Error correcting codes for the asymmetric channel. Technical Report. Department of Pure Mathematics, University of Bergen, 1981.
[13]
M. Lee, N. Duffield, and R. R. Kompella. Maple: A scalable architecture for maintaining packet latency measurements. In Proceedings of the ACM conference on Internet measurement conference, pages 101--114, 2012.
[14]
C.-s. Liu. The essence of the generalized newton binomial theorem. Communications in Nonlinear Science and Numerical Simulation, 15(10):2766--2768, 2010.
[15]
Y. Lu, B. Prabhakar, and F. Bonomi. Bloom filters: Design innovations and novel applications. In the 43rd Annual Allerton Conference, 2005.
[16]
I. P. Naydenova. Error detection and correction for symmetric and asymmetric channels. 2007.
[17]
Y. Qiao, T. Li, and S. Chen. Fast bloom filters and their generalization. IEEE Transactions on Parallel and Distributed Systems, 25(1):93--103, 2014.
[18]
B. Ryabko. Fast enumeration of combinatorial objects. In Discrete Mathematics and Applications, 1998.
[19]
S. Sen and J. Wang. Analyzing peer-to-peer traffic across large networks. IEEE/ACM Transactions on Networking, 12(2):219--232, 2004.
[20]
L. G. Tallini and B. Bose. Reed-muller codes, elementary symmetric functions and asymmetric error correction. In IEEE International Symposium on Information Theory Proceedings (ISIT), pages 1051--1055. IEEE, 2011.
[21]
L. G. Tallini and B. Bose. On $L_1$ metric asymmetric/unidirectional error control codes, constrained weight codes and σ-codes. In IEEE International Symposium on Information Theory Proceedings (ISIT), pages 694--698, 2013.
[22]
C. Tian, V. Vaishampayan, N. Sloane, et al. A coding algorithm for constant weight vectors: a geometric approach based on dissections. IEEE Transactions on Information Theory, 55(3):1051--1060, 2009.
[23]
S. Xiong, Y. Yao, Q. Cao, and T. He. kBF: A Bloom Filter for key-value storage with an application on approximate state machines. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM), pages 1150--1158. 2014.
[24]
M. K. Yoon, J. Son, and S.-H. Shin. Bloom tree: A search tree based on bloom filters for multiple-set membership testing. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM), pages 1429--1437. 2014.
[25]
M. Yu, A. Fabrikant, and J. Rexford. Buffalo: Bloom filter forwarding architecture for large organizations. In Proceedings of the 5th International Conference on Emerging Networking Experiments and Technologies, pages 313--324. ACM, 2009.
[26]
J. Zhang and F.-W. Fu. Constructions for binary codes correcting asymmetric errors from function fields. In Theory and Applications of Models of Computation, pages 284--294. Springer, 2012.
[27]
H. Zhou, A. Jiang, and J. Bruck. Nonuniform codes for correcting asymmetric errors in data storage. IEEE Transactions on Information Theory, 59(5):2988--3002, 2013.

Cited By

View all
  • (2023)A One-Pass Clustering Based Sketch Method for Network MonitoringIEEE/ACM Transactions on Networking10.1109/TNET.2023.325198131:6(2604-2613)Online publication date: Dec-2023
  • (2023)Seesaw Counting Filter: A Dynamic Filtering Framework for Vulnerable Negative KeysIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.327370935:12(12987-13001)Online publication date: 1-Dec-2023
  • (2022)Computational Estimation by Scientific Data Mining with Classical Methods to Automate Learning Strategies of ScientistsACM Transactions on Knowledge Discovery from Data10.1145/350273616:5(1-52)Online publication date: 9-Mar-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMETRICS '16: Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science
June 2016
434 pages
ISBN:9781450342667
DOI:10.1145/2896377
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. asymmetric error-correcting code
  2. bloom filter
  3. constant weight code.
  4. multi-set membership testing
  5. noise

Qualifiers

  • Research-article

Funding Sources

  • Huawei Innovation Research Program (HIRP)
  • National Natural Sci- ence Foundation of China
  • Jiangsu High-level Innovation and Entrepreneurship (Shuangchuang) Program

Conference

SIGMETRICS '16
Sponsor:

Acceptance Rates

SIGMETRICS '16 Paper Acceptance Rate 28 of 208 submissions, 13%;
Overall Acceptance Rate 459 of 2,691 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)30
  • Downloads (Last 6 weeks)3
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)A One-Pass Clustering Based Sketch Method for Network MonitoringIEEE/ACM Transactions on Networking10.1109/TNET.2023.325198131:6(2604-2613)Online publication date: Dec-2023
  • (2023)Seesaw Counting Filter: A Dynamic Filtering Framework for Vulnerable Negative KeysIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.327370935:12(12987-13001)Online publication date: 1-Dec-2023
  • (2022)Computational Estimation by Scientific Data Mining with Classical Methods to Automate Learning Strategies of ScientistsACM Transactions on Knowledge Discovery from Data10.1145/350273616:5(1-52)Online publication date: 9-Mar-2022
  • (2022)BBF: A Bloom Filter Using B Sequences for Multi-set Membership QueryACM Transactions on Knowledge Discovery from Data10.1145/350273516:5(1-26)Online publication date: 9-Mar-2022
  • (2022)Bloom Filter with Noisy Coding Framework for Multi-Set Membership TestingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3199646(1-14)Online publication date: 2022
  • (2022)Coloring Embedder: Towards Multi-Set Membership Queries in Web Cache SharingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.306218234:12(5664-5680)Online publication date: 1-Dec-2022
  • (2022)Multiset Membership Lookup in Large DatasetsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.304962434:10(4947-4958)Online publication date: 1-Oct-2022
  • (2022)LTC: A Fast Algorithm to Accurately Find Significant Items in Data StreamsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.303891134:9(4342-4356)Online publication date: 1-Sep-2022
  • (2022)A Pareto optimal Bloom filter family with hash adaptivityThe VLDB Journal10.1007/s00778-022-00755-z32:3(525-548)Online publication date: 26-Jul-2022
  • (2021)Machine Learning for Electronic Design Automation: A SurveyACM Transactions on Design Automation of Electronic Systems10.1145/345117926:5(1-46)Online publication date: 5-Jun-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media