skip to main content
10.1145/1811039.1811056acmconferencesArticle/Chapter ViewAbstractPublication PagesmetricsConference Proceedingsconference-collections
research-article

Small subset queries and bloom filters using ternary associative memories, with applications

Published: 14 June 2010 Publication History

Abstract

Associative memories offer high levels of parallelism in matching a query against stored entries. We design and analyze an architecture which uses single lookup into a Ternary Content Addressable Memory (TCAM) to solve the subset query problem for small sets, i.e., to check whether a given set (the query) contains (or alternately, is contained in) any one of a large collection of sets in a database. We use each TCAM entry as a small Ternary Bloom Filter (each 'bit' of which is one of {0,1,wildcard}) to store one of the sets in the collection. Like Bloom filters, our architecture is susceptible to false positives. Since each TCAM entry is quite small, asymptotic analyses of Bloom filters do not directly apply. Surprisingly, we are able to show that the asymptotic false positive probability formula can be safely used if we penalize the small Bloom filter by taking away just one bit of storage and adding just half an extra set element before applying the formula. We believe that this analysis is independently interesting. The subset query problem has applications in databases, network intrusion detection, packet classification in Internet routers, and Information Retrieval. We demonstrate our architecture on one illustrative streaming application -- intrusion detection in network traffic. Be shingling (i.e., taking consecutive bytes of) the strings in the database, we can perform a single subset query and hence a single TCAM search, to skip many bytes in the stream. We evaluate our scheme on the open source CLAM anti-virus database, for worst-case as well as random streams. Our architecture appears to be at least one order of magnitude faster than previous approaches. Since the individual Bloom filters must fit in a single TCAM entry (currently 72 to 576 bits), our solution applies only when each set is of a small cardinality. However, this is sufficient for many typical applications. Also, recent algorithms for the subset-query problem use a small-set version as a subroutine

References

[1]
K. G. Anagnostakis, S. Antonatos, E. P. Markatos, and M. Polychronakis. E2xB: A domain-specific string matching algorithm for intrusion detection. In Proceedings of the 18th IFIP International Information Security Conference (SEC2003, 2003.
[2]
N. Bandi, D. Agrawal, A. Abbadi, and A. Metwally. Fast data stream algorithms using associative memories. In Proc. SIGMOD, 2007.
[3]
Blog entry about query sizes. http://www.beussery.com/blog/index.php/2008/02/google-average-number-of-word%s-per-query-have-increased/.
[4]
B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422--426, 1970.
[5]
R. Boyer and J. Moore. A fast string searching algorithm. Commun. ACM, 20(10):762, October 1977.
[6]
A. Broder and M. Mitzenmacher. Network applications of Bloom filters: A survey. Internet Mathematics, pages 636--646, 2002.
[7]
J. Byers and M. Mitzenmacher. Fast approximate reconciliation of set differences. Draft paper, available as BU Computer Science TR 2002-019, 2002.
[8]
C. Y. Chan and Y. E. Ioannidis. Bitmap index design and evaluation. In L. M. Haas and A. Tiwary, editors, Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pages 355--366, 1998.
[9]
M. Charikar, P. Indyk, and R. Panigrahy. New algorithms for subset query, partial match, orthogonal range searching, and related problems. Lecture Notes In Computer Science; Vol. 2380. Proceedings of the 29th International Colloquium on Automata, Languages and Programming, pages 451--462, 2002.
[10]
ClamAV. http://www.clamav.net/.
[11]
S. Dharmapurikar, M. Attig, and J. Lockwood. Design and implementation of a string matching system for network intrusion detection using FPGA-based bloom lters. In Proc. IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'04), 2004.
[12]
S. Dharmapurikar, P. Krishnamurthy, T. Sproull, and J. Lockwood. Deep packet inspection using parallel bloom filters. In Proc. Hot Interconnects, Stanford, CA, pages 44--51, August 2003.
[13]
D. Dubhashi and D. Ranjan. Balls and bins: a study in negative dependence. Random Struct. Algorithms, 13(2):99--124, 1998.
[14]
O. Erdogan and P. Cao. Hash-AV: fast virus signature scanning by cache-resident filters. International Journal of Security and Networks, 2:50--59, 2007.
[15]
C. Estan and G. Varghese. New directions in traffic measurement and accounting. Proceedings of the 2001 ACM SIGCOMM Internet Measurement Workshop, pages 75--80, November 2001.
[16]
M. Fisk and G. Varghese. Fast content based packet handling for intrusion detection. Tech. report CS2001-0670, Univ. of California, San Diego, 2001.
[17]
P. Gupta and N. Mckeown. Packet classification using hierarchical intelligent cuttings. In Hot Interconnects VII, pages 34--41, 1999.
[18]
P. Gupta and N. McKeown. Algorithms for packet classification. IEEE Network (Special Issue), 15(2):24--32, 2001.
[19]
Integrated Device Technology Inc. http://www.idt.com/.
[20]
K. Lakshminarayanan, A. Rangarajan, and S. Venkatachary. Algorithms for advanced packet classification with Ternary CAMs. In Proc. Sigcomm, pages 193--204. ACM, 2005.
[21]
H. Liu. Efficient mapping of range classifier into Ternary-CAM. In Proc. of Hot Interconnects, 2002.
[22]
Y. Lu and B. Prabhakar. Perfect hashing for network applications. In IEEE Symposium on Information Theory, pages 2774--2778, 2006.
[23]
C. Masson, C. Robardet, and J. Boulicaut. Optimizing subset queries: a step towards sql-based inductive databases for itemsets. In Proceedings of the 2004 ACM symposium on Applied computing, 2004.
[24]
T. Morzy and R. Nanopoulos. Hierarchical bitmap index: An efficient and scalable indexing technique for set-valued attributes. In Proc. ADBIS, pages 236--252. Springer-Verlag, 2003.
[25]
T. Morzy and M. Zakrewicz. Group bitmap index: a structure for association rules retrieval. In Proc. ACM SIGKDD, pages 284--288, 1998.
[26]
Netlogic microsystems. http://www.netlogicmicro.com/.
[27]
R. L. Rivest. Analysis of associative retrieval algorithms. Ph.D. thesis, Stanford University, 1974.
[28]
R. L. Rivest. Partial match retrieval algorithms. Siam Journal on Computing, 5:19--50, 1976.
[29]
L. Salmela, J. Tarhio, and J. Kytjoki. Multi-pattern string matching with q-grams. ACM Journal of Experimental Algorithmics, 11, 2006.
[30]
D. Shah and P. Gupta. Fast updates on Ternary-CAMs for packet lookups and classification. In Proc. Hot Interconnects VIII, Stanford, 2000.
[31]
S. Sharma and R. Panigrahy. Reducing TCAM power consumption and increasing throughput. In Proceedings of the 10th Symposium on High Performance Interconnects HOT Interconnects (HotI'02), page 107, 2002.
[32]
R. Shinde, A. Goel, P. Gupta, and D. Dutta. Similarity search and locality sensitive hashing using ternary content addressable memories. In Proceedings of the 2010 ACM SIGMOD International Conference, 2010.
[33]
S. Singh, F. Baboescu, G. Varghese, and J. Wang. Packet classification using multidimensional cutting. In Proc. of ACM Sigcomm, pages 213--224, 2003.
[34]
Snort. http://www.snort.org/.
[35]
D. E. Taylor. Survey and taxonomy of packet classification techniques. In ACM Computing Surveys, 2004.
[36]
N. Tuck, T. Sherwood, B. Calder, and G. Varghese. Deterministic memory-efficient string matching algorithms for intrusion detection. In Proc. IEEE Infocom, Hong Kong, pages 333--340, 2004.
[37]
S. Wu and U. Manber. A fast algorithm for multi-pattern searching. Technical Report, Department of Computer Science, University of Arizona - TR-94-17, 1994.
[38]
F. Yu and R. H. Katz. Efficient multi-match packet classification with TCAM. In Proc. of Hot Interconnects, 2004.
[39]
F. Yu, R. H. Katz, and T. Lakshman. Gigabit rate multiple-pattern matching with TCAM. Sahara Retreat Posters Winter 2004.
[40]
F. Zane, G. Narlikar, and A. Basu. CoolCAM: Power-Efficient TCAMs for Forwarding Engines. In Proc. of IEEE Infocom, 2003.
[41]
X. Zhou, B. Xu, Y. Qi, and J. Li. MRSI: A fast pattern matching algorithm for anti-virus applications. In ICN '08: Proceedings of the Seventh International Conference on Networking, pages 256--261, 2008.
[42]
C. R. Meiners and a. E. T. A. X. Liu. Topological transformation approaches to optimizing tcam-based packet processing systems. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), Seattle, Washington, June 2009.

Cited By

View all
  • (2024)SoK: Collusion-resistant Multi-party Private Set Intersections in the Semi-honest Model2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00079(465-483)Online publication date: 19-May-2024
  • (2024)Jacobian sparsity detection using Bloom filtersOptimization Methods and Software10.1080/10556788.2023.2285486(1-13)Online publication date: 10-Oct-2024
  • (2023)Optimizing 0-RTT Key Exchange with Full Forward SecurityProceedings of the 2023 on Cloud Computing Security Workshop10.1145/3605763.3625246(55-68)Online publication date: 26-Nov-2023
  • Show More Cited By

Index Terms

  1. Small subset queries and bloom filters using ternary associative memories, with applications

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMETRICS '10: Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
    June 2010
    398 pages
    ISBN:9781450300384
    DOI:10.1145/1811039
    • cover image ACM SIGMETRICS Performance Evaluation Review
      ACM SIGMETRICS Performance Evaluation Review  Volume 38, Issue 1
      Performance evaluation review
      June 2010
      382 pages
      ISSN:0163-5999
      DOI:10.1145/1811099
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 June 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. TCAM
    2. bloom filters
    3. subset queries

    Qualifiers

    • Research-article

    Conference

    SIGMETRICS '10
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 459 of 2,691 submissions, 17%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)16
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)SoK: Collusion-resistant Multi-party Private Set Intersections in the Semi-honest Model2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00079(465-483)Online publication date: 19-May-2024
    • (2024)Jacobian sparsity detection using Bloom filtersOptimization Methods and Software10.1080/10556788.2023.2285486(1-13)Online publication date: 10-Oct-2024
    • (2023)Optimizing 0-RTT Key Exchange with Full Forward SecurityProceedings of the 2023 on Cloud Computing Security Workshop10.1145/3605763.3625246(55-68)Online publication date: 26-Nov-2023
    • (2023)Scalably Detecting Third-Party Android Libraries With Two-Stage Bloom FilteringIEEE Transactions on Software Engineering10.1109/TSE.2022.321562849:4(2272-2284)Online publication date: 1-Apr-2023
    • (2023)Libra: A Space-Efficient, High-Performance Inline Deduplication for Emerging Hybrid Storage System2023 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SocialCom-SustainCom59178.2023.00062(221-228)Online publication date: 21-Dec-2023
    • (2023)Lightweight certificate revocation for low-power IoT with end-to-end securityJournal of Information Security and Applications10.1016/j.jisa.2023.10342473:COnline publication date: 1-Mar-2023
    • (2022)Adversarial Correctness and Privacy for Probabilistic Data StructuresProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security10.1145/3548606.3560621(1037-1050)Online publication date: 7-Nov-2022
    • (2021)Bloom Filter Encryption and Applications to Efficient Forward-Secret 0-RTT Key ExchangeJournal of Cryptology10.1007/s00145-021-09374-334:2Online publication date: 9-Mar-2021
    • (2020)A Hybrid SWIM Data Naming Scheme Based on TLC StructureFuture Internet10.3390/fi1209014212:9(142)Online publication date: 25-Aug-2020
    • (2020)Don't Work on Individual Data Plane Algorithms. Put Them Together!Proceedings of the 19th ACM Workshop on Hot Topics in Networks10.1145/3422604.3425932(60-66)Online publication date: 4-Nov-2020
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media