FPGA based string matching for network processing applications

https://doi.org/10.1016/j.micpro.2007.11.001Get rights and content

Abstract

String matching is a key problem in many network processing applications. Current implementations of this process using software are time consuming and cannot meet gigabit bandwidth requirements. Implementing this process in hardware improves the search time considerably and has several other advantages. This paper presents an array based hardware implementation of this time consuming process for network intrusion detection and directory lookup applications using reconfigurable hardware. These designs are coded in VHDL targeting a Xilinx Virtex-II Pro FPGA and are evaluated in terms of the speed and resource utilization.

Introduction

Many network processing applications require frequent use of string matching to find the presence of a keyword. There are several software algorithms for string matching, but the slowness of these implementations creates a bottleneck for growing gigabit network configurations. A hardware implementation can potentially speed up the signature matching process considerably, and in this paper, we introduce variations on an array based architecture for string matching with applications as a string lookup cache and for network intrusion detection.

The architecture is flexible enough to handle variable sized keys as well as provide a mechanism to do mapping in addition to string matching. The architecture is based loosely on temporally cascaded content addressable memories (CAM) and is an extension of Motomura’s cellular automata structure. As with Motomura’s design, processor element cells, external to the CAM array, will process character match signals from the CAM and output keyword match signals. This architecture is also compatible with further optimizations like processing characters in parallel, prefix sharing, pattern partitioning, etc. to improve the performance. The compactness of each cellular automata element leads to a highly efficient design in terms of both area and speed.

The rest of the paper is organized as follows. The following section presents background on string matching in software and hardware. Details of the architecture and working of a generic lookup cache and signature match processor for network intrusion detection are explained in following sections.

Section snippets

Background and related work

String matching is the process of finding out if a given string or pattern is present in the data we have. String matching is a very common problem we encounter in the computing world. We often search our computer for a specific file or we search some text document for a specific keyword. There are different types of string matching like exact string matching which as the name suggests looks for the presence of the exact pattern in the search area. String matching with errors or approximate

String lookup cache

Many networking applications often require the ability to cache certain frequently used values. This is particularly true for operations that use tables that map one known value to another value. Examples include IP to Ethernet address mappings and routing table lookups. It is common to use an address resolution protocol (ARP) cache that serves as a translation table to map layer 3 IP address to corresponding layer 2 hardware address. If an entry is found in the ARP cache, the network processor

Network intrusion detection

With the rapid increase in malicious attacks on corporate and government networks, there has been an increased awareness amongst network administrators to deploy tools that protect them from external attacks. Network intrusion detection systems (NIDS) are one of the primary tools available to help in creating a secure network infrastructure. Network intrusion detection is the process of identifying and analyzing packets that may signify an impending threat to organization’s network. NIDS can be

Conclusions

In this paper, we have introduced an array based architecture for string matching and demonstrated its use in two applications – namely network intrusion detection and as a string cache. The principle behind the architecture is that the processing elements external to CAM array process character matches from CAM yielding keyword matches. This architecture is scalable and optimizations such as processing characters in parallel improve the throughput considerably. Aligning the keyword on a

Future work

We are investigating the use of SMPs in these applications as well as developing extensions of the SMPs to support wildcards and approximate word matching capabilities as well. Other research directions include improving the power characteristics of the SMPs and analyzing operating system requirements to efficiently and seamlessly support these FPGA implementations of network processing functions.

Acknowledgements

This work was supported in part by a grant from the University of Connecticut Research Foundation. Special thanks to Long Bu for his contributions to the initial architecture design.

References (31)

  • R.M. Karp et al.

    Efficient randomized pattern-matching algorithms

    IBM Journal of Research and Development

    (1987)
  • D.E. Knuth et al.

    Fast pattern matching in strings

    SIAM Journal on Computing

    (1977)
  • R.S. Boyer et al.

    A fast string searching algorithm

    Communications of the ACM

    (1977)
  • Q. Zhang, R.D. Chamberlain, R.S. Indeck, B.M. West, J. White, Massively parallel data mining using reconfigurable...
  • R. Baeza-Yates et al.

    A new approach to text searching

    Communications of the ACM

    (1992)
  • S. Wu et al.

    A new approach to text searching

    Communications of the ACM

    (1992)
  • R. Sidhu, V.K. Prasanna, Fast regular expression matching using FPGAs, in: Proceedings of IEEE Symposium on...
  • B.L. Hutchings, R. Franklin, D. Carver, Assisting network intrusion detection with reconfigurable hardware, in:...
  • J. Moscola, J. Lockwood, R.P. Loui, M. Pachos, Implementation of a content-scanning module for an internet firewall,...
  • C. Clark, D. Schimmel, Scalable multi-pattern matching on high-speed networks, in: Proceedings of the IEEE Symposium on...
  • A. Mukhopadhyay

    Hardware algorithms for string processing

    IEEE Computer

    (1980)
  • M. Motomura et al.

    A 1.2-million transistor, 33 MHz, 20-b dictionary search processor (DISP) ULSI with a 160-kb CAM

    IEEE Journal for Solid State Circuits

    (1990)
  • T. Moors et al.

    Cascading content addressable memories

    IEEE Micro

    (1992)
  • A.J. McAuley, P. Francis, Fast routing table lookup using CAMs, in: Proceedings of IEEE INFOCOM, Vol. 3, 1993, pp....
  • M. Motomura et al.

    A 2 k-word dictionary search processor (DISP) with an approximate word search capability

    IEEE Journal for Solid State Circuits

    (1992)
  • Cited by (10)

    • Active storage networks: Using embedded computation in the network switch for cluster data processing

      2015, Future Generation Computer Systems
      Citation Excerpt :

      ASN could work in concert with SDN to provide application level computational enhancements that reduce network traffic and congestion at the switch. Previous approaches to string matching in FPGAs have included finite automata methods that translate regexp signatures into hardware implementations [30,31], CAM based methods [15,32–35], and tree-based approaches [36]. Our approach uses similar approaches within the FPGA, but the integration into the ASN allows it to operate on data as it is streaming from multiple data sources.

    • NetStage/DPR: A self-reconfiguring platform for active and passive network security operations

      2012, Microprocessors and Microsystems
      Citation Excerpt :

      Today, solutions capable of processing 10+ Gb/s are widely available. Katashita et al. propose an integrated solution for packet inspection and filtering on a single FPGA [11], while Singaraju et al. focus on string matching implementations for deep packet inspection [12]. Beyond academic research, numerous commercial companies also investigate the technology.

    • An FPGA based soft multiprocessor for DNS/DNSSEC authoritative server

      2011, Microprocessors and Microsystems
      Citation Excerpt :

      One of the most natural hardware solutions is the use of associative memories. These memories, referred to as Content Addressable Memory (CAM) [14], realize addressing by content. The content, the prefix to search in our case, is applied to the memory input.

    View all citing articles on Scopus
    View full text