Design of embedded TCAM based longest prefix match search engine

https://doi.org/10.1016/j.micpro.2011.08.002Get rights and content

Abstract

The content addressable memory (CAM) based solutions are very useful in network applications due to its high speed parallel search mechanism. This paper presents a novel Ternary CAM (TCAM) based NAND Pseudo CMOS–Longest Prefix Match (NPC–LPM) search engine. The proposed system provides a simple hardware based solution using novel 11T TCAM cell structures and NPC word line technique, for network routers. The experiments were performed on 256 × 128 NPC–LPM system under 0.13 μm technology. The simulation result shows that the proposed design provides low power dissipation of 5.78 mW and high search speed of 315 MSearches/s under 1.3 V supply voltage. The presented NPC–LPM system meets the speed requirement of Optical Carrier (OC) 3072 with line-rate of 160 Gb/s in Ethernet networking and IPv6 protocol. The experimental results also show that the proposed system improves power-performance by 65%.

Introduction

Content addressable memory (CAM) is a special functional memory. Most conventional memory devices such as RAM or ROM store and retrieve the data by addressing specific memory locations. As a result, this path often becomes the limiting factor for systems that rely on fast memory bandwidth [1]. However in CAM, the stored data can be identified for accessing by the content of the data itself instead of its address. The time required to find an item stored in a memory determines the search speed. However, the hardware based solution through CAM provides a performance advantage over other memory search algorithms (such as binary and tree based searches or look-aside tag buffers). It can perform an associative search operation in one access cycle, which is not always possible for random access memory (RAM). Some earlier research works have already investigated about different types of CAM circuits [2], [3].

The high speed parallel search mechanism of CAM is applicable in many fields such as caches [4], [5], image processing [6], network filters [7], pattern recognitions [8], translation-look aside buffers [9], [10], and IP routers [11], [12]. In addition, CAM plays an important role in high-speed communication networks, instance gigabit Ethernet [13], [14] network, high-speed asynchronous transfer mode switch and high-speed lookup tables (more than 100 MSearches/s) [15], [16]. Readers may refer to [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16] for a brief analysis of different CAM applications. For instance, CAM can be used for switching application in network switch. Here it compares the destination address of received packets with the table of address stored within it. A CAM might store an Ethernet address and switch port numbers. It compares the received data against the table that has been stored within, and if the comparison yields a match, the port identification is given, and routing control forwards the packet to the correct port or address.

Recently, high speed communication networks, high speed asynchronous transfer mode (ATM) switch and high speed look up tables are essentially required hardware based parallel search system with low power and best performance features. Therefore the CAM architecture is the right solution in order to satisfy today’s high speed data search requirements of network applications. The performance of the software based conventional routers [17] depends on the speed of processor and memory. To achieve high-speed routing, high-performance processors together with large memories are required. This results in higher cost. Thus, the software-based routing is possible at low-speed range from 10 megabits per second (Mbps) to 100 Mbps. The processing costs and architectural implications make it difficult to achieve high-speed like OC-768 (Line-rate 40 Gb/s) or OC-3072 (Line-rate 160 Gb/s) routing at higher speeds when using software-based processing.

The high-speed CAM based hardware solutions have been presented to overcome the difficulties of software based solutions [18], [19]. The CAM based hardware solutions for some applications where high speed data search is significant like internet routers, IP filters, LPM search engine and Network switch are required an extra state “X” (don’t care) in addition with binary states (0 and 1). The extra state is used to improve the memory utilization and search performance. To acquire this, our proposed Ternary CAM (TCAM) cell provides three states “0”, “1”, and “X” with reduced number of transistors. The conventional static TCAM cell [2] is usually constructed using seventeen transistors (17T) as shown in Fig. 1. It consists of two ordinary 6 transistors (6T) SRAM cells which are used to store the data bit and the mask bit respectively. Also, the circuit includes five transistors (5T) in the comparison circuitry. It requires two write cycles for both of its read and write operations. The operation of comparison circuit is similar to BCAM cell. The output of the comparison circuit depends upon the mask bit status. This cell results high hardware cost, high power consumption and low search speed. Also, Tsai et al. [19] proposed static TCAM cell with reduced number of transistors but it has heavy loading problem when the system size is large. In this paper, we propose the hardware based solution such as a novel TCAM based NPC–LPM circuit for the network router application. The proposed TCAM cell structure with word line technique avoids the mentioned drawbacks and provides regularity, scalability, low power, low cost and high search speed features. In contrast to previous solutions, the proposed system is providing good power-performance results.

The rest of the paper is organized as follows. Section 2 gives the overview of the proposed NPC–LPM system architecture. Section 3 describes our proposed 11T TCAM cell structure and worst case delay analysis. This section also gives detail of our proposed NPC word line technique. Section 4 analyses the design consideration regarding write failures in TCAM cell. Section 5 shows the simulation results, choosing partial bits value and functional verification of the proposed design. Finally, Section 6 concludes this paper.

Section snippets

Architecture of the NPC–LPM system

Fig. 2 shows the functional block diagram of NAND Pseudo CMOS–Longest Prefix Match search (NPC–LPM) system. Longest prefix match (LPM) or Maximum prefix length match concept used by routers in Internet Protocol (IP) networking to select an entry from a routing table [21]. Each entry in a routing or forwarding table may specify a network; one destination address may match more than one routing table entry. The most specific table entry such as the one with the highest subnet mask is called the

Proposed TCAM cell structures

A large-capacity TCAM section will be expensive partially due to the large cell area. TCAM cell with minimum number of transistors can reduce the cost of the entire system by improving the layout density. The proposed 9T static ternary CAM cell is much more area efficient than conventional 17T static cell. Fig. 4 illustrates the proposed TCAM cells, in which the efficient cell density achieved by eliminating extra access transistor and maskable 6T SRAM circuit from each cell. The parasitic

Analysis of write failure

To achieve the correct function without write failure, it needs careful design of the cell structure. The write failure may cause search failure which will totally affect the functionality of the search engine. Here we analyzed the equivalent circuit of the proposed cell structure under consideration of transistor sizing factor. In static design approach, improper transistor sizing results confliction between the present data and existing data. It may be the causes of the write failure. In our

Simulation results

This section shows the simulation results of NPC–LPM system. In order to realize the correct function and utilize the fair result for power and performance of our cell structures, worst case operation is taken into account. The main reason is that worst case (only one bit miss) evaluation estimates the critical search speed. This work used SPICE – 0.13 μm technology for its simulation. The simulation results of NPC_LPM search engine with the size of 128 × 32, used to decide the p value which is

Conclusions

In this paper, a novel NPC–LPM search engine for the network router application is proposed. This NPC–LPM search engine reduces large amount of power dissipation using our static memory architecture. The proposed 11 transistors TCAM cell is simpler than conventional 17 transistors TCAM cell. Furthermore, the speed result of our LPM system can fulfill the requirement of OC 3072 and the word width (n = 128) can satisfy IPv6 protocol. The simulation results show that the NPC–LPM search engine with

Acknowledgments

Special thanks to Prof. Bin Da Liu, SPIC Lab, Department of EE, NCKU-Taiwan. Many thanks to R.J. Tsai, TSMC, Taiwan for valuable discussions. This work is supported by the NTNU PhD Research Fellowships in Information Technology, Mathematics and Electrical Engineering IME 082-2007.

P. Manikandan received the Diploma in electrical and electronics engineering from Alagappa Polytechic, Tamilnadu, India and Bachelors degree in electronics and communications engineering from Madurai Kamaraj University, Tamilnadu, India, in 2001 and 2004, respectively. He received his Masters degree in electrical engineering from National Cheng Kung University, Tainan, Taiwan in 2007. At present he is working towards his PhD in electronics and telecommunications engineering at the Norwegian

References (24)

  • K.J. Schultz

    Content-addressable memory core cells: a survey

    Integration: VLSI J.

    (1997)
  • C.S. Lin et al.

    A low-power precomputation-based fully parallel content-addressable memory

    IEEE J. Solid-State Circuits

    (2003)
  • K. Pagiamtzis et al.

    Content-addressable memory (CAM) circuits and architectures: a tutorial and survey

    IEEE J. Solid-State Circuits

    (2006)
  • B. Mohammad, P. Bassett, J. Abraham, A. Aziz, Cache organization for embedded processors: CAM-vs-SRAM, in: Proc. IEEE...
  • A. Efthymiou et al.

    A CAM with mixed serial-parallel comparison for use in low energy caches

    IEEE Trans. VLSI Syst.

    (2004)
  • T. Ikenaga et al.

    A fully-parallel 1 Mb CAM LSI for real-time pixel-parallel image processing

    IEEE J. Solid-State Circuits

    (2000)
  • K.J. Lin et al.

    A low-power CAM design for LZ data compression

    IEEE Trans. Comput.

    (2000)
  • H. Miyatake et al.

    A design for high-speed low-power CMOS fully parallel content-addressable memory macros

    IEEE J. Solid-State Circuits

    (2001)
  • M. Sumita, A 800MHz single cycle access 32 entry fully associative TLB with a 240ps access match circuit, in: Symp....
  • J.H. Lee, G.H. Park, S.B. Park, S.D. Kim, A selective filter-bank TLB system (embedded processor MMU for low power),...
  • Z. Liang, J. Wu, K. Xu, A TCAM-based IP lookup scheme for multi-nexthop routing, in: Proc. IEEE Int. Conf. Comp....
  • F. Zane, N. Girija, A. Basu, Coolcams: power-efficient TCAMs for forwarding engines, in: Proc. IEEE INFOCOM, March...
  • Cited by (4)

    • Comparative study of power consumption of a NetFPGA-based forwarding node in publish-subscribe Internet routing

      2014, Computer Communications
      Citation Excerpt :

      Once the LPM is completed, the packets are finally sent through the port corresponding to the LPM entry. More detailed discussions of the LPM and CAM/TCAM can be found in [27–29]. The experimental testing system is shown in Fig. 3.

    • Hardware based parallel phrase matching engine in dictionary compressor

      2018, IEICE Transactions on Information and Systems
    • An efficient parallelization of longest prefix match and application on data compression

      2016, International Journal of High Performance Computing Applications

    P. Manikandan received the Diploma in electrical and electronics engineering from Alagappa Polytechic, Tamilnadu, India and Bachelors degree in electronics and communications engineering from Madurai Kamaraj University, Tamilnadu, India, in 2001 and 2004, respectively. He received his Masters degree in electrical engineering from National Cheng Kung University, Tainan, Taiwan in 2007. At present he is working towards his PhD in electronics and telecommunications engineering at the Norwegian University of Science and Technology, Trondheim, Norway. Mr. Manikandan received IEEE – Best paper award in 2006 and Industrial scholar award in 2007. He has research experience from the department of electrical and computer engineering at The University of Iowa, IA, USA and Indian Institute of Technology, Kharagpur, India. He also worked as a Senior system engineer in Hsinchu Science park, Taiwan in 2007–2008. He is the chairman of Region 8, IEEE-SB of Norway section. His research interests are built-in self test, path delay fault testing and content addressable memories.

    Bjørn B. Larsen received his Ph.D. degree from the Norwegian Institute of Technology in 1991. Since 1992, he is working as an associate professor in electronics and telecommunications engineering at the Norwegian University of Science and Technology. His current areas of research include VLSI design, field programmable gate array testing, Design for Testability, Built-in self-test and verification.

    Einar J. Aas received his Ph.D. degree from the Norwegian Institute of Technology in 1972. In 1972, he joined SINTEF (The Foundation for Scientific and Industrial Research). Since 1981 he has been a full professor in Electronic Design Methodology at NTH, now the Norwegian University of Science and Technology. Dr. Aas is author and co-author of more than 400 technical and scientific publications. His current areas of research include VLSI design, verification and testing. Prof. Aas is Member of the Norwegian Academy of Technological Sciences and The Royal Norwegian Society of Sciences and Letters.

    View full text