ABSTRACT
IP forwarding is one of the main bottlenecks in Internet backbone routers, as it requires performing the longest-prefix match at 10Gbps speed or higher. IPv6 forwarding further exacerbates the situation because its search space is quadrupled. We propose a high-performance IPv6 forwarding algorithm TrieC, and implement it efficiently on the Intel IXP2800 network processor (NPU). Programming the multi-core and multithreaded NPU is a daunting task. We study the interaction between the parallel algorithm design and the architecture mapping to facilitate efficient algorithm implementation. We experiment with an architecture-aware design principle to guarantee the high performance of the resulting algorithm.This paper investigates the main software design issues that have dramatic performance impacts on any NPU based implementation: memory space reduction, instruction selection, data allocation, task partitioning, latency hiding, and thread synchronization. In the paper, we provide insight on how to design an NPU-aware algorithm for high-performance networking applications. Based on the detailed performance analysis of the TrieC algorithm, we provide guidance on developing high-performance networking applications for the multi-core and multithreaded architecture.
- Agere, Network Processor, http://www.agere.com/telecom/network_processors.html.Google Scholar
- J. R. Allen, B. M. Bass, C. Basso, R. H. Boivie, J. L. Calvignac, G. T. Davis, L. Frelechoux, M. Heddes, A., et al., "IBM PowerNP Network Processor: Hardware, Software, and Applications", IBM J. Res. & Dev., Vol. 47 NO. 2/3 MARCH/MAY 2003. Google ScholarDigital Library
- AMCC, Network Processor, https://www.amcc.com/MyAMCC/jsp/public/browse/controller.jsp?networkLevel=COMM&superFamily=NETP.Google Scholar
- CERNET BGP View Project, http://bgpview.6test.edu.cn/bgp-view/index.shtml.Google Scholar
- M. K. Chen, X. F. Li, R. Lian, J. H. Lin, L. Liu, T. Liu, and R. Ju, "Shangri-La: achieving high performance from compiled network applications while enabling ease of programming", in Proc. of ACM PLDI'05, Chicago, IL, USA, 2005, pp. 224--236. Google ScholarDigital Library
- J. Dai, B. Huang, L. Li, and L. Harrison, "Automatically Partitioning Packet Processing Applications for Pipelined Architectures", in Proc. of ACM PLDI'05, 2005, pp. 237--248. Google ScholarDigital Library
- S. Deering, and R. Hinden, RFC2460, "Internet Protocol, Version 6?IPv6?Specification". Google ScholarDigital Library
- M. Degermark, A. Brodnik, S. Carlsson, and S. Pink, "Small Forwarding Tables for Fast Routing Lookups," in Proc. of ACM SIGCOMM '97, Cannes, France, 1997, pp. 3--14. Google ScholarDigital Library
- W. Eatherton, G. Varghese, and Z Dittia, "Tree Bitmap: Hardware/Software IP Lookups with Incremental Updates," in Proc. of ACM SIGCOMM on Computer Communication Review, Vol. 34, Issue 2, April 2004, pp. 97--122. Google ScholarDigital Library
- Freescale, C-Port Network Processors, http://www.freescale.com/webapp/sps/site/homepage.jsp?nodeId=02VS0lDFTQ3126.Google Scholar
- P. Gupta, S. Lin, and N. McKeown, "Routing Lookups in Hardware at Memory Access Speeds", in Proc. of INFOCOM'98, Vol. 3, San Francisco, 1998, pp. 1240--1247.Google ScholarCross Ref
- J. Hasan, and T. N. Vijaykumar, "Dynamic Pipelining: Making IP-lookup Truly Scalable", in Proc. of ACM SIGCOMM'05, Philadelphia, USA, 2005, pp. 205--216. Google ScholarDigital Library
- Xianghui Hu, Bei Hua, and Xinan Tang, "TrieC: A High-Speed IPv6 Lookup with Fast Updates Using Network Processor", in Proc. of the International Conference on Embedded Software and Systems, Xi'an, China, Dec. 2005. pp. 117--128. Google ScholarDigital Library
- Intel, IXP2XXX Product Line of Network Processor, http://www.intel.com/design/network/products/npfamily/ixp2xxx.htm.Google Scholar
- IPv6 Report, http://bgp.potaroo.net/index-v6.html.Google Scholar
- R. Jain, "A Comparison of Hashing Schemes for Address Lookup in Computer Networks", IEEE Transactions on Communications, 40 (10), Oct. 1992, pp. 1570--1573.Google ScholarCross Ref
- C. Kulkarni, M. Gries, C. Sauer, and K. Keutzer, "Programming Challenges in Network Processor Deployment", in Proc. of the International Conference on Compilers, Architecture, and Synthesis for Embedded System, San Jose, 2003, pp. 178--187. Google ScholarDigital Library
- B. Lampson, V. Srinivasan, and G. Varghese, "IP Lookups using Multiway and Multicolumn Search", in Proc. of INFOCOM'98, San Francisco, 1998, pp. 1248--1256.Google ScholarCross Ref
- A.J. McAuley, and P. Francis, "Fast Routing Table Lookup using CAMs", in Proc. of INFOCOM'93, Vol. 3, pp. 1382--1391.Google Scholar
- L. K. McDowell, S. J. Eggers, and S. D. Gribble, "Improving Server Software Support for Simultaneous Multithreaded Processors", in Proc. of ASPLOS'00, Cambridge, MA, USA, 2000, pp. 245--256. Google ScholarDigital Library
- D. R. Morrison, "PATRICIA - Practical Algorithm to Retrieve Information Coded in Alphanumeric", J. ACM, Vol. 15, No. 4, 1968, pp. 514--534. Google ScholarDigital Library
- M. K. Prabhu and K. Olukotun, "Exposing Speculative Thread Parallelism in SPEC2000", in Proc. of ACM PPoPP'05, Chicago, 2005, pp. 142--152. Google ScholarDigital Library
- M. A. Ruiz-Sanchez, E.W. Biersack, and W. Dabbous, "Survey and Taxonomy of IP Address Lookup Algorithms", IEEE Network, Vol. 15, 2001, pp. 8--23. Google ScholarDigital Library
- R. Sangireddy, and A.K. Somani, "High-speed IP Routing with Binary Decision Diagrams based Hardware Address Lookup Engine", IEEE Journal on Selected Areas in Communications, Vol. 21, Issue 4, May 2003, pp. 513--521. Google ScholarDigital Library
- A. Sodan, G. R. Gao, O. Maquelin, J. Schultz, and X. M. Tian, "Experiences with Non-numeric Applications on Multithreaded Architectures", in Proc. of ACM PPoPP'99, Las Vegas, 1999, pp. 124--135. Google ScholarDigital Library
- V. Srinivasan, and G. Varghese, "Fast Address Lookups using Controlled Prefix Expansion", in Proc. of ACM Sigmetrics'98, June 1998, pp. 1--11. Google ScholarDigital Library
- S. Suri, G. Varghese, and P.R. Warkhede, "Multiway Range Trees: Scalable IP Lookup with Fast updates", in Proc. of IEEE GLOBECOM'01, Vol. 3, Nov. 2001, pp. 1610--1614.Google Scholar
- Xinan Tang, Guang R. Gao, "Automatically Partitioning Threads for Multithreaded Architectures", in Journal of Parallel Distributed Computing, 58(2): 159--189, 1999. Google ScholarDigital Library
- E. Taylor, J. W. Lockwood, T. S. Sproull, J. S. Turner, and D. B. Parlour, "Scalable IP Lookup for Programmable Routers", in Proc. of INFOCOM'02, Vol. 2, pp. 562--571.Google Scholar
- M. Waldvogel, G. Varghese, J. Turner, and B. Plattner, "Scalable High Speed IP Routing Lookups", in Proc. of ACM SIGCOMM'97, Vol. 27, 1997, pp. 25--36. Google ScholarDigital Library
- M. Wang, S. Deering, T. Hain, and L. Dunn, "Non-random Generator for IPv6 Tables", in Proc. of IEEE Symposium on High Performance Interconnects, 2004, pp. 35--40. Google ScholarDigital Library
Index Terms
- High-performance IPv6 forwarding algorithm for multi-core and multithreaded network processor
Recommendations
An evaluation of speculative instruction execution on simultaneous multithreaded processors
Modern superscalar processors rely heavily on speculative execution for performance. For example, our measurements show that on a 6-issue superscalar, 93% of committed instructions for SPECINT95 are speculative. Without speculation, processor resources ...
High-performance packet classification algorithm for many-core and multithreaded network processor
CASES '06: Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systemsPacket classification is crucial for the Internet to provide more value-added services and guaranteed quality of service. Besides hardware-based solutions, many software-based classification algorithms have been proposed. However, classifying at 10Gbps ...
High-performance packet classification algorithm for multithreaded IXP network processor
Packet classification is crucial for the Internet to provide more value-added services and guaranteed quality of service. Besides hardware-based solutions, many software-based classification algorithms have been proposed. However, classifying at 10 Gbps ...
Comments