ABSTRACT
Overhead of memory accesses limits the performance of packet processing applications. To overcome this bottleneck, today's network processors can utilize a wide-range of mechanisms-such as multi-level memory hierarchy, wide-word accesses, special-purpose result-caches, asynchronous memory, and hardware multi-threading. However, supporting all of these mechanisms complicates programmability and hardware design, and wastes systemresources. In this paper, we address the following fundamental question: what minimal set of hardware mechanisms must a network processor support to achieve the twin goals of simplified programmability and high packet throughput? We show that no single mechanism sufficies; the minimal set must include data-caches and multi-threading. Data-caches and multi-threading are complementary; whereas data-caches exploit locality to reduce the number of context-switches and the off-chip memory bandwidth requirement, multi-threading exploits parallelism to hide long cache-miss latencies.
- http://www.caida.org/analysis/workload.Google Scholar
- Benchmarks, Network Processing Forum. http://www.npforum.org/benchmarking/index.shtml.Google Scholar
- Cacti3.2 research.compaq.com/wrl/people/jouppi/cacti.html.Google Scholar
- The reincarnation of network processor market. In-Stat MDR, Dec.2003.Google Scholar
- Snort: The Open Source Network Intrusion Detection System. http://www.snort.org/.Google Scholar
- The Simplescalar Tool Set Version 3.0. http://www.simplescalar.com/.Google Scholar
- The Tolly Group: Information Technology Testing, Research and Certification. http://www.tolly.com.Google Scholar
- University of Oregon Route Views Project. http://www.routeviews.org/.Google Scholar
- J.-L. Baer, D. Low, P. Crowley, and N. Sidhwaney. Memory hierarchy design for a multiprocessor look-up engine. In Proc. of the 12th International Conference on Parallel Architectures and Compilation Techniques, 2003. Google ScholarDigital Library
- T.-C. Chiueh and P. Pradhan. High-Performance IP Routing Table Lookup using CPU Caching. In Proc. of IEEE INFOCOMM'99, 1999.Google ScholarCross Ref
- T.-C. Chiueh and P. Pradhan. Cache Memory Design for Network Processors. In Proc. of the 6th HPCA, Jan 2000.Google Scholar
- D. Comer. Network Systems Design Using Network Processors. Prentice Hall, ISBN 0-13-141792-4, 2003. Google ScholarDigital Library
- P. Crowley, M. E. Fiuczynski, and J.-L. Baer. Characterizing Processor Architectures for Programmable Network Interfaces. In Proc. of the 2000 nternational Conference on Supercomputing, May 2000. Google ScholarDigital Library
- W. Eatherton, G. Varghese, and Z. Dittia. Tree bitmap: hardware/software IP lookups with incremental updates. SIGCOMM Comput. Commun. Rev., 34(2), 2004. Google ScholarDigital Library
- D. C. Feldemeir. Improving Gateway Performance with a Routing-table Cache. In Proceedings of IEEE INFOCOMM'88, March 1988.Google ScholarCross Ref
- K. Gopalan and T.-C. Chiueh. Improving Route Lookup Performance Using Network Processor Cache. In IEEE/ACM SC Conf., 2002. Google ScholarDigital Library
- P. Gupta and N. McKeown. Algorithms for Packet Classification. In IEEE Network, March/April 2001. Google ScholarDigital Library
- J. Hasan, S. Chandra, and T. N. Vijaykumar. Efficient Use of Memory Bandwidth to Improve Network Processor Throughput. In Proc. of the 30th ISCA, June 2003. Google ScholarDigital Library
- Intel IXP2800 Hw. Ref. Manual, Nov 2002.Google Scholar
- S. Iyer, R. R. Kompella, and N. McKeown. Techniques for Fast Packet Buffers. In Gigabit Networking Workshop, April 2001.Google Scholar
- R. Jain. Characteristics of Destination Address Locality in Computer Networks: A Comparison of Caching Schemes. Computer Networks and ISDN Systems, 18(4), May 1990. Google ScholarDigital Library
- E. J. Johnson and A. Kunze. IXP 2xxx Programming. Intel Press, 2003.Google Scholar
- G. Memik, W. H. Mangione-Smith, and W. Hu. NetBench: A Benchmarking Suite for Network Processors. In ICCAD, 2001. Google ScholarDigital Library
- J. Mudigonda, H. M. Vin, and R. Yavatkar. Managing Memory Access Latency in Packet Processing. In Proc. of ACM SIGMETRICS, Banff, Canada, June 2005. Google ScholarDigital Library
- E. Nahum, D. Yates, J. Kurose, and D. Towsley. Cache behavior of network protocols. SIGMETRICS, 25(1):169--180, 1997. Google ScholarDigital Library
- NLANR Network Traffic Packet Header Traces. http://pma.nlanr.net/Traces/.Google Scholar
- C. Partridge. A Fifty Gigabit Per Second IP Router. IEEE/ACM Transactions on Networking, 6(3), June 1998. Google ScholarDigital Library
- C. Partridge. Locality and Route Caches. In NSF Workshop, Internet Statistics Measurement and Analysis, February 1999.Google Scholar
- R. Ramaswamy, N. Weng, and T. Wolf. Analysis of network processing workloads. In Proc. of IEEE Intnl. Symp. on Perf. Analysis of Systems and Software, Austin, TX, Mar. 2005. Google ScholarDigital Library
- M. Ruiz-Sanchez, E. Biersack, and W. Dabbous. Survey and Taxonomy of IP Address Lookup Algorithms. IEEE Network, March 2001. Google ScholarDigital Library
- T. Sherwood, G. Varghese, and B. Calder. A Pipelined Memory Architecture for High Throughput Network Processors. In ISCA, 2003. Google ScholarDigital Library
- M. Shreedhar and G. Varghese. Efficient Fair Queuing Using Deficit Round Robin. In Proceedings of ACM SIGCOMM, August 1995. Google ScholarDigital Library
- K. Sklower. A Tree-Based Packet Routing Table for Berkely Unix. In Proceedings of the Winter 1991 USENIX Conference, January 1991.Google Scholar
- V. Srinivasan and G. Varghese. Fast Address Lookups using Controlled Prefix Expansion. ACM Trans. on Computer Systems, 17(1), 1999. Google ScholarDigital Library
- M. Waldvogel, G. Varghese, J. Turner, and B. Plattner. Scalable high speed ip routing lookups. In Proc. of the ACM SIGCOMM, 1997. Google ScholarDigital Library
- T. Wolf and M. Franklin. CommBench - A Telecommunications Benchmark for Network Processors. In IEEE International Symposium on Performance Analysis of Systems and Software, April 2000. Google ScholarDigital Library
- T. Wolf and M. A. Franklin. Locality-aware Predictive Scheduling of Network Processors. In Proc. of IEEE International Symposium on Performance Analysis of Systems and Software, Nov 2001.Google ScholarCross Ref
- S. Wu and U. Manber. A Fast Algorithm for Multi-pattern Searching. Technical Report TR-94-17, 1994.Google Scholar
- W. A. Wulf and S. A. McKee. Hitting the memory wall: implications of the obvious. SIGARCH Comput. Archit. News, 23(1):20--24, 1995. Google ScholarDigital Library
- H. Xie, L. Zhao, and L. Bhuyan. Architectural analysis and instruction-set optimization for design of network protocol processors. In Proc. of the 1st IEEE/ACM Intl. Conf. HW/SW codesign & synthesis, 2003. Google ScholarDigital Library
- J. Xu, M. Singhal, and J. Degroat. A Novel Cache Architecture to Support Layer-Four Packet Classification at Memory Access Speeds. In IEEE INFOCOM, 2000.Google Scholar
- L. Zhao, R. Illikkal, S. Makineni, and L. Bhuyan. Tcp/ip cache characterization in commercial server workloads. In CAECW 2004, Along with HPCA-10, 2004.Google Scholar
Index Terms
- Overcoming the memory wall in packet processing: hammers or ladders?
Recommendations
Managing memory access latency in packet processing
SIGMETRICS '05: Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systemsIn this study, we refute the popular belief [1,2] that packet processing does not benefit from data-caching. We show that a small data-cache of 8KB can bring down the packet processing time by much as 50-90%, while reducing the off-chip memory bandwidth ...
Managing memory access latency in packet processing
Performance evaluation reviewIn this study, we refute the popular belief [1,2] that packet processing does not benefit from data-caching. We show that a small data-cache of 8KB can bring down the packet processing time by much as 50-90%, while reducing the off-chip memory bandwidth ...
Reducing energy and delay using efficient victim caches
ISLPED '03: Proceedings of the 2003 international symposium on Low power electronics and designIn this paper, we investigate methods for improving the hit rates in the first level of memory hierarchy. Particularly, we propose victim cache structures to reduce the number of accesses to more power consuming structures such as level 2 caches. We ...
Comments