Article

Overcoming the memory wall in packet processing: hammers or ladders?

Authors:
Jayaram Mudigonda

University of Texas at Austin

University of Texas at Austin
View Profile

,
Harrick M. Vin

University of Texas at Austin

University of Texas at Austin
View Profile

,
Raj Yavatkar

Intel Corporation

Intel Corporation
View Profile

ANCS '05: Proceedings of the 2005 ACM symposium on Architecture for networking and communications systemsOctober 2005Pages 1–10https://doi.org/10.1145/1095890.1095892

Published:26 October 2005Publication History

ANCS '05: Proceedings of the 2005 ACM symposium on Architecture for networking and communications systems

Pages 1–10

ABSTRACT

Overhead of memory accesses limits the performance of packet processing applications. To overcome this bottleneck, today's network processors can utilize a wide-range of mechanisms-such as multi-level memory hierarchy, wide-word accesses, special-purpose result-caches, asynchronous memory, and hardware multi-threading. However, supporting all of these mechanisms complicates programmability and hardware design, and wastes systemresources. In this paper, we address the following fundamental question: what minimal set of hardware mechanisms must a network processor support to achieve the twin goals of simplified programmability and high packet throughput? We show that no single mechanism sufficies; the minimal set must include data-caches and multi-threading. Data-caches and multi-threading are complementary; whereas data-caches exploit locality to reduce the number of context-switches and the off-chip memory bandwidth requirement, multi-threading exploits parallelism to hide long cache-miss latencies.

References

http://www.caida.org/analysis/workload.Google Scholar
Benchmarks, Network Processing Forum. http://www.npforum.org/benchmarking/index.shtml.Google Scholar
Cacti3.2 research.compaq.com/wrl/people/jouppi/cacti.html.Google Scholar
The reincarnation of network processor market. In-Stat MDR, Dec.2003.Google Scholar
Snort: The Open Source Network Intrusion Detection System. http://www.snort.org/.Google Scholar
The Simplescalar Tool Set Version 3.0. http://www.simplescalar.com/.Google Scholar
The Tolly Group: Information Technology Testing, Research and Certification. http://www.tolly.com.Google Scholar
University of Oregon Route Views Project. http://www.routeviews.org/.Google Scholar
J.-L. Baer, D. Low, P. Crowley, and N. Sidhwaney. Memory hierarchy design for a multiprocessor look-up engine. In Proc. of the 12th International Conference on Parallel Architectures and Compilation Techniques, 2003. Google ScholarDigital Library
T.-C. Chiueh and P. Pradhan. High-Performance IP Routing Table Lookup using CPU Caching. In Proc. of IEEE INFOCOMM'99, 1999.Google ScholarCross Ref
T.-C. Chiueh and P. Pradhan. Cache Memory Design for Network Processors. In Proc. of the 6th HPCA, Jan 2000.Google Scholar
D. Comer. Network Systems Design Using Network Processors. Prentice Hall, ISBN 0-13-141792-4, 2003. Google ScholarDigital Library
P. Crowley, M. E. Fiuczynski, and J.-L. Baer. Characterizing Processor Architectures for Programmable Network Interfaces. In Proc. of the 2000 nternational Conference on Supercomputing, May 2000. Google ScholarDigital Library
W. Eatherton, G. Varghese, and Z. Dittia. Tree bitmap: hardware/software IP lookups with incremental updates. SIGCOMM Comput. Commun. Rev., 34(2), 2004. Google ScholarDigital Library
D. C. Feldemeir. Improving Gateway Performance with a Routing-table Cache. In Proceedings of IEEE INFOCOMM'88, March 1988.Google ScholarCross Ref
K. Gopalan and T.-C. Chiueh. Improving Route Lookup Performance Using Network Processor Cache. In IEEE/ACM SC Conf., 2002. Google ScholarDigital Library
P. Gupta and N. McKeown. Algorithms for Packet Classification. In IEEE Network, March/April 2001. Google ScholarDigital Library
J. Hasan, S. Chandra, and T. N. Vijaykumar. Efficient Use of Memory Bandwidth to Improve Network Processor Throughput. In Proc. of the 30th ISCA, June 2003. Google ScholarDigital Library
Intel IXP2800 Hw. Ref. Manual, Nov 2002.Google Scholar
S. Iyer, R. R. Kompella, and N. McKeown. Techniques for Fast Packet Buffers. In Gigabit Networking Workshop, April 2001.Google Scholar
R. Jain. Characteristics of Destination Address Locality in Computer Networks: A Comparison of Caching Schemes. Computer Networks and ISDN Systems, 18(4), May 1990. Google ScholarDigital Library
E. J. Johnson and A. Kunze. IXP 2xxx Programming. Intel Press, 2003.Google Scholar
G. Memik, W. H. Mangione-Smith, and W. Hu. NetBench: A Benchmarking Suite for Network Processors. In ICCAD, 2001. Google ScholarDigital Library
J. Mudigonda, H. M. Vin, and R. Yavatkar. Managing Memory Access Latency in Packet Processing. In Proc. of ACM SIGMETRICS, Banff, Canada, June 2005. Google ScholarDigital Library
E. Nahum, D. Yates, J. Kurose, and D. Towsley. Cache behavior of network protocols. SIGMETRICS, 25(1):169--180, 1997. Google ScholarDigital Library
NLANR Network Traffic Packet Header Traces. http://pma.nlanr.net/Traces/.Google Scholar
C. Partridge. A Fifty Gigabit Per Second IP Router. IEEE/ACM Transactions on Networking, 6(3), June 1998. Google ScholarDigital Library
C. Partridge. Locality and Route Caches. In NSF Workshop, Internet Statistics Measurement and Analysis, February 1999.Google Scholar
R. Ramaswamy, N. Weng, and T. Wolf. Analysis of network processing workloads. In Proc. of IEEE Intnl. Symp. on Perf. Analysis of Systems and Software, Austin, TX, Mar. 2005. Google ScholarDigital Library
M. Ruiz-Sanchez, E. Biersack, and W. Dabbous. Survey and Taxonomy of IP Address Lookup Algorithms. IEEE Network, March 2001. Google ScholarDigital Library
T. Sherwood, G. Varghese, and B. Calder. A Pipelined Memory Architecture for High Throughput Network Processors. In ISCA, 2003. Google ScholarDigital Library
M. Shreedhar and G. Varghese. Efficient Fair Queuing Using Deficit Round Robin. In Proceedings of ACM SIGCOMM, August 1995. Google ScholarDigital Library
K. Sklower. A Tree-Based Packet Routing Table for Berkely Unix. In Proceedings of the Winter 1991 USENIX Conference, January 1991.Google Scholar
V. Srinivasan and G. Varghese. Fast Address Lookups using Controlled Prefix Expansion. ACM Trans. on Computer Systems, 17(1), 1999. Google ScholarDigital Library
M. Waldvogel, G. Varghese, J. Turner, and B. Plattner. Scalable high speed ip routing lookups. In Proc. of the ACM SIGCOMM, 1997. Google ScholarDigital Library
T. Wolf and M. Franklin. CommBench - A Telecommunications Benchmark for Network Processors. In IEEE International Symposium on Performance Analysis of Systems and Software, April 2000. Google ScholarDigital Library
T. Wolf and M. A. Franklin. Locality-aware Predictive Scheduling of Network Processors. In Proc. of IEEE International Symposium on Performance Analysis of Systems and Software, Nov 2001.Google ScholarCross Ref
S. Wu and U. Manber. A Fast Algorithm for Multi-pattern Searching. Technical Report TR-94-17, 1994.Google Scholar
W. A. Wulf and S. A. McKee. Hitting the memory wall: implications of the obvious. SIGARCH Comput. Archit. News, 23(1):20--24, 1995. Google ScholarDigital Library
H. Xie, L. Zhao, and L. Bhuyan. Architectural analysis and instruction-set optimization for design of network protocol processors. In Proc. of the 1st IEEE/ACM Intl. Conf. HW/SW codesign & synthesis, 2003. Google ScholarDigital Library
J. Xu, M. Singhal, and J. Degroat. A Novel Cache Architecture to Support Layer-Four Packet Classification at Memory Access Speeds. In IEEE INFOCOM, 2000.Google Scholar
L. Zhao, R. Illikkal, S. Makineni, and L. Bhuyan. Tcp/ip cache characterization in commercial server workloads. In CAECW 2004, Along with HPCA-10, 2004.Google Scholar

Index Terms

Overcoming the memory wall in packet processing: hammers or ladders?

Recommendations

Managing memory access latency in packet processing
SIGMETRICS '05: Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems

In this study, we refute the popular belief [1,2] that packet processing does not benefit from data-caching. We show that a small data-cache of 8KB can bring down the packet processing time by much as 50-90%, while reducing the off-chip memory bandwidth ...
Read More
Managing memory access latency in packet processing
Performance evaluation review

In this study, we refute the popular belief [1,2] that packet processing does not benefit from data-caching. We show that a small data-cache of 8KB can bring down the packet processing time by much as 50-90%, while reducing the off-chip memory bandwidth ...
Read More
Reducing energy and delay using efficient victim caches
ISLPED '03: Proceedings of the 2003 international symposium on Low power electronics and design

In this paper, we investigate methods for improving the hit rates in the first level of memory hierarchy. Particularly, we propose victim cache structures to reduce the number of accesses to more power consuming structures such as level 2 caches. We ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ANCS '05: Proceedings of the 2005 ACM symposium on Architecture for networking and communications systems
October 2005
230 pages
ISBN:1595930825
DOI:10.1145/1095890
General Chair:
Alan Berenbaum
SMSC
,
Program Chairs:
Kai Li
Princeton University, Princeton, NJ
,
Jonathan Turner
Washington University in St. Louis
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 October 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data-caches
multithreading
network processors
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate88of314submissions,28%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 36
  Total Citations
  View Citations
- 849
  Total Downloads
- Downloads (Last 12 months)14
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.