skip to main content
10.1145/1217935.1217961acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
Article

Evaluating network processing efficiency with processor partitioning and asynchronous I/O

Published: 18 April 2006 Publication History

Abstract

Applications requiring high-speed TCP/IP processing can easily saturate a modern server. We and others have previously suggested alleviating this problem in multiprocessor environments by dedicating a subset of the processors to perform network packet processing. The remaining processors perform only application computation, thus eliminating contention between these functions for processor resources. Applications interact with packet processing engines (PPEs) using an asynchronous I/O (AIO) programming interface which bypasses the operating system. A key attraction of this overall approach is that it exploits the architectural trend toward greater thread-level parallelism in future systems based on multi-core processors. In this paper, we conduct a detailed experimental performance analysis comparing this approach to a best-practice configured Linux baseline system.We have built a prototype system implementing this architecture, ETA+AIO (Embedded Transport Acceleration with Asynchronous I/O), and ported a high-performance web-server to the AIO interface. Although the prototype uses modern single-core CPUs instead of future multi-core CPUs, an analysis of its performance can reveal important properties of this approach. Our experiments show that the ETA+AIO prototype has a modest advantage over the baseline Linux system in packet processing efficiency, consuming fewer CPU cycles to sustain the same throughput. This efficiency advantage enables the ETA+AIO prototype to achieve higher peak throughput than the baseline system, but only for workloads where the mix of packet processing and application processing approximately matches the allocation of CPUs in the ETA+AIO system thereby enabling high utilization of all the CPUs. Detailed analysis shows that the efficiency advantage of the ETA+AIO prototype, which uses one PPE CPU, comes from avoiding multiprocessing overheads in packet processing, lower overhead of our AIO interface compared to standard sockets, and reduced cache misses due to processor partitioning.

References

[1]
Apache. URL www.apache.org.
[2]
OProfile. URL oprofile.sourceforge.net/news/.
[3]
RDMA Consortium. URL www.rdmaconsortium.org.
[4]
Sockets API Extensions. URL www.opengroup.org.
[5]
Zeus Technology. URL www.zeus.co.uk.
[6]
Design notes on asynchronous I/O (aio) for Linux, 2002. URL lse.sourceforge.net/io/aionotes.txt.
[7]
The Open Group Base Specifications Issue 6 IEEE Std 1003.1, 2003 Edition.
[8]
V. Anand and B. Hartner. TCP/IP network stack performance in Linux kernel 2.4 and 2.5. In Proceedings of the Linux Symposium, pages 8--30. Ottawa, Ontario, Canada, July 2003
[9]
B. S. Ang. An evaluation of an attempt at offloading TCP/IP processing onto an i960rn-based NIC. Technical Report HPL-2001-8, HP Labs, Palo Alto, CA, Jan 2001.
[10]
G. Banga, J. Mogul, and P. Druschel. A scalable and explicit event delivery mechanism for UNIX. In Proceedings of the 1999 USENIX Annual Technical Conference. Monterey, CA, June 1999.
[11]
A. V. Bhatt. Creating a PCI Express interconnect. URL www.pcisig.com/specifications/pciexpress/technical_library/pciexpress_whitepaper.pdf.
[12]
N. L. Binkert, L. R. Hsu, A. G. Saidi, R. G. Dreslinski, A. L. Schultz, and S. K. Reinhardt. Performance analysis of system overheads in TCP/IP workloads. In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques. St. Louis, September 2005.
[13]
T. Brecht and M. Ostrowski. Exploring the performance of select-based internet servers. Technical Report HPL-2001-314, HP Labs, November 2001.
[14]
T. Brecht, D. Pariag, and L. Gammo. accept()able strategies for improving web server performance. In Proceedings of the 2004 USENIX Annual Technical Conference. Boston, June 2004.
[15]
D. Clark, V. Jacobson, J. Romkey, and H. Salwen. An analysis of TCP processing overhead. IEEE Communications Magazine, 27(6):23--29, June 1989.
[16]
Z. Ditta, G. Parulkar, and J. Cox Jr. The APIC approach to high performance network interface design: Protected and other techniques. In Proceedings of IEEE INFOCOM '97, volume 2, pages 7--11, April 1997.
[17]
D. Dunning, G. Regnier, G. McAlpine, D. Cameron, B. Shubert, F. Berry, A. M. Merritt, E. Gronke, and C. Dodd. The Virtual Interface Architecture. IEEE Micro, 18(2):66--76, March-April 1998.
[18]
K. Elmeleegy, A. Chanda, A. L. Cox, and W. Zwaenepoel. Lazy asynchronous I/O for event-driven servers. In Proceedings of the 2004 USENIX Annual Technical Conference. Boston, June 2004.
[19]
A. Foong, J. Fung, and D. Newell. An in-depth analysis of the impact of processor affinity on network performance. In IEEE International Conference on Networks, November 2004.
[20]
A. Foong, T. Huff, H. Hum, J. Patwardhan, and G. Regnier. TCP performance re-visited. In IEEE International Symposium on Performance of Systems and Software, March 2003.
[21]
D. Freimuth, E. Hu, J. LaVoie, R. Mraz, E. Nahum, P. Pradhan, and J. Tracey. Server network scalability and TCP offload. In Proceedings of the 2005 USENIX Annual Technical Conference, pages 209--222. Anaheim, April 2005.
[22]
A. Gallatin, J. Chase, and K. Yocum. Trapeze/IP: TCP/IP at near-gigabit speeds. In Proceedings of 1999 USENIX Technical Conference (Freenix Track), pages 109--120, June 1999.
[23]
J. M. Hart. Win32 System Programming. Addison Wesley, 2nd edition, 2001.
[24]
HP Labs. The userver home page, 2005. URL www.hpl.hp.com/research/linux/userver.
[25]
R. Huggahalli, R. Iyer, and S. Tetrick. Direct cache access for high bandwidth network I/O. In Proceedings of the 32nd International Conference on Computer Architecture (ISCA'05). Madison, WI, June 2005.
[26]
InfiniBandSM Trade Association. InfiniBand#8482; Architecture Specification Volume 1, Release 1.0. October 2000. URL www.infinibandta.org.
[27]
Intel® Corporation. PCI/PCI-X Family of Gigabit Ethernet Controllers Software Developer's Manual, Revision 2.5. July 2005.
[28]
V. Jacobson and B. Felderman. A modest proposal to help speed up and scale up the linux networking stack. In linux.conf.au, January 2006.
[29]
J. Kay and J. Pasquale. The importance of non-data touching processing overheads in TCP/IP. In SIGCOMM, pages 259--268, 1993.
[30]
J. Kay and J. Pasquale. Profiling and reducing processing overheads in TCP/IP. IEEE/ACM Transations on Networking, 4(6):817--828, 1996.
[31]
Y. Khalidi and M. Thadani. An efficient zero-copy I/O framework for UNIX. Technical report, SMLI TR95--39, Sun Microsystems Lab, May 1995.
[32]
D. Libenzi. Improving (network) I/O performance. URL http://www.xmailserver.org/linux-patches/nio-improve.html.
[33]
J. C. Mogul. TCP offload is a dumb idea whose time has come. In 9th Workshop on Hot Topics in Operating Systems (HotOS IX). USENIX, May 2003.
[34]
J. C. Mogul and K. K. Ramakrishnan. Eliminating receive livelock in an interrupt-driven kernel. ACM Transactions on Computer Systems, 15(3):217--252, 1997.
[35]
D. Mosberger and T. Jin. httperf: A tool for measuring web server performance. In First Workshop on Internet Server Performance, pages 59--67. Madison, WI, June 1998.
[36]
S. Muir and J. Smith. AsyMOS - an asymmetric multiprocessor operating system. In IEEE Conf on Open Architectures and Network Programming (OPENARCH), April 1998.
[37]
S. Muir and J. Smith. Functional divisions in the Piglet multiprocessor operating system. In ACM SIGOPS European Workshop, September 1998.
[38]
S. Nagar, P. Larson, H. Linder, and D. Stevens, epoll scalability web page. URL http://Ise.sourceforge.net/epoll/index.html.
[39]
V. S. Pai, P. Druschel, and W. Zwaenepoel. Flash: An efficient and portable Web server. In Proceedings of the USENIX 1999 Annual Technical Conference, 1999.
[40]
M. Rangarajan, K. Banerjee, J. Yeo, and L. Iftode. MemNet: Efficient offloading of TCP/IP processing using memory-mapped communication. Technical Report DCS-TR-485, Rutgers University Technical Report, 2002.
[41]
M. Rangarajan, A. Bohra, K. Banerjee, E. Carrera, R. Bianchini, L. Iftode, and W. Zwaenepoel. TCP Servers: Offloading TCP processing in Internet servers. Technical Report DCS-TR-481, Rutgers University, Mar 2002.
[42]
G. Regnier, D. Minturn, G. McAlpine, V. Saletore, and A. Foong. ETA: Experience with an Intel® Xeon#8482; processor as a packet processing engine. In Hot Interconnects, August 2003.
[43]
G. J. Regnier, S. Makineni, R. Illikkal, R. R. Iyer, D. B. Minturn, R. Huggahalli, D. Newell, L. S. Cline, and A. Foong. TCP onloading for data center servers. IEEE Computer, 37(11):48--58, 2004.
[44]
V. A. Saletore, P. M. Stillwell, J. A. Wiegert, P. Cayton, J. Gray, and G. J. Regnier. Efficient direct user level sockets for an Intel® Xeon#8482; processor based TCP on-load engine. In The Workshop on Communication Architecture for Clusters. Denver, CO, April 2005.
[45]
J. H. Salim, R. Olsson, and A. Kuznetsov. Beyond Softnet. In 5th Annual Linux Showcase and Conference, pages 165--172, November 2001.
[46]
P. Sarkar, S. Uttamchandani, and K. Voruganti. Storage over IP: when does hardware support help? In 2nd USENIX Conference on File and Storage Technologies (FAST), Mar 2003.
[47]
P. Shivam and J. S. Chase. On the elusive benefits of protocol offload. In ACM SigComm Workshop on Network-IO Convergence (NICELI). Germany, August 2003.
[48]
Standard Performance Evaluation Corporation. SPECweb99 Benchmark, 1999. URL www.spec.org/osg/web99.
[49]
W. Stevens. Unix Network Programming, Volume 1. Addison Wesley, third edition, 2003.
[50]
Y. Turner, T. Brecht, G. Regnier, V. Saletore, G. J. Janakiraman, and B. Lynn. Scalable networking for next-generation computing platforms. In Third Annual Workshop on System Area Networks (SAN-3). Madrid, Spain, February 2004.
[51]
M. Welsh, D. Culler, and E. Brewer. SEDA: an architecture for well-conditioned, scalable Internet services. In 18th Symp. on Operating System Principles (SOSP-18), Oct 2001.
[52]
N. Zeldovich, A. Yip, F. Dabek, R. T. Morris, D. Mazieres, and F. Kaashoek. Multiprocessor support for event-driven programs. In Proceedings of the USENIX 2003 Annual Technical Conference, June 2003.

Cited By

View all
  • (2024)KPAC: Efficient Emulation of the ARM Pointer Authentication InstructionsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.344377343:11(3467-3478)Online publication date: 1-Nov-2024
  • (2018)Decoupling the control plane from program control flow for flexibility and performance in cloud computingProceedings of the Thirteenth EuroSys Conference10.1145/3190508.3190516(1-13)Online publication date: 23-Apr-2018
  • (2018)A Survey of End-System Optimizations for High-Speed NetworksACM Computing Surveys10.1145/318489951:3(1-36)Online publication date: 16-Jul-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '06: Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
April 2006
420 pages
ISBN:1595933220
DOI:10.1145/1217935
  • cover image ACM SIGOPS Operating Systems Review
    ACM SIGOPS Operating Systems Review  Volume 40, Issue 4
    Proceedings of the 2006 EuroSys conference
    October 2006
    383 pages
    ISSN:0163-5980
    DOI:10.1145/1218063
    Issue’s Table of Contents

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 April 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. TCP/IP
  2. asynchronous I/O
  3. network processing

Qualifiers

  • Article

Conference

EUROSYS06
Sponsor:
EUROSYS06: Eurosys 2006 Conference
April 18 - 21, 2006
Leuven, Belgium

Acceptance Rates

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)KPAC: Efficient Emulation of the ARM Pointer Authentication InstructionsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.344377343:11(3467-3478)Online publication date: 1-Nov-2024
  • (2018)Decoupling the control plane from program control flow for flexibility and performance in cloud computingProceedings of the Thirteenth EuroSys Conference10.1145/3190508.3190516(1-13)Online publication date: 23-Apr-2018
  • (2018)A Survey of End-System Optimizations for High-Speed NetworksACM Computing Surveys10.1145/318489951:3(1-36)Online publication date: 16-Jul-2018
  • (2016)Virtualized I/OAttaining High Performance Communications10.1201/b10249-17(261-282)Online publication date: 19-Apr-2016
  • (2012)Comparing high-performance multi-core web-server architecturesProceedings of the 5th Annual International Systems and Storage Conference10.1145/2367589.2367591(1-12)Online publication date: 4-Jun-2012
  • (2012)A Transport-Friendly NIC for Multicore/Multiprocessor SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2011.19523:4(607-615)Online publication date: 1-Apr-2012
  • (2012)A Source-aware Interrupt Scheduling for Modern Parallel I/O SystemsProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium10.1109/IPDPS.2012.24(156-166)Online publication date: 21-May-2012
  • (2009)Virtualization polling engine (VPE)Proceedings of the 23rd international conference on Supercomputing10.1145/1542275.1542309(225-234)Online publication date: 8-Jun-2009
  • (2007)High performance and scalable I/O virtualization via self-virtualized devicesProceedings of the 16th international symposium on High performance distributed computing10.1145/1272366.1272390(179-188)Online publication date: 25-Jun-2007
  • (2007)Improving Network Processing Concurrency using TCPServersSixth IEEE International Symposium on Network Computing and Applications (NCA 2007)10.1109/NCA.2007.31(213-222)Online publication date: Jul-2007
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media