Abstract
Modern high-end computing systems utilize specialized offload engines to enhance various aspects of their processing. For example, high-speed networks such as InfiniBand, Quadrics and Myrinet utilize specialized hardware to offload network processing to help improve performance. However, such hardware units are expensive, and their manufacturing complexity increases exponentially depending on the number and complexity of tasks they offload. On the other hand, the proliferation of multi- and many-core processors into the general desktop and laptop markets is increasingly driving their cost down due to the economies of scale. To take advantage of the obvious benefits of multi/many-core architectures, we propose, design and evaluate ProOnE, a general purpose Protocol Onload Engine. ProOnE utilizes a small subset of the available cores on a multi-core CPU to ‘‘onload’’ various tasks in a dedicated manner instead of ‘‘offloading’’ them to specialized hardware. The general purpose processing capabilities of multi-core architectures allow ProOnE to be designed in a flexible, extensible and scalable manner, while benefiting from the reducing costs of general-purpose CPUs. In this paper, we onload onto ProOnE, several tasks relevant to communication sub-systems such as MPI that are too complex for current hardware offload engines to support, and demonstrate significant benefits in terms of overlap of computation and communication and improved application performance.
Similar content being viewed by others
References
http://en.wikipedia.org/wiki/Virtual_Interface_Architecture
http://www.myri.com/myrinet/overview/
www.quadrics.com/
http://www.intel.com/products/processor/core2XE/
http://www.sun.com/processors/niagara/
Chelsio TOE. http://www.chelsio.com/
Giganet clan. http://www.emulex.com/
InfiniBand Trade Association. http://www.infinibandta.com
Jacobi Method. http://en.wikipedia.org/wiki/Jacobi_method
MPICH2. http://www.mcs.anl.gov/research/projects/mpich2/
OpenMP. http://openmp.org/wp/
Top 500 SuperComputer Sites. http://www.top500.org/
Amerson G, Apon A (2004) Implementation and design analysis of a network messaging module using virtual interface architecture. In: International Conference on Cluster Computing
Regnier G, Minturn D, McAlpine G, Saletore V, Foong A (2003) ETA: experience with an Intel Xeon processor as a packet processing engin. In: Proceedings of the 11th Symposium on High Performance Interconnects (HOTI’03)
Brightwell R, Underwood KD (2004) An analysis of the impact of MPI overlap and independent progress. In: Proceedings of the 18th annual international conference on Supercomputing, March 2004
Chai L, Hartono A, Panda DK (2006) Designing high performance and scalable MPI intra-node communication support for clusters. In: The IEEE International Conference on Cluster Computing
MPI Forum (1993) MPI: A Message Passing Interface
Gropp W, Lusk E, Doss N, Skjellum A. A high-performance, portable implementation of the MPI. Technical report, Argonne National Laboratory and Mississippi State University
Jin H-W, Sur S, Chai L, Panda DK (2007) Lightweight Kernel-Level Primitives for High-performance MPI Intra-Node Communication over Multi-Core Systems. In: IEEE International Conference on Cluster Computing (poster presentation)
Kumar R, Mamidala AR, Koop MJ, Santhanaraman G, Panda DK (2008) Lock-free asynchronous rendezvous design for MPI Point-to-point communication. In: EuroPVM ’08
Majumder S, Rixner S, Pai VS (2004) An event-driven architecture for mpi libraries. In: Computer Science Institute Symposium
Ortiz A, Ortega J, Daz AF, Prieto A (2008) Comparison of onloading and offloading strategies to improve network interfaces. In PDP. IEEE Computer Society, 2008.
Regnier G, Makineni S, Illikkal R, Minturn D, Huggahalli R, Newell D, Cline L, Foong A. TCP onloading for data center servers. IEEE Comput 37(11):48–58
Sancho JC, Barker KJ, Kerbyson DJ, Davis K (2006) Quantifying the potential benefit of overlapping communication and computation in large-scale scientific applications. In: ACM/IEEESC 2006 Conference (SC’06)
Sancho JC, Kerbyson DJ, Barker KJ (2007) Efficient offloading of collective communications in large-scale systems. In: IEEE International Conference on Cluster Computing
Sandia National Laboratories. Sandia MPI Micro-Benchmark Suite. http://www.cs.sandia.gov/smb/
Shivam P, Chase JS (2003) On the elusive benefits of protocol offload. In: SIGCOMM’03 Workshop on NICELI
Sur S, Jin H-W, Chai L, Panda DK (2006) RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits. In: Symposium on PPOPP, March 2006
Trahay F, Brunet E, Denis A, Namyst R (2008) A multithreaded communication engine for multicore architectures. In: International Parallel and Distributed Processing (IPDPS)
Vaidyanathan K, Lai P, Narravula S, Panda DK (2008) Optimized distributed data sharing substrate in multi-core commodity clusters: A comprehensive study with applications. In: International Symposium on Cluster Computing and the Grid (CCGrid), May 2008
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lai, P., Balaji, P., Thakur, R. et al. ProOnE: a general-purpose protocol onload engine for multi- and many-core architectures . Comp. Sci. Res. Dev. 23, 133–142 (2009). https://doi.org/10.1007/s00450-009-0090-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00450-009-0090-8