ProOnE: a general-purpose protocol onload engine for multi- and many-core architectures

Lai, P.; Balaji, P.; Thakur, R.; Panda, D. K.

doi:10.1007/s00450-009-0090-8

ProOnE: a general-purpose protocol onload engine for multi- and many-core architectures

Special Issue Paper
Published: 06 May 2009

Volume 23, pages 133–142, (2009)
Cite this article

Computer Science - Research and Development

P. Lai¹,
P. Balaji²,
R. Thakur² &
…
D. K. Panda¹

51 Accesses
7 Citations
Explore all metrics

Abstract

Modern high-end computing systems utilize specialized offload engines to enhance various aspects of their processing. For example, high-speed networks such as InfiniBand, Quadrics and Myrinet utilize specialized hardware to offload network processing to help improve performance. However, such hardware units are expensive, and their manufacturing complexity increases exponentially depending on the number and complexity of tasks they offload. On the other hand, the proliferation of multi- and many-core processors into the general desktop and laptop markets is increasingly driving their cost down due to the economies of scale. To take advantage of the obvious benefits of multi/many-core architectures, we propose, design and evaluate ProOnE, a general purpose Protocol Onload Engine. ProOnE utilizes a small subset of the available cores on a multi-core CPU to ‘‘onload’’ various tasks in a dedicated manner instead of ‘‘offloading’’ them to specialized hardware. The general purpose processing capabilities of multi-core architectures allow ProOnE to be designed in a flexible, extensible and scalable manner, while benefiting from the reducing costs of general-purpose CPUs. In this paper, we onload onto ProOnE, several tasks relevant to communication sub-systems such as MPI that are too complex for current hardware offload engines to support, and demonstrate significant benefits in terms of overlap of computation and communication and improved application performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An In-Depth Performance Analysis of Many-Integrated Core for Communication Efficient Heterogeneous Computing

Towards high-performance packet processing on commodity multi-cores: current issues and future directions

Article 18 November 2015

Use It or Lose It: Cheap Compute Everywhere

References

http://en.wikipedia.org/wiki/Virtual_Interface_Architecture
http://www.myri.com/myrinet/overview/
www.quadrics.com/
http://www.intel.com/products/processor/core2XE/
http://www.sun.com/processors/niagara/
Chelsio TOE. http://www.chelsio.com/
Giganet clan. http://www.emulex.com/
InfiniBand Trade Association. http://www.infinibandta.com
Jacobi Method. http://en.wikipedia.org/wiki/Jacobi_method
MPICH2. http://www.mcs.anl.gov/research/projects/mpich2/
OpenMP. http://openmp.org/wp/
Top 500 SuperComputer Sites. http://www.top500.org/
Amerson G, Apon A (2004) Implementation and design analysis of a network messaging module using virtual interface architecture. In: International Conference on Cluster Computing
Regnier G, Minturn D, McAlpine G, Saletore V, Foong A (2003) ETA: experience with an Intel Xeon processor as a packet processing engin. In: Proceedings of the 11th Symposium on High Performance Interconnects (HOTI’03)
Brightwell R, Underwood KD (2004) An analysis of the impact of MPI overlap and independent progress. In: Proceedings of the 18th annual international conference on Supercomputing, March 2004
Google Scholar
Chai L, Hartono A, Panda DK (2006) Designing high performance and scalable MPI intra-node communication support for clusters. In: The IEEE International Conference on Cluster Computing
MPI Forum (1993) MPI: A Message Passing Interface
Gropp W, Lusk E, Doss N, Skjellum A. A high-performance, portable implementation of the MPI. Technical report, Argonne National Laboratory and Mississippi State University
Jin H-W, Sur S, Chai L, Panda DK (2007) Lightweight Kernel-Level Primitives for High-performance MPI Intra-Node Communication over Multi-Core Systems. In: IEEE International Conference on Cluster Computing (poster presentation)
Kumar R, Mamidala AR, Koop MJ, Santhanaraman G, Panda DK (2008) Lock-free asynchronous rendezvous design for MPI Point-to-point communication. In: EuroPVM ’08
Majumder S, Rixner S, Pai VS (2004) An event-driven architecture for mpi libraries. In: Computer Science Institute Symposium
Ortiz A, Ortega J, Daz AF, Prieto A (2008) Comparison of onloading and offloading strategies to improve network interfaces. In PDP. IEEE Computer Society, 2008.
Regnier G, Makineni S, Illikkal R, Minturn D, Huggahalli R, Newell D, Cline L, Foong A. TCP onloading for data center servers. IEEE Comput 37(11):48–58
Sancho JC, Barker KJ, Kerbyson DJ, Davis K (2006) Quantifying the potential benefit of overlapping communication and computation in large-scale scientific applications. In: ACM/IEEESC 2006 Conference (SC’06)
Sancho JC, Kerbyson DJ, Barker KJ (2007) Efficient offloading of collective communications in large-scale systems. In: IEEE International Conference on Cluster Computing
Sandia National Laboratories. Sandia MPI Micro-Benchmark Suite. http://www.cs.sandia.gov/smb/
Shivam P, Chase JS (2003) On the elusive benefits of protocol offload. In: SIGCOMM’03 Workshop on NICELI
Sur S, Jin H-W, Chai L, Panda DK (2006) RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits. In: Symposium on PPOPP, March 2006
Trahay F, Brunet E, Denis A, Namyst R (2008) A multithreaded communication engine for multicore architectures. In: International Parallel and Distributed Processing (IPDPS)
Vaidyanathan K, Lai P, Narravula S, Panda DK (2008) Optimized distributed data sharing substrate in multi-core commodity clusters: A comprehensive study with applications. In: International Symposium on Cluster Computing and the Grid (CCGrid), May 2008

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Ohio State University, Columbus, OH, USA
P. Lai & D. K. Panda
Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, USA
P. Balaji & R. Thakur

Authors

P. Lai
View author publications
You can also search for this author in PubMed Google Scholar
P. Balaji
View author publications
You can also search for this author in PubMed Google Scholar
R. Thakur
View author publications
You can also search for this author in PubMed Google Scholar
D. K. Panda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. Lai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lai, P., Balaji, P., Thakur, R. et al. ProOnE: a general-purpose protocol onload engine for multi- and many-core architectures . Comp. Sci. Res. Dev. 23, 133–142 (2009). https://doi.org/10.1007/s00450-009-0090-8

Download citation

Published: 06 May 2009
Issue Date: June 2009
DOI: https://doi.org/10.1007/s00450-009-0090-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ProOnE: a general-purpose protocol onload engine for multi- and many-core architectures

Abstract

Access this article

Similar content being viewed by others

An In-Depth Performance Analysis of Many-Integrated Core for Communication Efficient Heterogeneous Computing

Towards high-performance packet processing on commodity multi-cores: current issues and future directions

Use It or Lose It: Cheap Compute Everywhere

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

ProOnE: a general-purpose protocol onload engine for multi- and many-core architectures

Abstract

Access this article

Similar content being viewed by others

An In-Depth Performance Analysis of Many-Integrated Core for Communication Efficient Heterogeneous Computing

Towards high-performance packet processing on commodity multi-cores: current issues and future directions

Use It or Lose It: Cheap Compute Everywhere

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation