Network Offloaded Hierarchical Collectives Using ConnectX-2’s CORE-Direct Capabilities

Rabinovitz, Ishai; Shamis, Pavel; Graham, Richard L.; Bloch, Noam; Shainer, Gilad

doi:10.1007/978-3-642-15646-5_11

Network Offloaded Hierarchical Collectives Using ConnectX-2’s CORE-Direct Capabilities

Ishai Rabinovitz²⁰,
Pavel Shamis²⁰,
Richard L. Graham²¹,
Noam Bloch²⁰ &
…
Gilad Shainer²⁰

Conference paper

1048 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 6305))

Abstract

As the scale of High Performance Computing (HPC) systems continues to increase, demanding that we extract even more parallelism from applications, the need to move communication management away from the Central Processing Unit (CPU) becomes even greater. Moving this management to the network, frees up CPU cycles for computation, making it possible to overlap computation and communication. In this paper we continue to investigate how to best use the new CORE-Direct support added in the ConnectX-2 Host Channel Adapter (HCA) for creating high performance, asynchronous collective operations that are managed by the HCA. Specifically we consider the network topology, creating a two-level communication hierarchy, reducing the MPI_Barrier completion time by 45%, from 26.59 microseconds, when not considering network topology, to 14.72 microseconds, with the CPU based collective barrier operation completing in 19.04 microseconds. The nonblocking barrier algorithm has similar performance, with about 50% of that time available for computation.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

InfiniBand Trade Association, http://www.infinibandta.org/specs
Mellanox Technologies, http://www.mellanox.com/
Mvapich, http://mvapich.cse.ohio-state.edu/
Quadrics, http://www.quadrics.com/
Top 500 Super Computer Sites, http://www.top500.org/
Bhoedjang, R.A.F., Ruhl, T., Bal, H.E.: Efficient Multicast on Myrinet Using Link-Level Flow Control. In: 27th ICPP (1998)
Google Scholar
Buntinas, D., Panda, D.K.: NIC-Based Reduction in Myrinet Clusters: Is It Beneficial. In: SAN-2002 Workshop (in conjunction with HPCA) (February 2003)
Google Scholar
Buntinas, D., Panda, D.K., Sadayappan, P.: Fast NIC-Level Barrier over Myrinet/GM. In: Proceedings of IPDPS (2001)
Google Scholar
Garbriel, E., et al: Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation. In: Proceedings, 11th European PVM/MPI Users’ Group Meeting (2004)
Google Scholar
Graham, R.L., et al.: A Network-Failure-tolerant Message-Passing System for Terascale Clusters. In: Proceedings of ICS (June 2002)
Google Scholar
Kumar, S., et al: The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer. In: ICS 2008: Proceedings of the 22nd annual international conference on Supercomputing, pp. 94–103. ACM, New York (2008)
Google Scholar
Graham, R.L., Poole, S., Shamis, P., Bloch, G., Bloch, N., Chapman, H., Kagan, M., Shahar, A., Rabinovitz, I., Shainer, G.: ConnectX-2 InfiniBand Management Queues: First investigation of the new support for network offloaded collective operations. Accepted for the 10th IEEE/ACM International Symposium CCGrid (2010)
Google Scholar
Graham, R.L., Poole, S., Shamis, P., Bloch, G., Bloch, N., Chapman, H., Kagan, M., Shahar, A., Rabinovitz, I., Shainer, G.: Overlapping Computation and Communication: Barrier Algorithms and ConnectX-2 Core-DIRECT Capabilities. Accepted to CAC (2010)
Google Scholar
Hoefler, T., Lumsdaine, A.: Optimizing non-blocking Collective Operations for InfiniBand. In: Proceedings of the 22nd IPDPS (April 2008)
Google Scholar
Hoefler, T., Lumsdaine, A., Rehm, W.: Implementation and Performance Analysis of Non-Blocking Collective Operations for MPI. In: SC 2007: Proceedings of the SC 2007, pp. 1–10. ACM, New York (2007)
Google Scholar
Dongarra, J., et al.: The International Exascale Software Project: a Call To Cooperative Action By the Global High-Performance Community. Int. J. High Perform. Comput. Appl. 23(4), 309–322 (2009)
Article Google Scholar
Kielmann, T., Hofman, R.F.H., Bal, H.E., Plaat, A., Bhoedjang, R.A.F.: MagPIe: MPI’s collective communication operations for clustered wide area systems. SIGPLAN Not. 34, 131–140 (1999)
Article Google Scholar
Lawry, W., Wilson, C., Maccabe, A.B., Brightwell, R.: Comb: a portable benchmark suite for assessing mpi overlap. In: 2002 IEEE International Conference on Cluster Computing, pp. 472–475 (2002)
Google Scholar
Message Passing Interface Forum. MPI: A Message-Passing Standard (June 2008)
Google Scholar
Moody, A., Fernandez, J., Petrini, F., Panda, D.: Scalable NIC-based Reduction on Large-Scale Clusters. In: SC 2003 (November 2003)
Google Scholar
Mraz, R.: Reducing the Variance of Point to Point Transfers in the IBM 9076 Parallel Computer. In: Proceedings of the 1994 ACM/IEEE conference on Supercomputing, pp. 620–629 (November 1994)
Google Scholar
Sancho, J.C., Kerbyson, D.J., Barker, K.J.: Efficient Offloading of Collective Communications in Large-Scale Systems. In: IEEE International Conference on Cluster Computing, pp. 169–178 (2007)
Google Scholar
Steffenel, L.A., Mounié, G.: A Framework for Adaptive Collective Communications for Heterogeneous Hierarchical Computing Systems. J. Comput. Syst. Sci. 74(6), 1082–1093 (2008)
Article MATH Google Scholar
Tipparaju, V., Nieplocha, J., Panda, D.: Fast Collective Operations Using Shared and Remote Memory Access Protocols on Clusters. In: Proceedings of the IPDPS (2003)
Google Scholar
Yu, W., Buntinas, D., Graham, R.L., Panda, D.K.: Efficient and Scalable Barrier over Quadrics and Myrinet with a New NIC-Based Collective Message Passing Protocol. In: CAC Workshop, in Conjunction IPDPS 2004 (April 2004)
Google Scholar
Yu, W., Buntinas, D., Panda, D.K.: High Performance and Reliable NIC-Based Multicast over Myrinet/GM-2. In: Proceedings of the IPDPS 2003 (October 2003)
Google Scholar
Zhu, H., Goodell, D., Gropp, W., Thakur, R.: Hierarchical Collectives in MPICH2. In: Proceedings of the 16th European PVM/MPI Users’ Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp. 325–326. Springer, Heidelberg (2009)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Mellanox Technologies, Inc.,
Ishai Rabinovitz, Pavel Shamis, Noam Bloch & Gilad Shainer
Oak Ridge National Laboratory (ORNL),
Richard L. Graham

Authors

Ishai Rabinovitz
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Shamis
View author publications
You can also search for this author in PubMed Google Scholar
Richard L. Graham
View author publications
You can also search for this author in PubMed Google Scholar
Noam Bloch
View author publications
You can also search for this author in PubMed Google Scholar
Gilad Shainer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

High Performance Computing Center Stuttgart (HLRS), Universität Stuttgart, Nobelstr. 19, 70569, Stuttgart, Germany
Rainer Keller
Parallel Software Technologies Laboratory, Department of Computer Science, University of Houston,
Edgar Gabriel
High Performance Computing Center Stuttgart, University of Stuttgart, Nobelstr. 19, 70569, Stuttgart, Germany
Michael Resch
Department of Electrical Engineering and Computer Science, University of Tennessee, 37996-3450, Knoxville, TN, USA
Jack Dongarra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rabinovitz, I., Shamis, P., Graham, R.L., Bloch, N., Shainer, G. (2010). Network Offloaded Hierarchical Collectives Using ConnectX-2’s CORE-Direct Capabilities. In: Keller, R., Gabriel, E., Resch, M., Dongarra, J. (eds) Recent Advances in the Message Passing Interface. EuroMPI 2010. Lecture Notes in Computer Science, vol 6305. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15646-5_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-15646-5_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15645-8
Online ISBN: 978-3-642-15646-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics