Efficient Shared Memory and RDMA Based Design for MPI_Allgather over InfiniBand

Mamidala, Amith R.; Vishnu, Abhinav; Panda, Dhabaleswar K.

doi:10.1007/11846802_17

Amith R. Mamidala²⁰,
Abhinav Vishnu²⁰ &
Dhabaleswar K. Panda²⁰

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 4192))

Included in the following conference series:

European Parallel Virtual Machine / Message Passing Interface Users’ Group Meeting

1286 Accesses
20 Citations

Abstract

MPI_Allgather is an important collective operation which is used in applications such as matrix multiplication and in basic linear algebra operations. With the next generation systems going multi-core, the clusters deployed would enable a high process count per node. The traditional implementations of Allgather use two separate channels, namely network channel for communication across the nodes and shared memory channel for intra-node communication. An important drawback of this approach is the lack of sharing of communication buffers across these channels. This results in extra copying of data within a node yielding sub-optimal performance. This is true especially for a collective involving large number of processes with a high process density per node. In the approach proposed in the paper, we propose a solution which eliminates the extra copy costs by sharing the communication buffers for both intra and inter node communication. Further, we optimize the performance by allowing overlap of network operations with intra-node shared memory copies. On a 32, 2-way node cluster, we observe an improvement upto a factor of two for MPI_Allgather compared to the original implementation. Also, we observe overlap benefits upto 43% for 32x2 process configuration.

This research is supported in part by Department of Energy’s Grant #DE-FC02-01ER25506; National Science Foundation’s grants #CCR-0204429, #CCR-0311542 and #CNS-0403342; grants from Intel and Mellanox; and equipment donations from Intel, Mellanox, AMD, Apple, Advanced Clustering and Sun Microsystems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bernaschi, M., Richelli, G.: Mpi collective communication operations on large shared memory systems. In: Parallel and Distributed Processing, 2001. Proceedings. Ninth Euromicro Workshop (2001)
Google Scholar
Bruck, J., Ho, C.-T., Kipnis, S., Upfal, E., Weathersby, D.: Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems. IEEE Transactions in Parallel and Distributed Systems 8(11), 1143–1156 (1997)
Article Google Scholar
Gropp, W., Lusk, E., Doss, N., Skjellum, A.: A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard. Parallel Computing 22(6), 789–828 (1996)
Article MATH Google Scholar
InfiniBand Trade Association. InfiniBand Architecture Specification, Release 1.1 (October 2004), http://www.infinibandta.org
Kini, S.P., Liu, J., Wu, J., Wyckoff, P., Panda, D.K.: Fast and Scalable Barrier using RDMA and Multicast Mechanisms for InfiniBand-Based Clusters. In: Dongarra, J., Laforenza, D., Orlando, S. (eds.) EuroPVM/MPI 2003. LNCS, vol. 2840, pp. 369–378. Springer, Heidelberg (2003)
Chapter Google Scholar
Liu, J., Wu, J., Kinis, S.P., Buntinas, D., Yu, W., Chandrasekaran, B., Noronha, R., Wyckoff, P., Panda, D.K.: MPI over InfiniBand: Early Experiences. Technical Report, OSU-CISRC-10/02-TR25, Computer and Information Science, the Ohio State University (January 2003)
Google Scholar
NASA. NAS Parallel Benchmarks, http://www.nas.nasa.gov/Software/NPB/
Mamidala, A.R., Liu, J., Panda, D.K.: Efficient Barrier and Allreduce InfiniBand Clusters using Hardware Multicast and Adaptive Algorithms. In: Proceedings of Cluster Computing (2004)
Google Scholar
Snir, M., Otto, S., Huss-Lederman, S., Walker, D., Dongarra, J.: MPI–The Complete Reference. In: The MPI-1 Core, 2nd edn., vol. 1. The MIT Press, Cambridge (1998)
Google Scholar
Sur, S., Bondhugula, U.K.R., Mamidala, A.R., Jin, H.-W., Panda, D.K.: High Performance RDMA Based All-to-All Broadcast for InfiniBand Clusters. In: Bader, D.A., Parashar, M., Sridhar, V., Prasanna, V.K. (eds.) HiPC 2005. LNCS, vol. 3769, pp. 148–157. Springer, Heidelberg (2005)
Chapter Google Scholar
Sur, S., Jin, H.-W., Panda, D.K.: Efficient and Scalable All-to-All Exchange for InfiniBand-based Clusters. In: International Conference on Parallel Processing (ICPP) (2004)
Google Scholar
Thakur, R., Rabenseifner, R., Gropp, W.: Optimization of Collective communication operations in MPICH. Int’l Journal of High Performance Computing Applications 19(1), 49–66 (2005)
Article Google Scholar
Tipparaju, V., Nieplocha, J., Panda, D.K.: Fast collective operations using shared and remote memory access protocols on clusters. In: International Parallel and Distributed Processing Symposium 2003 (2003)
Google Scholar
Wu, M.-S., Kendall, R.A., Wright, K.: Optimizing collective communications on smp clusters. In: ICPP 2005 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, The Ohio State University,
Amith R. Mamidala, Abhinav Vishnu & Dhabaleswar K. Panda

Authors

Amith R. Mamidala
View author publications
You can also search for this author in PubMed Google Scholar
Abhinav Vishnu
View author publications
You can also search for this author in PubMed Google Scholar
Dhabaleswar K. Panda
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Forschungszentrum Jülich, ZAM, 52425, Jülich, Germany
Bernd Mohr
NEC Europe Ltd., NEC Laboratories Europe, Rathausallee 10, D-53757, Sankt Augustin, Germany
Jesper Larsson Träff
Dolphin Interconnect Solutions ASA R&D Germany, Siebengebirgsblick 26, 53343, Wachtberg, Germany
Joachim Worringen
Computer Science Department, University of Tennessee, 37996-3450, Knoxville, TN, USA
Jack Dongarra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mamidala, A.R., Vishnu, A., Panda, D.K. (2006). Efficient Shared Memory and RDMA Based Design for MPI_Allgather over InfiniBand. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2006. Lecture Notes in Computer Science, vol 4192. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11846802_17

Download citation

DOI: https://doi.org/10.1007/11846802_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-39110-4
Online ISBN: 978-3-540-39112-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics