Skip to main content

Efficient Shared Memory and RDMA Based Design for MPI_Allgather over InfiniBand

  • Conference paper
Recent Advances in Parallel Virtual Machine and Message Passing Interface (EuroPVM/MPI 2006)

Abstract

MPI_Allgather is an important collective operation which is used in applications such as matrix multiplication and in basic linear algebra operations. With the next generation systems going multi-core, the clusters deployed would enable a high process count per node. The traditional implementations of Allgather use two separate channels, namely network channel for communication across the nodes and shared memory channel for intra-node communication. An important drawback of this approach is the lack of sharing of communication buffers across these channels. This results in extra copying of data within a node yielding sub-optimal performance. This is true especially for a collective involving large number of processes with a high process density per node. In the approach proposed in the paper, we propose a solution which eliminates the extra copy costs by sharing the communication buffers for both intra and inter node communication. Further, we optimize the performance by allowing overlap of network operations with intra-node shared memory copies. On a 32, 2-way node cluster, we observe an improvement upto a factor of two for MPI_Allgather compared to the original implementation. Also, we observe overlap benefits upto 43% for 32x2 process configuration.

This research is supported in part by Department of Energy’s Grant #DE-FC02-01ER25506; National Science Foundation’s grants #CCR-0204429, #CCR-0311542 and #CNS-0403342; grants from Intel and Mellanox; and equipment donations from Intel, Mellanox, AMD, Apple, Advanced Clustering and Sun Microsystems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bernaschi, M., Richelli, G.: Mpi collective communication operations on large shared memory systems. In: Parallel and Distributed Processing, 2001. Proceedings. Ninth Euromicro Workshop (2001)

    Google Scholar 

  2. Bruck, J., Ho, C.-T., Kipnis, S., Upfal, E., Weathersby, D.: Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems. IEEE Transactions in Parallel and Distributed Systems 8(11), 1143–1156 (1997)

    Article  Google Scholar 

  3. Gropp, W., Lusk, E., Doss, N., Skjellum, A.: A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard. Parallel Computing 22(6), 789–828 (1996)

    Article  MATH  Google Scholar 

  4. InfiniBand Trade Association. InfiniBand Architecture Specification, Release 1.1 (October 2004), http://www.infinibandta.org

  5. Kini, S.P., Liu, J., Wu, J., Wyckoff, P., Panda, D.K.: Fast and Scalable Barrier using RDMA and Multicast Mechanisms for InfiniBand-Based Clusters. In: Dongarra, J., Laforenza, D., Orlando, S. (eds.) EuroPVM/MPI 2003. LNCS, vol. 2840, pp. 369–378. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  6. Liu, J., Wu, J., Kinis, S.P., Buntinas, D., Yu, W., Chandrasekaran, B., Noronha, R., Wyckoff, P., Panda, D.K.: MPI over InfiniBand: Early Experiences. Technical Report, OSU-CISRC-10/02-TR25, Computer and Information Science, the Ohio State University (January 2003)

    Google Scholar 

  7. NASA. NAS Parallel Benchmarks, http://www.nas.nasa.gov/Software/NPB/

  8. Mamidala, A.R., Liu, J., Panda, D.K.: Efficient Barrier and Allreduce InfiniBand Clusters using Hardware Multicast and Adaptive Algorithms. In: Proceedings of Cluster Computing (2004)

    Google Scholar 

  9. Snir, M., Otto, S., Huss-Lederman, S., Walker, D., Dongarra, J.: MPI–The Complete Reference. In: The MPI-1 Core, 2nd edn., vol. 1. The MIT Press, Cambridge (1998)

    Google Scholar 

  10. Sur, S., Bondhugula, U.K.R., Mamidala, A.R., Jin, H.-W., Panda, D.K.: High Performance RDMA Based All-to-All Broadcast for InfiniBand Clusters. In: Bader, D.A., Parashar, M., Sridhar, V., Prasanna, V.K. (eds.) HiPC 2005. LNCS, vol. 3769, pp. 148–157. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  11. Sur, S., Jin, H.-W., Panda, D.K.: Efficient and Scalable All-to-All Exchange for InfiniBand-based Clusters. In: International Conference on Parallel Processing (ICPP) (2004)

    Google Scholar 

  12. Thakur, R., Rabenseifner, R., Gropp, W.: Optimization of Collective communication operations in MPICH. Int’l Journal of High Performance Computing Applications 19(1), 49–66 (2005)

    Article  Google Scholar 

  13. Tipparaju, V., Nieplocha, J., Panda, D.K.: Fast collective operations using shared and remote memory access protocols on clusters. In: International Parallel and Distributed Processing Symposium 2003 (2003)

    Google Scholar 

  14. Wu, M.-S., Kendall, R.A., Wright, K.: Optimizing collective communications on smp clusters. In: ICPP 2005 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mamidala, A.R., Vishnu, A., Panda, D.K. (2006). Efficient Shared Memory and RDMA Based Design for MPI_Allgather over InfiniBand. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2006. Lecture Notes in Computer Science, vol 4192. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11846802_17

Download citation

  • DOI: https://doi.org/10.1007/11846802_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-39110-4

  • Online ISBN: 978-3-540-39112-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics