Skip to main content
Log in

A Speculative and Adaptive MPI Rendezvous Protocol Over RDMA-enabled Interconnects

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Overlapping computation with communication is a key technique to conceal the effect of communication latency on the performance of parallel applications. Message Passing Interface (MPI) is a widely used message passing standard for high performance computing. One of the most important factors in achieving a good level of overlap is the MPI ability to make progress on outstanding communication operations. In this paper, we propose a novel speculative MPI Rendezvous protocol that uses RDMA Read and RDMA Write to effectively improve communication progress and consequently the overlap ability. Performance results based on a modified MPICH2 implementation over 10-Gigabit iWARP Ethernet reveal a significant (80–100%) improvement in receiver side overlap and progress ability. We have also observed up to 30% improvement in application wait time for some NPB applications as well as the RADIX application. For applications that do not benefit from this protocol, an adaptation mechanism is used to stop the speculation to effectively reduce the protocol overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. MPI: A Message-Passing Interface standard, MPI Forum (1997)

  2. Goumas, G., Sotiropoulos, A., Koziris, N.: Minimizing completion time for loop tiling with computation and communication overlapping. In: Proceedings of 15th IEEE/ACM International Parallel and Distributed Processing Symposium (IPDPS’01) (2001). doi:10.1109/IPDPS.2001.924976

  3. Fishgold, L., Danalis, A., Pollock, L., Swany, M.: An automated approach to improve communication-computation overlap in clusters. In: 2006 NSF Next Generation Software Workshop (NSFNGS 2006). Proceedings of 20th IEEE/ACM International Parallel and Distributed Processing Symposium (IPDPS’06) (2006). doi:10.1109/IPDPS.2006.1639590

  4. Brightwell R., Riesen R., Underwood K.D.: Analyzing the impact of overlap, offload, and independent progress for Message Passing Interface applications. Int. J. High Perform. Comput. Appl. 19(2), 103–117 (2005). doi:10.1177/1094342005054257

    Article  Google Scholar 

  5. Rashti, M.J., Afsahi, A.: Assessing the ability of computation/communication overlap and communication progress in modern interconnects. In: Proceedings of 15th Annual IEEE Symposium on High-Performance Interconnects (Hot Interconnects 2007), pp. 117–124 (2007). doi:10.1109/HOTI.2007.12

  6. Wagner, A., Jin, H., Panda, D.K., Riesen, R.: NIC-based offload of dynamic user-defined modules for Myrinet clusters. In: Proceedings of 6th IEEE International Conference on Cluster Computing (Cluster’04), pp. 205–214 (2004). doi:10.1109/CLUSTR.2004.1392618

  7. Sitsky, D., Hayashi, K.: An MPI library which uses polling, interrupts and remote copying for the Fujitsu AP1000+. In: Proceedings of International Symposium on Parallel Architectures, Algorithms, and Networks, pp. 43–49 (1996). doi:10.1109/ISPAN.1996.508959

  8. Sur, S., Jin, H., Chai, L., Panda, D.K.: RDMA Read based Rendezvous protocol for MPI over InfiniBand: design alternatives and benefits. In: Proceedings of 11th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2006), pp. 32–39 (2006). doi:10.1145/1122971.1122978

  9. Trahay, F., Denis, A., Aumage, O., Namyst, R.: Improving reactivity and communication overlap in MPI using a generic I/O manager. In: Proceedings of Euro PVM/MPI 2007, LNCS 4757, pp. 170–177 (2007)

  10. MPICH2: http://www-unix.mcs.anl.gov/mpi/mpich2/

  11. Rashti, M.J., Afsahi, A.: 10-Gigabit iWARP Ethernet: comparative performance analysis with InfiniBand and Myrinet-10G. In: 7th IEEE Workshop on Communication Architecture for Clusters (CAC’07). Proceedings of 21st IEEE/ACM International Parallel and Distributed Processing Symposium (IPDPS’07) (2007). doi:10.1109/IPDPS.2007.370480

  12. Rashti, M.J., Afsahi, A.: Improving communication progress and overlap in MPI Rendezvous protocol over RDMA-enabled interconnects. In: Proceedings of 22nd International Symposium on High Performance Computing Systems and Applications (HPCS 2008), pp. 95–101 (2008). doi:10.1109/HPCS.2008.10

  13. National Aeronautics and Space Administration (NASA): NAS Parallel Benchmarks (NPB) for MPI, http://www.nas.nasa.gov/Resources/Software/npb.html/

  14. Shan H., Singh J.P., Oliker L., Biswas R.: Message passing and shared address space parallelism on an SMP cluster. J. Parallel Comput. 29(2), 167–186 (2003). doi:10.1016/S0167-8191(02)00222-3

    Article  Google Scholar 

  15. Petrini F., Coll S., Frachtenberg E., Hoisie A.: Performance evaluation of the Quadrics interconnection network. J. Cluster Comput. 6(2), 125–142 (2003). doi:10.1023/A:1022852505633

    Article  Google Scholar 

  16. Brightwell, R., Doerfler, D., Underwood, K.D.: A comparison of 4X InfiniBand and Quadrics elan-4 technologies. In: Proceedings of 6th IEEE International Conference on Cluster Computing (Cluster’04), pp. 193–204 (2004). doi:10.1109/CLUSTR.2004.1392617

  17. InfiniBand Trade Association, InfiniBand Architecture Specification, vol. 1, October (2004)

  18. Beecroft J., Addison D., Hewson D., McLaren M., Roweth D., Petrini F., Nieplocha J.: QsNetII: Defining high-performance network design. IEEE Micro 25(4), 34–47 (2005). doi:10.1109/MM.2005.75

    Article  Google Scholar 

  19. Doerfler, D., Brightwell, R.: Measuring MPI send and receive overhead and application availability in high performance network interfaces. In: Proceedings of EuroPVM/MPI 2006, LNCS 4192, pp. 331–338 (2006)

  20. Liu, J., Chandrasekaran, B., Wu, J., Jiang, W., Kini, S., Yu, W., Buntinas, D., Wyckoff, P., Panda, D.K.: Performance comparison of MPI implementations over InfiniBand, Myrinet and Quadrics. In: Proceedings of 2003 ACM/IEEE Conference on Supercomputing (SC 2003) (2003). doi:10.1109/SC.2003.10007

  21. Zamani, R., Qian, Y., Afsahi, A.: An evaluation of the Myrinet/GM2 two-port networks. In: 3rd IEEE Workshop on High-Speed Local Networks (HSLN 2004). Proceedings of 2004 International Conference on Local Area Networks (LCN 2004), pp. 734–742 (2004). doi:10.1109/LCN.2004.20

  22. Mellanox Technologies, Inc.: http://www.mellanox.com/

  23. Myricom. http://www.myricom.com/

  24. NetEffect, Inc.: NetEffect NE020 10Gb iWARP Ethernet channel adapter. http://www.neteffect.com/

  25. RDMA Consortium: iWARP protocol specification, http://www.rdmaconsortium.org/

  26. Amerson, G., Apon, A.: Implementation and design analysis of a network messaging module using virtual interface architecture. In: Proceedings of 6th IEEE International Conference on Cluster Computing (Cluster’04), pp. 255–265 (2004). doi:10.1109/CLUSTR.2004.1392623

  27. MVAPICH: http://mvapich.cse.ohio-state.edu/index.shtml/

  28. Kumar, R., Mamidala, A.R., Koop, M.J., Santhanaraman, G., Panda, D.K.: Lock-free asynchronous Rendezvous design for MPI point-to-point communication. In: Proceedings of EuroPVM/MPI 2008, LNCS 5205, pp. 185–193 (2008)

  29. Pakin, S.: Receiver-initiated message passing over RDMA networks. In: Proceedings of 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS 2008) (2008). doi:10.1109/IPDPS.2008.4536262

  30. Chen T., Raghavan R., Dale J.N., Iwata E.: Cell Broadband Engine architecture and its first implementation—a performance view. IBM J. Res. Develop. 51(5), 559–572 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmad Afsahi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rashti, M.J., Afsahi, A. A Speculative and Adaptive MPI Rendezvous Protocol Over RDMA-enabled Interconnects. Int J Parallel Prog 37, 223–246 (2009). https://doi.org/10.1007/s10766-009-0094-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-009-0094-9

Keywords

Navigation