Abstract
To avoid the memory registration cost for small messages in MPI implementations over RDMA-enabled networks, message transfer protocols involve a copy to intermediate buffers at both sender and receiver. In this paper, we propose to eliminate the send-side copy when an application buffer is reused frequently. We show that it is more efficient to register the application buffer and use it for data transfer. The idea is examined for small message transfer protocols in MVAPICH2, including RDMA Write and Send/Receive based communications, one-sided communications and collectives. The proposed protocol adaptively falls back to the current protocol when the application does not frequently use its buffers. The performance results over InfiniBand indicate up to 14% improvement for single message latency, close to 20% improvement for one-sided operations and up to 25% improvement for collectives. In addition, the communication time in MPI applications with high buffer reuse is improved using this technique.
Similar content being viewed by others
References
Message Passing Interface Forum: MPI, A Message Passing Interface Standard V2.2 (2011)
RDMA Consortium: Remote direct memory access protocol. http://www.rdmaconsortium.org (2009). Accessed 1 August 2010
InfiniBand Trade Association: InfiniBand architecture specification. http://www.infinibandta.org/ (2010). Accessed 19 July 2010
Mietke, F., Rex, R., Baumgartl, R., Mehlan, T., Hoefler, T., Rehm, W.: Analysis of the memory registration process in the mellanox InfiniBand software stack. In: Proceedings of the 12th International Euro-Par Conference, Dresden, Germany, pp. 124–133 (2006). doi:10.1007/11823285_13
Magoutis, K.: Memory management support for multi-programmed remote direct memory access (RDMA) systems. In: Proceedings of the 2nd Workshop for RDMA Applications, Implementations and Technologies (RAIT-2005); held in conjunction with IEEE Cluster 2005, Burlington, MA, October (2005). doi:10.1109/CLUSTR.2005.347031
Argonne National Laboratory: MPICH2 MPI Implementation. http://www-unix.mcs.anl.gov/mpi/mpich2/ (2010). Accessed 26 July 2010
Liu, J., Wu, J., Panda, D.K.: High performance RDMA-based MPI implementation over InfiniBand. In: Proceedings of the 17th Annual Conference on Supercomputing, pp. 295–304 (2003). doi:10.1145/782814.782855
Rashti, M.J., Afsahi, A.: Improving RDMA-based MPI Eager protocol for frequently-used buffers. In: 9th Workshop on Communication Architecture for Clusters (CAC 2009). Proceedings of the 23rd International Parallel and Distributed Processing Symposium (IPDPS 2009), Rome, Italy, May 25–29 (2009). doi:10.1109/IPDPS.2009.5160895
Huang, W., Santhanaraman, G., Jin, H., Panda, D.K.: Design alternatives and performance trade-offs for implementing MPI-2 over InfiniBand. In: Proceedings of the Euro PVM/MPI Conference, pp. 191–199 (2005). doi:10.1007/11557265_27
Mietke, F., Rex, R., Mehlan, T., Hoefler, T., Rehm, W.: Reducing the impact of memory registration in InfiniBand. In: 1st Kommunikation in Clusterrechnern und Clusterverbundsystemen (KiCC) (2005)
Wyckoff, P., Wu, J.: Memory registration caching correctness. In: Proceedings of the IEEE International Symposium on Cluster Computing and the Grid (CCGrid’05), pp. 1008–1015 (2005). doi:10.1109/CCGRID.2005.1558671
Lever, C.: Linux Kernel Hash Table Behavior: Analysis and Improvements. Technical Report 00-1, Center for Information Technology Integration, University of Michigan (2000)
Mellanox Technologies Inc.: http://www.mellanox.com/ (2010). Accessed 27 July 2010
The Ohio State University, Network-based computing laboratory: MVAPICH2, MPI-2 over InfiniBand, iWARP and RoCE Project. http://mvapich.cse.ohio-state.edu/ (2010). Accessed 1 July 2010
OpenFabric Alliance: OpenFabrics Enterprise Distribution (OFED). http://www.openfabrics.org (2010). Accessed 1 July 2010
National Aeronautics and Space Administration: NAS Parallel Benchmarks, version 2.4. http://www.nas.nasa.gov/Resources/Software/npb.html (2010). Accessed 1 August 2010
Lawrence Livermore National Laboratory: AMG 2006, ASC Sequoia Benchmarks. http://asc.llnl.gov/sequoia/benchmarks/ (2009). Accessed 1 August 2010
Standard Performance Evaluation Corporation: SPEC MPI 2007 Benchmark Suite. http://www.spec.org/mpi/ (2010). Accessed 1 August 2010
Faraj, A., Yuan, X.: Communication characteristics in the NAS parallel benchmarks. In: Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS), pp. 724–729 (2002)
Arber, L., Pakin, S.: The impact of message-buffer alignment on communication performance. Parallel Process. Lett. 15, 49–65 (2005). doi:10.1142/S0129626405002052
Morrow, M.: Optimizing Memcpy improves speed. Embedded system design, April 2004. http://www.eetimes.com/design/other/4024961/Optimizing-Memcpy-improves-speed (2004). Accessed 24 July 2010
Woodall, T.S., Shipman, G.M., Bosilca, G., Graham, R.L., Maccabe, A.B.: High performance RDMA protocols in HPC. In: Proceedings of the Euro PVM/MPI Conference, pp. 76–85 (2006). doi:10.1007/11846802_18
Dalessandro, D., Wyckoff, P., Montry, G.: Initial performance evaluation of the NetEffect 10 gigabit iWARP adapter. In: Proceedings of the 3rd IEEE Workshop on Remote Direct Memory Access (RDMA): Applications, Implementations, and Technologies (RAIT 2006), held in conjunction with IEEE Cluster, pp. 1–7 (2006). doi:10.1109/CLUSTR.2006.311915
Goglin, B.: Decoupling Memory Pinning from the Application with Overlapped On-demand Pinning and MMU Notifiers. In: Proceedings of the IEEE International Symposium on Parallel & Distributed Processing (IPDPS2009), pp. 1–8 (2009). doi:10.1109/IPDPS.2009.5160888
Ou, L., He, X., Han, J.: An efficient design for fast memory registration in RDMA. J. Netw. Comput. Appl. 32, 642–651 (2009). doi:10.1145/363095.363141
Dalessandro, D., Wyckoff, P.: Memory management strategies for data serving with RDMA. In: Proceedings of the 15th Annual IEEE Symposium on High-Performance Interconnects (HotI), August 22–24 (2007). doi:10.1109/HOTI.2007.21
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rashti, M.J., Afsahi, A. Exploiting application buffer reuse to improve MPI small message transfer protocols over RDMA-enabled networks. Cluster Comput 14, 345–356 (2011). https://doi.org/10.1007/s10586-011-0165-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-011-0165-8