Abstract
The evolution of parallel computing architectures presents new challenges for developing efficient parallelized codes. The emergence of heterogeneous systems has given rise to multiple programming models, each requiring careful adaptation to maximize performance. In this context, we propose reevaluating memory layout designs for computational tasks within larger nodes by comparing various architectures. To gain insight into the performance discrepancies between shared memory and shared-address space settings, we systematically measure the bandwidth between cores and sockets using different methodologies. Our findings reveal significant differences in performance, suggesting that MPI running inside UNIX processes may not fully utilize its intranode bandwidth potential. In light of our work in the MPC thread-based MPI runtime, which can leverage shared memory to achieve higher performance due to its optimized layout, we advocate for enabling the use of shared memory within the MPI standard.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
spack install openmpi fabrics=cma.
- 4.
References
Besnard, J.B., et al.: Introducing task-containers as an alternative to runtime-stacking. In: Proceedings of the 23rd European MPI Users’ Group Meeting, pp. 51–63 (2016)
Besnard, J., Malony, A.D., Shende, S., Pérache, M., Carribault, P., Jaeger, J.: An MPI halo-cell implementation for zero-copy abstraction. In: Dongarra, J.J., Denis, A., Goglin, B., Jeannot, E., Mercier, G. (eds.) Proceedings of the 22nd European MPI Users’ Group Meeting, EuroMPI 2015, Bordeaux, France, 21–23 September 2015, pp. 3:1–3:9. ACM (2015). https://doi.org/10.1145/2802658.2802669
Brightwell, R., Pedretti, K., Hudson, T.: SMARTMAP: operating system support for efficient data sharing among processes on a multi-core processor. In: SC 2008: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing. IEEE (2008)
Buntinas, D., Mercier, G., Gropp, W.: Implementation and evaluation of shared-memory communication and synchronization operations in MPICH2 using the nemesis communication subsystem. Parallel Comput. 33(9), 634–644 (2007)
Chen, C.C., et al.: MPI-xCCL: a portable MPI library over collective communication libraries for various accelerators. In: Proceedings of the SC’23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, pp. 847–854 (2023)
Dilley, N., Lange, J.: An empirical study of messaging passing concurrency in Go projects. In: 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 377–387. IEEE (2019)
Dosanjh, M.G., et al.: Implementation and evaluation of MPI 4.0 partitioned communication libraries. Parallel Comput. 108, 102827 (2021)
Friedley, A., Bronevetsky, G., Hoefler, T., Lumsdaine, A.: Hybrid MPI: efficient message passing for multi-core systems. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–11 (2013)
Gillis, T., Raffenetti, K., Zhou, H., Guo, Y., Thakur, R.: Quantifying the performance benefits of partitioned communication in MPI. In: Proceedings of the 52nd International Conference on Parallel Processing, pp. 285–294 (2023)
Goglin, B., Moreaud, S.: KNEM: a generic and scalable kernel-assisted intra-node MPI communication framework. J. Parallel Distributed Comput. 73(2), 176–188 (2013)
Grant, R.E., Dosanjh, M.G.F., Levenhagen, M.J., Brightwell, R., Skjellum, A.: Finepoints: partitioned multithreaded MPI communication. In: Weiland, M., Juckeland, G., Trinitis, C., Sadayappan, P. (eds.) ISC High Performance 2019. LNCS, vol. 11501, pp. 330–350. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20656-7_17
Hoefler, T., et al.: MPI+ MPI: a new hybrid approach to parallel programming with MPI plus shared memory. Computing 95, 1121–1136 (2013)
Hori, A., Ouyang, K., Gerofi, B., Ishikawa, Y.: On the difference between shared memory and shared address space in HPC communication. In: Panda, D.K., Sullivan, M. (eds.) SCFA 2022. LNCS, vol. 13214, pp. 59–78. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-10419-0_5
Hori, A., et al.: Process-in-process: techniques for practical address-space sharing. In: Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, pp. 131–143 (2018)
Huang, C., Lawlor, O., Kalé, L.V.: Adaptive MPI. In: Rauchwerger, L. (ed.) LCPC 2003. LNCS, vol. 2958, pp. 306–322. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24644-2_20
Jin, H.W., Sur, S., Chai, L., Panda, D.K.: LiMIC: support for high-performance MPI intra-node communication on Linux cluster. In: 2005 International Conference on Parallel Processing (ICPP 2005), pp. 184–191. IEEE (2005)
John, J., Narvaez, S., Gerndt, M.: Invasive computing for power corridor management. Parallel Comput. Technol. Trends 36, 386 (2020)
Malony, A.D., Reed, D.A., McGuire, P.J.: MPF: a portable message passing facility for shared memory multiprocessors. Technical report (1987)
Martinelli, A.R., Torquati, M., Aldinucci, M., Colonnelli, I., Cantalupo, B.: CAPIO: a middleware for transparent I/O streaming in data-intensive workflows. In: 2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC), pp. 153–163. IEEE (2023)
MPI Forum: MPI Endpoints Proposal (2015). https://github.com/MPI-forum/MPI-issues/issues/56. Accessed 2024
MPI Forum (2016): Arecv/Fsend Proposal. https://github.com/MPI-forum/MPI-issues/issues/32. Accessed 2024
Ouyang, K., Si, M., Hori, A., Chen, Z., Balaji, P.: CAB-MPI: exploring interprocess work-stealing towards balanced MPI communication. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE (2020)
Pérache, M., Jourdren, H., Namyst, R.: MPC: a unified parallel runtime for clusters of NUMA machines. In: Luque, E., Margalef, T., Benítez, D. (eds.) Euro-Par 2008. LNCS, vol. 5168, pp. 78–88. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85451-7_9
Pieper, R., Löff, J., Hoffmann, R.B., Griebler, D., Fernandes, L.G.: High-level and efficient structured stream parallelism for rust on multi-cores. J. Comput. Lang. 65, 101054 (2021)
Ross, R.B., et al.: Mochi: composing data services for high-performance computing environments. J. Comput. Sci. Technol. 35, 121–144 (2020)
Shimada, A., Gerofi, B., Hori, A., Ishikawa, Y.: Proposing a new task model towards many-core architecture. In: Proceedings of the First International Workshop on Many-Core Embedded Systems, pp. 45–48 (2013)
Shimosaka, T., Murai, H., Sato, M.: A design of a communication library between multiple sets of MPI processes for MPMD. In: 2014 IEEE 17th International Conference on Computational Science and Engineering, pp. 1886–1893. IEEE (2014)
Trott, C.R., et al.: Kokkos 3: programming model extensions for the exascale era. IEEE Trans. Parallel Distrib. Syst. 33(4), 805–817 (2021)
Vef, M.A., et al.: GekkoFS-a temporary distributed file system for HPC applications. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER), pp. 319–324. IEEE (2018)
Venkata, M.G., Graham, R.L., Hjelm, N.T., Gutierrez, S.K.: Open MPI for cray XE/XK systems. In: Proceedings of the 2012 Cray User Group, Greengineering the Future, Stuttgart, Germany (2012)
Vienne, J.: Benefits of cross memory attach for MPI libraries on HPC clusters. In: Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment, pp. 1–6 (2014)
Weingram, A., Li, Y., Qi, H., Ng, D., Dai, L., Lu, X.: xCCL: a survey of industry-led collective communication libraries for deep learning. J. Comput. Sci. Technol. 38(1), 166–195 (2023)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Adam, J., Besnard, JB., Roussel, A., Jaeger, J., Carribault, P., Pérache, M. (2025). To Share or Not to Share: A Case for MPI in Shared-Memory. In: Blaas-Schenner, C., Niethammer, C., Haas, T. (eds) Recent Advances in the Message Passing Interface. EuroMPI 2024. Lecture Notes in Computer Science, vol 15267. Springer, Cham. https://doi.org/10.1007/978-3-031-73370-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-73370-3_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73369-7
Online ISBN: 978-3-031-73370-3
eBook Packages: Computer ScienceComputer Science (R0)