To Share or Not to Share: A Case for MPI in Shared-Memory

Adam, Julien; Besnard, Jean-Baptiste; Roussel, Adrien; Jaeger, Julien; Carribault, Patrick; Pérache, Marc

doi:10.1007/978-3-031-73370-3_6

Julien Adam¹⁰,
Jean-Baptiste Besnard¹⁰,
Adrien Roussel^11,12,
Julien Jaeger^11,12,
Patrick Carribault^11,12 &
…
Marc Pérache^11,12

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15267))

Included in the following conference series:

European MPI Users' Group Meeting

102 Accesses

Abstract

The evolution of parallel computing architectures presents new challenges for developing efficient parallelized codes. The emergence of heterogeneous systems has given rise to multiple programming models, each requiring careful adaptation to maximize performance. In this context, we propose reevaluating memory layout designs for computational tasks within larger nodes by comparing various architectures. To gain insight into the performance discrepancies between shared memory and shared-address space settings, we systematically measure the bandwidth between cores and sockets using different methodologies. Our findings reveal significant differences in performance, suggesting that MPI running inside UNIX processes may not fully utilize its intranode bandwidth potential. In light of our work in the MPC thread-based MPI runtime, which can leverage shared memory to achieve higher performance due to its optimized layout, we advocate for enabling the use of shared memory within the MPI standard.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Methodology Approach to Compare Performance of Parallel Programming Models for Shared-Memory Architectures

A Study in SHMEM: Parallel Graph Algorithm Acceleration with Distributed Symmetric Memory

Mitigating the NUMA effect on task-based runtime systems

Article 06 April 2023

Notes

1.
https://github.com/besnardjb/memmapper.
2.
https://github.com/besnardjb/memmapper.
3.
spack install openmpi fabrics=cma.
4.

References

Besnard, J.B., et al.: Introducing task-containers as an alternative to runtime-stacking. In: Proceedings of the 23rd European MPI Users’ Group Meeting, pp. 51–63 (2016)
Google Scholar
Besnard, J., Malony, A.D., Shende, S., Pérache, M., Carribault, P., Jaeger, J.: An MPI halo-cell implementation for zero-copy abstraction. In: Dongarra, J.J., Denis, A., Goglin, B., Jeannot, E., Mercier, G. (eds.) Proceedings of the 22nd European MPI Users’ Group Meeting, EuroMPI 2015, Bordeaux, France, 21–23 September 2015, pp. 3:1–3:9. ACM (2015). https://doi.org/10.1145/2802658.2802669
Brightwell, R., Pedretti, K., Hudson, T.: SMARTMAP: operating system support for efficient data sharing among processes on a multi-core processor. In: SC 2008: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing. IEEE (2008)
Google Scholar
Buntinas, D., Mercier, G., Gropp, W.: Implementation and evaluation of shared-memory communication and synchronization operations in MPICH2 using the nemesis communication subsystem. Parallel Comput. 33(9), 634–644 (2007)
Article Google Scholar
Chen, C.C., et al.: MPI-xCCL: a portable MPI library over collective communication libraries for various accelerators. In: Proceedings of the SC’23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, pp. 847–854 (2023)
Google Scholar
Dilley, N., Lange, J.: An empirical study of messaging passing concurrency in Go projects. In: 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 377–387. IEEE (2019)
Google Scholar
Dosanjh, M.G., et al.: Implementation and evaluation of MPI 4.0 partitioned communication libraries. Parallel Comput. 108, 102827 (2021)
Google Scholar
Friedley, A., Bronevetsky, G., Hoefler, T., Lumsdaine, A.: Hybrid MPI: efficient message passing for multi-core systems. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–11 (2013)
Google Scholar
Gillis, T., Raffenetti, K., Zhou, H., Guo, Y., Thakur, R.: Quantifying the performance benefits of partitioned communication in MPI. In: Proceedings of the 52nd International Conference on Parallel Processing, pp. 285–294 (2023)
Google Scholar
Goglin, B., Moreaud, S.: KNEM: a generic and scalable kernel-assisted intra-node MPI communication framework. J. Parallel Distributed Comput. 73(2), 176–188 (2013)
Article Google Scholar
Grant, R.E., Dosanjh, M.G.F., Levenhagen, M.J., Brightwell, R., Skjellum, A.: Finepoints: partitioned multithreaded MPI communication. In: Weiland, M., Juckeland, G., Trinitis, C., Sadayappan, P. (eds.) ISC High Performance 2019. LNCS, vol. 11501, pp. 330–350. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20656-7_17
Chapter Google Scholar
Hoefler, T., et al.: MPI+ MPI: a new hybrid approach to parallel programming with MPI plus shared memory. Computing 95, 1121–1136 (2013)
Article Google Scholar
Hori, A., Ouyang, K., Gerofi, B., Ishikawa, Y.: On the difference between shared memory and shared address space in HPC communication. In: Panda, D.K., Sullivan, M. (eds.) SCFA 2022. LNCS, vol. 13214, pp. 59–78. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-10419-0_5
Hori, A., et al.: Process-in-process: techniques for practical address-space sharing. In: Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, pp. 131–143 (2018)
Google Scholar
Huang, C., Lawlor, O., Kalé, L.V.: Adaptive MPI. In: Rauchwerger, L. (ed.) LCPC 2003. LNCS, vol. 2958, pp. 306–322. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24644-2_20
Chapter Google Scholar
Jin, H.W., Sur, S., Chai, L., Panda, D.K.: LiMIC: support for high-performance MPI intra-node communication on Linux cluster. In: 2005 International Conference on Parallel Processing (ICPP 2005), pp. 184–191. IEEE (2005)
Google Scholar
John, J., Narvaez, S., Gerndt, M.: Invasive computing for power corridor management. Parallel Comput. Technol. Trends 36, 386 (2020)
Google Scholar
Malony, A.D., Reed, D.A., McGuire, P.J.: MPF: a portable message passing facility for shared memory multiprocessors. Technical report (1987)
Google Scholar
Martinelli, A.R., Torquati, M., Aldinucci, M., Colonnelli, I., Cantalupo, B.: CAPIO: a middleware for transparent I/O streaming in data-intensive workflows. In: 2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC), pp. 153–163. IEEE (2023)
Google Scholar
MPI Forum: MPI Endpoints Proposal (2015). https://github.com/MPI-forum/MPI-issues/issues/56. Accessed 2024
MPI Forum (2016): Arecv/Fsend Proposal. https://github.com/MPI-forum/MPI-issues/issues/32. Accessed 2024
Ouyang, K., Si, M., Hori, A., Chen, Z., Balaji, P.: CAB-MPI: exploring interprocess work-stealing towards balanced MPI communication. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE (2020)
Google Scholar
Pérache, M., Jourdren, H., Namyst, R.: MPC: a unified parallel runtime for clusters of NUMA machines. In: Luque, E., Margalef, T., Benítez, D. (eds.) Euro-Par 2008. LNCS, vol. 5168, pp. 78–88. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85451-7_9
Chapter Google Scholar
Pieper, R., Löff, J., Hoffmann, R.B., Griebler, D., Fernandes, L.G.: High-level and efficient structured stream parallelism for rust on multi-cores. J. Comput. Lang. 65, 101054 (2021)
Article Google Scholar
Ross, R.B., et al.: Mochi: composing data services for high-performance computing environments. J. Comput. Sci. Technol. 35, 121–144 (2020)
Article Google Scholar
Shimada, A., Gerofi, B., Hori, A., Ishikawa, Y.: Proposing a new task model towards many-core architecture. In: Proceedings of the First International Workshop on Many-Core Embedded Systems, pp. 45–48 (2013)
Google Scholar
Shimosaka, T., Murai, H., Sato, M.: A design of a communication library between multiple sets of MPI processes for MPMD. In: 2014 IEEE 17th International Conference on Computational Science and Engineering, pp. 1886–1893. IEEE (2014)
Google Scholar
Trott, C.R., et al.: Kokkos 3: programming model extensions for the exascale era. IEEE Trans. Parallel Distrib. Syst. 33(4), 805–817 (2021)
Article Google Scholar
Vef, M.A., et al.: GekkoFS-a temporary distributed file system for HPC applications. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER), pp. 319–324. IEEE (2018)
Google Scholar
Venkata, M.G., Graham, R.L., Hjelm, N.T., Gutierrez, S.K.: Open MPI for cray XE/XK systems. In: Proceedings of the 2012 Cray User Group, Greengineering the Future, Stuttgart, Germany (2012)
Google Scholar
Vienne, J.: Benefits of cross memory attach for MPI libraries on HPC clusters. In: Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment, pp. 1–6 (2014)
Google Scholar
Weingram, A., Li, Y., Qi, H., Ng, D., Dai, L., Lu, X.: xCCL: a survey of industry-led collective communication libraries for deep learning. J. Comput. Sci. Technol. 38(1), 166–195 (2023)
Article Google Scholar

Download references

Author information

Authors and Affiliations

ParaTools SAS, Bruyères-le-Châtel, France
Julien Adam & Jean-Baptiste Besnard
CEA, DAM, DIF, 91297, Arpajon, France
Adrien Roussel, Julien Jaeger, Patrick Carribault & Marc Pérache
Université Paris-Saclay, CEA, Laboratoire en Informatique Haute Performance pour le Calcul et la simulation, 91680, Bruyères-le-Châtel, France
Adrien Roussel, Julien Jaeger, Patrick Carribault & Marc Pérache

Authors

Julien Adam
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Baptiste Besnard
View author publications
You can also search for this author in PubMed Google Scholar
Adrien Roussel
View author publications
You can also search for this author in PubMed Google Scholar
Julien Jaeger
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Carribault
View author publications
You can also search for this author in PubMed Google Scholar
Marc Pérache
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julien Adam .

Editor information

Editors and Affiliations

VSC Research Center, TU Wien, Operngasse, Wien, Austria
Claudia Blaas-Schenner
HLRS, University of Stuttgart, Stuttgart, Baden-Württemberg, Germany
Christoph Niethammer
HLRS, University of Stuttgart, Stuttgart, Baden-Württemberg, Germany
Tobias Haas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Adam, J., Besnard, JB., Roussel, A., Jaeger, J., Carribault, P., Pérache, M. (2025). To Share or Not to Share: A Case for MPI in Shared-Memory. In: Blaas-Schenner, C., Niethammer, C., Haas, T. (eds) Recent Advances in the Message Passing Interface. EuroMPI 2024. Lecture Notes in Computer Science, vol 15267. Springer, Cham. https://doi.org/10.1007/978-3-031-73370-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-73370-3_6
Published: 25 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73369-7
Online ISBN: 978-3-031-73370-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

To Share or Not to Share: A Case for MPI in Shared-Memory

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Methodology Approach to Compare Performance of Parallel Programming Models for Shared-Memory Architectures

A Study in SHMEM: Parallel Graph Algorithm Acceleration with Distributed Symmetric Memory

Mitigating the NUMA effect on task-based runtime systems

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

To Share or Not to Share: A Case for MPI in Shared-Memory

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Methodology Approach to Compare Performance of Parallel Programming Models for Shared-Memory Architectures

A Study in SHMEM: Parallel Graph Algorithm Acceleration with Distributed Symmetric Memory

Mitigating the NUMA effect on task-based runtime systems

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation