skip to main content
10.1145/3127024.3127030acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurompiConference Proceedingsconference-collections
research-article

A hierarchical model to manage hardware topology in MPI applications

Published: 25 September 2017 Publication History

Abstract

The MPI standard is a major contribution in the landscape of parallel programming. Since its inception in the mid 90's it has ensured portability and performance for parallel applications on a wide spectrum of machines and architectures. With the advent of multicore machines, understanding and taking into account the underlying physical topology and memory hierarchy have become of paramount importance. The MPI standard in its current state, however, and despite recent evolutions is still unable to offer mechanisms to achieve this. In this paper, we detail several additions to the standard that give the user tools to address the hardware topology and data locality issues while improving application performance.

References

[1]
B. Brandfass, T. Alrutz, and T. Gerhold. 2012. Rank Reordering for MPI Communication Optimization. Computer & Fluids (Jan. 2012).
[2]
F. Broquedis, J. Clet-Ortega, S. Moreaud, N. Furmento, B. Goglin, G. Mercier, S. Thibault, and R. Namyst. 2010. Hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications. In Proceedings of the 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP2010). IEEE Computer Society Press, Pisa, Italia. http://hal.inria.fr/inria-00429889
[3]
David Culler, Richard Karp, David Patterson, Abhijit Sahay, Klaus Erik Schauser, Eunice Santos, Ramesh Subramonian, and Thorsten von Eicken. 1993. LogP: Towards a Realistic Model of Parallel Computation. SIGPLAN Not. 28, 7 (July 1993), 1--12.
[4]
Brice Goglin, Joshua Hursey, and Jeffrey M. Squyres. 2014. Netloc: Towards a Comprehensive View of the HPC System Topology. In 43rd International Conference on Parallel Processing Workshops, ICPPW 2014, Minneapolis, MN, USA, September 9-12, 2014. IEEE Computer Society, 216--225.
[5]
T. Hatazaki. 1998. Rank Reordering Strategy for MPI Topology Creation Functions. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, V. Alexandrov and J. Dongarra (Eds.). Lecture Notes in Computer Science, Vol. 1497. Springer Berlin / Heidelberg, 188--195. 10.1007/BFb0056575.
[6]
Roger W. Hockney. 1994. The Communication Challenge for MPP: Intel Paragon and Meiko CS-2. Parallel Comput. 20, 3 (March 1994), 389--398.
[7]
J. Hursey, J. M. Squyres, and T. Dontje. 2011. Locality-Aware Parallel Process Mapping for Multi-core HPC Systems. In 2011 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 527--531.
[8]
J. L. Träff. 2002. Implementing the MPI Process Topology Mechanism. In Supercomputing '02: Proceedings of the 2002 ACM/IEEE conference on Supercomputing. IEEE Computer Society Press, Los Alamitos, CA, USA, 1--14.
[9]
Emmanuel Jeannot, Guillaume Mercier, and François Tessier. 2014. Process Placement in Multicore Clusters: Algorithmic Issues and Practical Techniques. IEEE Trans. Parallel Distrib. Syst. 25, 4(2014), 993--1002.
[10]
Jesper Larsson Träff. 2002. Implementing the MPI process topology mechanism. In Supercomputing 02: Proceedings of the 2002 ACM/IEEE conference on Supercomputing. IEEE Computer Society Press, Los Alamitos, CA, USA, 1--14.
[11]
Thilo Kielmann, Henri E. Bal, and Kees Verstoep. 2000. Fast Measurement of LogP Parameters for Message Passing Platforms. Springer Berlin Heidelberg, Berlin, Heidelberg, 1176--1183.
[12]
Andi Kleen. 2005. A NUMA API for Linux. Novel Inc (2005). http://halobates.de/numaapi3.pdf
[13]
G. Mercier and J. Clet-Ortega. 2009. Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments. In EuroPVM/MPI (Lecture Notes in Computer Science), Vol. 5759. Springer, Espoo, Finland, 104--115.
[14]
G. Mercier and E. Jeannot. 2011. Improving MPI Applications Performance on Multicore Clusters with Rank Reordering. In EuroMPI (Lecture Notes in Computer Science), Vol. 6960. Springer, Santorini, Greece, 39--49.
[15]
Message Passing Interface Forum. 2012. MPI: A Message-Passing Interface Standard, Version 3.0. Technical Report. http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf
[16]
Bradford Nichols, Dick Buttlar, and Jacqueline Proulx Farrell. 1996. Pthreads Programming. O'Reilly & Associates, Inc., Sebastopol, CA, USA.
[17]
Plafrim. Plate-forme Fédérative pour la Recherche en Informatique et Mathématiques. (????). https://plafrim.bordeaux.inria.fr/doku.php.
[18]
Jean-Noel Quintin, Khalid Hasanov, and A. Lastovetsky. 2013. Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms. In 42nd International Conference on Parallel Processing (ICPP 2013). IEEE, IEEE, Lyon, France, 754--762.
[19]
M. J. Rashti, J. Green, P. Balaji, A. Afsahi, and W. Gropp. 2011. Multi-core and Network Aware MPI Topology Functions. In EuroMPI 2011. Recent Advances in the Message Passing Interface - 18th European MPI Users' Group Meeting (Lecture Notes in Computer Science), Y. Cotronis, A. Danalis, D. S. Nikolopoulos, and J. Dongarra (Eds.), Vol. 6960. Springer, 50--60.
[20]
James Reinders and Jim Jeffers. 2015. High Performance Parallelism Pearls (1 ed.). Vol. 2. Morgan Kaufmann.
[21]
R. A. Van De Geijn and J. Watts. 1997. SUMMA: scalable universal matrix multiplication algorithm. Concurrency: Practice and Experience 9, 4 (1997), 255--274. <255::AID-CPE250>3.0.CO;2-2
[22]
J. Zhang, J. Zhai, W. Chen, and W. Zheng. 2009. Process Mapping for MPI Collective Communications. In Euro-Par (Lecture Notes in Computer Science), H. J. Sips, D. H. J. Epema, and H.-X. Lin (Eds.), Vol. 5704. Springer, 81--92.
[23]
H. Zhu, D. Goodell, W. Gropp, and R. Thakur. 2009. Hierarchical Collectives in MPICH2. In Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface. Springer-Verlag, Berlin, Heidelberg, 325--326.

Cited By

View all
  • (2023)Efficient Approaches to Mitigate Communication Bottlenecks in MPI Communicator Splitting by Type2023 5th International Conference on Frontiers Technology of Information and Computer (ICFTIC)10.1109/ICFTIC59930.2023.10455835(217-222)Online publication date: 17-Nov-2023
  • (2023)An Analysis of Long-Tailed Network Latency Distribution and Background Traffic on Dragonfly+Benchmarking, Measuring, and Optimizing10.1007/978-3-031-31180-2_8(123-142)Online publication date: 13-May-2023
  • (2019)Leveraging Network-level parallelism with Multiple Process-Endpoints for MPI Broadcast2019 IEEE/ACM Third Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware (IPDRM)10.1109/IPDRM49579.2019.00009(34-41)Online publication date: Nov-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EuroMPI '17: Proceedings of the 24th European MPI Users' Group Meeting
September 2017
169 pages
ISBN:9781450348492
DOI:10.1145/3127024
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

  • Mellanox: Mellanox Technologies
  • Intel: Intel

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 September 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. hardware topology
  2. hierarchy
  3. message passing

Qualifiers

  • Research-article

Conference

EuroMPI/USA '17
Sponsor:
  • Mellanox
  • Intel
EuroMPI/USA '17: 24th European MPI Users' Group Meeting
September 25 - 28, 2017
Illinois, Chicago

Acceptance Rates

EuroMPI '17 Paper Acceptance Rate 17 of 37 submissions, 46%;
Overall Acceptance Rate 66 of 139 submissions, 47%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Efficient Approaches to Mitigate Communication Bottlenecks in MPI Communicator Splitting by Type2023 5th International Conference on Frontiers Technology of Information and Computer (ICFTIC)10.1109/ICFTIC59930.2023.10455835(217-222)Online publication date: 17-Nov-2023
  • (2023)An Analysis of Long-Tailed Network Latency Distribution and Background Traffic on Dragonfly+Benchmarking, Measuring, and Optimizing10.1007/978-3-031-31180-2_8(123-142)Online publication date: 13-May-2023
  • (2019)Leveraging Network-level parallelism with Multiple Process-Endpoints for MPI Broadcast2019 IEEE/ACM Third Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware (IPDRM)10.1109/IPDRM49579.2019.00009(34-41)Online publication date: Nov-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media