research-article

Using Node Information to Implement MPI Cartesian Topologies

Author:
William D. Gropp

University of Illinois at Urbana-Champaign

University of Illinois at Urbana-Champaign
View Profile

EuroMPI '18: Proceedings of the 25th European MPI Users' Group MeetingSeptember 2018Article No.: 18Pages 1–9https://doi.org/10.1145/3236367.3236377

Published:23 September 2018Publication History

EuroMPI '18: Proceedings of the 25th European MPI Users' Group Meeting

Pages 1–9

ABSTRACT

The MPI API provides support for Cartesian process topologies, including the option to reorder the processes to achieve better communication performance. But MPI implementations rarely provide anything useful for the reorder option, typically ignoring it. One argument made is that modern interconnects are fast enough that applications are less sensitive to the exact layout of processes onto the system. However, intranode communication performance is much greater than internode communication performance. In this paper, we show a simple approach that takes into account only information about which MPI processes are on the same node to provide a fast and effective implementation of the MPI Cartesian topology. While not optimal, this approach provides a significant improvement over all tested MPI implementations and provides an implementation that may be used as the default in any MPI implementation of MPI_Cart_create.

References

Blue Waters Project 2018. Topology Considerations. https://bluewaters.ncsa.illinois.edu/topology-considerations. (2018).Google Scholar
CORAL 2014. CORAL Collaboration Benchmark Codes. https://asc.llnl.gov/CORAL-benchmarks/. (2014).Google Scholar
Juan J. Galvez, Nikhil Jain, and Laxmikant V. Kale. 2017. Automatic Topology Mapping of Diverse Large-scale Parallel Applications. In Proceedings of the International Conference on Supercomputing (ICS '17). ACM, New York, NY, USA, Article 17, 10 pages. Google ScholarDigital Library
William Gropp, Luke N. Olson, and Philipp Samfass. 2016. Modeling MPI Communication Performance on SMP Nodes: Is It Time to Retire the Ping Pong Test. In Proceedings of the 23rd European MPI Users' Group Meeting (EuroMPI 2016). ACM, New York, NY, USA, 41--50. Google ScholarDigital Library
William D. Gropp and Ewing Lusk. 1999. Reproducible Measurements of MPI Performance Characteristics. In Recent Advances in Parallel Virtual Machine and Message Passing Interface (Lecture Notes in Computer Science), Jack Dongarra, Emilio Luque, and Tomàs Margalef (Eds.), Vol. 1697. Springer Verlag, 11--18. 6th European PVM/MPI Users' Group Meeting, Barcelona, Spain, September 1999. Google ScholarDigital Library
T. Hoefler, R. Rabenseifner, H. Ritzdorf, B. R. de Supinski, R. Thakur, and J. L. Träff. 2010. The Scalable Process Topology Interface of MPI 2.2. Concurrency and Computation: Practice and Experience 23, 4 (Aug. 2010), 293--310. Google ScholarDigital Library
T. Hoefler and M. Snir. 2011. Generic Topology Mapping Strategies for Large-scale Parallel Architectures. In Proceedings of the 2011 ACM International Conference on Supercomputing (ICS'11). ACM, 75--85. Google ScholarDigital Library
HOMB 2009. Hybrid OpenMP MPI Benchmark 1.0. https://sourceforge.net/projects/homb/. (May 2009).Google Scholar
HPCC 2003. HPC Challenge Benchmark. http://icl.cs.utk.edu/hpcc/. (2003).Google Scholar
LAMMPS {n. d.}. LAMMPS Benchmarks. http://lammps.sandia.gov/bench.html. ({n. d.}).Google Scholar
Piotr Luszczek, Jack J. Dongarra, David Koester, Rolf Rabenseifner, Bob Lucas, Jeremy Kepner, John McCalpin, David Bailey, and Daisuke Takahashi. 2005. Introduction to the HPC Challenge Benchmark Suite. Technical Report LBNL-57493. Lawrence Berkeley National Laboratory. https://www.osti.gov/servlets/purl/860347.Google Scholar
Teng Ma, George Bosilca, Aurelien Bouteiller, and Jack J. Dongarra. 2010. Locality and Topology Aware Intra-node Communication Among Multicore CPUs. In Proceedings of the 17th European MPI Users' Group Meeting Conference on Recent Advances in the Message Passing Interface (EuroMPI'10). Springer-Verlag, Berlin, Heidelberg, 265--274. http://dl.acm.org/citation.cfm?id=1894122.1894158 Google ScholarDigital Library
Guillaume Mercier and Emmanuel Jeannot. 2011. Improving MPI Applications Performance on Multicore Clusters with Rank Reordering. In Recent Advances in the Message Passing Interface, Yiannis Cotronis, Anthony Danalis, Dimitrios S. Nikolopoulos, and Jack Dongarra (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 39--49. Google ScholarDigital Library
NAS 2016. NAS Parallel Benchmarks. https://www.nas.nasa.gov/publications/npb.html. (2016).Google Scholar
Antonio J. Peña, Ralf G. Correa Carvalho, James Dinan, Pavan Balaji, Rajeev Thakur, and William Gropp. 2013. Analysis of topology-dependent MPI performance on Gemini networks. In 20th European MPI Users's Group Meeting, EuroMPI '13, Madrid, Spain - September 15 - 18, 2013, Jack Dongarra, Javier García Blas, and Jesús Carretero (Eds.). ACM, 61--66. http://dl.acm.org/citation.cfm?id=2488551 Google ScholarDigital Library
Steve Plimpton. 1995. Fast Parallel Algorithms for Short-range Molecular Dynamics. J. Comput. Phys. 117, 1 (March 1995), 1--19. Google ScholarDigital Library
POP. 2003. Parallel Ocean Program (POP) User Guide Version 2.0. Technical Report LACC 99--18. Los Alamos National Laboratory.Google Scholar
Mohammad J. Rashti, Jonathan Green, Pavan Balaji, Ahmad Afsahi, and William Gropp. 2011. Multi-core and Network Aware MPI Topology Functions. In Recent Advances in the Message Passing Interface - 18th European MPI Users' Group Meeting, EuroMPI 2011, Santorini, Greece, September 18-21, 2011. Proceedings (Lecture Notes in Computer Science), Yiannis Cotronis, Anthony Danalis, Dimitrios S. Nikolopoulos, and Jack Dongarra (Eds.), Vol. 6960. Springer, 50--60. Google ScholarDigital Library
SPP 2017. SPP-2017 Instructions and Reporting. (2017). https://bluewaters.ncsa.illinois.edu/spp-methodology.Google Scholar
Jesper Larsson Träff. 2002. Implementing the MPI Process Topology Mechanism. In Proceedings of the 2002 ACM/IEEE Conference on Supercomputing (SC '02). IEEE Computer Society Press, Los Alamitos, CA, USA, 1--14. http://dl.acm.org/citation.cfm?id=762761.762767 Google ScholarDigital Library
Jesper Larsson Träff and Felix Donatus Lübbe. 2015. Specification Guideline Violations by MPI_Dims_create. In Proceedings of the 22Nd European MPI Users' Group Meeting (EuroMPI '15). ACM, New York, NY, USA, Article 19, 2 pages. Google ScholarDigital Library
Ghobad Zarrinchian, Mohsen Soryani, and Morteza Analoui. 2012. A New Process Placement Algorithm in Multi-core Clusters Aimed to Reducing Network Interface Contention. In Advances in Computer Science, Engineering & Applications, David C. Wyld, Jan Zizka, and Dhinaharan Nagamalai (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 1041--1050.Google Scholar

Index Terms

Using Node Information to Implement MPI Cartesian Topologies
1. Computing methodologies
  1. Parallel computing methodologies

Recommendations

Using node and socket information to implement MPI Cartesian topologies
Highlights
- This paper describes a method for implementing the MPI Cartesian process topology that uses only information about which processes are on the same node, ...
Abstract
The MPI API provides support for Cartesian process topologies, including the option to reorder the processes to achieve better communication performance. But MPI implementations rarely provide anything useful for the reorder option, ...
Read More
Performance Evaluation of MPI Implementations and MPI-Based Parallel ELLPACK Solvers
MPIDC '96: Proceedings of the Second MPI Developers Conference

Abstract: We are concerned with the parallelization of finite element mesh generation and its decomposition, and the parallel solution of sparse algebraic equations which are obtained from the parallel discretization of second order elliptic partial ...
Read More
Self-Consistent MPI Performance Guidelines

Message passing using the Message-Passing Interface (MPI) is at present the most widely adopted framework for programming parallel applications for distributed memory and clustered parallel systems. For reasons of (universal) implementability, the MPI ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

EuroMPI '18: Proceedings of the 25th European MPI Users' Group Meeting
September 2018
187 pages
ISBN:9781450364928
DOI:10.1145/3236367

Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 September 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Cartesian process topology
MPI
Message passing
Process topology
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate66of139submissions,47%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 225
  Total Downloads
- Downloads (Last 12 months)22
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Using Node Information to Implement MPI Cartesian Topologies

EuroMPI '18: Proceedings of the 25th European MPI Users' Group Meeting

ABSTRACT

References

Cited By

Index Terms

Recommendations

Using node and socket information to implement MPI Cartesian topologies

Performance Evaluation of MPI Implementations and MPI-Based Parallel ELLPACK Solvers

Self-Consistent MPI Performance Guidelines

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Using Node Information to Implement MPI Cartesian Topologies

EuroMPI '18: Proceedings of the 25th European MPI Users' Group Meeting

ABSTRACT

References

Cited By

Index Terms

Recommendations

Using node and socket information to implement MPI Cartesian topologies

Performance Evaluation of MPI Implementations and MPI-Based Parallel ELLPACK Solvers

Self-Consistent MPI Performance Guidelines

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media