Hardware MPI message matching: Insights into MPI matching behavior to inform design: Hardware MPI message matching

Ferreira, Kurt; Grant, Ryan E.; Levenhagen, Michael J.; Levy, Scott; Groves, Taylor

doi:10.1002/cpe.5150

Title: Hardware MPI message matching: Insights into MPI matching behavior to inform design: Hardware MPI message matching

Journal Article · Wed Feb 27 00:00:00 EST 2019 · Concurrency and Computation. Practice and Experience

DOI:https://doi.org/10.1002/cpe.5150· OSTI ID:1501630

Ferreira, Kurt ^[1]; Grant, Ryan E. ^[1]; Levenhagen, Michael J. ^[1];

^[1]; Groves, Taylor ^[2]

Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

Here, this paper explores key differences of MPI match lists for several important United States Department of Energy (DOE) applications and proxy applications. This understanding is critical in determining the most promising hardware matching design for any given high-speed network. The results of MPI match list studies for the major open-source MPI implementations, MPICH and Open MPI, are presented, and we modify an MPI simulator, LogGOPSim, to provide match list statistics. These results are discussed in the context of several different potential design approaches to MPI matching–capable hardware. The data illustrate the requirements for different hardware designs in terms of performance and memory capacity. Finally, this paper's contributions are the collection and analysis of data to help inform hardware designers of common MPI requirements and highlight the difficulties in determining these requirements by only examining a single MPI implementation.

View Accepted Manuscript (DOE)

Cite

Export

Save

Research Organization:: Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

Sponsoring Organization:: USDOE National Nuclear Security Administration (NNSA)

Grant/Contract Number:: AC04-94AL85000; NA0003525; AC02‐05CH11231

OSTI ID:: 1501630

Report Number(s):: SAND-2019-0943J; 671923

Journal Information:: Concurrency and Computation. Practice and Experience, Vol. 32, Issue 3; ISSN 1532-0626

Publisher:: WileyCopyright Statement

Country of Publication:: United States

Language:: English

Citation Metrics:

Cited by: 6 works

Citation information provided by
Web of Science

References (30)

An architecture to perform NIC based MPI matching Hemmert, K. Scott; Underwood, Keith D.; Rodrigues, Arun 2007 IEEE International Conference on Cluster Computing (CLUSTER) https://doi.org/10.1109/CLUSTR.2007.4629234	conference	September 2007
A Dedicated Message Matching Mechanism for Collective Communications Ghazimirsaeed, S. Mahdieh; Grant, Ryan E.; Afsahi, Ahmad Proceedings of the 47th International Conference on Parallel Processing Companion - ICPP '18 https://doi.org/10.1145/3229710.3229712	conference	January 2018
Improving MPI Multi-threaded RMA Communication Performance Hjelm, Nathan; Dosanjh, Matthew G. F.; Grant, Ryan E. Proceedings of the 47th International Conference on Parallel Processing - ICPP 2018 https://doi.org/10.1145/3225058.3225114	conference	January 2018
The Case for Semi-Permanent Cache Occupancy: Understanding the Impact of Data Locality on Network Processing Dosanjh, Matthew G. F.; Ghazimirsaeed, S. Mahdieh; Grant, Ryan E. Proceedings of the 47th International Conference on Parallel Processing - ICPP 2018 https://doi.org/10.1145/3225058.3225130	conference	January 2018
Early Experiences Co-Scheduling Work and Communication Tasks for Hybrid MPI+X Applications Stark, Dylan T.; Barrett, Richard F.; Grant, Ryan E. 2014 Workshop on Exascale MPI at Supercomputing Conference (ExaMPI) https://doi.org/10.1109/ExaMPI.2014.6	conference	November 2014
An evaluation of MPI message rate on hybrid-core processors Barrett, Brian W.; Brightwell, Ron; Grant, Ryan The International Journal of High Performance Computing Applications, Vol. 28, Issue 4 https://doi.org/10.1177/1094342014552085	journal	November 2014
Re-evaluating Network Onload vs. Offload for the Many-Core Era Dosanjh, Matthew G. F.; Grant, Ryan E.; Bridges, Patrick G. 2015 IEEE International Conference on Cluster Computing (CLUSTER) https://doi.org/10.1109/CLUSTER.2015.55	conference	September 2015
Myrinet: a gigabit-per-second local area network Boden, N. J.; Cohen, D.; Felderman, R. E. IEEE Micro, Vol. 15, Issue 1 https://doi.org/10.1109/40.342015	journal	January 1995
Performance of particle in cell methods on highly concurrent computational architectures Adams, M. F.; Ethier, S.; Wichmann, N. Journal of Physics: Conference Series, Vol. 78 https://doi.org/10.1088/1742-6596/78/1/012001	journal	July 2007
Eliminating contention bottlenecks in multithreaded MPI Dang, Hoang-Vu; Snir, Marc; Gropp, William Parallel Computing, Vol. 69 https://doi.org/10.1016/j.parco.2017.08.003	journal	November 2017
Enabling communication concurrency through flexible MPI endpoints Dinan, James; Grant, Ryan E.; Balaji, Pavan The International Journal of High Performance Computing Applications, Vol. 28, Issue 4 https://doi.org/10.1177/1094342014548772	journal	September 2014
Characterizing MPI matching via trace-based simulation Ferreira, Kurt B.; Levy, Scott; Pedretti, Kevin Proceedings of the 24th European MPI Users' Group Meeting on - EuroMPI '17 https://doi.org/10.1145/3127024.3127040	conference	January 2017
Understanding Performance Interference in Next-Generation HPC Systems Mondragon, Oscar H.; Bridges, Patrick G.; Levy, Scott SC16: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2016.32	conference	November 2016
LogGOPSim: simulating large-scale applications in the LogGOPS model Hoefler, Torsten; Schneider, Timo; Lumsdaine, Andrew Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing - HPDC '10 https://doi.org/10.1145/1851476.1851564	conference	January 2010
Fast Parallel Algorithms for Short-Range Molecular Dynamics Plimpton, Steve Journal of Computational Physics, Vol. 117, Issue 1 https://doi.org/10.1006/jcph.1995.1039	journal	March 1995
Characterizing MPI matching via trace-based simulation Ferreira, Kurt B.; Levy, Scott; Pedretti, Kevin Parallel Computing, Vol. 77 https://doi.org/10.1016/j.parco.2018.05.005	journal	September 2018
A high-performance, portable implementation of the MPI message passing interface standard Gropp, William; Lusk, Ewing; Doss, Nathan Parallel Computing, Vol. 22, Issue 6 https://doi.org/10.1016/0167-8191(96)00024-5	journal	September 1996
Why is MPI so slow?: analyzing the fundamental limits in implementing MPI-3.1 Raffenetti, Ken; Blocksome, Michael; Si, Min Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17 https://doi.org/10.1145/3126908.3126963	conference	January 2017
Characterizing the Influence of System Noise on Large-Scale Applications by Simulation Hoefler, Torsten; Schneider, Timo; Lumsdaine, Andrew 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.12	conference	November 2010
The BXI Interconnect Architecture Derradji, Said; Palfer-Sollier, Thibaut; Panziera, Jean-Pierre 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects (HOTI) https://doi.org/10.1109/HOTI.2015.15	conference	August 2015
Instrumentation and Analysis of MPI Queue Times on the SeaStar High-Performance Network Brightwell, R.; Pedretti, K.; Ferreira, K. 17th International Conference on Computer Communications and Networks 2008, 2008 Proceedings of 17th International Conference on Computer Communications and Networks https://doi.org/10.1109/ICCCN.2008.ECP.116	conference	August 2008
Preparing for exascale: modeling MPI for many-core systems using fine-grain queues Bridges, Patrick G.; Dosanjh, Matthew G. F.; Grant, Ryan Proceedings of the 3rd Workshop on Exascale MPI - ExaMPI '15 https://doi.org/10.1145/2831129.2831134	conference	January 2015
The impact of MPI queue usage on message latency Underwood, K. D.; Brightwell, R. International Conference on Parallel Processing, 2004. ICPP 2004. https://doi.org/10.1109/ICPP.2004.1327915	conference	January 2004
SeaStar Interconnect: Balanced Bandwidth for Scalable Performance Brightwell, R.; Pedretti, K. T.; Underwood, K. D. IEEE Micro, Vol. 26, Issue 3 https://doi.org/10.1109/MM.2006.65	journal	May 2006
How I Learned to Stop Worrying and Love In Situ Analytics: Leveraging Latent Synchronization in MPI Collective Algorithms Levy, Scott; Ferreira, Kurt B.; Widener, Patrick Proceedings of the 23rd European MPI Users' Group Meeting on - EuroMPI 2016 https://doi.org/10.1145/2966884.2966920	conference	January 2016
The Quadrics network: high-performance clustering technology Petrini, F.; Hoisie, A. IEEE Micro, Vol. 22, Issue 1 https://doi.org/10.1109/40.988689	journal	January 2002
A fast and resource-conscious MPI message queue mechanism for large-scale jobs Zounmevo, Judicael A.; Afsahi, Ahmad Future Generation Computer Systems, Vol. 30 https://doi.org/10.1016/j.future.2013.07.003	journal	January 2014
Protocols for Fully Offloaded Collective Operations on Accelerated Network Adapters Schneider, Timo; Hoefler, Torsten; Grant, Ryan E. 2013 42nd International Conference on Parallel Processing (ICPP) https://doi.org/10.1109/ICPP.2013.73	conference	October 2013
Toward an evolutionary task parallel integrated MPI + X programming model Barrett, Richard F.; Stark, Dylan T.; Vaughan, Courtenay T. Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '15 https://doi.org/10.1145/2712386.2712388	conference	January 2015
sPIN: High-performance streaming Processing In the Network Hoefler, Torsten; Di Girolamo, Salvatore; Taranov, Konstantin SC '17: The International Conference for High Performance Computing, Networking, Storage and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3126908.3126970	conference	November 2017

Cited By (2)

Foreword to the Special Issue of the Workshop on Exascale MPI (ExaMPI 2017) Skjellum, Anthony; Bangalore, Purushotham V.; Grant, Ryan E. Concurrency and Computation: Practice and Experience, Vol. 32, Issue 3 https://doi.org/10.1002/cpe.5459	journal	July 2019
Performance drop at executing communication-intensive parallel algorithms Moríñigo, José A.; García-Muller, Pablo; Rubio-Montero, Antonio J. The Journal of Supercomputing, Vol. 76, Issue 9 https://doi.org/10.1007/s11227-019-03142-8	journal	January 2020

Similar Records

Using Simulation to Examine the Effect of MPI Message Matching Costs on Application Performance

Journal Article · Wed Feb 27 00:00:00 EST 2019 · Parallel Computing · OSTI ID:1501630

Levy, Scott; Ferreira, Kurt B.; Schonbein, Whit; +2 more

A high-performance, portable implementation of the MPI message passing interface standard.

Journal Article · Sun Sep 01 00:00:00 EDT 1996 · Parallel Comput. · OSTI ID:1501630

Gropp, W; Lusk, E; Doss, N; +1 more

A grid-enabled MPI : message passing in heterogeneous distributed computing systems.

Conference · Thu Nov 30 00:00:00 EST 2000 · OSTI ID:1501630

Foster, I; Karonis, N T

Related Subjects

97 MATHEMATICS AND COMPUTING
hardware matching
MPI
MPI matching

Title: Hardware MPI message matching: Insights into MPI matching behavior to inform design: Hardware MPI message matching

Citation Formats

References (30)

Cited By (2)

Similar Records

Related Subjects