research-article

Evaluating the Viability of LogGP for Modeling MPI Performance with Non-contiguous Datatypes on Modern Architectures

Authors:
Nicholas H Bacon

Department of Computer Science, University of New Mexico, USA

Department of Computer Science, University of New Mexico, USA

0009-0002-7536-9630
View Profile

,
Patrick Bridges

Department of Computer Science, University of New Mexico, USA

Department of Computer Science, University of New Mexico, USA

0000-0003-4801-0390
View Profile

,
Scott Levy

Sandia National Laboratories, USA

Sandia National Laboratories, USA

0000-0002-2232-3201
View Profile

,
Kurt Ferreira

Sandia National Laboratories, USA

Sandia National Laboratories, USA

0000-0001-5607-5691
View Profile

,
Amanda Bienz

Department of Computer Science, University of New Mexico, USA

Department of Computer Science, University of New Mexico, USA

0000-0002-8891-934X
View Profile

EuroMPI '23: Proceedings of the 30th European MPI Users' Group MeetingSeptember 2023Article No.: 8Pages 1–10https://doi.org/10.1145/3615318.3615326

Published:21 September 2023Publication History

EuroMPI '23: Proceedings of the 30th European MPI Users' Group Meeting

Pages 1–10

ABSTRACT

Modern architectures and communication systems software include complex hardware, communication abstractions, and optimizations that make their performance difficult to measure, model, and understand. This paper examines the ability of modified versions of the existing Netgauge communication performance measurement tool and LogGOPS performance model to accurately characterize communication behavior of modern hardware, MPI abstractions, and implementations. This includes analyzing their ability to model both GPU-aware communication in different MPI implementations and quantifying the performance characteristics of different approaches to non-contiguous data communication on modern GPU systems. This paper also applies these techniques to quantify the performance of different implementations and optimization approaches to non-contiguous data communication on a variety of systems, demonstrating that modern communication system design approaches can result in widely-varying and difficult-to-predict performance variation, even within the same hardware/communication software combination.

References

Albert Alexandrov, Mihai F. Ionescu, Klaus E. Schauser, and Chris Scheiman. 1995. LogGP: Incorporating Long Messages into the LogP Model—One Step Closer towards a Realistic Model for Parallel Computation. In Proceedings of the Seventh Annual ACM Symposium on Parallel Algorithms and Architectures (Santa Barbara, California, USA) (SPAA ’95). Association for Computing Machinery, New York, NY, USA, 95–105.Google ScholarDigital Library
Nicholas Bacon. 2023. GPU Datatype Enhanced Netgauge. https://github.com/CUP-ECS/datatypes-logGPGoogle Scholar
Amanda Bienz, Luke N. Olson, William D. Gropp, and Shelby Lockhart. 2021. Modeling Data Movement Performance on Heterogeneous Architectures. In 2021 IEEE High Performance Extreme Computing Conference (HPEC). 1–7.Google Scholar
Dan Bonachea and Paul H Hargrove. 2019. GASNet-EX: A high-performance, portable communication library for exascale. In Languages and Compilers for Parallel Computing: 31st International Workshop, Salt Lake City, UT, USA, October 9–11, 2018, Revised Selected Papers 31. Springer, 138–158.Google Scholar
Michael Boyer, Jiayuan Meng, and Kalyan Kumaran. 2013. Improving GPU performance prediction with data transfer modeling. In 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum. IEEE, 1097–1106.Google ScholarDigital Library
David Culler, Richard Karp, David Patterson, Abhijit Sahay, Klaus Erik Schauser, Eunice Santos, Ramesh Subramonian, and Thorsten von Eicken. 1993. LogP: Towards a Realistic Model of Parallel Computation. In Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, 1–12.Google ScholarDigital Library
Keira Haskins, Patrick Bridges, Kurt Ferreira, and Scott Levy. 2021. A Benchmark to Understand Communication Performance in Hybrid MPI and GPU Applications.Technical Report. Sandia National Laboratory, Albuquerque, NM.Google Scholar
Torsten Hoefler, Torsten Mehlan, Andrew Lumsdaine, and Wolfgang Rehm. 2007. Netgauge: A Network Performance Measurement Framework. In Proceedings of High Performance Computing and Communications, HPCC’07 (Houston, USA), Vol. 4782. Springer, 659–671.Google ScholarCross Ref
Torsten Hoefler, Timo Schneider, and Andrew Lumsdaine. 2010. LogGOPSim: simulating large-scale applications in the LogGOPS model. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. 597–604.Google ScholarDigital Library
Fumihiko Ino, Noriyuki Fujimoto, and Kenichi Hagihara. 2001. LogGPS: a parallel computational model for synchronization analysis. In Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming. 133–142.Google ScholarDigital Library
Argonne National Laboratory. 2020. Yaksa : High-performance Noncontiguous Data Management. https://www.yaksa.org/.Google Scholar
Lawrence Berkeley National Laboratory. 2023. GASNet-EX API Description. https://gasnet.lbl.gov/docs/GASNet-EX.txtGoogle Scholar
Message Passing Interface Forum. 2021. MPI: A Message-Passing Interface Standard Version 4.0. https://www.mpi-forum.org/docs/Google Scholar
Csaba Andras Moritz. 1998. Cost Modeling and Analysis: Towards Optimal Resource Utilization in Parallel Computer Systems. Ph. D. Thesis, Royal Institute of Technology (1998).Google Scholar
NVIDIA. 2022. Faster memory transfers between CPU and GPU with GDRCopy. https://developer.nvidia.com/gdrcopyGoogle Scholar
OpenUCX. 2023. Data type routines. https://openucx.readthedocs.io/en/master/api.html#data-type-routinesGoogle Scholar
Dhabaleswar K Panda, Karen Tomko, Karl Schulz, and Amitava Majumdar. 2013. The MVAPICH project: Evolution and sustainability of an open source production quality MPI library for HPC. In Workshop on Sustainable Software for Science: Practice and Experiences, held in conjunction with Int’l Conference on Supercomputing (WSSPE).Google Scholar
Carl Pearson, Kun Wu, I-Hsin Chung, Jinjun Xiong, and Wen-Mei Hwu. 2021. TEMPI: An interposed MPI library with a canonical representation of CUDA-aware datatypes. In Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing. 95–106.Google ScholarDigital Library
Rong Shi, Xiaoyi Lu, Sreeram Potluri, Khaled Hamidouche, Jie Zhang, and Dhabaleswar K Panda. 2014. Hand: A hybrid approach to accelerate non-contiguous data movement using MPI datatypes on GPU clusters. In 2014 43rd International Conference on Parallel Processing. IEEE, 221–230.Google ScholarDigital Library
Xian-He Sun 2003. Improving the performance of MPI derived datatypes by optimizing memory-access cost. In 2003 Proceedings IEEE International Conference on Cluster Computing. IEEE, 412–419.Google Scholar
Kaushik Kandadi Suresh, Kawthar Shafie Khorassani, Chen Chun Chen, Bharath Ramesh, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, and Dhabaleswar K Panda. 2022. Network Assisted Non-Contiguous Transfers for GPU-Aware MPI Libraries. In 2022 IEEE Symposium on High-Performance Interconnects (HOTI). IEEE, 13–20.Google Scholar
Ben Van Werkhoven, Jason Maassen, Frank J Seinstra, and Henri E Bal. 2014. Performance models for CPU-GPU data transfers. In 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. IEEE, 11–20.Google ScholarDigital Library
Hao Wang, Sreeram Potluri, Miao Luo, Ashish Kumar Singh, Xiangyong Ouyang, Sayantan Sur, and Dhabaleswar K Panda. 2011. Optimized non-contiguous MPI datatype communication for GPU clusters: Design, implementation and evaluation with MVAPICH2. In 2011 IEEE International Conference on Cluster Computing. IEEE, 308–316.Google ScholarDigital Library

Index Terms

Evaluating the Viability of LogGP for Modeling MPI Performance with Non-contiguous Datatypes on Modern Architectures

Index terms have been assigned to the content through auto-classification.

Recommendations

LogGP Performance Evaluation of MPI
HPDC '98: Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing

Users of parallel machines need good performance evaluations for several communication patterns in order to develop efficient message-passing applications. LogGP is a simple parallel machine model that reflects the important parameters required to ...
Read More
LogGP

We present a new model of parallel computation the LogGP model and use it to analyze a number of algorithms, most notably, the single node scatter (one-to-all personalized broadcast). The LogGP model is an extension of the LogP model for parallel ...
Read More
Performance Modeling and Evaluation of MPI

Users of parallel machines need to have a good grasp for how different communication patterns and styles affect the performance of message-passing applications. LogGP is a simple performance model that reflects the most important parameters required to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

EuroMPI '23: Proceedings of the 30th European MPI Users' Group Meeting
September 2023
123 pages
ISBN:9798400709135
DOI:10.1145/3615318

Copyright © 2023 ACM
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 September 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate66of139submissions,47%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 27
  Total Downloads
- Downloads (Last 12 months)27
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Evaluating the Viability of LogGP for Modeling MPI Performance with Non-contiguous Datatypes on Modern Architectures

EuroMPI '23: Proceedings of the 30th European MPI Users' Group Meeting

ABSTRACT

References

Cited By

Index Terms

Recommendations

LogGP Performance Evaluation of MPI

LogGP

Performance Modeling and Evaluation of MPI

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Evaluating the Viability of LogGP for Modeling MPI Performance with Non-contiguous Datatypes on Modern Architectures

EuroMPI '23: Proceedings of the 30th European MPI Users' Group Meeting

ABSTRACT

References

Cited By

Index Terms

Recommendations

LogGP Performance Evaluation of MPI

LogGP

Performance Modeling and Evaluation of MPI

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media