skip to main content
10.1145/3615318.3615326acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurompiConference Proceedingsconference-collections
research-article

Evaluating the Viability of LogGP for Modeling MPI Performance with Non-contiguous Datatypes on Modern Architectures

Published:21 September 2023Publication History

ABSTRACT

Modern architectures and communication systems software include complex hardware, communication abstractions, and optimizations that make their performance difficult to measure, model, and understand. This paper examines the ability of modified versions of the existing Netgauge communication performance measurement tool and LogGOPS performance model to accurately characterize communication behavior of modern hardware, MPI abstractions, and implementations. This includes analyzing their ability to model both GPU-aware communication in different MPI implementations and quantifying the performance characteristics of different approaches to non-contiguous data communication on modern GPU systems. This paper also applies these techniques to quantify the performance of different implementations and optimization approaches to non-contiguous data communication on a variety of systems, demonstrating that modern communication system design approaches can result in widely-varying and difficult-to-predict performance variation, even within the same hardware/communication software combination.

References

  1. Albert Alexandrov, Mihai F. Ionescu, Klaus E. Schauser, and Chris Scheiman. 1995. LogGP: Incorporating Long Messages into the LogP Model—One Step Closer towards a Realistic Model for Parallel Computation. In Proceedings of the Seventh Annual ACM Symposium on Parallel Algorithms and Architectures (Santa Barbara, California, USA) (SPAA ’95). Association for Computing Machinery, New York, NY, USA, 95–105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Nicholas Bacon. 2023. GPU Datatype Enhanced Netgauge. https://github.com/CUP-ECS/datatypes-logGPGoogle ScholarGoogle Scholar
  3. Amanda Bienz, Luke N. Olson, William D. Gropp, and Shelby Lockhart. 2021. Modeling Data Movement Performance on Heterogeneous Architectures. In 2021 IEEE High Performance Extreme Computing Conference (HPEC). 1–7.Google ScholarGoogle Scholar
  4. Dan Bonachea and Paul H Hargrove. 2019. GASNet-EX: A high-performance, portable communication library for exascale. In Languages and Compilers for Parallel Computing: 31st International Workshop, Salt Lake City, UT, USA, October 9–11, 2018, Revised Selected Papers 31. Springer, 138–158.Google ScholarGoogle Scholar
  5. Michael Boyer, Jiayuan Meng, and Kalyan Kumaran. 2013. Improving GPU performance prediction with data transfer modeling. In 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum. IEEE, 1097–1106.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. David Culler, Richard Karp, David Patterson, Abhijit Sahay, Klaus Erik Schauser, Eunice Santos, Ramesh Subramonian, and Thorsten von Eicken. 1993. LogP: Towards a Realistic Model of Parallel Computation. In Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, 1–12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Keira Haskins, Patrick Bridges, Kurt Ferreira, and Scott Levy. 2021. A Benchmark to Understand Communication Performance in Hybrid MPI and GPU Applications.Technical Report. Sandia National Laboratory, Albuquerque, NM.Google ScholarGoogle Scholar
  8. Torsten Hoefler, Torsten Mehlan, Andrew Lumsdaine, and Wolfgang Rehm. 2007. Netgauge: A Network Performance Measurement Framework. In Proceedings of High Performance Computing and Communications, HPCC’07 (Houston, USA), Vol. 4782. Springer, 659–671.Google ScholarGoogle ScholarCross RefCross Ref
  9. Torsten Hoefler, Timo Schneider, and Andrew Lumsdaine. 2010. LogGOPSim: simulating large-scale applications in the LogGOPS model. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. 597–604.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Fumihiko Ino, Noriyuki Fujimoto, and Kenichi Hagihara. 2001. LogGPS: a parallel computational model for synchronization analysis. In Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming. 133–142.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Argonne National Laboratory. 2020. Yaksa : High-performance Noncontiguous Data Management. https://www.yaksa.org/.Google ScholarGoogle Scholar
  12. Lawrence Berkeley National Laboratory. 2023. GASNet-EX API Description. https://gasnet.lbl.gov/docs/GASNet-EX.txtGoogle ScholarGoogle Scholar
  13. Message Passing Interface Forum. 2021. MPI: A Message-Passing Interface Standard Version 4.0. https://www.mpi-forum.org/docs/Google ScholarGoogle Scholar
  14. Csaba Andras Moritz. 1998. Cost Modeling and Analysis: Towards Optimal Resource Utilization in Parallel Computer Systems. Ph. D. Thesis, Royal Institute of Technology (1998).Google ScholarGoogle Scholar
  15. NVIDIA. 2022. Faster memory transfers between CPU and GPU with GDRCopy. https://developer.nvidia.com/gdrcopyGoogle ScholarGoogle Scholar
  16. OpenUCX. 2023. Data type routines. https://openucx.readthedocs.io/en/master/api.html#data-type-routinesGoogle ScholarGoogle Scholar
  17. Dhabaleswar K Panda, Karen Tomko, Karl Schulz, and Amitava Majumdar. 2013. The MVAPICH project: Evolution and sustainability of an open source production quality MPI library for HPC. In Workshop on Sustainable Software for Science: Practice and Experiences, held in conjunction with Int’l Conference on Supercomputing (WSSPE).Google ScholarGoogle Scholar
  18. Carl Pearson, Kun Wu, I-Hsin Chung, Jinjun Xiong, and Wen-Mei Hwu. 2021. TEMPI: An interposed MPI library with a canonical representation of CUDA-aware datatypes. In Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing. 95–106.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Rong Shi, Xiaoyi Lu, Sreeram Potluri, Khaled Hamidouche, Jie Zhang, and Dhabaleswar K Panda. 2014. Hand: A hybrid approach to accelerate non-contiguous data movement using MPI datatypes on GPU clusters. In 2014 43rd International Conference on Parallel Processing. IEEE, 221–230.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Xian-He Sun 2003. Improving the performance of MPI derived datatypes by optimizing memory-access cost. In 2003 Proceedings IEEE International Conference on Cluster Computing. IEEE, 412–419.Google ScholarGoogle Scholar
  21. Kaushik Kandadi Suresh, Kawthar Shafie Khorassani, Chen Chun Chen, Bharath Ramesh, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, and Dhabaleswar K Panda. 2022. Network Assisted Non-Contiguous Transfers for GPU-Aware MPI Libraries. In 2022 IEEE Symposium on High-Performance Interconnects (HOTI). IEEE, 13–20.Google ScholarGoogle Scholar
  22. Ben Van Werkhoven, Jason Maassen, Frank J Seinstra, and Henri E Bal. 2014. Performance models for CPU-GPU data transfers. In 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. IEEE, 11–20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Hao Wang, Sreeram Potluri, Miao Luo, Ashish Kumar Singh, Xiangyong Ouyang, Sayantan Sur, and Dhabaleswar K Panda. 2011. Optimized non-contiguous MPI datatype communication for GPU clusters: Design, implementation and evaluation with MVAPICH2. In 2011 IEEE International Conference on Cluster Computing. IEEE, 308–316.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Evaluating the Viability of LogGP for Modeling MPI Performance with Non-contiguous Datatypes on Modern Architectures
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            EuroMPI '23: Proceedings of the 30th European MPI Users' Group Meeting
            September 2023
            123 pages
            ISBN:9798400709135
            DOI:10.1145/3615318

            Copyright © 2023 ACM

            Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 21 September 2023

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

            Acceptance Rates

            Overall Acceptance Rate66of139submissions,47%
          • Article Metrics

            • Downloads (Last 12 months)27
            • Downloads (Last 6 weeks)5

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format