ABSTRACT
Kokkos provides in-memory advanced data structures, concurrency, and algorithms to support performance portable C++ parallel programming across CPUs and GPUs. The Message Passing Interface (MPI) provides the most widely used message passing model for inter-node communication. Many programmers use both Kokkos and MPI together. In this paper, Kokkos is integrated within an MPI implementation for ease of use in applications that use both Kokkos and MPI, without sacrificing performance. For instance, this model allows passing first-class Kokkos objects directly to extended C++-based MPI APIs.
We prototype this integrated model using ExaMPI, a C++17-based subset implementation of MPI-4. We then demonstrate use of our C++-friendly APIs and Kokkos extensions through benchmarks and a mini-application. We explain why direct use of Kokkos within certain parts of the MPI implementation is crucial to performance and enhanced expressivity. Although the evaluation in this paper focuses on CPU-based examples, we also motivate why making Kokkos memory spaces visible to the MPI implementation generalizes the idea of “CPU memory” and “GPU memory” in ways that enable further optimizations in heterogeneous Exascale architectures. Finally, we describe future goals and show how these mesh both with a possible future C++ API for MPI-5 as well as the potential to accelerate MPI on such architectures.
- Björn Andres, Ullrich Köthe, Thorben Kröger, and Fred A. Hamprecht. 2010. Runtime-Flexible Multi-dimensional Arrays and Views for C++98 and C++0x. CoRR abs/1008.2909 (2010). arXiv:1008.2909http://arxiv.org/abs/1008.2909Google Scholar
- Ethan T. Coon, Wael R. Elwasif, Himanshu Pillai, Peter E. Thornton, and Scott L. Painter. 2019. Exploring the Use of Novel Programming Models in Land Surface Models. In 2019 IEEE/ACM Parallel Applications Workshop, Alternatives To MPI (PAW-ATM). 1–10. https://doi.org/10.1109/PAW-ATM49560.2019.00006Google ScholarCross Ref
- Leonardo Dagum and Ramesh Menon. 1998. OpenMP: an industry standard API for shared-memory programming. Computational Science & Engineering, IEEE 5, 1 (1998), 46–55.Google ScholarDigital Library
- Gregor Daiß, Mikael Simberg, Auriane Reverdell, John Biddiscombe, Theresa Pollinger, Hartmut Kaiser, and Dirk Pflüger. 2021. Beyond Fork-Join: Integration of Performance Portable Kokkos Kernels with HPX. In 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 377–386. https://doi.org/10.1109/IPDPSW52791.2021.00066Google ScholarCross Ref
- H. Carter Edwards and Christian R. Trott. 2013. Kokkos: Enabling Performance Portability Across Manycore Architectures. In 2013 Extreme Scaling Workshop (xsw 2013). 18–24. https://doi.org/10.1109/XSW.2013.7Google ScholarDigital Library
- Edgar Gabriel, Graham E. Fagg, George Bosilca, Thara Angskun, Jack J. Dongarra, Jeffrey M. Squyres, Vishal Sahay, Prabhanjan Kambadur, Brian Barrett, Andrew Lumsdaine, Ralph H. Castain, David J. Daniel, Richard L. Graham, and Timothy S. Woodall. 2004. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation. In Proceedings, 11th European PVM/MPI Users’ Group Meeting. Budapest, Hungary, 97–104.Google ScholarCross Ref
- William Gropp, Ewing Lusk, and Anthony Skjellum. 2014. Using MPI: Portable Parallel Programming with the Message-Passing Interface. The MIT Press.Google ScholarDigital Library
- David S. Hollman, Bryce Adelstein-Lelbach, H. Carter Edwards, Mark Hoemmen, Daniel Sunderland, and Christian R. Trott. 2020. mdspan in C++: A Case Study in the Integration of Performance Portable Features into International Language Standards. CoRR abs/2010.06474 (2020). arXiv:2010.06474https://arxiv.org/abs/2010.06474Google Scholar
- John K. Holmen, Alan Humphrey, Daniel Sunderland, and Martin Berzins. 2017. Improving Uintah’s Scalability Through the Use of Portable Kokkos-Based Data Parallel Tasks. In Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact (New Orleans, LA, USA) (PEARC17). Association for Computing Machinery, New York, NY, USA, Article 27, 8 pages. https://doi.org/10.1145/3093338.3093388Google ScholarDigital Library
- John K. Holmen, Brad Peterson, and Martin Berzins. 2019. An Approach for Indirectly Adopting a Performance Portability Layer in Large Legacy Codes. In 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). 36–49. https://doi.org/10.1109/P3HPC49587.2019.00009Google ScholarCross Ref
- Samuel Khuvis, Karen Tomko, Jahanzeb Hashmi, and Dhabaleswar K. Panda. 2020. Exploring Hybrid MPI+Kokkos Tasks Programming Model. In 2020 IEEE/ACM 3rd Annual Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM). 66–73. https://doi.org/10.1109/PAWATM51920.2020.00011Google ScholarCross Ref
- Sandia National Laboratories. 2023. Kokkos Tutorials. https://github.com/kokkos/kokkos-tutorialsGoogle Scholar
- Message Passing Interface Forum. 2021. MPI: A Message-Passing Interface Standard Version 4.0. https://www.mpi-forum.org/docs/mpi-4.0/mpi40-report.pdfGoogle Scholar
- Anthony Skjellum, Martin Rüfenacht, Nawrin Sultana, Derek Schafer, Ignacio Laguna, and Kathryn Mohror. 2020. ExaMPI: A Modern Design and Implementation to Accelerate Message Passing Interface Innovation. In High Performance Computing, Juan Luis Crespo-Mariño and Esteban Meneses-Rojas (Eds.). Springer International Publishing, Cham, 153–169.Google Scholar
- Christian R. Trott, Damien Lebrun-Grandié, Daniel Arndt, Jan Ciesko, Vinh Dang, Nathan Ellingwood, Rahulkumar Gayatri, Evan Harvey, Daisy S. Hollman, Dan Ibanez, Nevin Liber, Jonathan Madsen, Jeff Miles, David Poliakoff, Amy Powell, Sivasankaran Rajamanickam, Mikael Simberg, Dan Sunderland, Bruno Turcksin, and Jeremiah Wilke. 2022. Kokkos 3: Programming Model Extensions for the Exascale Era. IEEE Transactions on Parallel and Distributed Systems 33, 4 (2022), 805–817. https://doi.org/10.1109/TPDS.2021.3097283Google ScholarCross Ref
- Christian Robert Trott, Steven J. Plimpton, and Aidan P. Thompson. 2017. Solving the performance portability issue with Kokkos. (8 2017). https://www.osti.gov/biblio/1467794Google Scholar
- Daniel Waters, Colin A MacLean, Dan Bonachea, and Paul Hargrove. 2021. Demonstrating UPC++/Kokkos Interoperability in a Heat Conduction Simulation (Extended Abstract). https://doi.org/10.25344/S4630VGoogle ScholarCross Ref
Index Terms
- View-aware Message Passing Through the Integration of Kokkos and ExaMPI
Recommendations
A MultiGPU Performance-Portable Solution for Array Programming Based on Kokkos
ARRAY 2023: Proceedings of the 9th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array ProgrammingToday, multiGPU nodes are widely used in high-performance computing and data centers. However, current programming models do not provide simple, transparent, and portable support for automatically targeting multiple GPUs within a node on application ...
Experiences with implementing Kokkos’ SYCL backend
IWOCL '24: Proceedings of the 12th International Workshop on OpenCL and SYCLWith the recent diversification of the hardware landscape in the high-performance computing community, performance-portability solutions are becoming more and more important. One of the most popular choices is Kokkos. In this paper, we describe how ...
Comments