Skip to main content

The Kokkos OpenMPTarget Backend: Implementation and Lessons Learned

  • Conference paper
  • First Online:
OpenMP: Advanced Task-Based, Device and Compiler Programming (IWOMP 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14114))

Included in the following conference series:

  • 486 Accesses

Abstract

As the supercomputing landscape diversifies, solutions such as Kokkos to write vendor agnostic applications and libraries have risen in popularity. Kokkos provides a programming model designed for performance portability, which allows developers to write a single source implementation that can run efficiently on various architectures. At its heart, Kokkos maps parallel algorithms to architecture and vendor specific backends written in lower level programming models such as CUDA and HIP. Another approach to writing vendor agnostic parallel code is using OpenMP’s directives based approach, which lets developers annotate code to express parallelism. It is implemented at the compiler level and is supported by all major high performance computing vendors, as well as the primary Open Source toolchains GNU and LLVM. Since its inception, Kokkos has used OpenMP to parallelize on CPU architectures. In this paper, we explore leveraging OpenMP for a GPU backend and discuss the challenges we encountered when mapping the Kokkos APIs and semantics to OpenMP target constructs. As an exemplar workload we chose a simple conjugate gradient solver for sparse matrices. We find that performance on NVIDIA and AMD GPUs varies widely based on details of the implementation strategy and the chosen compiler. Furthermore, the performance of the OpenMP implementations decreases with increasing complexity of the investigated algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Beckingsale, D.A., et al.: RAJA: portable performance for large-scale scientific applications. In: 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), pp. 71–81. IEEE (2019)

    Google Scholar 

  2. Carter Edwards, H., Trott, C.R., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parall. Distrib. Comput. 74(12), 3202–3216 (2014). https://doi.org/10.1016/j.jpdc.2014.07.003. https://www.sciencedirect.com/science/article/pii/S0743731514001257. Domain-Specific Languages and High-Level Frameworks for High-Performance Computing

  3. Doerfert, J., et al.: Breaking the vendor lock: performance portable programming through OpenMP as target independent runtime layer. In: Klöckner, A., Moreira, J. (eds.) Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, PACT 2022, Chicago, Illinois, 8–12 October 2022, pp. 494–504. ACM (2022). https://doi.org/10.1145/3559009.3569687

  4. Doerfert, J., et al.: Co-designing an OpenMP GPU runtime and optimizations for near-zero overhead execution. In: 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 504–514 (2022). https://doi.org/10.1109/IPDPS53621.2022.00055

  5. Hestenes, M.R., Stiefel, E., et al.: Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bur. Stand. 49(6), 409–436 (1952)

    Article  MathSciNet  MATH  Google Scholar 

  6. Kelling, J., et al.: Challenges porting a C++ template-metaprogramming abstraction layer to directive-based offloading. In: Bhalachandra, S., Daley, C., Melesse Vergara, V. (eds.) 2021 International Workshop on Accelerator Programming Using Directives. WACCPD 2021. Lecture Notes in Computer Science, vol. 13194. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-97759-7_5

  7. Khronos SYCL Working Group: SYCL specification (2020). https://www.khronos.org/registry/SYCL/specs/sycl-2020-provisional.pdf

  8. Killian, W., Scogland, T., Kunen, A., Cavazos, J.: The design and implementation of openMP 4.5 and OpenACC backends for the RAJA C++ performance portability layer. In: Chandrasekaran, S., Juckeland, G. (eds.) WACCPD 2017. LNCS, vol. 10732, pp. 63–82. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-74896-2_4

    Chapter  Google Scholar 

  9. OpenMP Architecture Review Board: OpenMP Application Programming Interface, Version 4.0. https://www.openmp.org/wp-content/uploads/OpenMP4.0.0.pdf (2013)

  10. OpenMP Architecture Review Board: OpenMP Application Programming Interface, Version 5.2. https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5-2.pdf (2021)

  11. Trott, C., et al.: The kokkos ecosystem: comprehensive performance portability for high performance computing. Comput. Sci. Eng. 23(5), 10–18 (2021). https://doi.org/10.1109/MCSE.2021.3098509

    Article  Google Scholar 

  12. Trott, C.R., et al.: Kokkos 3: Programming model extensions for the exascale era. IEEE Trans. Parallel Distrib. Syst. 33(4), 805–817 (2021)

    Article  Google Scholar 

Download references

Acknowledgments

Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA-0003525. This written work is authored by an employee of NTESS. The employee, not NTESS, owns the right, title and interest in and to the written work and is responsible for its contents. Any subjective views or opinions that might be expressed in the written work do not necessarily represent the views of the U.S. Government. The publisher acknowledges that the U.S. Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this written work or allow others to do so, for U.S. Government purposes. The DOE will provide public access to results of federally sponsored research in accordance with the DOE Public Access Plan. This work was supported by Exascale Computing Project 17-SC-20-SC, a joint project of the U.S. Department of Energy’s Office of Science and National Nuclear Security Administration, responsible for delivering a capable exascale ecosystem, including software, applications, and hardware technology, to support the nation’s exascale computing imperative. This research used resources of the National Energy Research Scientific Computing Center (NERSC), which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231, and the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rahulkumar Gayatri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gayatri, R., Olivier, S.L., Trott, C.R., Doerfert, J., Ciesko, J., Lebrun-Grandie, D. (2023). The Kokkos OpenMPTarget Backend: Implementation and Lessons Learned. In: McIntosh-Smith, S., Klemm, M., de Supinski, B.R., Deakin, T., Klinkenberg, J. (eds) OpenMP: Advanced Task-Based, Device and Compiler Programming. IWOMP 2023. Lecture Notes in Computer Science, vol 14114. Springer, Cham. https://doi.org/10.1007/978-3-031-40744-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-40744-4_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-40743-7

  • Online ISBN: 978-3-031-40744-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics