Skip to main content

Towards Automatic OpenMP-Aware Utilization of Fast GPU Memory

  • Conference paper
  • First Online:
OpenMP in a Modern World: From Multi-device Support to Meta Programming (IWOMP 2022)

Abstract

OpenMP has supported target offloading since version 4.0, and LLVM/Clang supports its compilation and optimization. There have been several optimizing transformations in LLVM aiming to improve the performance of the offloaded region, especially for targeting GPUs. Although using the memory efficiently is essential for high performance on a GPU, there has not been much work done to automatically optimize memory transactions inside the target region at compile time.

In this work, we develop an inter-procedural LLVM transformation to improve the performance of OpenMP target regions by optimizing memory transactions. This transformation pass effectively prefetches some of the read-only input data to the fast shared memory via compile time code injection. Especially if there is reuse, accesses to shared memory far outpace global memory accesses. Consequently, our method can significantly improve performance if the right data is placed in shared memory.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    It is better to use the default option for cases where some (but not all) of the team’s chunk iterations read the same locations. The reason is that avoiding prefetching redundant data in these cases complicates the copy_to_shared_mem function in different ways (e.g., adds conditional branches to it) that degrades the performance.

References

  1. CUDA programming guide. https://docs.nvidia.com/cuda/cuda-c-programming-guide

  2. LLVM version 11. https://releases.llvm.org/download.html#11.0.0

  3. OpenMP application programming interface version 4.0. https://www.openmp.org/wp-content/uploads/OpenMP4.0.0.pdf

  4. OpenMP application programming interface version 5.0. https://www.openmp.org/spec-html/5.0/openmp.html

  5. SCEV Class Reference. https://llvm.org/doxygen/classllvm_1_1SCEV.html

  6. SCEVAddRecExpr Class Reference. https://llvm.org/doxygen/classllvm_1_1SCEVAddRecExpr.html

  7. SCEVExpander Class Reference. https://llvm.org/doxygen/classllvm_1_1SCEVExpander.html

  8. Using Shared Memory in CUDA C/C++. https://developer.nvidia.com/blog/using-shared-memory-cuda-cc/

  9. Value Class Reference. https://llvm.org/doxygen/classllvm_1_1Value.html

  10. Antao, S.F., et al.: Offloading support for OpenMP in Clang and LLVM. In: 2016 Third Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC), pp. 1–11. IEEE (2016)

    Google Scholar 

  11. Bataev, A., Bokhanko, A., Cownie, J.: Towards OpenMP support in LLVM. In: 2013 European LLVM Conference (2013)

    Google Scholar 

  12. Bertolli, C., et al.: Integrating GPU support for OpenMP offloading directives into Clang. In: Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, pp. 1–11 (2015)

    Google Scholar 

  13. Hayashi, A., Shirako, J., Tiotto, E., Ho, R., Sarkar, V.: Performance evaluation of OpenMP’s target construct on GPUs-exploring compiler optimisations. Int. J. High Perform. Comput. Networking 13(1), 54–69 (2019)

    Article  Google Scholar 

  14. Huber, J., et al.: Efficient execution of OpenMP on GPUs. In: 2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 41–52. IEEE (2022)

    Google Scholar 

  15. Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis and transformation. In: International Symposium on Code Generation and Optimization, CGO 2004, pp. 75–86. IEEE (2004)

    Google Scholar 

  16. Tian, S., Chesterfield, J., Doerfert, J., Chapman, B.: Experience report: writing a portable GPU runtime with OpenMP 5.1. In: McIntosh-Smith, S., de Supinski, B.R., Klinkenberg, J. (eds.) IWOMP 2021. LNCS, vol. 12870, pp. 159–169. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85262-7_11

    Chapter  Google Scholar 

  17. Tramm, J.R., Siegel, A.R., Islam, T., Schulz, M.: XSBench - the development and verification of a performance abstraction for Monte Carlo reactor analysis. The Role of Reactor Physics toward a Sustainable Future (PHYSOR) (2014)

    Google Scholar 

Download references

Acknowledgements

First and second authors thank NSERC of Canada (Grant RGPIN-2018-06534) for their support. Also, part of this research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of two U.S. Department of Energy organizations (Office of Science and the National Nuclear Security Administration) responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering, and early testbed platforms, in support of the nation’s exascale computing imperative. Part of this research was supported by the Lawrence Livermore National Security, LLC (“LLNS”) via MPO No. B642066.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Delaram Talaashrafi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Talaashrafi, D., Maza, M.M., Doerfert, J. (2022). Towards Automatic OpenMP-Aware Utilization of Fast GPU Memory. In: Klemm, M., de Supinski, B.R., Klinkenberg, J., Neth, B. (eds) OpenMP in a Modern World: From Multi-device Support to Meta Programming. IWOMP 2022. Lecture Notes in Computer Science, vol 13527. Springer, Cham. https://doi.org/10.1007/978-3-031-15922-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-15922-0_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-15921-3

  • Online ISBN: 978-3-031-15922-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics