Towards Automatic OpenMP-Aware Utilization of Fast GPU Memory

Talaashrafi, Delaram; Maza, Marc Moreno; Doerfert, Johannes

doi:10.1007/978-3-031-15922-0_5

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13527))

Included in the following conference series:

International Workshop on OpenMP

437 Accesses
1 Citations
3 Altmetric

Abstract

OpenMP has supported target offloading since version 4.0, and LLVM/Clang supports its compilation and optimization. There have been several optimizing transformations in LLVM aiming to improve the performance of the offloaded region, especially for targeting GPUs. Although using the memory efficiently is essential for high performance on a GPU, there has not been much work done to automatically optimize memory transactions inside the target region at compile time.

In this work, we develop an inter-procedural LLVM transformation to improve the performance of OpenMP target regions by optimizing memory transactions. This transformation pass effectively prefetches some of the read-only input data to the fast shared memory via compile time code injection. Especially if there is reuse, accesses to shared memory far outpace global memory accesses. Consequently, our method can significantly improve performance if the right data is placed in shared memory.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
It is better to use the default option for cases where some (but not all) of the team’s chunk iterations read the same locations. The reason is that avoiding prefetching redundant data in these cases complicates the copy_to_shared_mem function in different ways (e.g., adds conditional branches to it) that degrades the performance.

References

CUDA programming guide. https://docs.nvidia.com/cuda/cuda-c-programming-guide
LLVM version 11. https://releases.llvm.org/download.html#11.0.0
OpenMP application programming interface version 4.0. https://www.openmp.org/wp-content/uploads/OpenMP4.0.0.pdf
OpenMP application programming interface version 5.0. https://www.openmp.org/spec-html/5.0/openmp.html
SCEV Class Reference. https://llvm.org/doxygen/classllvm_1_1SCEV.html
SCEVAddRecExpr Class Reference. https://llvm.org/doxygen/classllvm_1_1SCEVAddRecExpr.html
SCEVExpander Class Reference. https://llvm.org/doxygen/classllvm_1_1SCEVExpander.html
Using Shared Memory in CUDA C/C++. https://developer.nvidia.com/blog/using-shared-memory-cuda-cc/
Value Class Reference. https://llvm.org/doxygen/classllvm_1_1Value.html
Antao, S.F., et al.: Offloading support for OpenMP in Clang and LLVM. In: 2016 Third Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC), pp. 1–11. IEEE (2016)
Google Scholar
Bataev, A., Bokhanko, A., Cownie, J.: Towards OpenMP support in LLVM. In: 2013 European LLVM Conference (2013)
Google Scholar
Bertolli, C., et al.: Integrating GPU support for OpenMP offloading directives into Clang. In: Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, pp. 1–11 (2015)
Google Scholar
Hayashi, A., Shirako, J., Tiotto, E., Ho, R., Sarkar, V.: Performance evaluation of OpenMP’s target construct on GPUs-exploring compiler optimisations. Int. J. High Perform. Comput. Networking 13(1), 54–69 (2019)
Article Google Scholar
Huber, J., et al.: Efficient execution of OpenMP on GPUs. In: 2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 41–52. IEEE (2022)
Google Scholar
Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis and transformation. In: International Symposium on Code Generation and Optimization, CGO 2004, pp. 75–86. IEEE (2004)
Google Scholar
Tian, S., Chesterfield, J., Doerfert, J., Chapman, B.: Experience report: writing a portable GPU runtime with OpenMP 5.1. In: McIntosh-Smith, S., de Supinski, B.R., Klinkenberg, J. (eds.) IWOMP 2021. LNCS, vol. 12870, pp. 159–169. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85262-7_11
Chapter Google Scholar
Tramm, J.R., Siegel, A.R., Islam, T., Schulz, M.: XSBench - the development and verification of a performance abstraction for Monte Carlo reactor analysis. The Role of Reactor Physics toward a Sustainable Future (PHYSOR) (2014)
Google Scholar

Download references

Acknowledgements

First and second authors thank NSERC of Canada (Grant RGPIN-2018-06534) for their support. Also, part of this research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of two U.S. Department of Energy organizations (Office of Science and the National Nuclear Security Administration) responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering, and early testbed platforms, in support of the nation’s exascale computing imperative. Part of this research was supported by the Lawrence Livermore National Security, LLC (“LLNS”) via MPO No. B642066.

Author information

Authors and Affiliations

Western University, London, ON, Canada
Delaram Talaashrafi & Marc Moreno Maza
Argonne National Laboratory, Lemont, IL, USA
Johannes Doerfert

Authors

Delaram Talaashrafi
View author publications
You can also search for this author in PubMed Google Scholar
Marc Moreno Maza
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Doerfert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Delaram Talaashrafi .

Editor information

Editors and Affiliations

OpenMP ARB, Beaverton, OR, USA
Michael Klemm
Lawrence Livermore National Laboratory, Livermore, CA, USA
Bronis R. de Supinski
RWTH Aachen University, Aachen, Germany
Jannis Klinkenberg
University of Arizona, Tucson, AZ, USA
Brandon Neth

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Talaashrafi, D., Maza, M.M., Doerfert, J. (2022). Towards Automatic OpenMP-Aware Utilization of Fast GPU Memory. In: Klemm, M., de Supinski, B.R., Klinkenberg, J., Neth, B. (eds) OpenMP in a Modern World: From Multi-device Support to Meta Programming. IWOMP 2022. Lecture Notes in Computer Science, vol 13527. Springer, Cham. https://doi.org/10.1007/978-3-031-15922-0_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-15922-0_5
Published: 20 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15921-3
Online ISBN: 978-3-031-15922-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards Automatic OpenMP-Aware Utilization of Fast GPU Memory