Concurrent Execution of Deferred OpenMP Target Tasks with Hidden Helper Threads

Tian, Shilei; Doerfert, Johannes; Chapman, Barbara

doi:10.1007/978-3-030-95953-1_4

Concurrent Execution of Deferred OpenMP Target Tasks with Hidden Helper Threads

Conference paper
First Online: 16 February 2022

375 Accesses
14 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13149))

Abstract

In this paper, we introduce a novel approach to support concurrent offloading for OpenMP tasks based on hidden helper threads. We contrast our design to alternative implementations and explain why the approach we have chosen provides the most consistent performance across a wide range of use cases. In addition to a theoretical discussion of the trade-offs, we detail our implementation in the LLVM compiler infrastructure. Finally, we provide evaluation results of four extreme offloading situations on the Summit supercomputer, showing that we achieve speedup of up to \(6.7\times \) over synchronous offloading, and provide comparable speedup to the commercial IBM XL C/C++ compiler.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The fallback case, execution on the issuing device, is sufficiently similar.
2.
This is CUDA terminology, but almost all heterogeneous programming models have a similar concept, such as the command queue in OpenCL.

References

Antao, S.F., et al.: Offloading support for OpenMP in clang and LLVM. In: The Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC), Salt Lake City, UT, USA, pp. 1–11 (2016)
Google Scholar
Oak Ridge Leadership Computing Facility: Summit - oak ridge leadership computing facility. https://www.olcf.ornl.gov/summit/
Group, L.D.: OpenMP support – clang 11 documentation - LLVM. https://clang.llvm.org/docs/OpenMPSupport.html
IBM: OpenMP support in XL C/C++. https://www.ibm.com/support/knowledgecenter/SSXVZZ_16.1.1/com.ibm.xlcpp1611.lelinux.doc/getstart/omp_v1611.html
Jiao, Q., Lu, M., Huynh, H.P., Mitra, T.: Improving GPGPU energy-efficiency through concurrent kernel execution and DVFS. In: IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 1–11. IEEE, San Francisco (2015)
Google Scholar
NVIDIA: CUDA C best practices guide. https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html
NVIDIA: Nvidia PTX optimizing assembler. https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html
NVIDIA: Nvidia visual profiler. https://developer.nvidia.com/nvidia-visual-profiler
Project, G.: Offloading support in GCC. https://gcc.gnu.org/wiki/Offloading
Wang, L., Huang, M., El-Ghazawi, T.: Exploiting concurrent kernel execution on graphic processing units. In: International Conference on High Performance Computing & Simulation, pp. 24–32. IEEE, Istanbul, July 2011
Google Scholar
Wen, Y., O’Boyle, M.F., Fensch, C.: MaxPair: enhance OpenCL concurrent kernel execution by weighted maximum matching. In: Workshop on General Purpose GPUs, pp. 40–49. ACM, Vienna (2018)
Google Scholar
Wende, F., Cordes, F., Steinke, T.: On improving the performance of multi-threaded CUDA applications with concurrent kernel execution by kernel reordering. In: Symposium on Application Accelerators in High Performance Computing, pp. 74–83. IEEE, Chicago (2012)
Google Scholar

Download references

Acknowledgments

This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of two U.S. Department of Energy organizations (Office of Science and the National Nuclear Security Administration) responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering, and early testbed platforms, in support of the nation’s exascale computing imperative.

Author information

Authors and Affiliations

Department of Computer Science, Stony Brook University, Stony Brook, USA
Shilei Tian & Barbara Chapman
Argonne Leadership Computing Facility, Argonne National Laboratory, Lemont, USA
Johannes Doerfert

Authors

Shilei Tian
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Doerfert
View author publications
You can also search for this author in PubMed Google Scholar
Barbara Chapman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shilei Tian .

Editor information

Editors and Affiliations

Inst for Advanced Computational Science, Stony Brook University, Stony Brook, NY, USA
Barbara Chapman
IBM TJ Watson Research Center, Yorktown Heights, NY, USA
José Moreira

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tian, S., Doerfert, J., Chapman, B. (2022). Concurrent Execution of Deferred OpenMP Target Tasks with Hidden Helper Threads. In: Chapman, B., Moreira, J. (eds) Languages and Compilers for Parallel Computing. LCPC 2020. Lecture Notes in Computer Science(), vol 13149. Springer, Cham. https://doi.org/10.1007/978-3-030-95953-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-95953-1_4
Published: 16 February 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95952-4
Online ISBN: 978-3-030-95953-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics