OpenMP Target Task: Tasking and Target Offloading on Heterogeneous Systems

Valero Lara, Pedro; Kim, Jungwon; Hernandez Mendoza, Oscar; Vetter, Jeffrey

doi:10.1007/978-3-031-06156-1_35

Title: OpenMP Target Task: Tasking and Target Offloading on Heterogeneous Systems

Conference · Wed Jun 01 00:00:00 EDT 2022

DOI:https://doi.org/10.1007/978-3-031-06156-1_35· OSTI ID:1885285

^[1];

^[1]

ORNL

This work evaluated the use of OpenMP tasking with target GPU offloading as a potential solution for programming productivity and performance on heterogeneous systems. Also, it is proposed a new OpenMP specification to make the implementation of heterogeneous codes simpler by using OpenMP target task, which integrates both OpenMP tasking and target GPU offloading in a single OpenMP pragma. As a test case, the authors used one of the most popular and widely used Basic Linear Algebra Subprogram Level-3 routines: triangular solver (TRSM). To benefit from the heterogeneity of the current high-performance computing systems, the authors propose a different parallelization of the algorithm by using a nonuniform decomposition of the problem. This work used target GPU offloading inside OpenMP tasks to address the heterogeneity found in the hardware. This new approach can outperform the state-of-the-art algorithms, which use a uniform decomposition of the data, on both the CPU-only and hybrid CPU-GPU systems, reaching speedups of up to one order of magnitude. The performance that this approach achieves is faster than the IBM ESSL math library on CPU and competitive relative to a highly optimized heterogeneous CUDA version. One node of Oak Ridge National Laboratory’s supercomputer, Summit, was used for performance analysis.

View Conference

Cite

Export

Save

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

Sponsoring Organization:: USDOE Office of Science (SC)

DOE Contract Number:: AC05-00OR22725

OSTI ID:: 1885285

Resource Relation:: Journal Volume: 13098; Conference: 27th International European Conference on Parallel and Distributed Computing (Euro-Par) - Online Event, , Portugal - 8/30/2021 8:00:00 AM-9/3/2021 8:00:00 AM

Country of Publication:: United States

Language:: English

References (12)

StarPU: a unified platform for task scheduling on heterogeneous multicore architectures Augonnet, Cédric; Thibault, Samuel; Namyst, Raymond Concurrency and Computation: Practice and Experience, Vol. 23, Issue 2 https://doi.org/10.1002/cpe.1631	journal	November 2010
Accelerating Conjugate Gradient using OmpSs Catalan, Sandra; Martorell, Xavier; Labarta, Jesus 2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT) https://doi.org/10.1109/PDCAT46702.2019.00033	conference	December 2019
A set of level 3 basic linear algebra subprograms Dongarra, J. J.; Du Croz, Jeremy; Hammarling, Sven ACM Transactions on Mathematical Software, Vol. 16, Issue 1 https://doi.org/10.1145/77626.79170	journal	March 1990
Plasma Dongarra, Jack; Gates, Mark; Haidar, Azzam ACM Transactions on Mathematical Software, Vol. 45, Issue 2 https://doi.org/10.1145/3264491	journal	May 2019
OmpSs: A PROPOSAL FOR PROGRAMMING HETEROGENEOUS MULTI-CORE ARCHITECTURES Duran, Alejandro; AyguadÉ, Eduard; Badia, Rosa M. Parallel Processing Letters, Vol. 21, Issue 02 https://doi.org/10.1142/S0129626411000151	journal	June 2011
Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels Haidar, Azzam; Ltaief, Hatem; Dongarra, Jack Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/2063384.2063394	conference	November 2011
An Improved Magma Gemm For Fermi Graphics Processing Units Nath, Rajib; Tomov, Stanimire; Dongarra, Jack The International Journal of High Performance Computing Applications, Vol. 24, Issue 4 https://doi.org/10.1177/1094342010385729	journal	September 2010
Self-Adaptive OmpSs Tasks in Heterogeneous Environments Planas, Judit; Badia, Rosa M.; Ayguade, Eduard 2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on Parallel and Distributed Processing https://doi.org/10.1109/IPDPS.2013.53	conference	May 2013
BLAS-3 Optimized by OmpSs Regions (LASs Library) Valero-Lara, Pedro; Catalan, Sandra; Martorell, Xavier 2019 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) https://doi.org/10.1109/EMPDP.2019.8671545	conference	February 2019
sLASs: A fully automatic auto-tuned linear algebra library based on OpenMP extensions implemented in OmpSs (LASs Library) Valero-Lara, Pedro; Catalán, Sandra; Martorell, Xavier Journal of Parallel and Distributed Computing, Vol. 138 https://doi.org/10.1016/j.jpdc.2019.12.002	journal	April 2020
MPI+OpenMP tasking scalability for multi-morphology simulations of the human brain Valero-Lara, Pedro; Sirvent, Raül; Peña, Antonio J. Parallel Computing, Vol. 84 https://doi.org/10.1016/j.parco.2019.03.006	journal	May 2019
MPI+OpenMP Tasking Scalability for the Simulation of the Human Brain Valero-Lara, Pedro; Sirvent, Raül; Peña, Antonio J. Proceedings of the 25th European MPI Users' Group Meeting https://doi.org/10.1145/3236367.3236373	conference	September 2018

Similar Records

An OpenMP GPU-offload implementation of a non-equilibrium solidification cellular automata model for additive manufacturing

Journal Article · Thu Nov 24 00:00:00 EST 2022 · Computer Physics Communications · OSTI ID:1885285

Sabau, Adrian S.; Yuan, Lang; Fattebert, Jean-Luc; +1 more

Porting fragmentation methods to GPUs using an OpenMP API: Offloading the resolution-of-the-identity second-order Møller–Plesset perturbation method

Journal Article · Fri Apr 28 00:00:00 EDT 2023 · Journal of Chemical Physics · OSTI ID:1885285

Pham, Buu Q.; Carrington, Laura; Tiwari, Ananta; +7 more

COMPOFF: A Compiler Cost model using Machine Learning to predict the Cost of OpenMP Offloading

Conference · Mon May 30 00:00:00 EDT 2022 · OSTI ID:1885285

Mishra, Alok; Soto, Carlos X.; Chheda, Smeet; +3 more

Title: OpenMP Target Task: Tasking and Target Offloading on Heterogeneous Systems

Citation Formats

References (12)

Similar Records

Related Subjects