Evaluating CUDA Portability with HIPCL and DPCT

Jin, Zheming; Vetter, Jeffrey

doi:10.1109/IPDPSW52791.2021.00065

Title: Evaluating CUDA Portability with HIPCL and DPCT

Conference · Tue Jun 01 00:00:00 EDT 2021

DOI:https://doi.org/10.1109/IPDPSW52791.2021.00065· OSTI ID:1838992

Jin, Zheming ^[1];

^[1]

ORNL

HIPCL is expanding the scope of the CUDA portability route from an AMD platform to an OpenCL platform. In the meantime, the Intel DPC++ Compatibility Tool (DPCT) is migrating a CUDA program to a data parallel C++ (DPC++) program. Towards the goal of portability enhancement, we evaluate the performance of the CUDA applications from Rodinia, SHOC, and proxy applications ported using HIPCL and DPCT on Intel GPUs. After profiling the ported programs, we aim to understand their performance gaps, and optimize codes converted by DPCT to improve their performance. The open-source repository for the CUDA, HIP, and DPCT programs will be useful for the development of a translator.

View Conference

Cite

Export

Save

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

Sponsoring Organization:: USDOE Office of Science (SC)

DOE Contract Number:: AC05-00OR22725

OSTI ID:: 1838992

Resource Relation:: Conference: 26TH INTERNATIONAL WORKSHOP ON HIGH-LEVEL PARALLEL PROGRAMMING MODELS AND SUPPORTIVE ENVIRONMENTS - Portland, Oregon, United States of America - 5/17/2021 8:00:00 AM-5/21/2021 8:00:00 AM

Country of Publication:: United States

Language:: English

References (28)

Benchmarking and Evaluating Unified Memory for OpenMP GPU Offloading Mishra, Alok; Li, Lingda; Kong, Martin Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC - LLVM-HPC'17 https://doi.org/10.1145/3148173.3148184	conference	January 2017
Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption Memeti, Suejb; Li, Lu; Pllana, Sabri Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing - ARMS-CC '17 https://doi.org/10.1145/3110355.3110356	conference	January 2017
The Scalable Heterogeneous Computing (SHOC) benchmark suite Danalis, Anthony; Marin, Gabriel; McCurdy, Collin Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units - GPGPU '10 https://doi.org/10.1145/1735688.1735702	conference	January 2010
Debugging and Analyzing Programs Using the Intercept Layer for OpenCL Applications Ashbaugh, Ben Proceedings of the International Workshop on OpenCL https://doi.org/10.1145/3204919.3204933	conference	May 2018
A Comprehensive Performance Comparison of CUDA and OpenCL Fang, Jianbin; Varbanescu, Ana Lucia; Sips, Henk 2011 International Conference on Parallel Processing (ICPP) https://doi.org/10.1109/ICPP.2011.45	conference	September 2011
Examining recent many-core architectures and programming models using SHOC Lopez, M. Graham; Young, Jeffrey; Meredith, Jeremy S. Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems https://doi.org/10.1145/2832087.2832090	conference	November 2015
Evaluating and Optimizing OpenCL Kernels for High Performance Computing with FPGAs Zohouri, Hamid Reza; Maruyama, Naoya; Smith, Aaron SC16: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2016.34	conference	November 2016
Evaluation of Medical Imaging Applications using SYCL Jin, Zheming; Finkel, Hal 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) https://doi.org/10.1109/BIBM47256.2019.8982983	conference	November 2019
Performance Portability of Multi-Material Kernels Reguly, Istvan Z. 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC) https://doi.org/10.1109/P3HPC49587.2019.00008	conference	November 2019
Porting a Legacy CUDA Stencil Code to oneAPI Christgau, Steffen; Steinke, Thomas 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) https://doi.org/10.1109/IPDPSW50202.2020.00070	conference	May 2020
Evaluating the performance of HPC-style SYCL applications Deakin, Tom; McIntosh-Smith, Simon Proceedings of the International Workshop on OpenCL https://doi.org/10.1145/3388333.3388643	conference	April 2020
Accelerated Neural Networks on OpenCL Devices Using SYCL-DNN Burns, Rod; Lawson, John; McBain, Duncan Proceedings of the International Workshop on OpenCL https://doi.org/10.1145/3318170.3318183	conference	May 2019
Accelerated Machine Learning Using TensorFlow and SYCL on OpenCL Devices Goli, Mehdi; Iwanski, Luke; Richards, Andrew Proceedings of the 5th International Workshop on OpenCL https://doi.org/10.1145/3078155.3078160	conference	May 2017
Hipcl Babej, Michal; Jääskeläinen, Pekka Proceedings of the International Workshop on OpenCL https://doi.org/10.1145/3388333.3388641	conference	April 2020
Swan: A tool for porting CUDA programs to OpenCL Harvey, M. J.; De Fabritiis, G. Computer Physics Communications, Vol. 182, Issue 4 https://doi.org/10.1016/j.cpc.2010.12.052	journal	April 2011
CUDA-on-CL Perkins, Hugh Proceedings of the 5th International Workshop on OpenCL https://doi.org/10.1145/3078155.3078156	conference	May 2017
Performance Characterisation and Simulation of Intel's Integrated GPU Architecture Gera, Prasun; Kim, Hyojong; Kim, Hyesoon 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) https://doi.org/10.1109/ISPASS.2018.00027	conference	April 2018
From opencl to high-performance hardware on FPGAS Czajkowski, Tomasz S.; Aydonat, Utku; Denisenko, Dmitry 22nd International Conference on Field Programmable Logic and Applications (FPL) https://doi.org/10.1109/FPL.2012.6339272	conference	August 2012
Rodinia: A benchmark suite for heterogeneous computing Che, Shuai; Boyer, Michael; Meng, Jiayuan 2009 IEEE International Symposium on Workload Characterization (IISWC) https://doi.org/10.1109/IISWC.2009.5306797	conference	October 2009
neoSYCL: a SYCL implementation for SX-Aurora TSUBASA Ke, Yinan; Agung, Mulya; Takizawa, Hiroyuki The International Conference on High Performance Computing in Asia-Pacific Region https://doi.org/10.1145/3432261.3432268	conference	January 2021
Performance Portability of a Wilson Dslash Stencil Operator Mini-App Using Kokkos and SYCL Joo, Balint; Kurth, Thorsten; Clark, M. A. 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC) https://doi.org/10.1109/P3HPC49587.2019.00007	conference	November 2019
Efficiency and productivity for decision making on low-power heterogeneous CPU+GPU SoCs Constantinescu, Denisa-Andreea; Navarro, Angeles; Corbera, Francisco The Journal of Supercomputing, Vol. 77, Issue 1 https://doi.org/10.1007/s11227-020-03257-3	journal	March 2020
A Case Study of k-means Clustering using SYCL Jin, Zheming; Finkel, Hal 2019 IEEE International Conference on Big Data (Big Data) https://doi.org/10.1109/BigData47090.2019.9005555	conference	December 2019
NVIDIA cuda software and gpu parallel computing architecture Kirk, David Proceedings of the 6th international symposium on Memory management - ISMM '07 https://doi.org/10.1145/1296907.1296909	conference	January 2007
Characterizing the challenges and evaluating the efficacy of a CUDA-to-OpenCL translator Gardner, Mark; Sathre, Paul; Feng, Wu-chun Parallel Computing, Vol. 39, Issue 12 https://doi.org/10.1016/j.parco.2013.09.003	journal	December 2013
On the Portability of CPU-Accelerated Applications via Automated Source-to-Source Translation Sathre, Paul; Gardner, Mark; Feng, Wu-chun Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region https://doi.org/10.1145/3293320.3293338	conference	January 2019
Ginkgo: A high performance numerical linear algebra library Anzt, Hartwig; Cojean, Terry; Chen, Yen-Chen Journal of Open Source Software, Vol. 5, Issue 52 https://doi.org/10.21105/joss.02260	journal	August 2020
Scalable molecular dynamics on CPU and GPU architectures with NAMD Phillips, James C.; Hardy, David J.; Maia, Julio D. C. The Journal of Chemical Physics, Vol. 153, Issue 4 https://doi.org/10.1063/5.0014475	journal	July 2020

Similar Records

Case Study of Using Kokkos and SYCLs Performance-Portable Frameworks for Milc-Dslash Benchmark on NVIDIA, AMD and Intel GPUs

Conference · Fri Jan 01 00:00:00 EST 2021 · OSTI ID:1838992

Dufek, Amanda S; Gayatri, Rahulkumar; Mehta, Neil A; +4 more

Toward Evaluating High-Level Synthesis Portability and Performance between Intel and Xilinx FPGAs

Conference · Thu Apr 01 00:00:00 EDT 2021 · OSTI ID:1838992

Cabrera, Anthony; Young, Aaron; Lambert, Jacob; +8 more

Portability for GPU-accelerated molecular docking applications for cloud and HPC: can portable compiler directives provide performance across all platforms?

Conference · Sun May 01 00:00:00 EDT 2022 · OSTI ID:1838992

Thavappiragasam, Mathialakan; Elwasif, Wael; Sedova, Ada

Title: Evaluating CUDA Portability with HIPCL and DPCT

Citation Formats

References (28)

Similar Records

Related Subjects