Evaluating the Performance of Integer Sum Reduction in SYCL on GPUs
- ORNL
SYCL is a promising programming model for heterogeneous computing—allowing a single-source code to target devices from multiple vendors. One significant task performed on these accelerators is a primitive operation for integer sum reduction. This paper presents several SYCL implementations of integer sum reduction—using atomic functions, shared local memory, vectorized memory accesses and parameterized workload sizes—to compare the performance and maturity of SYCL against open-source vendor-specific implementations of the same reduction. For a sufficiently large number of integers, tuning the parameters of our SYCL implementations achieves 1.4X speedup over the open-source implementations on an Intel UHD630 integrated GPU. The SYCL reduction is 3% faster than the templated reduction in Thrust, and 0.3% faster than the device reduction in CUB on an Nvidia P100 GPU. The SYCL reduction is 1.9% faster than the templated reduction in Thrust, and 0.4% faster than the device reduction in CUB on an Nvidia V100 GPU.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC)
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1840191
- Resource Relation:
- Conference: ICPP: International Conference on Parallel Processing Workshop - Chicago, Illinois, United States of America - 8/9/2021 8:00:00 AM-8/12/2021 8:00:00 AM
- Country of Publication:
- United States
- Language:
- English
Similar Records
Experience of Migrating a Parallel Graph Coloring Program from CUDA to SYCL
Performance portability study of epistasis detection using SYCL on NVIDIA GPU