ABSTRACT
Heterogeneous computing has emerged as an important method for supporting more than one kind of processors or accelerators in a program. There is generally a trade off between source code portability and device performance for heterogeneous programming. Thus, new programming abstractions to assist programmers to reduce their development efforts while minimizing performance penalties is extremely valuable.
The Khronos SYCL standard defines an abstract single-programmultiple- data (SPMD) programming model for heterogeneous computing. This paper presents a language extension on top of the SYCL standard to enable flexibility for programmers. We introduce a set of single-instruction-multiple-data (SIMD) abstractions based on multi-dimensional arrays (Tensors) in conjuction with the existing SPMD programming paradigm.
Our work is based on a C++ language and a set new of LLVM intermediate representation (IR) for representing the SIMD programs. This also includes a set of custom optimization passes that performs instruction lowering, automatic address allocation, and synchronization insertion. We show how our work can be used in conjunction with conventional SYCL SPMD programming for various benchmarks such as general matrix multiplication (GEMM) and lower upper (LU) inverse and evaluate its hardware utilization performance.
- 2021. Intel's LLVM Project. https://github.com/intel/llvm/.Google Scholar
- 2021. SYCL Khronos Group. https://khronos.org/sycl/.Google Scholar
- Wilson Feng, Rasool Maghareh, and Kai-Ting AmyWang. 2021. Extending DPC++ with Support for Huawei Ascend AI Chipset. In InternationalWorkshop on OpenCL (Munich, Germany) (IWOCL'21). Association for Computing Machinery, New York, NY, USA, Article 13, 4 pages. https://doi.org/10.1145/3456669.3456684Google ScholarDigital Library
- Heng Liao, Jiajin Tu, Jing Xia, and Xiping Zhou. 2019. DaVinci: A Scalable Architecture for Neural Network Computing. In 2019 IEEE Hot Chips 31 Symposium (HCS). 1--44. https://doi.org/10.1109/HOTCHIPS.2019.8875654Google Scholar
- Florent Lopez and Theo Mary. 2020. Mixed Precision LU Factorization on GPU Tensor Cores: Reducing Data Movement and Memory Footprint. Technical Report ICL-UT-20--13.Google Scholar
- Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, and Jeremy Kepner. 2021. AI Accelerator Survey and Trends. arXiv:2109.08957 [cs.AR]Google Scholar
- Philippe Tillet, H. T. Kung, and David Cox. 2019. Triton: An Intermediate Language and Compiler for Tiled Neural Network Computations. In Proceedings of the 3rd ACM SIGPLAN InternationalWorkshop on Machine Learning and Programming Languages (Phoenix, AZ, USA) (MAPL 2019). Association for Computing Machinery, New York, NY, USA, 10 19. https://doi.org/10.1145/3315508.3329973Google ScholarDigital Library
Index Terms
- Extending SYCL's Programming Paradigm with Tensor-based SIMD Abstractions
Recommendations
Early experiments using SYCL single-source modern C++ on Xilinx FPGA: Extended Abstract of Technical Presentation
IWOCL '18: Proceedings of the International Workshop on OpenCLHeterogeneous computing is required in systems ranging from low-end embedded systems up to the high-end HPC systems to reach high-performance while keeping power consumption low. Having more and more CPU and accelerators such as FPGA creates challenges ...
Black-Scholes Option Pricing on Intel CPUs and GPUs: Implementation on SYCL and Optimization Techniques
SupercomputingAbstractThe Black-Scholes option pricing problem is one of the widely used financial benchmarks. We explore the possibility of developing a high-performance portable code using the SYCL (Data Parallel C++) programming language. We start from a C++ code ...
AdaptiveCpp Stdpar: C++ Standard Parallelism Integrated Into a SYCL Compiler
IWOCL '24: Proceedings of the 12th International Workshop on OpenCL and SYCLExpressing data parallel programs using C++ standard parallelism is attractive not only due to the simplicity of the model, but also due to its highly idiomatic nature. This programming model, commonly referred to as stdpar, can also be used for ...
Comments