short-paper

Extending SYCL's Programming Paradigm with Tensor-based SIMD Abstractions

Authors:
Wilson Feng

Huawei Canada Research Centre, Markham, ON, Canada

Huawei Canada Research Centre, Markham, ON, Canada
View Profile

,
Shucai Yao

Huawei Canada Research Centre, Markham, ON, Canada

Huawei Canada Research Centre, Markham, ON, Canada
View Profile

,
Kai Ting Wang

Huawei Canada Research Centre, Markham, Canada

Huawei Canada Research Centre, Markham, Canada
View Profile

,
Md Aamir Raihan

Huawei Canada Research Centre, Markham, Canada

Huawei Canada Research Centre, Markham, Canada
View Profile

,
Laichun Feng

Huawei Canada Research Centre, Markham, Canada

Huawei Canada Research Centre, Markham, Canada
View Profile

,
Chunrong Xu

Huawei Canada Research Centre, Markham, Canada

Huawei Canada Research Centre, Markham, Canada
View Profile

ICPE '22: Proceedings of the 2022 ACM/SPEC on International Conference on Performance EngineeringApril 2022Pages 59–66https://doi.org/10.1145/3489525.3511681

Published:09 April 2022Publication History

ICPE '22: Proceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering

Pages 59–66

ABSTRACT

Heterogeneous computing has emerged as an important method for supporting more than one kind of processors or accelerators in a program. There is generally a trade off between source code portability and device performance for heterogeneous programming. Thus, new programming abstractions to assist programmers to reduce their development efforts while minimizing performance penalties is extremely valuable.

The Khronos SYCL standard defines an abstract single-programmultiple- data (SPMD) programming model for heterogeneous computing. This paper presents a language extension on top of the SYCL standard to enable flexibility for programmers. We introduce a set of single-instruction-multiple-data (SIMD) abstractions based on multi-dimensional arrays (Tensors) in conjuction with the existing SPMD programming paradigm.

Our work is based on a C++ language and a set new of LLVM intermediate representation (IR) for representing the SIMD programs. This also includes a set of custom optimization passes that performs instruction lowering, automatic address allocation, and synchronization insertion. We show how our work can be used in conjunction with conventional SYCL SPMD programming for various benchmarks such as general matrix multiplication (GEMM) and lower upper (LU) inverse and evaluate its hardware utilization performance.

References

2021. Intel's LLVM Project. https://github.com/intel/llvm/.Google Scholar
2021. SYCL Khronos Group. https://khronos.org/sycl/.Google Scholar
Wilson Feng, Rasool Maghareh, and Kai-Ting AmyWang. 2021. Extending DPC++ with Support for Huawei Ascend AI Chipset. In InternationalWorkshop on OpenCL (Munich, Germany) (IWOCL'21). Association for Computing Machinery, New York, NY, USA, Article 13, 4 pages. https://doi.org/10.1145/3456669.3456684Google ScholarDigital Library
Heng Liao, Jiajin Tu, Jing Xia, and Xiping Zhou. 2019. DaVinci: A Scalable Architecture for Neural Network Computing. In 2019 IEEE Hot Chips 31 Symposium (HCS). 1--44. https://doi.org/10.1109/HOTCHIPS.2019.8875654Google Scholar
Florent Lopez and Theo Mary. 2020. Mixed Precision LU Factorization on GPU Tensor Cores: Reducing Data Movement and Memory Footprint. Technical Report ICL-UT-20--13.Google Scholar
Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, and Jeremy Kepner. 2021. AI Accelerator Survey and Trends. arXiv:2109.08957 [cs.AR]Google Scholar
Philippe Tillet, H. T. Kung, and David Cox. 2019. Triton: An Intermediate Language and Compiler for Tiled Neural Network Computations. In Proceedings of the 3rd ACM SIGPLAN InternationalWorkshop on Machine Learning and Programming Languages (Phoenix, AZ, USA) (MAPL 2019). Association for Computing Machinery, New York, NY, USA, 10 19. https://doi.org/10.1145/3315508.3329973Google ScholarDigital Library

Index Terms

Extending SYCL's Programming Paradigm with Tensor-based SIMD Abstractions
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Single instruction, multiple data
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation

Recommendations

Early experiments using SYCL single-source modern C++ on Xilinx FPGA: Extended Abstract of Technical Presentation
IWOCL '18: Proceedings of the International Workshop on OpenCL

Heterogeneous computing is required in systems ranging from low-end embedded systems up to the high-end HPC systems to reach high-performance while keeping power consumption low. Having more and more CPU and accelerators such as FPGA creates challenges ...
Read More
Black-Scholes Option Pricing on Intel CPUs and GPUs: Implementation on SYCL and Optimization Techniques
Supercomputing
Abstract
The Black-Scholes option pricing problem is one of the widely used financial benchmarks. We explore the possibility of developing a high-performance portable code using the SYCL (Data Parallel C++) programming language. We start from a C++ code ...
Read More
AdaptiveCpp Stdpar: C++ Standard Parallelism Integrated Into a SYCL Compiler
IWOCL '24: Proceedings of the 12th International Workshop on OpenCL and SYCL

Expressing data parallel programs using C++ standard parallelism is attractive not only due to the simplicity of the model, but also due to its highly idiomatic nature. This programming model, commonly referred to as stdpar, can also be used for ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICPE '22: Proceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering
April 2022
242 pages
ISBN:9781450391436
DOI:10.1145/3489525
General Chairs:
Dan Feng
Huazhong University of Science and Technology, China
,
Steffen Becker
University of Stuttgart, Germany
,
Program Chairs:
Nikolas Herbst
University of Würzburg, Germany
,
Philipp Leitner
Chalmers and University of Gothenburg
,
Publications Chair:
Alessandro Papadopoulos
Mälardalen University, Sweden
Copyright © 2022 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 April 2022
Check for updates
Author Tags
llvm
parallel computing
sycl
tensor
Qualifiers
- short-paper
Conference

Acceptance Rates
ICPE '22 Paper Acceptance Rate14of58submissions,24%Overall Acceptance Rate252of851submissions,30%
More
Upcoming Conference
ICPE '24

Sponsor:

sigsoft online

sigsoft online

15th ACM/SPEC International Conference on Performance Engineering

May 7 - 11, 2024

London , United Kingdom
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 171
  Total Downloads
- Downloads (Last 12 months)44
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Extending SYCL's Programming Paradigm with Tensor-based SIMD Abstractions

ICPE '22: Proceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Early experiments using SYCL single-source modern C++ on Xilinx FPGA: Extended Abstract of Technical Presentation

Black-Scholes Option Pricing on Intel CPUs and GPUs: Implementation on SYCL and Optimization Techniques

AdaptiveCpp Stdpar: C++ Standard Parallelism Integrated Into a SYCL Compiler