poster

Simplifying low-level GPU programming with GAS

Authors:

Da Yan,

Wei Wang,

Xiaowen ChuAuthors Info & Claims

PPoPP '21: Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Pages 469 - 471

https://doi.org/10.1145/3437801.3441591

Published: 17 February 2021 Publication History

Get Access

Abstract

Many low-level optimizations for NVIDIA GPU can only be implemented in native hardware assembly (SASS). However, programming in SASS is unproductive and not portable.

To simplify low-level GPU programming, we present GAS (Gpu ASsembly), a PTX-like language that provides a stable instruction set across hardware architectures while giving programmers a low-level control of code execution. We demonstrate that GAS can be used with ease for low-level benchmarking and performance tuning in the context of Tensor Core HGEMM.

References

[1]

NervanaSystems. 2016. Neon. Retrieved Jan 12, 2020 from https://github.com/NervanaSystems/neon

Google Scholar

[2]

NVIDIA. 2020. cuBLAS. Retrieved Aug 12, 2020 from https://docs.nvidia.com/cuda/cublas/index.html

Google Scholar

[3]

Prashant Singh Rawat, Fabrice Rastello, Aravind Sukumaran-Rajam, Louis-Noël Pouchet, Atanas Rountev, and P Sadayappan. 2018. Register optimizations for stencils on GPUs. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'18). ACM, Vienna, Austria, 168--182.

Digital Library

Google Scholar

[4]

Da Yan, Wei Wang, and Xiaowen Chu. 2020. Demystifying Tensor Cores to Optimize Half-Precision Matrix Multiply. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS'2020). IEEE, New Orleans, LA, USA, 634--643.

Crossref

Google Scholar

[5]

Xiuxia Zhang, Guangming Tan, Shuangbai Xue, Jiajia Li, Keren Zhou, and Mingyu Chen. 2017. Understanding the GPU Microarchitecture to Achieve Bare-Metal Performance Tuning. In Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'17). ACM, Austin, TX, USA, 31--43.

Digital Library

Google Scholar

Cited By

View all

Castro RAndrade DFraguela B(2024)STuning-DL: Model-Driven Autotuning of Sparse GPU Kernels for Deep LearningIEEE Access10.1109/ACCESS.2024.340232612(70581-70599)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3402326

Index Terms

Simplifying low-level GPU programming with GAS
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multicore architectures
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation

Recommendations

Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers
Highlights
- Generate parallel CUDA code from sequential C input code using a compiler-based tool for key operators in Geometric Multigrid.
Abstract
GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model ...
Optimizing stencil application on multi-thread GPU architecture using stream programming model
ARCS'10: Proceedings of the 23rd international conference on Architecture of Computing Systems

With fast development of GPU hardware and software, using GPUs to accelerate non-graphics CPU applications is becoming inevitable trend. GPUs are good at performing ALU-intensive computation and feature high peak performance; however, how to harness ...
gpucc: an open-source GPGPU compiler
CGO '16: Proceedings of the 2016 International Symposium on Code Generation and Optimization

Graphics Processing Units have emerged as powerful accelerators for massively parallel, numerically intensive workloads. The two dominant software models for these devices are NVIDIA’s CUDA and the cross-platform OpenCL standard. Until now, there has ...

Comments

Information & Contributors

Information

Published In

PPoPP '21: Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 2021

507 pages

ISBN:9781450382946

DOI:10.1145/3437801

General Chair:
Jaejin Lee
Seoul National University, South Korea
,
Program Chair:
Erez Petrank
Technion, Israel

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 February 2021

Check for updates

Badges

Author Tags

Qualifiers

Poster

Conference

PPoPP '21

Sponsor:

PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 27, 2021

Virtual Event, Republic of Korea

Acceptance Rates

PPoPP '21 Paper Acceptance Rate 31 of 150 submissions, 21%;

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
411
Total Downloads

Downloads (Last 12 months)44
Downloads (Last 6 weeks)5

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Castro RAndrade DFraguela B(2024)STuning-DL: Model-Driven Autotuning of Sparse GPU Kernels for Deep LearningIEEE Access10.1109/ACCESS.2024.340232612(70581-70599)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3402326

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cited By

Index Terms

Recommendations

Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers

Optimizing stencil application on multi-thread GPU architecture using stream programming model

gpucc: an open-source GPGPU compiler