Conferences >2021 IEEE/ACM International S...

Unleashing the Low-Precision Computation Potential of Tensor Cores on GPUs

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Tensor-specialized hardware for supporting low-precision arithmetic has become an inevitable trend due to the ever-increasing demand on computational capability and energ...Show More

Metadata

Abstract:

Tensor-specialized hardware for supporting low-precision arithmetic has become an inevitable trend due to the ever-increasing demand on computational capability and energy efficiency in intelligent applications. The main challenge faced when accelerating a tensor program on tensor-specialized hardware is how to achieve the best performance possible in reduced precision by fully utilizing its computational resources while keeping the precision loss in a controlled manner. In this paper, we address this challenge by proposing QUANTENSOR, a new approach for accelerating general-purpose tensor programs by replacing its tensor computations with low-precision quantized tensor computations on NVIDIA Tensor Cores. The key novelty is a new residual-based precision refinement technique for controlling the quantization errors, allowing tradeoffs between performance and precision to be made. Evaluation with GEMM, deep neural networks, and linear algebra applications shows that QUANTENSOR can achieve remarkable performance improvements while reducing the precision loss incurred significantly at acceptable overheads.

Published in: 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Date of Conference: 27 February 2021 - 03 March 2021

Date Added to IEEE Xplore: 11 March 2021

ISBN Information:

DOI: 10.1109/CGO51591.2021.9370335

Conference Location: Seoul, Korea (South)