Loading [MathJax]/extensions/MathMenu.js
Unleashing the Low-Precision Computation Potential of Tensor Cores on GPUs | IEEE Conference Publication | IEEE Xplore

Unleashing the Low-Precision Computation Potential of Tensor Cores on GPUs


Abstract:

Tensor-specialized hardware for supporting low-precision arithmetic has become an inevitable trend due to the ever-increasing demand on computational capability and energ...Show More

Abstract:

Tensor-specialized hardware for supporting low-precision arithmetic has become an inevitable trend due to the ever-increasing demand on computational capability and energy efficiency in intelligent applications. The main challenge faced when accelerating a tensor program on tensor-specialized hardware is how to achieve the best performance possible in reduced precision by fully utilizing its computational resources while keeping the precision loss in a controlled manner. In this paper, we address this challenge by proposing QUANTENSOR, a new approach for accelerating general-purpose tensor programs by replacing its tensor computations with low-precision quantized tensor computations on NVIDIA Tensor Cores. The key novelty is a new residual-based precision refinement technique for controlling the quantization errors, allowing tradeoffs between performance and precision to be made. Evaluation with GEMM, deep neural networks, and linear algebra applications shows that QUANTENSOR can achieve remarkable performance improvements while reducing the precision loss incurred significantly at acceptable overheads.
Date of Conference: 27 February 2021 - 03 March 2021
Date Added to IEEE Xplore: 11 March 2021
ISBN Information:
Conference Location: Seoul, Korea (South)

Contact IEEE to Subscribe

References

References is not available for this document.