Abstract:
Tensor-specialized hardware for supporting low-precision arithmetic has become an inevitable trend due to the ever-increasing demand on computational capability and energ...Show MoreMetadata
Abstract:
Tensor-specialized hardware for supporting low-precision arithmetic has become an inevitable trend due to the ever-increasing demand on computational capability and energy efficiency in intelligent applications. The main challenge faced when accelerating a tensor program on tensor-specialized hardware is how to achieve the best performance possible in reduced precision by fully utilizing its computational resources while keeping the precision loss in a controlled manner. In this paper, we address this challenge by proposing QUANTENSOR, a new approach for accelerating general-purpose tensor programs by replacing its tensor computations with low-precision quantized tensor computations on NVIDIA Tensor Cores. The key novelty is a new residual-based precision refinement technique for controlling the quantization errors, allowing tradeoffs between performance and precision to be made. Evaluation with GEMM, deep neural networks, and linear algebra applications shows that QUANTENSOR can achieve remarkable performance improvements while reducing the precision loss incurred significantly at acceptable overheads.
Date of Conference: 27 February 2021 - 03 March 2021
Date Added to IEEE Xplore: 11 March 2021
ISBN Information: