7.2 A 12nm Programmable Convolution-Efficient Neural-Processing-Unit Chip Achieving 825TOPS | IEEE Conference Publication | IEEE Xplore

7.2 A 12nm Programmable Convolution-Efficient Neural-Processing-Unit Chip Achieving 825TOPS


Abstract:

Convolutional neural networks (CNN) represent a key application in data centers, which calls for accelerators that are: 1) efficient for CNN computations; 2) having high ...Show More

Abstract:

Convolutional neural networks (CNN) represent a key application in data centers, which calls for accelerators that are: 1) efficient for CNN computations; 2) having high throughput to be cost-efficient; and, 3) with adequate programming flexibility for algorithm upgrades. Lacking of the availability of such a chip in the market, we designed our own. Matrix multiplication (MM) and convolution (CONV) are the top-2 deep learning (DL) operations requiring intensive computation. Most existing accelerators, like GPUs [6], [7], TPU [9], and a few new AI chips [3], [4] are architected for GEMM. Computing CONV on a GEMM engine, one needs the img2col() transformation to flatten images into general matrixes. This introduces huge data inflation, leading to unnecessary extra computation and storage, but also decreasing arithmetic intensity and bounding performance towards I/O and memory. Although some accelerators such as [5] exploit the CONV architecture directly, integrating larger but balanced computing power into a single chip is quite challenging. Moreover, with the fast evolution of DL algorithms, it is critical to design a programmable neural processing unit (NPU) instead of a dedicated ASIC for data center scenarios. To satisfy the above requirements, our NPU is architected to be CONV-efficient under the control of operation-fused coarse-grained instructions. It integrates as much computing power as possible via squeezed computation with a large SRAM-only design. Also, it delivers programming flexibility via an instruction set architecture (ISA) with coverage for anticipated forward-looking functionality.
Date of Conference: 16-20 February 2020
Date Added to IEEE Xplore: 13 April 2020
ISBN Information:

ISSN Information:

Conference Location: San Francisco, CA, USA

Contact IEEE to Subscribe

References

References is not available for this document.