A 28nm 29.2TFLOPS/W BF16 and 36.5TOPS/W INT8 Reconfigurable Digital CIM Processor with Unified FP/INT Pipeline and Bitwise In-Memory Booth Multiplication for Cloud Deep Learning Acceleration | IEEE Conference Publication | IEEE Xplore

A 28nm 29.2TFLOPS/W BF16 and 36.5TOPS/W INT8 Reconfigurable Digital CIM Processor with Unified FP/INT Pipeline and Bitwise In-Memory Booth Multiplication for Cloud Deep Learning Acceleration


Abstract:

Many computing-in-memory (CIM) processors have been proposed for edge deep learning (DL) acceleration. They usually rely on analog CIM techniques to achieve high-efficien...Show More

Abstract:

Many computing-in-memory (CIM) processors have been proposed for edge deep learning (DL) acceleration. They usually rely on analog CIM techniques to achieve high-efficiency NN inference with low-precision INT multiply-accumulation (MAC) support [1]. Different from edge DL, cloud DL has higher accuracy requirements for NN inference and training, which demands extra support for high-precision floating-point (FP) MAC. As shown in Fig. 15.5.1, applying CIM techniques to cloud DL has three main limitations: 1) FP MAC has tightly coupled exponent alignment and INT mantissa MAC. Implementing complex exponent alignment in memory will harm CIM's direct accumulation structure and reduce efficiency. 2) FP MAC's energy is dominated by INT mantissa MAC. Further acceleration on CIM-based INT MAC is critical for processor efficiency. 3) Previous cloud DL processors usually have separate FP and INT engines, but only activate one engine at once [2], which causes high area overhead and low resource utilization.
Date of Conference: 20-26 February 2022
Date Added to IEEE Xplore: 17 March 2022
ISBN Information:

ISSN Information:

Conference Location: San Francisco, CA, USA

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.