Abstract:
Heterogeneously integrated SRAM & ReRAM Computation-in-Memories (CiMs) are proposed for Transformer models. The emerging transformer models including LLMs such as ChatGTP...Show MoreMetadata
Abstract:
Heterogeneously integrated SRAM & ReRAM Computation-in-Memories (CiMs) are proposed for Transformer models. The emerging transformer models including LLMs such as ChatGTP are composed of 1) linear & FC layers that only read weights of MAC and 2) self-attention that both reads and writes weights of MAC. To meet these diverse requirements and achieve compact transformer models that can be embedded at the edge, this paper proposes Transformer Hetero-CiM. Proposed Transformer Hetero-CiM is composed of 1) SRAM CiM for 4-bit Read/Write-MAC Self-attention and 2) MLC ReRAM CiM for 6-bit Read-MAC linear & FC Layers. By the optimal mix & match of low energy write and endurance free SRAM CiM and high capacity/low cost MLC ReRAM, the optimal 3D-integration Transformer system of the edge AI is achieved. Proposed Transformer Hetero-CiM reduces the circuit area by 89.1% and 45.3 % compared with transformer models that intensively use SRAM CiMs or ReRAM CiMs, respectively. Furthermore, proposed Transformer Hetero-CiM improves inference accuracy by 1.1% compared with intensive SRAM CiMs and intensive ReRAM CiMs.
Published in: 2024 IEEE International Memory Workshop (IMW)
Date of Conference: 12-15 May 2024
Date Added to IEEE Xplore: 24 May 2024
ISBN Information: