Loading [a11y]/accessibility-menu.js
Embedded Transformer Hetero-CiM: SRAM CiM for 4b Read/Write-MAC Self-attention and MLC ReRAM CiM for 6b Read-MAC Linear&FC Layers | IEEE Conference Publication | IEEE Xplore

Embedded Transformer Hetero-CiM: SRAM CiM for 4b Read/Write-MAC Self-attention and MLC ReRAM CiM for 6b Read-MAC Linear&FC Layers


Abstract:

Heterogeneously integrated SRAM & ReRAM Computation-in-Memories (CiMs) are proposed for Transformer models. The emerging transformer models including LLMs such as ChatGTP...Show More

Abstract:

Heterogeneously integrated SRAM & ReRAM Computation-in-Memories (CiMs) are proposed for Transformer models. The emerging transformer models including LLMs such as ChatGTP are composed of 1) linear & FC layers that only read weights of MAC and 2) self-attention that both reads and writes weights of MAC. To meet these diverse requirements and achieve compact transformer models that can be embedded at the edge, this paper proposes Transformer Hetero-CiM. Proposed Transformer Hetero-CiM is composed of 1) SRAM CiM for 4-bit Read/Write-MAC Self-attention and 2) MLC ReRAM CiM for 6-bit Read-MAC linear & FC Layers. By the optimal mix & match of low energy write and endurance free SRAM CiM and high capacity/low cost MLC ReRAM, the optimal 3D-integration Transformer system of the edge AI is achieved. Proposed Transformer Hetero-CiM reduces the circuit area by 89.1% and 45.3 % compared with transformer models that intensively use SRAM CiMs or ReRAM CiMs, respectively. Furthermore, proposed Transformer Hetero-CiM improves inference accuracy by 1.1% compared with intensive SRAM CiMs and intensive ReRAM CiMs.
Date of Conference: 12-15 May 2024
Date Added to IEEE Xplore: 24 May 2024
ISBN Information:

ISSN Information:

Conference Location: Seoul, Korea, Republic of

Funding Agency:


References

References is not available for this document.