research-article

IM3A: Boosting Deep Neural Network Efficiency via In-Memory Addressing-Assisted Acceleration

Authors:

Li JiangAuthors Info & Claims

GLSVLSI '21: Proceedings of the 2021 Great Lakes Symposium on VLSI

Pages 253 - 258

https://doi.org/10.1145/3453688.3461491

Published: 22 June 2021 Publication History

Get Access

Abstract

Most existing RRAM-based designs require expensive analog-to-digital converters (ADCs) digital-to-analog converters (DACs) and excessively occupied crossbars to achieve efficient acceleration. To reduce the overhead of DACs, the existing solution is to split the input into a bit sequence, but the MAC operation that can be completed by one cycle is forced to multiple cycles to the energy-efficiency decrease. For ADCs, it generally partitions the weight into multiple cells, resulting in an excessive number of crossbars or frequent writes on account of insufficient number. To solve this problem, we propose IM3A, an In-Memory Addressing-Assisted Acceleration scheme IM3A decompose MAC operations into multiplication and accumulation, which are implemented separately through the content-addressable and multiply-accumulated capabilities of the crossbar. The energy-efficiency is improved by the CAM crossbar supporting the parallel search of very large numbers of data bits, and the RRAM crossbar selectively enabling the rows to be read based on the hit result of the CAM search. Therefore, only the possibility of operands involved in MAC is deployed on the crossbar. Experimental results show that IM3A applied on various networks achieves system energy-efficiency improvement by 1.7x ∼ 15.9x over two state-of-the-art crossbar accelerators: ISAAC and PIM-Prune.

Supplemental Material

MP4 File

This is the representation video of our paper, ?IM3A: Boosting Deep Neural Network Efficiency via In-Memory Addressing-Assisted Acceleration?. Our paper uses ternary content addressable memory (TCAM) and ReRAM crossbar to effectively implement the multiplication in deep neuron networks. TCAM is used to search for the index of the multiplication and ReRAM crossbar retrieve the product from its memory. By adopting our scheme, expensive digital-to-analog and analog-to-digital converters can be exempted from the architecture. In addition, the number of crossbar that are used to store all the weight matrices in existing schemes can be greatly saved by only store operand pairs and their result on TCAM and ReRAM crossbar respectively.

Download
28.21 MB

References

[1]

Rajeev Balasubramonian et almbox. 2017. CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories. TACO, Vol. 14, 2 (2017), 14.

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

An Almost Fully RRAM-Based LUT Design for Reconfigurable Circuits

CORUSCANT: Fast Efficient Processing-in-Racetrack Memories

Area-Efficient and Reliable Error Correcting Code Circuit Based on Hybrid CMOS/Memristor Circuit

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Data Availability

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations