research-article

Architectural Design of 3D NAND Flash based Compute-in-Memory for Inference Engine

Authors:
Wonbo Shim

Georgia Institute of Technology, United States

Georgia Institute of Technology, United States
View Profile

,
Hongwu Jiang

Georgia Institute of Technology, United States

Georgia Institute of Technology, United States
View Profile

,
Xiaochen Peng

Georgia Institute of Technology, United States

Georgia Institute of Technology, United States
View Profile

,
Shimeng Yu

Georgia Institute of Technology, United States

Georgia Institute of Technology, United States
View Profile

MEMSYS '20: Proceedings of the International Symposium on Memory SystemsSeptember 2020Pages 77–85https://doi.org/10.1145/3422575.3422779

Published:21 March 2021Publication History

MEMSYS '20: Proceedings of the International Symposium on Memory Systems

Pages 77–85

ABSTRACT

3D NAND Flash memory has been proposed as an attractive candidate of inference engine for deep neural network (DNN) owing to its ultra-high density and commercially matured fabrication technology. However, the peripheral circuits require to be modified to enable compute-in-memory (CIM) and the chip architectures need to be redesigned for an optimized dataflow. In this work, we present a design of 3D NAND-CIM accelerator based on the macro parameters from an industry-grade prototype chip. The DNN inference performance is evaluated using the DNN+ NeuroSim framework. To exploit the ultra-high density of 3D NAND Flash, both inputs and weights duplication strategies are introduced to improve the throughput. The benchmarking on a variety of VGG and ResNet networks was performed across technological candidates for CIM including SRAM, RRAM and 3D NAND. Compared to similar designs with SRAM or RRAM, the result shows that 3D NAND based CIM design can achieve not only 17-24% chip size but also 1.9-2.7 times more competitive energy efficiency for 8-bit precision inference.

References

N. P. Jouppi, C. Young, N. Patil, D. Patterson G. Agrawal, R. Bajwa, S. Bates, , "In-datacenter performance analysis of a tensor processing unit," ACM/IEEE International Symposium on Computer Architecture (ISCA), 2017, pp. 1-12. DOI:https://doi.org/10.1145/3079856.3080246.Google ScholarDigital Library
S. Yu, “Neuro-inspired computing with emerging non-volatile memory,” Proceeding of the IEEE, vol. 106, no. 2, pp. 260-285, 2018, DOI: 10.1109/JPROC.2018.2790840.Google ScholarCross Ref
M. Prezioso, F. Merrikh-Bayat, B. D. Hoskins, G. C. Adam, K. K. Likharev and D. B. Strukov, “Training and operation of an integrated neuromorphic network based on metal-oxide memristors,” Nature 521, pp. 61–64, May. 2015, DOI: 10.1038/nature14441.Google ScholarCross Ref
J. Woo, K. Moon, J. Song, S. Lee, M. Kwak, J. Park, and H. Hwang, “Improved synaptic behavior under identical pulses using AlOx /HfO2 bilayer RRAM array for neuromorphic systems,” IEEE Electron Device Letters, VOL. 37, NO. 8, pp. 994-997, Aug. 2016, DOI: 10.1109/LED.2016.2582859.Google ScholarCross Ref
F. Cai, J. M. Correll, S. H. Lee, Y. Lim, V. Bothra, Z. Zhang, M. P. Flynn, W.D. Lu M.A. Zidan, J. P. Strachan, and W. D. Lu, “A fully integrated reprogrammable memristor–CMOS system for efficient multiply–accumulate operations,” Nature Electronics 2 (7), 290-299, DOI: 10.1038/s41928-019-0270-x.Google ScholarCross Ref
W. Wu, H. Wu, B. Gao, P. Yao, X. Zhang, X. Peng, S. Yu, H. Qian, “A methodology to improve linearity of analog RRAM for neuromorphic computing,” Symposium on VLSI Technology, June, 2018, art. No. 8510690, pp. 103-104, DOI: 10.1109/VLSIT.2018.8510690.Google ScholarCross Ref
T. Gokmen, Y. Vlasov, “Acceleration of deep neural network training with resistive cross-point devices: design considerations,” Frontiers in Neuroscience, 10, 333, 2016, DOI: 10.3389/fnins.2016.00333.Google ScholarCross Ref
P. Jain, U. Arslan, M. Sekhar, B. C. Lin, L. Wei, T. Sahu, J. Alzate-vinasco, A. Vangapaty, M. Meterelliyoz, N. Strutt, A. B. Chen, P. Hentges, P. A. Quintero, C. Connor, O. Golonzka, K. Fischer, F. Hamzaoglu, "A 3.6Mb 10.1Mb/mm2 Embedded Non-Volatile ReRAM Macro in 22nm FinFET Technology with Adaptive Forming/Set/Reset Schemes Yielding Down to 0.5V with Sensing Time of 5ns at 0.7V," IEEE International Solid- State Circuits Conference ( ISSCC), San Francisco, CA, USA, 2019, pp. 212-214, DOI: 10.1109/ISSCC.2019.8662393.Google Scholar
S. Ambrogio, P. Narayanan, , H. Tsai, R. M. Shelby, I. Boybat, C. Nolfo, S. Sidler, M. Giordano, M. Bodini, N. C. P. Farinha, B. Killeen, C. Cheng, Y. Jaoudi and G. W. Burr, “Equivalent-accuracy accelerated neural-network training using analogue memory,” Nature 558, pp. 60–67, 2018, DOI: 10.1038/s41586-018-0180-5.Google Scholar
W. Kim, R. L. Bruce, T. Masuda, G. W. Fraczak, N. Gong, P. Adusumilli, S. Ambrogio, H. Tsai, J. Bruley, J. -P. Han, M. Longstreet, F. Carta, K. Suu and M. BrightSky, “Confined PCM-based analog synaptic devices offering low resistance-drift and 1000 programmable states for deep learning, ” Symposium on VLSI Technology, June, 2019, pp. T66-67, DOI: 10.23919/VLSIT.2019.8776551.Google ScholarCross Ref
X. Guo, F. Merrikh-Bayat, M. Bavandpour, M. Klachko, M. R. Mahmoodi, M. Prezioso, K. K. Likharev, D. B. Strukov, "Fast, energy-efficient, robust, and reproducible mixed-signal neuromorphic classifier based on embedded NOR flash memory technology," IEEE International Electron Devices Meeting (IEDM),San Francisco, CA, 2017, pp. 6.5.1-6.5.4, DOI: 10.1109/IEDM.2017.8268341.Google Scholar
P. Wang, F. Xu, B. Wang, B. Gao, H. Wu, H. Qian, S. Yu, “Three-dimensional NAND Flash for vector-matrix multiplication,” IEEE Trans. VLSI Systems, vol. 27, no. 4, pp. 988-991, 2019, DOI: 10.1109/TVLSI.2018.2882194.Google ScholarCross Ref
H. -T. Lue, P. -K. Hsu, M. -L. Wei, T. -H. Yeh, P. -Y. Du, W. -C. Chen, K. -C. Wang and C. -Y. Lu, “Optimal design methods to transform 3D NAND Flash into a high-density, high-bandwidth and low-power nonvolatile computing in memory (nvCIM) accelerator for deep-learning neural networks (DNN),” IEEE International Electron Devices Meeting (IEDM),San Francisco, CA, 2019, pp. 38.1.1-38.1.4, DOI: 10.1109/IEDM19573.2019.8993652.Google ScholarCross Ref
S. -T. Lee, H. Kim, J. -H. Bae, H. Yoo, N. Y. Choi, D. Kwon, S. Lim, B. -G. Park, J. -H. Lee, "High-Density and Highly-Reliable Binary Neural Networks Using NAND Flash Memory Cells as Synaptic Devices," IEEE International Electron Devices Meeting (IEDM), San Francisco, CA, USA, 2019, pp. 38.4.1-38.4.4, DOI: 10.1109/IEDM19573.2019.8993478Google ScholarCross Ref
D. -H. Kim, H. Kim, S. Yun, Y. Song, J. Kim, S. Joe, K. Kang, J. Jang, H. Yoon, K. Lee, M. Kim, J. Kwon, J. Jo, S. Park, J. Park, J. Cho, S. Park, G. Kim, J. Bang, H. Kim, J. Park, D. Lee, S. Lee, H. Jang, H. Lee, D. Shin, J. Park, J. Kim, J. Kim, K. Jang, I. H. Park, S. H. Moon, M. Choi, P. Kwak, J. Park, Y. Choi, S. Kim, S. Lee, D. Kang, J. Lim, D. Byeon, K. Song, J. Choi, S. J. Hwang, J. Jeong, "A 1Tb 4b/cell NAND Flash Memory with tPROG=2ms, tR=110μs and 1.2Gb/s High-Speed IO Rate," IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 2020, pp. 218-220, DOI: 10.1109/ISSCC19947.2020.9063053Google ScholarCross Ref
X. Peng, S. Huang, Y. Luo, X. Sun, S. Yu, “DNN+NeuroSim: An end-to-end benchmarking framework for compute-in-memory accelerators with versatile device technologies,” IEEE International Electron Devices Meeting (IEDM)2019, San Francisco, USA, DOI: 10.1109/IEDM19573.2019.8993491. Open-source code available at: https://github.com/neurosim/DNN_NeuroSim_V1.1Google Scholar
K. Simonyan, A. Zisserman, “Very deep convolutional networks for large-scale image recognition”, International Conference on Learning Representations (ICLR), May. 2015, San Diego, USA.Google Scholar
K. He, X. Zhang, S. Ren, J. Sun, “Deep Residual Learning for Image Recognition”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-778, Jun. 2016, Las Vegas, USA, DOI: 10.1109/CVPR.2016.90.Google ScholarCross Ref
Techinsights, Samsung 32L 3D NAND teardown report.Google Scholar
S. Yu, X. Sun, X. Peng, S. Huang, “Compute-in-memory with emerging nonvolatile-memories: challenges and prospects,” IEEE Custom Integrated Circuits Conference (CICC)2020, Boston, USA, DOI: 10.1109/CICC48029.2020.9075887Google ScholarCross Ref
Q. Dong, M. E. Sinangil, B. Erbagci, D. Sun, W. Khwa, H. Liao, Y. Wang, J. Chang, “A 351TOPS/W and 372.4GOPS Compute-in-Memory SRAM Macro in 7nm FinFET CMOS for Machine-Learning Applications”, IEEE International Solid- State Circuits Conference ( ISSCC), San Francisco, CA, USA, 2020, pp. 242-244, DOI: 10.1109/ISSCC19947.2020.9062985Google ScholarCross Ref
X. Si, Y. -N. Tu, W. -H. Huanq, J. -W. Su, P. -J. Lu, J. -H. Wang, T. -W. Liu, S. -Y. Wu, R. Liu, Y. -C. Chou, Z. Zhang, S. -H. Sie, W. -C. Wei, Y. -C. Lo, T. -H. Wen, T. -H. Hsu, Y. -K. Chen, W. Shih, C. -C. Lo, R. -S. Liu, C. -C. Hsieh, K. -T. Tang, N. -C. Lien, W. -C. Shih, Y. He, Q. Li, M. Chang, "A 28nm 64Kb 6T SRAM Computing-in-Memory Macro with 8b MAC Operation for AI Edge Chips," IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 2020, pp. 246-248, DOI: 10.1109/ISSCC19947.2020.9062995.Google ScholarCross Ref
C. -X. Xue, T. -Y. Huang, J. -S. Liu, T. -W. Chang, H. -Y. Kao, J. -H. Wang, T. W. Liu, S. Y. Wei, S. P. Huang, W. -C. Wei, Y. -R. Chen, T. -H. Hsu, Y. -K. Chen, Y. -C. Lo, T. -H. Wen, C. -C. Lo, R. -S. Liu, C. -C. Hsieh, K. -T. Tang, M. F. Chang, "A 22nm 2Mb ReRAM Compute-in-Memory Macro with 121-28TOPS/W for Multibit MAC Computing for Tiny AI Edge Devices," IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 2020, pp. 244-246, DOI: 10.1109/ISSCC19947.2020.9063078.Google ScholarCross Ref

Recommendations

Improving 3D NAND Flash Memory Lifetime by Tolerating Early Retention Loss and Process Variation

Compared to planar (i.e., two-dimensional) NAND flash memory, 3D NAND flash memory uses a new flash cell design, and vertically stacks dozens of silicon layers in a single chip. This allows 3D NAND flash memory to increase storage density using a much ...
Read More
Revisiting wear leveling design on compression applied 3D NAND flash memory: work-in-progress
CODES '18: Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis

Compression has been demonstrated as an efficient method for lifetime improvement on flash memory. However, data compression ratios are various, which bring proportional wearing on flash pages. Furthermore, the compression schemes have still not been ...
Read More
1+1>2: variation-aware lifetime enhancement for embedded 3D NAND flash systems
LCTES 2019: Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems

Three-dimensional (3D) NAND flash has been developed to boost the storage capacity by stacking memory cells vertically. One critical characteristic of 3D NAND flash is its large endurance variation. With this characteristic, the lifetime will be ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

MEMSYS '20: Proceedings of the International Symposium on Memory Systems
September 2020
362 pages
ISBN:9781450388993
DOI:10.1145/3422575

Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 March 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
3D NAND Flash
Deep neural network
compute-in-memory
hardware accelerator
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 567
  Total Downloads
- Downloads (Last 12 months)177
- Downloads (Last 6 weeks)37
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Architectural Design of 3D NAND Flash based Compute-in-Memory for Inference Engine

MEMSYS '20: Proceedings of the International Symposium on Memory Systems

ABSTRACT

References

Cited By

Recommendations

Improving 3D NAND Flash Memory Lifetime by Tolerating Early Retention Loss and Process Variation

Revisiting wear leveling design on compression applied 3D NAND flash memory: work-in-progress

1+1>2: variation-aware lifetime enhancement for embedded 3D NAND flash systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Architectural Design of 3D NAND Flash based Compute-in-Memory for Inference Engine

MEMSYS '20: Proceedings of the International Symposium on Memory Systems

ABSTRACT

References

Cited By

Recommendations

Improving 3D NAND Flash Memory Lifetime by Tolerating Early Retention Loss and Process Variation

Revisiting wear leveling design on compression applied 3D NAND flash memory: work-in-progress

1+1>2: variation-aware lifetime enhancement for embedded 3D NAND flash systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media