ABSTRACT
3D NAND Flash memory has been proposed as an attractive candidate of inference engine for deep neural network (DNN) owing to its ultra-high density and commercially matured fabrication technology. However, the peripheral circuits require to be modified to enable compute-in-memory (CIM) and the chip architectures need to be redesigned for an optimized dataflow. In this work, we present a design of 3D NAND-CIM accelerator based on the macro parameters from an industry-grade prototype chip. The DNN inference performance is evaluated using the DNN+ NeuroSim framework. To exploit the ultra-high density of 3D NAND Flash, both inputs and weights duplication strategies are introduced to improve the throughput. The benchmarking on a variety of VGG and ResNet networks was performed across technological candidates for CIM including SRAM, RRAM and 3D NAND. Compared to similar designs with SRAM or RRAM, the result shows that 3D NAND based CIM design can achieve not only 17-24% chip size but also 1.9-2.7 times more competitive energy efficiency for 8-bit precision inference.
- N. P. Jouppi, C. Young, N. Patil, D. Patterson G. Agrawal, R. Bajwa, S. Bates, , "In-datacenter performance analysis of a tensor processing unit," ACM/IEEE International Symposium on Computer Architecture (ISCA), 2017, pp. 1-12. DOI:https://doi.org/10.1145/3079856.3080246.Google ScholarDigital Library
- S. Yu, “Neuro-inspired computing with emerging non-volatile memory,” Proceeding of the IEEE, vol. 106, no. 2, pp. 260-285, 2018, DOI: 10.1109/JPROC.2018.2790840.Google ScholarCross Ref
- M. Prezioso, F. Merrikh-Bayat, B. D. Hoskins, G. C. Adam, K. K. Likharev and D. B. Strukov, “Training and operation of an integrated neuromorphic network based on metal-oxide memristors,” Nature 521, pp. 61–64, May. 2015, DOI: 10.1038/nature14441.Google ScholarCross Ref
- J. Woo, K. Moon, J. Song, S. Lee, M. Kwak, J. Park, and H. Hwang, “Improved synaptic behavior under identical pulses using AlOx /HfO2 bilayer RRAM array for neuromorphic systems,” IEEE Electron Device Letters, VOL. 37, NO. 8, pp. 994-997, Aug. 2016, DOI: 10.1109/LED.2016.2582859.Google ScholarCross Ref
- F. Cai, J. M. Correll, S. H. Lee, Y. Lim, V. Bothra, Z. Zhang, M. P. Flynn, W.D. Lu M.A. Zidan, J. P. Strachan, and W. D. Lu, “A fully integrated reprogrammable memristor–CMOS system for efficient multiply–accumulate operations,” Nature Electronics 2 (7), 290-299, DOI: 10.1038/s41928-019-0270-x.Google ScholarCross Ref
- W. Wu, H. Wu, B. Gao, P. Yao, X. Zhang, X. Peng, S. Yu, H. Qian, “A methodology to improve linearity of analog RRAM for neuromorphic computing,” Symposium on VLSI Technology, June, 2018, art. No. 8510690, pp. 103-104, DOI: 10.1109/VLSIT.2018.8510690.Google ScholarCross Ref
- T. Gokmen, Y. Vlasov, “Acceleration of deep neural network training with resistive cross-point devices: design considerations,” Frontiers in Neuroscience, 10, 333, 2016, DOI: 10.3389/fnins.2016.00333.Google ScholarCross Ref
- P. Jain, U. Arslan, M. Sekhar, B. C. Lin, L. Wei, T. Sahu, J. Alzate-vinasco, A. Vangapaty, M. Meterelliyoz, N. Strutt, A. B. Chen, P. Hentges, P. A. Quintero, C. Connor, O. Golonzka, K. Fischer, F. Hamzaoglu, "A 3.6Mb 10.1Mb/mm2 Embedded Non-Volatile ReRAM Macro in 22nm FinFET Technology with Adaptive Forming/Set/Reset Schemes Yielding Down to 0.5V with Sensing Time of 5ns at 0.7V," IEEE International Solid- State Circuits Conference ( ISSCC), San Francisco, CA, USA, 2019, pp. 212-214, DOI: 10.1109/ISSCC.2019.8662393.Google Scholar
- S. Ambrogio, P. Narayanan, , H. Tsai, R. M. Shelby, I. Boybat, C. Nolfo, S. Sidler, M. Giordano, M. Bodini, N. C. P. Farinha, B. Killeen, C. Cheng, Y. Jaoudi and G. W. Burr, “Equivalent-accuracy accelerated neural-network training using analogue memory,” Nature 558, pp. 60–67, 2018, DOI: 10.1038/s41586-018-0180-5.Google Scholar
- W. Kim, R. L. Bruce, T. Masuda, G. W. Fraczak, N. Gong, P. Adusumilli, S. Ambrogio, H. Tsai, J. Bruley, J. -P. Han, M. Longstreet, F. Carta, K. Suu and M. BrightSky, “Confined PCM-based analog synaptic devices offering low resistance-drift and 1000 programmable states for deep learning, ” Symposium on VLSI Technology, June, 2019, pp. T66-67, DOI: 10.23919/VLSIT.2019.8776551.Google ScholarCross Ref
- X. Guo, F. Merrikh-Bayat, M. Bavandpour, M. Klachko, M. R. Mahmoodi, M. Prezioso, K. K. Likharev, D. B. Strukov, "Fast, energy-efficient, robust, and reproducible mixed-signal neuromorphic classifier based on embedded NOR flash memory technology," IEEE International Electron Devices Meeting (IEDM),San Francisco, CA, 2017, pp. 6.5.1-6.5.4, DOI: 10.1109/IEDM.2017.8268341.Google Scholar
- P. Wang, F. Xu, B. Wang, B. Gao, H. Wu, H. Qian, S. Yu, “Three-dimensional NAND Flash for vector-matrix multiplication,” IEEE Trans. VLSI Systems, vol. 27, no. 4, pp. 988-991, 2019, DOI: 10.1109/TVLSI.2018.2882194.Google ScholarCross Ref
- H. -T. Lue, P. -K. Hsu, M. -L. Wei, T. -H. Yeh, P. -Y. Du, W. -C. Chen, K. -C. Wang and C. -Y. Lu, “Optimal design methods to transform 3D NAND Flash into a high-density, high-bandwidth and low-power nonvolatile computing in memory (nvCIM) accelerator for deep-learning neural networks (DNN),” IEEE International Electron Devices Meeting (IEDM),San Francisco, CA, 2019, pp. 38.1.1-38.1.4, DOI: 10.1109/IEDM19573.2019.8993652.Google ScholarCross Ref
- S. -T. Lee, H. Kim, J. -H. Bae, H. Yoo, N. Y. Choi, D. Kwon, S. Lim, B. -G. Park, J. -H. Lee, "High-Density and Highly-Reliable Binary Neural Networks Using NAND Flash Memory Cells as Synaptic Devices," IEEE International Electron Devices Meeting (IEDM), San Francisco, CA, USA, 2019, pp. 38.4.1-38.4.4, DOI: 10.1109/IEDM19573.2019.8993478Google ScholarCross Ref
- D. -H. Kim, H. Kim, S. Yun, Y. Song, J. Kim, S. Joe, K. Kang, J. Jang, H. Yoon, K. Lee, M. Kim, J. Kwon, J. Jo, S. Park, J. Park, J. Cho, S. Park, G. Kim, J. Bang, H. Kim, J. Park, D. Lee, S. Lee, H. Jang, H. Lee, D. Shin, J. Park, J. Kim, J. Kim, K. Jang, I. H. Park, S. H. Moon, M. Choi, P. Kwak, J. Park, Y. Choi, S. Kim, S. Lee, D. Kang, J. Lim, D. Byeon, K. Song, J. Choi, S. J. Hwang, J. Jeong, "A 1Tb 4b/cell NAND Flash Memory with tPROG=2ms, tR=110μs and 1.2Gb/s High-Speed IO Rate," IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 2020, pp. 218-220, DOI: 10.1109/ISSCC19947.2020.9063053Google ScholarCross Ref
- X. Peng, S. Huang, Y. Luo, X. Sun, S. Yu, “DNN+NeuroSim: An end-to-end benchmarking framework for compute-in-memory accelerators with versatile device technologies,” IEEE International Electron Devices Meeting (IEDM)2019, San Francisco, USA, DOI: 10.1109/IEDM19573.2019.8993491. Open-source code available at: https://github.com/neurosim/DNN_NeuroSim_V1.1Google Scholar
- K. Simonyan, A. Zisserman, “Very deep convolutional networks for large-scale image recognition”, International Conference on Learning Representations (ICLR), May. 2015, San Diego, USA.Google Scholar
- K. He, X. Zhang, S. Ren, J. Sun, “Deep Residual Learning for Image Recognition”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-778, Jun. 2016, Las Vegas, USA, DOI: 10.1109/CVPR.2016.90.Google ScholarCross Ref
- Techinsights, Samsung 32L 3D NAND teardown report.Google Scholar
- S. Yu, X. Sun, X. Peng, S. Huang, “Compute-in-memory with emerging nonvolatile-memories: challenges and prospects,” IEEE Custom Integrated Circuits Conference (CICC)2020, Boston, USA, DOI: 10.1109/CICC48029.2020.9075887Google ScholarCross Ref
- Q. Dong, M. E. Sinangil, B. Erbagci, D. Sun, W. Khwa, H. Liao, Y. Wang, J. Chang, “A 351TOPS/W and 372.4GOPS Compute-in-Memory SRAM Macro in 7nm FinFET CMOS for Machine-Learning Applications”, IEEE International Solid- State Circuits Conference ( ISSCC), San Francisco, CA, USA, 2020, pp. 242-244, DOI: 10.1109/ISSCC19947.2020.9062985Google ScholarCross Ref
- X. Si, Y. -N. Tu, W. -H. Huanq, J. -W. Su, P. -J. Lu, J. -H. Wang, T. -W. Liu, S. -Y. Wu, R. Liu, Y. -C. Chou, Z. Zhang, S. -H. Sie, W. -C. Wei, Y. -C. Lo, T. -H. Wen, T. -H. Hsu, Y. -K. Chen, W. Shih, C. -C. Lo, R. -S. Liu, C. -C. Hsieh, K. -T. Tang, N. -C. Lien, W. -C. Shih, Y. He, Q. Li, M. Chang, "A 28nm 64Kb 6T SRAM Computing-in-Memory Macro with 8b MAC Operation for AI Edge Chips," IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 2020, pp. 246-248, DOI: 10.1109/ISSCC19947.2020.9062995.Google ScholarCross Ref
- C. -X. Xue, T. -Y. Huang, J. -S. Liu, T. -W. Chang, H. -Y. Kao, J. -H. Wang, T. W. Liu, S. Y. Wei, S. P. Huang, W. -C. Wei, Y. -R. Chen, T. -H. Hsu, Y. -K. Chen, Y. -C. Lo, T. -H. Wen, C. -C. Lo, R. -S. Liu, C. -C. Hsieh, K. -T. Tang, M. F. Chang, "A 22nm 2Mb ReRAM Compute-in-Memory Macro with 121-28TOPS/W for Multibit MAC Computing for Tiny AI Edge Devices," IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 2020, pp. 244-246, DOI: 10.1109/ISSCC19947.2020.9063078.Google ScholarCross Ref
Recommendations
Improving 3D NAND Flash Memory Lifetime by Tolerating Early Retention Loss and Process Variation
Compared to planar (i.e., two-dimensional) NAND flash memory, 3D NAND flash memory uses a new flash cell design, and vertically stacks dozens of silicon layers in a single chip. This allows 3D NAND flash memory to increase storage density using a much ...
Revisiting wear leveling design on compression applied 3D NAND flash memory: work-in-progress
CODES '18: Proceedings of the International Conference on Hardware/Software Codesign and System SynthesisCompression has been demonstrated as an efficient method for lifetime improvement on flash memory. However, data compression ratios are various, which bring proportional wearing on flash pages. Furthermore, the compression schemes have still not been ...
1+1>2: variation-aware lifetime enhancement for embedded 3D NAND flash systems
LCTES 2019: Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded SystemsThree-dimensional (3D) NAND flash has been developed to boost the storage capacity by stacking memory cells vertically. One critical characteristic of 3D NAND flash is its large endurance variation. With this characteristic, the lifetime will be ...
Comments