skip to main content
10.1145/3422575.3422779acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmemsysConference Proceedingsconference-collections
research-article

Architectural Design of 3D NAND Flash based Compute-in-Memory for Inference Engine

Published:21 March 2021Publication History

ABSTRACT

3D NAND Flash memory has been proposed as an attractive candidate of inference engine for deep neural network (DNN) owing to its ultra-high density and commercially matured fabrication technology. However, the peripheral circuits require to be modified to enable compute-in-memory (CIM) and the chip architectures need to be redesigned for an optimized dataflow. In this work, we present a design of 3D NAND-CIM accelerator based on the macro parameters from an industry-grade prototype chip. The DNN inference performance is evaluated using the DNN+ NeuroSim framework. To exploit the ultra-high density of 3D NAND Flash, both inputs and weights duplication strategies are introduced to improve the throughput. The benchmarking on a variety of VGG and ResNet networks was performed across technological candidates for CIM including SRAM, RRAM and 3D NAND. Compared to similar designs with SRAM or RRAM, the result shows that 3D NAND based CIM design can achieve not only 17-24% chip size but also 1.9-2.7 times more competitive energy efficiency for 8-bit precision inference.

References

  1. N. P. Jouppi, C. Young, N. Patil, D. Patterson G. Agrawal, R. Bajwa, S. Bates, , "In-datacenter performance analysis of a tensor processing unit," ACM/IEEE International Symposium on Computer Architecture (ISCA), 2017, pp. 1-12. DOI:https://doi.org/10.1145/3079856.3080246.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Yu, “Neuro-inspired computing with emerging non-volatile memory,” Proceeding of the IEEE, vol. 106, no. 2, pp. 260-285, 2018, DOI:  10.1109/JPROC.2018.2790840.Google ScholarGoogle ScholarCross RefCross Ref
  3. M. Prezioso, F. Merrikh-Bayat, B. D. Hoskins, G. C. Adam, K. K. Likharev and D. B. Strukov, “Training and operation of an integrated neuromorphic network based on metal-oxide memristors,” Nature 521, pp. 61–64, May. 2015, DOI: 10.1038/nature14441.Google ScholarGoogle ScholarCross RefCross Ref
  4. J. Woo, K. Moon, J. Song, S. Lee, M. Kwak, J. Park, and H. Hwang, “Improved synaptic behavior under identical pulses using AlOx /HfO2 bilayer RRAM array for neuromorphic systems,” IEEE Electron Device Letters, VOL. 37, NO. 8, pp. 994-997, Aug. 2016, DOI: 10.1109/LED.2016.2582859.Google ScholarGoogle ScholarCross RefCross Ref
  5. F. Cai, J. M. Correll, S. H. Lee, Y. Lim, V. Bothra, Z. Zhang, M. P. Flynn, W.D. Lu M.A. Zidan, J. P. Strachan, and W. D. Lu, “A fully integrated reprogrammable memristor–CMOS system for efficient multiply–accumulate operations,” Nature Electronics 2 (7), 290-299, DOI: 10.1038/s41928-019-0270-x.Google ScholarGoogle ScholarCross RefCross Ref
  6. W. Wu, H. Wu, B. Gao, P. Yao, X. Zhang, X. Peng, S. Yu, H. Qian, “A methodology to improve linearity of analog RRAM for neuromorphic computing,” Symposium on VLSI Technology, June, 2018, art. No. 8510690, pp. 103-104, DOI: 10.1109/VLSIT.2018.8510690.Google ScholarGoogle ScholarCross RefCross Ref
  7. T. Gokmen, Y. Vlasov, “Acceleration of deep neural network training with resistive cross-point devices: design considerations,” Frontiers in Neuroscience, 10, 333, 2016, DOI: 10.3389/fnins.2016.00333.Google ScholarGoogle ScholarCross RefCross Ref
  8. P. Jain, U. Arslan, M. Sekhar, B. C. Lin, L. Wei, T. Sahu, J. Alzate-vinasco, A. Vangapaty, M. Meterelliyoz, N. Strutt, A. B. Chen, P. Hentges, P. A. Quintero, C. Connor, O. Golonzka, K. Fischer, F. Hamzaoglu, "A 3.6Mb 10.1Mb/mm2 Embedded Non-Volatile ReRAM Macro in 22nm FinFET Technology with Adaptive Forming/Set/Reset Schemes Yielding Down to 0.5V with Sensing Time of 5ns at 0.7V," IEEE International Solid- State Circuits Conference ( ISSCC), San Francisco, CA, USA, 2019, pp. 212-214, DOI: 10.1109/ISSCC.2019.8662393.Google ScholarGoogle Scholar
  9. S. Ambrogio, P. Narayanan, , H. Tsai, R. M. Shelby, I. Boybat, C. Nolfo, S. Sidler, M. Giordano, M. Bodini, N. C. P. Farinha, B. Killeen, C. Cheng, Y. Jaoudi and G. W. Burr, “Equivalent-accuracy accelerated neural-network training using analogue memory,”  Nature 558, pp. 60–67, 2018, DOI: 10.1038/s41586-018-0180-5.Google ScholarGoogle Scholar
  10. W. Kim, R. L. Bruce, T. Masuda, G. W. Fraczak, N. Gong, P. Adusumilli, S. Ambrogio, H. Tsai, J. Bruley, J. -P. Han, M. Longstreet, F. Carta, K. Suu and M. BrightSky, “Confined PCM-based analog synaptic devices offering low resistance-drift and 1000 programmable states for deep learning, ” Symposium on VLSI Technology, June, 2019, pp. T66-67, DOI: 10.23919/VLSIT.2019.8776551.Google ScholarGoogle ScholarCross RefCross Ref
  11. X. Guo, F. Merrikh-Bayat, M. Bavandpour, M. Klachko, M. R. Mahmoodi, M. Prezioso, K. K. Likharev, D. B. Strukov, "Fast, energy-efficient, robust, and reproducible mixed-signal neuromorphic classifier based on embedded NOR flash memory technology," IEEE International Electron Devices Meeting (IEDM),San Francisco, CA, 2017, pp. 6.5.1-6.5.4, DOI:  10.1109/IEDM.2017.8268341.Google ScholarGoogle Scholar
  12. P. Wang, F. Xu, B. Wang, B. Gao, H. Wu, H. Qian, S. Yu, “Three-dimensional NAND Flash for vector-matrix multiplication,” IEEE Trans. VLSI Systems, vol. 27, no. 4, pp. 988-991, 2019, DOI:  10.1109/TVLSI.2018.2882194.Google ScholarGoogle ScholarCross RefCross Ref
  13. H. -T. Lue, P. -K. Hsu, M. -L. Wei, T. -H. Yeh, P. -Y. Du, W. -C. Chen, K. -C. Wang and C. -Y. Lu, “Optimal design methods to transform 3D NAND Flash into a high-density, high-bandwidth and low-power nonvolatile computing in memory (nvCIM) accelerator for deep-learning neural networks (DNN),” IEEE International Electron Devices Meeting (IEDM),San Francisco, CA, 2019, pp. 38.1.1-38.1.4, DOI: 10.1109/IEDM19573.2019.8993652.Google ScholarGoogle ScholarCross RefCross Ref
  14. S. -T. Lee, H. Kim, J. -H. Bae, H. Yoo, N. Y. Choi, D. Kwon, S. Lim, B. -G. Park, J. -H. Lee, "High-Density and Highly-Reliable Binary Neural Networks Using NAND Flash Memory Cells as Synaptic Devices,"  IEEE International Electron Devices Meeting (IEDM), San Francisco, CA, USA, 2019, pp. 38.4.1-38.4.4, DOI: 10.1109/IEDM19573.2019.8993478Google ScholarGoogle ScholarCross RefCross Ref
  15. D. -H. Kim, H. Kim, S. Yun, Y. Song, J. Kim, S. Joe, K. Kang, J. Jang, H. Yoon, K. Lee, M. Kim, J. Kwon, J. Jo, S. Park, J. Park, J. Cho, S. Park, G. Kim, J. Bang, H. Kim, J. Park, D. Lee, S. Lee, H. Jang, H. Lee, D. Shin, J. Park, J. Kim, J. Kim, K. Jang, I. H. Park, S. H. Moon, M. Choi, P. Kwak, J. Park, Y. Choi, S. Kim, S. Lee, D. Kang, J. Lim, D. Byeon, K. Song, J. Choi, S. J. Hwang, J. Jeong, "A 1Tb 4b/cell NAND Flash Memory with tPROG=2ms, tR=110μs and 1.2Gb/s High-Speed IO Rate," IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 2020, pp. 218-220, DOI: 10.1109/ISSCC19947.2020.9063053Google ScholarGoogle ScholarCross RefCross Ref
  16. X. Peng, S. Huang, Y. Luo, X. Sun, S. Yu, “DNN+NeuroSim: An end-to-end benchmarking framework for compute-in-memory accelerators with versatile device technologies,” IEEE International Electron Devices Meeting (IEDM)2019, San Francisco, USA, DOI: 10.1109/IEDM19573.2019.8993491. Open-source code available at: https://github.com/neurosim/DNN_NeuroSim_V1.1Google ScholarGoogle Scholar
  17. K. Simonyan, A. Zisserman, “Very deep convolutional networks for large-scale image recognition”, International Conference on Learning Representations (ICLR), May. 2015, San Diego, USA.Google ScholarGoogle Scholar
  18. K. He, X. Zhang, S. Ren, J. Sun, “Deep Residual Learning for Image Recognition”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-778, Jun. 2016, Las Vegas, USA, DOI:  10.1109/CVPR.2016.90.Google ScholarGoogle ScholarCross RefCross Ref
  19. Techinsights, Samsung 32L 3D NAND teardown report.Google ScholarGoogle Scholar
  20. S. Yu, X. Sun, X. Peng, S. Huang, “Compute-in-memory with emerging nonvolatile-memories: challenges and prospects,” IEEE Custom Integrated Circuits Conference (CICC)2020, Boston, USA, DOI:  10.1109/CICC48029.2020.9075887Google ScholarGoogle ScholarCross RefCross Ref
  21. Q. Dong, M. E. Sinangil, B. Erbagci, D. Sun, W. Khwa, H. Liao, Y. Wang, J. Chang, “A 351TOPS/W and 372.4GOPS Compute-in-Memory SRAM Macro in 7nm FinFET CMOS for Machine-Learning Applications”, IEEE International Solid- State Circuits Conference ( ISSCC), San Francisco, CA, USA, 2020, pp. 242-244, DOI: 10.1109/ISSCC19947.2020.9062985Google ScholarGoogle ScholarCross RefCross Ref
  22. X. Si, Y. -N. Tu, W. -H. Huanq, J. -W. Su, P. -J. Lu, J. -H. Wang, T. -W. Liu, S. -Y. Wu, R. Liu, Y. -C. Chou, Z. Zhang, S. -H. Sie, W. -C. Wei, Y. -C. Lo, T. -H. Wen, T. -H. Hsu, Y. -K. Chen, W. Shih, C. -C. Lo, R. -S. Liu, C. -C. Hsieh, K. -T. Tang, N. -C. Lien, W. -C. Shih, Y. He, Q. Li, M. Chang, "A 28nm 64Kb 6T SRAM Computing-in-Memory Macro with 8b MAC Operation for AI Edge Chips," IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 2020, pp. 246-248, DOI: 10.1109/ISSCC19947.2020.9062995.Google ScholarGoogle ScholarCross RefCross Ref
  23. C. -X. Xue, T. -Y. Huang, J. -S. Liu, T. -W. Chang, H. -Y. Kao, J. -H. Wang, T. W. Liu, S. Y. Wei, S. P. Huang, W. -C. Wei, Y. -R. Chen, T. -H. Hsu, Y. -K. Chen, Y. -C. Lo, T. -H. Wen, C. -C. Lo, R. -S. Liu, C. -C. Hsieh, K. -T. Tang, M. F. Chang, "A 22nm 2Mb ReRAM Compute-in-Memory Macro with 121-28TOPS/W for Multibit MAC Computing for Tiny AI Edge Devices," IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 2020, pp. 244-246, DOI:  10.1109/ISSCC19947.2020.9063078.Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    MEMSYS '20: Proceedings of the International Symposium on Memory Systems
    September 2020
    362 pages
    ISBN:9781450388993
    DOI:10.1145/3422575

    Copyright © 2020 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 21 March 2021

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format