skip to main content
10.1145/3037697.3037746acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing

Published: 04 April 2017 Publication History

Abstract

With the recent advance of wearable devices and Internet of Things (IoTs), it becomes attractive to implement the Deep Convolutional Neural Networks (DCNNs) in embedded and portable systems. Currently, executing the software-based DCNNs requires high-performance servers, restricting the widespread deployment on embedded and mobile IoT devices. To overcome this obstacle, considerable research efforts have been made to develop highly-parallel and specialized DCNN accelerators using GPGPUs, FPGAs or ASICs.
Stochastic Computing (SC), which uses a bit-stream to represent a number within [-1, 1] by counting the number of ones in the bit-stream, has high potential for implementing DCNNs with high scalability and ultra-low hardware footprint. Since multiplications and additions can be calculated using AND gates and multiplexers in SC, significant reductions in power (energy) and hardware footprint can be achieved compared to the conventional binary arithmetic implementations. The tremendous savings in power (energy) and hardware resources allow immense design space for enhancing scalability and robustness for hardware DCNNs.
This paper presents SC-DCNN, the first comprehensive design and optimization framework of SC-based DCNNs, using a bottom-up approach. We first present the designs of function blocks that perform the basic operations in DCNN, including inner product, pooling, and activation function. Then we propose four designs of feature extraction blocks, which are in charge of extracting features from input feature maps, by connecting different basic function blocks with joint optimization. Moreover, the efficient weight storage methods are proposed to reduce the area and power (energy) consumption. Putting all together, with feature extraction blocks carefully selected, SC-DCNN is holistically optimized to minimize area and power (energy) consumption while maintaining high network accuracy. Experimental results demonstrate that the LeNet5 implemented in SC-DCNN consumes only 17 mm2 area and 1.53 W power, achieves throughput of 781250 images/s, area efficiency of 45946 images/s/mm2, and energy efficiency of 510734 images/J.

References

[1]
Stanford cs class, cs231n: Convolutional neural networks for visual recognition, 2016. URL http://cs231n.github. io/convolutional-networks/.
[2]
Convolutional neural networks (lenet), 2016. URL http://deeplearning.net/tutorial/lenet.html# motivation.
[3]
Nangate 45nm Open Library, Nangate Inc., 2009. URL http: //www.nangate.com/ .
[4]
F. Akopyan, J. Sawada, A. Cassidy, R. Alvarez-Icaza, J. Arthur, P. Merolla, N. Imam, Y. Nakamura, P. Datta, G.-J. Nam, B. Taba, M. Beakes, B. Brezzo, J. B. Kuang, R. Manohar, W. P. Risk, B. Jackson, and D. S. Modha. Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 34(10):1537--1557, 2015.
[5]
R. Andri, L. Cavigelli, D. Rossi, and L. Benini. Yodann: An ultra-low power convolutional neural network accelerator based on binary weights. arXiv preprint arXiv:1606.05487, 2016.
[6]
B. D. Brown and H. C. Card. Stochastic neural computation. i. computational elements. IEEE Transactions on computers, 50(9):891--905, 2001.
[7]
Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, and O. Temam. Dadiannao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, pages 609--622. IEEE Computer Society, 2014.
[8]
R. Collobert and J. Weston. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning, pages 160--167. ACM, 2008.
[9]
L. Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141--142, 2012.
[10]
L. Deng and D. Yu. Deep learning. Signal Processing, 7:3--4, 2014.
[11]
S. K. Esser, R. Appuswamy, P. Merolla, J. V. Arthur, and D. S. Modha. Backpropagation for energy-efficient neuromorphic computing. In Advances in Neural Information Processing Systems, pages 1117--1125, 2015.
[12]
S. K. Esser, P. A. Merolla, J. V. Arthur, A. S. Cassidy, R. Appuswamy, A. Andreopoulos, D. J. Berg, J. L. McKinstry, T. Melano, D. R. Barch, C. di Nolfo, P. Datta, A. Amir, B. Taba, M. D. Flickner, and D. S. Modha. Convolutional networks for fast, energy-efficient neuromorphic computing. CoRR, abs/1603.08270, 2016. URL http://arxiv.org/ abs/1603.08270 .
[13]
B. R. Gaines. Stochastic computing. In Proceedings of the April 18--20, 1967, spring joint computer conference, pages 149--156. ACM, 1967.
[14]
S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally. Eie: efficient inference engine on compressed deep neural network. arXiv preprint arXiv:1602.01528, 2016.
[15]
M. Hu, H. Li, Y. Chen, Q. Wu, G. S. Rose, and R. W. Linderman. Memristor crossbar-based neuromorphic computing system: A case study. IEEE transactions on neural networks and learning systems, 25(10):1864--1878, 2014.
[16]
Y. Ji, F. Ran, C. Ma, and D. J. Lilja. A hardware implementation of a radial basis function neural network using stochastic logic. In Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, pages 880--883. EDA Consortium, 2015.
[17]
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia, pages 675--678. ACM, 2014.
[18]
P. Judd, J. Albericio, T. Hetherington, T. Aamodt, N. E. Jerger, R. Urtasun, and A. Moshovos. Reduced-precision strategies for bounded memory in deep neural nets. arXiv preprint arXiv:1511.05236, 2015.
[19]
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 1725--1732, 2014.
[20]
K. Kim, J. Lee, and K. Choi. Approximate de-randomizer for stochastic circuits. Proc. ISOCC, 2015.
[21]
K. Kim, J. Kim, J. Yu, J. Seo, J. Lee, and K. Choi. Dynamic energy-accuracy trade-off using stochastic computing in deep neural networks. In Proceedings of the 53rd Annual Design Automation Conference, page 124. ACM, 2016.
[22]
K. Kim, J. Lee, and K. Choi. An energy-efficient random number generator for stochastic circuits. In 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), pages 256--261. IEEE, 2016.
[23]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097--1105, 2012.
[24]
D. Larkin, A. Kinane, V. Muresan, and N. OConnor. An efficient hardware architecture for a neural network activation function generator. In International Symposium on Neural Networks, pages 1319--1327. Springer, 2006.
[25]
E. László, P. Szolgay, and Z. Nagy. Analysis of a gpu based cnn implementation. In 2012 13th International Workshop on Cellular Nanoscale Networks and their Applications, pages 1--5. IEEE, 2012.
[26]
Y. LeCun. Lenet-5, convolutional neural networks. URL: http://yann. lecun. com/exdb/lenet, 2015.
[27]
Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521(7553):436--444, 2015.
[28]
J. Li, A. Ren, Z. Li, C. Ding, B. Yuan, Q. Qiu, and Y. Wang. Towards acceleration of deep convolutional neural networks using stochastic computing. In The 22nd Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 2017.
[29]
Z. Li, A. Ren, J. Li, Q. Qiu, Y. Wang, and B. Yuan. Dscnn: Hardware-oriented optimization for stochastic computing based deep convolutional neural networks. In Computer Design (ICCD), 2016 IEEE 34th International Conference on, pages 678--681. IEEE, 2016.
[30]
Z. Li, A. Ren, J. Li, Q. Qiu, B. Yuan, J. Draper, and Y. Wang. Structural design optimization for deep convolutional neural networks using stochastic computing. 2017.
[31]
M. Motamedi, P. Gysel, V. Akella, and S. Ghiasi. Design space exploration of fpga-based deep convolutional neural networks. In 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), pages 575--580. IEEE, 2016.
[32]
D. Neil and S.-C. Liu. Minitaur, an event-driven fpga-based spiking network accelerator. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 22(12):2621--2628, 2014.
[33]
B. Parhami and C.-H. Yeh. Accumulative parallel counters. In Signals, Systems and Computers, 1995. 1995 Conference Record of the Twenty-Ninth Asilomar Conference on, volume 2, pages 966--970. IEEE, 1995.
[34]
A. Ren, Z. Li, Y. Wang, Q. Qiu, and B. Yuan. Designing reconfigurable large-scale deep learning systems using stochastic computing. In 2016 IEEE International Conference on Rebooting Computing . IEEE, 2016.
[35]
T. N. Sainath, A.-r. Mohamed, B. Kingsbury, and B. Ramabhadran. Deep convolutional neural networks for lvcsr. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 8614--8618. IEEE, 2013.
[36]
S. Sato, K. Nemoto, S. Akimoto, M. Kinjo, and K. Nakajima. Implementation of a new neurochip using stochastic logic. IEEE Transactions on Neural Networks, 14(5):1122--1127, 2003.
[37]
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[38]
G. V. STOICA, R. DOGARU, and C. Stoica. High performance cuda based cnn image processor, 2015.
[39]
E. Stromatias, D. Neil, F. Galluppi, M. Pfeiffer, S.-C. Liu, and S. Furber. Scalable energy-efficient, low-latency implementations of trained spiking deep belief networks on spinnaker. In 2015 International Joint Conference on Neural Networks (IJCNN), pages 1--8. IEEE, 2015.
[40]
D. B. Strukov, G. S. Snider, D. R. Stewart, and R. S. Williams. The missing memristor found. nature, 453(7191):80--83, 2008.
[41]
M. Tanomoto, S. Takamaeda-Yamazaki, J. Yao, and Y. Nakashima. A cgra-based approach for accelerating convolutional neural networks. In Embedded Multicore/Many-core Systems-on-Chip (MCSoC), 2015 IEEE 9th International Symposium on, pages 73--80. IEEE, 2015.
[42]
S. Thoziyoor, N. Muralimanohar, J. Ahn, and N. Jouppi. Cacti 5.3. HP Laboratories, Palo Alto, CA, 2008.
[43]
S. Toral, J. Quero, and L. Franquelo. Stochastic pulse coded arithmetic. In Circuits and Systems, 2000. Proceedings. ISCAS 2000 Geneva. The 2000 IEEE International Symposium on, volume 1, pages 599--602. IEEE, 2000.
[44]
L. Xia, B. Li, T. Tang, P. Gu, X. Yin, W. Huangfu, P.-Y. Chen, S. Yu, Y. Cao, Y. Wang, Y. Xie, and H. Yang. Mnsim: Simulation platform for memristor-based neuromorphic computing system. In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 469--474. IEEE, 2016.
[45]
B. Yuan, C. Zhang, and Z. Wang. Design space exploration for hardware-efficient stochastic computing: A case study on discrete cosine transformation. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6555--6559. IEEE, 2016.
[46]
C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong. Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pages 161--170. ACM, 2015.

Cited By

View all
  • (2025)Toward Universal Multiplexer Multiply-Accumulate Architecture in Stochastic ComputingIEEE Access10.1109/ACCESS.2025.353998613(33874-33882)Online publication date: 2025
  • (2024)FUNSC: A GUI Software for Stochastic Computing2024 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA)10.1109/HORA61326.2024.10550497(1-6)Online publication date: 23-May-2024
  • (2023)Accurate yet Efficient Stochastic Computing Neural Acceleration with High Precision Residual Fusion2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10136942(1-6)Online publication date: Apr-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems
April 2017
856 pages
ISBN:9781450344654
DOI:10.1145/3037697
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 April 2017

Permissions

Request permissions for this article.

Check for updates

Author Tag

  1. neural network

Qualifiers

  • Research-article

Funding Sources

  • the seedling fund of DARPA SAGA program

Conference

ASPLOS '17

Acceptance Rates

ASPLOS '17 Paper Acceptance Rate 53 of 320 submissions, 17%;
Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)186
  • Downloads (Last 6 weeks)35
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Toward Universal Multiplexer Multiply-Accumulate Architecture in Stochastic ComputingIEEE Access10.1109/ACCESS.2025.353998613(33874-33882)Online publication date: 2025
  • (2024)FUNSC: A GUI Software for Stochastic Computing2024 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA)10.1109/HORA61326.2024.10550497(1-6)Online publication date: 23-May-2024
  • (2023)Accurate yet Efficient Stochastic Computing Neural Acceleration with High Precision Residual Fusion2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10136942(1-6)Online publication date: Apr-2023
  • (2023)Cambricon-U: A Systolic Random Increment Memory Architecture for Unary ComputingProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614286(424-437)Online publication date: 28-Oct-2023
  • (2023)A Noise-Driven Heterogeneous Stochastic Computing Multiplier for Heuristic Precision Improvement in Energy-Efficient DNNsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317805342:2(630-643)Online publication date: Feb-2023
  • (2023)Hardware–Software Co-Optimization of Long-Latency Stochastic ComputingIEEE Embedded Systems Letters10.1109/LES.2023.329873415:4(190-193)Online publication date: Dec-2023
  • (2023)SCONNA: A Stochastic Computing Based Optical Accelerator for Ultra-Fast, Energy-Efficient Inference of Integer-Quantized CNNs2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00061(546-556)Online publication date: May-2023
  • (2023)Stochastic Computing-based on-chip Training Circuitry for Reservoir Computing Systems2023 38th Conference on Design of Circuits and Integrated Systems (DCIS)10.1109/DCIS58620.2023.10336006(1-6)Online publication date: 15-Nov-2023
  • (2023)Efficient Non-Linear Adder for Stochastic Computing with Approximate Spatial-Temporal Sorting Network2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247826(1-6)Online publication date: 9-Jul-2023
  • (2023)Low-Latency, Energy-Efficient In-DRAM CNN Acceleration with Bit-Parallel Unary ComputingEmbedded Machine Learning for Cyber-Physical, IoT, and Edge Computing10.1007/978-3-031-19568-6_14(393-409)Online publication date: 1-Oct-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media