research-article

SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing

Authors:

Bo YuanAuthors Info & Claims

ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems

Pages 405 - 418

https://doi.org/10.1145/3037697.3037746

Published: 04 April 2017 Publication History

Abstract

With the recent advance of wearable devices and Internet of Things (IoTs), it becomes attractive to implement the Deep Convolutional Neural Networks (DCNNs) in embedded and portable systems. Currently, executing the software-based DCNNs requires high-performance servers, restricting the widespread deployment on embedded and mobile IoT devices. To overcome this obstacle, considerable research efforts have been made to develop highly-parallel and specialized DCNN accelerators using GPGPUs, FPGAs or ASICs.

Stochastic Computing (SC), which uses a bit-stream to represent a number within [-1, 1] by counting the number of ones in the bit-stream, has high potential for implementing DCNNs with high scalability and ultra-low hardware footprint. Since multiplications and additions can be calculated using AND gates and multiplexers in SC, significant reductions in power (energy) and hardware footprint can be achieved compared to the conventional binary arithmetic implementations. The tremendous savings in power (energy) and hardware resources allow immense design space for enhancing scalability and robustness for hardware DCNNs.

This paper presents SC-DCNN, the first comprehensive design and optimization framework of SC-based DCNNs, using a bottom-up approach. We first present the designs of function blocks that perform the basic operations in DCNN, including inner product, pooling, and activation function. Then we propose four designs of feature extraction blocks, which are in charge of extracting features from input feature maps, by connecting different basic function blocks with joint optimization. Moreover, the efficient weight storage methods are proposed to reduce the area and power (energy) consumption. Putting all together, with feature extraction blocks carefully selected, SC-DCNN is holistically optimized to minimize area and power (energy) consumption while maintaining high network accuracy. Experimental results demonstrate that the LeNet5 implemented in SC-DCNN consumes only 17 mm² area and 1.53 W power, achieves throughput of 781250 images/s, area efficiency of 45946 images/s/mm², and energy efficiency of 510734 images/J.

References

[1]

Stanford cs class, cs231n: Convolutional neural networks for visual recognition, 2016. URL http://cs231n.github. io/convolutional-networks/.

[2]

Convolutional neural networks (lenet), 2016. URL http://deeplearning.net/tutorial/lenet.html# motivation.

[3]

Nangate 45nm Open Library, Nangate Inc., 2009. URL http: //www.nangate.com/ .

[4]

F. Akopyan, J. Sawada, A. Cassidy, R. Alvarez-Icaza, J. Arthur, P. Merolla, N. Imam, Y. Nakamura, P. Datta, G.-J. Nam, B. Taba, M. Beakes, B. Brezzo, J. B. Kuang, R. Manohar, W. P. Risk, B. Jackson, and D. S. Modha. Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 34(10):1537--1557, 2015.

Digital Library

[5]

R. Andri, L. Cavigelli, D. Rossi, and L. Benini. Yodann: An ultra-low power convolutional neural network accelerator based on binary weights. arXiv preprint arXiv:1606.05487, 2016.

[6]

B. D. Brown and H. C. Card. Stochastic neural computation. i. computational elements. IEEE Transactions on computers, 50(9):891--905, 2001.

Digital Library

[7]

Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, and O. Temam. Dadiannao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, pages 609--622. IEEE Computer Society, 2014.

Digital Library

[8]

R. Collobert and J. Weston. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning, pages 160--167. ACM, 2008.

Digital Library

[9]

L. Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141--142, 2012.

[10]

L. Deng and D. Yu. Deep learning. Signal Processing, 7:3--4, 2014.

[11]

S. K. Esser, R. Appuswamy, P. Merolla, J. V. Arthur, and D. S. Modha. Backpropagation for energy-efficient neuromorphic computing. In Advances in Neural Information Processing Systems, pages 1117--1125, 2015.

Digital Library

[12]

S. K. Esser, P. A. Merolla, J. V. Arthur, A. S. Cassidy, R. Appuswamy, A. Andreopoulos, D. J. Berg, J. L. McKinstry, T. Melano, D. R. Barch, C. di Nolfo, P. Datta, A. Amir, B. Taba, M. D. Flickner, and D. S. Modha. Convolutional networks for fast, energy-efficient neuromorphic computing. CoRR, abs/1603.08270, 2016. URL http://arxiv.org/ abs/1603.08270 .

[13]

B. R. Gaines. Stochastic computing. In Proceedings of the April 18--20, 1967, spring joint computer conference, pages 149--156. ACM, 1967.

Digital Library

[14]

S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally. Eie: efficient inference engine on compressed deep neural network. arXiv preprint arXiv:1602.01528, 2016.

[15]

M. Hu, H. Li, Y. Chen, Q. Wu, G. S. Rose, and R. W. Linderman. Memristor crossbar-based neuromorphic computing system: A case study. IEEE transactions on neural networks and learning systems, 25(10):1864--1878, 2014.

[16]

Y. Ji, F. Ran, C. Ma, and D. J. Lilja. A hardware implementation of a radial basis function neural network using stochastic logic. In Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, pages 880--883. EDA Consortium, 2015.

[17]

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia, pages 675--678. ACM, 2014.

Digital Library

[18]

P. Judd, J. Albericio, T. Hetherington, T. Aamodt, N. E. Jerger, R. Urtasun, and A. Moshovos. Reduced-precision strategies for bounded memory in deep neural nets. arXiv preprint arXiv:1511.05236, 2015.

[19]

A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 1725--1732, 2014.

Digital Library

[20]

K. Kim, J. Lee, and K. Choi. Approximate de-randomizer for stochastic circuits. Proc. ISOCC, 2015.

[21]

K. Kim, J. Kim, J. Yu, J. Seo, J. Lee, and K. Choi. Dynamic energy-accuracy trade-off using stochastic computing in deep neural networks. In Proceedings of the 53rd Annual Design Automation Conference, page 124. ACM, 2016.

Digital Library

[22]

K. Kim, J. Lee, and K. Choi. An energy-efficient random number generator for stochastic circuits. In 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), pages 256--261. IEEE, 2016.

[23]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097--1105, 2012.

Digital Library

[24]

D. Larkin, A. Kinane, V. Muresan, and N. OConnor. An efficient hardware architecture for a neural network activation function generator. In International Symposium on Neural Networks, pages 1319--1327. Springer, 2006.

Digital Library

[25]

E. László, P. Szolgay, and Z. Nagy. Analysis of a gpu based cnn implementation. In 2012 13th International Workshop on Cellular Nanoscale Networks and their Applications, pages 1--5. IEEE, 2012.

[26]

Y. LeCun. Lenet-5, convolutional neural networks. URL: http://yann. lecun. com/exdb/lenet, 2015.

[27]

Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521(7553):436--444, 2015.

[28]

J. Li, A. Ren, Z. Li, C. Ding, B. Yuan, Q. Qiu, and Y. Wang. Towards acceleration of deep convolutional neural networks using stochastic computing. In The 22nd Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 2017.

[29]

Z. Li, A. Ren, J. Li, Q. Qiu, Y. Wang, and B. Yuan. Dscnn: Hardware-oriented optimization for stochastic computing based deep convolutional neural networks. In Computer Design (ICCD), 2016 IEEE 34th International Conference on, pages 678--681. IEEE, 2016.

[30]

Z. Li, A. Ren, J. Li, Q. Qiu, B. Yuan, J. Draper, and Y. Wang. Structural design optimization for deep convolutional neural networks using stochastic computing. 2017.

[31]

M. Motamedi, P. Gysel, V. Akella, and S. Ghiasi. Design space exploration of fpga-based deep convolutional neural networks. In 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), pages 575--580. IEEE, 2016.

[32]

D. Neil and S.-C. Liu. Minitaur, an event-driven fpga-based spiking network accelerator. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 22(12):2621--2628, 2014.

[33]

B. Parhami and C.-H. Yeh. Accumulative parallel counters. In Signals, Systems and Computers, 1995. 1995 Conference Record of the Twenty-Ninth Asilomar Conference on, volume 2, pages 966--970. IEEE, 1995.

[34]

A. Ren, Z. Li, Y. Wang, Q. Qiu, and B. Yuan. Designing reconfigurable large-scale deep learning systems using stochastic computing. In 2016 IEEE International Conference on Rebooting Computing . IEEE, 2016.

[35]

T. N. Sainath, A.-r. Mohamed, B. Kingsbury, and B. Ramabhadran. Deep convolutional neural networks for lvcsr. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 8614--8618. IEEE, 2013.

[36]

S. Sato, K. Nemoto, S. Akimoto, M. Kinjo, and K. Nakajima. Implementation of a new neurochip using stochastic logic. IEEE Transactions on Neural Networks, 14(5):1122--1127, 2003.

Digital Library

[37]

K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

[38]

G. V. STOICA, R. DOGARU, and C. Stoica. High performance cuda based cnn image processor, 2015.

[39]

E. Stromatias, D. Neil, F. Galluppi, M. Pfeiffer, S.-C. Liu, and S. Furber. Scalable energy-efficient, low-latency implementations of trained spiking deep belief networks on spinnaker. In 2015 International Joint Conference on Neural Networks (IJCNN), pages 1--8. IEEE, 2015.

[40]

D. B. Strukov, G. S. Snider, D. R. Stewart, and R. S. Williams. The missing memristor found. nature, 453(7191):80--83, 2008.

[41]

M. Tanomoto, S. Takamaeda-Yamazaki, J. Yao, and Y. Nakashima. A cgra-based approach for accelerating convolutional neural networks. In Embedded Multicore/Many-core Systems-on-Chip (MCSoC), 2015 IEEE 9th International Symposium on, pages 73--80. IEEE, 2015.

Digital Library

[42]

S. Thoziyoor, N. Muralimanohar, J. Ahn, and N. Jouppi. Cacti 5.3. HP Laboratories, Palo Alto, CA, 2008.

[43]

S. Toral, J. Quero, and L. Franquelo. Stochastic pulse coded arithmetic. In Circuits and Systems, 2000. Proceedings. ISCAS 2000 Geneva. The 2000 IEEE International Symposium on, volume 1, pages 599--602. IEEE, 2000.

[44]

L. Xia, B. Li, T. Tang, P. Gu, X. Yin, W. Huangfu, P.-Y. Chen, S. Yu, Y. Cao, Y. Wang, Y. Xie, and H. Yang. Mnsim: Simulation platform for memristor-based neuromorphic computing system. In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 469--474. IEEE, 2016.

[45]

B. Yuan, C. Zhang, and Z. Wang. Design space exploration for hardware-efficient stochastic computing: A case study on discrete cosine transformation. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6555--6559. IEEE, 2016.

[46]

C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong. Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pages 161--170. ACM, 2015.

Digital Library

Cited By

Lee YAbdul Halim ZAb Wahab MAlmohamad T(2025)Toward Universal Multiplexer Multiply-Accumulate Architecture in Stochastic ComputingIEEE Access10.1109/ACCESS.2025.353998613(33874-33882)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2025.3539986
Huse JSalehi S(2024)FUNSC: A GUI Software for Stochastic Computing2024 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA)10.1109/HORA61326.2024.10550497(1-6)Online publication date: 23-May-2024
https://doi.org/10.1109/HORA61326.2024.10550497
Hu YZhang TWei RLi MWang RWang YHuang R(2023)Accurate yet Efficient Stochastic Computing Neural Acceleration with High Precision Residual Fusion2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10136942(1-6)Online publication date: Apr-2023
https://doi.org/10.23919/DATE56975.2023.10136942
Show More Cited By

Index Terms

Recommendations

SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing
ASPLOS '17

With the recent advance of wearable devices and Internet of Things (IoTs), it becomes attractive to implement the Deep Convolutional Neural Networks (DCNNs) in embedded and portable systems. Currently, executing the software-based DCNNs requires high-...
SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing
Asplos'17

With the recent advance of wearable devices and Internet of Things (IoTs), it becomes attractive to implement the Deep Convolutional Neural Networks (DCNNs) in embedded and portable systems. Currently, executing the software-based DCNNs requires high-...
Optimizing DCNN FPGA accelerator design for handwritten hangul character recognition: work-in-progress
CASES '17: Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion

Deep¹ Convolutional Neural Network (DCNN) is a break-through technology in image recognition. However, because of extreme computing resource requirements, DCNN need to be implemented by hardware accelerator. In this paper, we present an FPGA-based ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems

April 2017

856 pages

ISBN:9781450344654

DOI:10.1145/3037697

General Chairs:
Yunji Chen
Institute of Computing Technology, CAS, China
,
Olivier Temam
Google, USA
,
Program Chair:
John Carter
IBM, USA

ACM SIGPLAN Notices Volume 52, Issue 4
ASPLOS '17
April 2017
811 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3093336
Editor:
Matthew Fluet
Issue’s Table of Contents
ACM SIGARCH Computer Architecture News Volume 45, Issue 1
Asplos'17
March 2017
812 pages
ISSN:0163-5964
DOI:10.1145/3093337
Editor:
Babak Falsafi
Interim
Issue’s Table of Contents

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 April 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tag

neural network

Qualifiers

Research-article

Funding Sources

the seedling fund of DARPA SAGA program

Conference

ASPLOS '17

Sponsor:

ASPLOS '17: Architectural Support for Programming Languages and Operating Systems

April 8 - 12, 2017

Xi'an, China

Acceptance Rates

ASPLOS '17 Paper Acceptance Rate 53 of 320 submissions, 17%;

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

154
Total Citations
View Citations
2,664
Total Downloads

Downloads (Last 12 months)186
Downloads (Last 6 weeks)35

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lee YAbdul Halim ZAb Wahab MAlmohamad T(2025)Toward Universal Multiplexer Multiply-Accumulate Architecture in Stochastic ComputingIEEE Access10.1109/ACCESS.2025.353998613(33874-33882)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2025.3539986
Huse JSalehi S(2024)FUNSC: A GUI Software for Stochastic Computing2024 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA)10.1109/HORA61326.2024.10550497(1-6)Online publication date: 23-May-2024
https://doi.org/10.1109/HORA61326.2024.10550497
Hu YZhang TWei RLi MWang RWang YHuang R(2023)Accurate yet Efficient Stochastic Computing Neural Acceleration with High Precision Residual Fusion2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10136942(1-6)Online publication date: Apr-2023
https://doi.org/10.23919/DATE56975.2023.10136942
Guo HZhao YLi ZHao YLiu CSong XLi XDu ZZhang RGuo QChen TXu Z(2023)Cambricon-U: A Systolic Random Increment Memory Architecture for Unary ComputingProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614286(424-437)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614286
Wang JChen HWang DMei KZhang SFan X(2023)A Noise-Driven Heterogeneous Stochastic Computing Multiplier for Heuristic Precision Improvement in Energy-Efficient DNNsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317805342:2(630-643)Online publication date: Feb-2023
https://doi.org/10.1109/TCAD.2022.3178053
Aygun SKouhalvandi LNajafi MOzoguz SGunes E(2023)Hardware–Software Co-Optimization of Long-Latency Stochastic ComputingIEEE Embedded Systems Letters10.1109/LES.2023.329873415:4(190-193)Online publication date: Dec-2023
https://doi.org/10.1109/LES.2023.3298734
Vatsavai SKarempudi VThakkar ISalehi AHastings T(2023)SCONNA: A Stochastic Computing Based Optical Accelerator for Ultra-Fast, Energy-Efficient Inference of Integer-Quantized CNNs2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00061(546-556)Online publication date: May-2023
https://doi.org/10.1109/IPDPS54959.2023.00061
Galán FFont-Rosselló JRoca MRosselló J(2023)Stochastic Computing-based on-chip Training Circuitry for Reservoir Computing Systems2023 38th Conference on Design of Circuits and Integrated Systems (DCIS)10.1109/DCIS58620.2023.10336006(1-6)Online publication date: 15-Nov-2023
https://doi.org/10.1109/DCIS58620.2023.10336006
Hu YZhang TLi MWei RLai LWang YWang RHuang R(2023)Efficient Non-Linear Adder for Stochastic Computing with Approximate Spatial-Temporal Sorting Network2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247826(1-6)Online publication date: 9-Jul-2023
https://doi.org/10.1109/DAC56929.2023.10247826
Thakkar IShivanandamurthy SSalehi S(2023)Low-Latency, Energy-Efficient In-DRAM CNN Acceleration with Bit-Parallel Unary ComputingEmbedded Machine Learning for Cyber-Physical, IoT, and Edge Computing10.1007/978-3-031-19568-6_14(393-409)Online publication date: 1-Oct-2023
https://doi.org/10.1007/978-3-031-19568-6_14
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten