research-article

Optimizing stochastic computing for low latency inference of convolutional neural networks

Authors:
Zhiyuan Chen

Nanjing University, Nanjing, China

Nanjing University, Nanjing, China
View Profile

,
Yufei Ma

Nanjing University, Nanjing, China

Nanjing University, Nanjing, China
View Profile

,
Zhongfeng Wang

Nanjing University, Nanjing, China

Nanjing University, Nanjing, China
View Profile

ICCAD '20: Proceedings of the 39th International Conference on Computer-Aided DesignNovember 2020Article No.: 90Pages 1–7https://doi.org/10.1145/3400302.3415697

Published:17 December 2020Publication History

ICCAD '20: Proceedings of the 39th International Conference on Computer-Aided Design

Pages 1–7

ABSTRACT

The appealing property of low area, low power, flexible precision, and high bit error tolerance has made Stochastic Computing (SC) a promising alternative to conventional binary arithmetic for many computation intensive tasks, e.g., convolutional neural networks (CNNs). However, to relieve the intrinsic fluctuation noise in SC, long bit stream is normally required in SC-based CNN accelerators to achieve satisfactory accuracy, which leads to extortionate latency. Although the bit parallel structure of a SC multiplier has been proposed to reduce latency, the resulting extra overhead still considerably degrade the overall efficiency of SC. In this paper, we optimize both the micro-architecture of SC multiply-and-accumulate (MAC) unit and the overall acceleration scheme of CNN accelerator to favor SC. An optimized and scalable SC-MAC unit, which fully utilizes the property of low-discrepancy bit stream, is proposed with adjustable parameters to reduce the latency with minor area increase. For the overall accelerator, the parallel dimensions of SC-based MAC array are extended to reuse hardware resources and improve throughput, since the judiciously chosen loop unrolling strategy can better benefit SC operations. The proposed CNN accelerator with extended SC-MAC array is synthesized and demonstrated using TSMC 28nm CMOS on several representative CNNs, which gains 2× performance speedup, 2.8× energy savings and 15% area reduction compared to state-of-the-art SC based CNN accelerator.

References

A. Krizhevsky et al., "ImageNet classification with deep convolutional neural networks," in Neural Information Processing Systems (NIPS), 2012.Google Scholar
Bochkovskiy, Alexey et al., "YOLOv4: Optimal Speed and Accuracy of Object Detection," in ArXiv abs/2004.10934 (2020).Google Scholar
R. Collobert et al., "A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning," in ACM Int. Conf. on Machine Learning (ICML), 2008.Google Scholar
H. Song et al., "Learning both weights and connections for efficient neural networks, " Neural Information Processing Systems (NIPS), 2015.Google Scholar
Y. Ma et al., "Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 26, no. 7, July 2018.Google Scholar
Y. Ma et al., "Performance Modeling for CNN Inference Accelerators on FPGA," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 4, April 2020.Google Scholar
A. Alaghi et al., "Survey of stochastic computing," in ACM Transactions in Embedded Computing Systems, vol. 12, 2013.Google Scholar
A. Alaghi et al., "Fast and accurate computation using stochastic circuits," IEEE Design, Automation & Test in Europe Conference & Exhibition (DATE), 2014.Google Scholar
S. Liu et al., "Energy efficient stochastic computing with Sobol sequences," IEEE Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.Google Scholar
D. Jenson et al., "A deterministic approach to stochastic computation," IEEE/ACM Int. Conf. on Computer-Aided Design (ICCAD), 2016.Google Scholar
H. Sim et al., "A new stochastic computing multiplier with application to deep convolutional neural networks," ACM/IEEE DAC, 2017.Google Scholar
H. Sim et al., "DPS: Dynamic Precision Scaling for Stochastic Computing-based Deep Neural Networks," in ACM/IEEE DAC, 2018.Google Scholar
H. Sim et al., "Log-quantized Stochastic Computing for Memory and Computation Efficient DNNs," in ASPDAC, 2019.Google Scholar
S. Lee et al., "Successive log Quantization for Cost-Efficient Neural Networks Using Stochastic Computing," in ACM/IEEE DAC, 2019.Google Scholar
R. Hojabr et al., "SkippyNN: An Embedded Stochastic-Computing Accelerator for Convolutional Neural Networks," ACM/IEEE Design Automation Conference (DAC), Las Vegas, NV, USA, 2019.Google Scholar
A. Ardakani et al., "VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 25, no. 10, Oct. 2017.Google Scholar
K. Kim et al., "Dynamic energy-accuracy trade-off using stochastic computing in deep neural networks," ACM/IEEE DAC, 2016.Google Scholar
R. Ao, et al. "SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing," International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2017.Google Scholar
J. Deng et al., "ImageNet: A large-scale hierarchical image database," IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Miami, FL, 2009.Google Scholar
K. Simonyan et al., "Very deep convolutional networks for large-scale image recognition," Int. Conference on Learning Representations (ICLR), 2015.Google Scholar
Krishnamoorthi et al., "Quantizing deep convolutional networks for efficient inference: A whitepaper," arXiv: Learning, 2018.Google Scholar
S. Han et al., "EIE: Efficient Inference Engine on Compressed Deep Neural Network," ACM/IEEE International Sympisum on Computer Architecture (ISCA), Seoul, 2016.Google ScholarDigital Library
N. Suda et al., "Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks," ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), Feb. 2016.Google Scholar
M. H. Najafi et al., "Low-Cost Sorting Network Circuits Using Unary Processing," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 26, no. 8, Aug. 2018Google Scholar

Recommendations

A New Stochastic Computing Multiplier with Application to Deep Convolutional Neural Networks
DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017

Stochastic computing (SC) allows for extremely low cost and low power implementations of common arithmetic operations. However inherent random fluctuation error and long latency of SC lead to the degradation of accuracy and energy efficiency when ...
Read More
Optimizing Convolutional Neural Networks on the Sunway TaihuLight Supercomputer

The Sunway TaihuLight supercomputer is powered by SW26010, a new 260-core processor designed with on-chip fusion of heterogeneous cores. In this article, we present our work on optimizing the training process of convolutional neural networks (CNNs) on ...
Read More
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
FPGA '15: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Convolutional neural network (CNN) has been widely employed for image recognition because it can achieve high accuracy by emulating behavior of optic nerves in living creatures. Recently, rapid growth of modern applications based on deep learning ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICCAD '20: Proceedings of the 39th International Conference on Computer-Aided Design
November 2020
1396 pages
ISBN:9781450380263
DOI:10.1145/3400302
General Chair:
Yuan Xie
Univ. of California, Santa Barbara, CA
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 December 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate457of1,762submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 168
  Total Downloads
- Downloads (Last 12 months)20
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Optimizing stochastic computing for low latency inference of convolutional neural networks

ICCAD '20: Proceedings of the 39th International Conference on Computer-Aided Design

ABSTRACT

References

Cited By

Recommendations

A New Stochastic Computing Multiplier with Application to Deep Convolutional Neural Networks

Optimizing Convolutional Neural Networks on the Sunway TaihuLight Supercomputer

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Optimizing stochastic computing for low latency inference of convolutional neural networks

ICCAD '20: Proceedings of the 39th International Conference on Computer-Aided Design

ABSTRACT

References

Cited By

Recommendations

A New Stochastic Computing Multiplier with Application to Deep Convolutional Neural Networks

Optimizing Convolutional Neural Networks on the Sunway TaihuLight Supercomputer

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media