poster

A Deep Learning Inference Accelerator Based on Model Compression on FPGA

Authors:
Lu Jing

Inspur Corporation, Shandong, China

Inspur Corporation, Shandong, China
View Profile

,
Jun Liu

Inspur Corporation, Shandong, China

Inspur Corporation, Shandong, China
View Profile

,
FuHai Yu

Inspur Corporation, Shandong, China

Inspur Corporation, Shandong, China
View Profile

FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysFebruary 2019Pages 118https://doi.org/10.1145/3289602.3293938

Published:20 February 2019Publication History

FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Pages 118

ABSTRACT

Convolutional neural networks (CNN) have demonstrated state-of-the-art accuracy in image classification and object detection owing to the increase in data and computation capacity of hardware. However, this state-of-the-art achievement depends heavily on the DSP floating-point computing capability of the device, which increases the power dissipation and cost of the device. In order to solve the problem, we made the first attempt to implement a CNN computing accelerator based on shift operation on FPGA. In this accelerator, an efficient Incremental Network Quantization (INQ) method was applied to compress the CNN model from full precision to 4-bit integer, which represents values of either zero or power of two. Then the multiply and accumulate (MAC) operations for convolution layer and fully-connected layer was converted to shift and accumulation (SAC) operations, and SAC could be easily implemented by the logic elements of FPGA. Consequently, parallelism of CNN inference process can be further expanded. For the SqueezeNet model, single image processing latency was 0.673ms on Intel Arria 10 FPGA (Inspur F10A board) showing a slightly better result than on NVIDIA Tesla P4, and the compute capacity of FPGA increased by 1.77 times at least.

References

Intel Arria. 2017. Device Overview.Google Scholar
Utku Aydonat, Shane O'Connell, Davor Capalija, Andrew C Ling, and Gordon R Chiu. 2017. An OpenCL? deep learning accelerator on arria 10. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 55--64. Google ScholarDigital Library
Yoonho Boo and Wonyong Sung. 2017. Structured sparse ternary weight coding of deep neural networks for efficient hardware implementations. In Signal Processing Systems (SiPS), 2017 IEEE International Workshop on. IEEE, 1--6.Google ScholarCross Ref
L-W Chan and Frank Fallside. 1987. An adaptive training algorithm for back propagation networks. Computer speech & language 2, 3--4 (1987), 205--218.Google Scholar
Convolutional neural network 2018. Convolutional neural network. https: //en.wikipedia.org/wiki/Convolutional_neural_network.Google Scholar
Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in neural information processing systems. 3123--3131. Google ScholarDigital Library
Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or -1. arXiv preprint arXiv:1602.02830 (2016).Google Scholar
Mike Foedisch and Aya Takeuchi. 2004. Adaptive real-time road detection using neural networks. In Intelligent Transportation Systems, 2004. Proceedings. The 7th International IEEE Conference on. IEEE, 167--172.Google ScholarCross Ref
Sameh Galal and Mark Horowitz. 2011. Energy-efficient floating-point unit design. IEEE Transactions on computers 60, 7 (2011), 913--922. Google ScholarDigital Library
Yiwen Guo, Anbang Yao, and Yurong Chen. 2016. Dynamic network surgery for efficient dnns. In Advances In Neural Information Processing Systems. 1379--1387. Google ScholarDigital Library
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: efcient inference engine on compressed deep neural network. In Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on. IEEE, 243--254. Google ScholarDigital Library
Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).Google Scholar
Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. In Advances in neural information processing systems. 1135--1143. Google ScholarDigital Library
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Identity Mappings in Deep Residual Networks. In European Conference on Computer Vision. 630--645.Google Scholar
Mark Horowitz. {n. d.}. Energy table for 45nm process.Google Scholar
Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. 2016. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <b model size. arXiv preprint arXiv:1602.07360 (2016).Google Scholar
FPGA Intel. 2017. SDK for OpenCL Programming Guide. UG-OCL002 8 (2017).Google Scholar
Cheng-Bin Jin, Shengzhe Li, Trung Dung Do, and Hakil Kim. 2015. Real-time human action recognition using CNN over temporal images for static video surveillance cameras. In Pacifc Rim Conference on Multimedia. Springer, 330--339. Google ScholarDigital Library
Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, Raquel Urtasun, and Andreas Moshovos. 2015. Reduced-precision strategies for bounded memory in deep neural nets. arXiv preprint arXiv:1511.05236 (2015).Google Scholar
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature 521, 7553 (2015), 436.Google Scholar
Fengfu Li and Bin Liu. 2016. Ternary Weight Networks. CoRR abs/1605.04711 (2016). arXiv:1605.04711 http://arxiv.org/abs/1605.04711Google Scholar
P. tukjunger M. Bevá. 2005. Fixed-point arithmetic in FPGA. Acta Polytechnica 45, 2 (2005), 389--393.Google Scholar
Armaan Hasan Nagpurwala, C Sundaresan, and CVS Chaitanya. 2013. Implementation of HDLC controller design using Verilog HDL. In Electrical, Electronics and System Engineering (ICEESE), 2013 International Conference on. IEEE, 7--10.Google ScholarCross Ref
Eriko Nurvitadhi, Jaewoong Sim, David Shefeld, Asit Mishra, Srivatsan Krishnan, and Debbie Marr. 2016. Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC. In Field Programmable Logic and Applications (FPL), 2016 26th International Conference on. IEEE, 1--4.Google ScholarCross Ref
G Alonzo Vera, Marios Pattichis, and James Lyke. 2011. A dynamic dual fixedpoint arithmetic architecture for FPGAs. International Journal of Reconfigurable Computing 2011 (2011).Google Scholar
Shmuel Winograd. 1980. Arithmetic complexity of computations. Vol. 33. Siam.Google Scholar
Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 161--170. Google ScholarDigital Library
Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen. 2017. Incremental network quantization: Towards lossless cnns with low-precision weights. arXiv preprint arXiv:1702.03044 (2017).Google Scholar

Index Terms

A Deep Learning Inference Accelerator Based on Model Compression on FPGA
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs

Recommendations

An FPGA-based Fine Tuning Accelerator for a Sparse CNN
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Fine-tuning learns abundant feature expression for a wide range of natural images by using a pre-trained CNN model. It can be applied to a wide range of the neural network (NN)based computer vision problems. This paper proposes an FPGA-based fine-tuning ...
Read More
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
FPGA '15: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Convolutional neural network (CNN) has been widely employed for image recognition because it can achieve high accuracy by emulating behavior of optic nerves in living creatures. Recently, rapid growth of modern applications based on deep learning ...
Read More
Implementation of a CNN accelerator on an Embedded SoC Platform using SDSoC
ICDSP '18: Proceedings of the 2nd International Conference on Digital Signal Processing

Today, Convolution Neural Networks (CNN) is adopted by various application areas such as computer vision, speech recognition, and natural language processing. Due to a massive amount of computing for CNN, CNN running on an embedded platform may not meet ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
February 2019
360 pages
ISBN:9781450361378
DOI:10.1145/3289602
General Chair:
Kia Bazargan
Univ. of Minnesota, USA
,
Program Chair:
Stephen Neuendorffer
Xilinx, USA
Copyright © 2019 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 February 2019
Check for updates
Author Tags
cnn
energy efciency
fpga
model compression
quantization
shift and accumulation
Qualifiers
- poster
Conference

Acceptance Rates
Overall Acceptance Rate125of627submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 0
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

A Deep Learning Inference Accelerator Based on Model Compression on FPGA

FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

ABSTRACT

References

Cited By

Index Terms

Recommendations

An FPGA-based Fine Tuning Accelerator for a Sparse CNN

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks

Implementation of a CNN accelerator on an Embedded SoC Platform using SDSoC

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

Digital Edition

Caption

A Deep Learning Inference Accelerator Based on Model Compression on FPGA

FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

ABSTRACT

References

Cited By

Index Terms

Recommendations

An FPGA-based Fine Tuning Accelerator for a Sparse CNN

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks

Implementation of a CNN accelerator on an Embedded SoC Platform using SDSoC

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

Digital Edition

Share this Publication link

Share on Social Media