ABSTRACT
Binarized Neural Networks (BNN) has shown a capability of performing various classification tasks while taking advantage of computational simplicity and memory saving. The problem with BNN, however, is a low accuracy on large convolutional neural networks (CNN). Local Binary Convolutional Neural Network (LBCNN) compensates accuracy loss of BNN by using standard convolutional layer together with binary convolutional layer and can achieve as high accuracy as standard AlexNet CNN. For the first time we propose FPGA hardware design architecture of LBCNN and address its unique challenges. We present performance and resource usage predictor along with design space exploration framework. Our architecture on LBCNN AlexNet shows 76.6% higher performance in terms of GOPS, 2.6X and 2.7X higher performance density in terms of GOPS/Slice, and GOPS/DSP compared to previous FPGA implementation of standard AlexNet CNN.
- K. Hwang et al., "Fixed-point feedforward deep neural network design using weights," in 2014 IEEE Workshop on Signal Processing Systems (SiPS), Oct 2014, pp. 1--6.Google Scholar
- M. Courbariaux et al., "Binaryconnect: Training deep neural networks with binary weights during propagations," vol. 28, 11 2015.Google Scholar
- M. Courbariaux et al., "Binarynet: Training deep neural networks with weights and activations constrained to +1 or -1," CoRR, vol. abs/1602.02830, 2016. {Online}. Available: http://arxiv.org/abs/1602.02830Google Scholar
- M. Rastegari et al., "Xnor-net: Imagenet classification using binary convolutional neural networks," CoRR, vol. abs/1603.05279, 2016.Google Scholar
- M. Shimoda et al., "All binarized convolutional neural network and its implementation on an fpga," in 2017 International Conference on Field Programmable Technology (ICFPT), Dec 2017, pp. 291--294.Google Scholar
- K. Ando et al., "Brein memory: A single-chip binary/ternary reconfigurable in-memory deep neural network accelerator achieving 1.4 tops at 0.6 w," IEEE Journal of Solid-State Circuits, vol. 53, no. 4, pp. 983--994, April 2018.Google ScholarCross Ref
- H. Yonekawa et al., "On-chip memory based binarized convolutional deep neural network applying batch normalization free technique on an fpga," in IEEE IPDPSW, May 2017, pp. 98--105.Google Scholar
- S. Liang et al., "Fp-bnn: Binarized neural network on fpga," vol. 275, 10 2017.Google Scholar
- S. Zhou et al., "Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients," CoRR, vol. abs/1606.06160, 2016. {Online}. Available: http://arxiv.org/abs/1606.06160Google Scholar
- M. Courbariaux et al., "Binarynet: Training deep neural networks with weights and activations constrained to +1 or -1," CoRR, vol. abs/1602.02830, 2016. {Online}. Available: http://arxiv.org/abs/1602.02830Google Scholar
- R. Zhao et al., "Accelerating binarized convolutional neural networks with software-programmable fpgas," ser. FPGA '17. New York, NY, USA: ACM, 2017, pp. 15--24. Google ScholarDigital Library
- F. Juefei-Xu et al., "Local binary convolutional neural networks," in IEEE Computer Vision and Pattern Recognition (CVPR), July 2017.Google Scholar
- X. Lin et al., "Towards accurate binary convolutional neural network," CoRR, vol. abs/1711.11294, 2017. {Online}. Available: http://arxiv.org/abs/1711.11294 Google ScholarDigital Library
- F. Juefei-Xu et al., "Perturbative neural networks," in IEEE Computer Vision and Pattern Recognition (CVPR), June 2018.Google Scholar
- C. Zhang et al., "Optimizing fpga-based accelerator design for deep convolutional neural networks," ser. FPGA '15. New York, NY, USA: ACM, 2015, pp. 161--170. {Online}. Available Google ScholarDigital Library
- A. Rahman et al., "Efficient fpga acceleration of convolutional neural networks using logical-3d compute array," in 2016 Design, Automation Test in Europe Conference Exhibition (DATE), March 2016, pp. 1393--1398. Google ScholarDigital Library
- Z. Du et al., "Shidiannao: Shifting vision processing closer to the sensor," in 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA), June 2015, pp. 92--104. Google ScholarDigital Library
- L. Cavigelli et al., "Origami: A 803-gop/s/w convolutional network accelerator," IEEE Transactions on Circuits and Systems for Video Technology, vol. 27, no. 11, pp. 2461--2475, Nov 2017.Google ScholarCross Ref
- Y. H. Chen et al., "Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks," IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127--138, Jan 2017.Google ScholarCross Ref
- J. Qiu et al., "Going deeper with embedded fpga platform for convolutional neural network," ser. FPGA '16. New York, NY, USA: ACM, 2016, pp. 26--35. Google ScholarDigital Library
Recommendations
Going Deeper with Embedded FPGA Platform for Convolutional Neural Network
FPGA '16: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysIn recent years, convolutional neural network (CNN) based methods have achieved great success in a large number of applications and have been among the most powerful and widely used techniques in computer vision. However, CNN-based methods are com-...
A Fully Onchip Binarized Convolutional Neural Network FPGA Impelmentation with Accurate Inference
ISLPED '18: Proceedings of the International Symposium on Low Power Electronics and DesignDeep convolutional neural network has taken an important role in machine learning algorithm which has been widely used in computer vision tasks. However, its enormous model size and massive computation cost have became the main obstacle for deployment ...
Efficient binary 3D convolutional neural network and hardware accelerator
AbstractThe three-dimensional convolutional neural networks have abundant parameters and computational costs. It is urgent to compress the three-dimensional convolutional neural network. In this paper, an efficient and simple binary three-dimensional ...
Comments