skip to main content
10.1145/3289602.3293938acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
poster

A Deep Learning Inference Accelerator Based on Model Compression on FPGA

Published:20 February 2019Publication History

ABSTRACT

Convolutional neural networks (CNN) have demonstrated state-of-the-art accuracy in image classification and object detection owing to the increase in data and computation capacity of hardware. However, this state-of-the-art achievement depends heavily on the DSP floating-point computing capability of the device, which increases the power dissipation and cost of the device. In order to solve the problem, we made the first attempt to implement a CNN computing accelerator based on shift operation on FPGA. In this accelerator, an efficient Incremental Network Quantization (INQ) method was applied to compress the CNN model from full precision to 4-bit integer, which represents values of either zero or power of two. Then the multiply and accumulate (MAC) operations for convolution layer and fully-connected layer was converted to shift and accumulation (SAC) operations, and SAC could be easily implemented by the logic elements of FPGA. Consequently, parallelism of CNN inference process can be further expanded. For the SqueezeNet model, single image processing latency was 0.673ms on Intel Arria 10 FPGA (Inspur F10A board) showing a slightly better result than on NVIDIA Tesla P4, and the compute capacity of FPGA increased by 1.77 times at least.

References

  1. Intel Arria. 2017. Device Overview.Google ScholarGoogle Scholar
  2. Utku Aydonat, Shane O'Connell, Davor Capalija, Andrew C Ling, and Gordon R Chiu. 2017. An OpenCL? deep learning accelerator on arria 10. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 55--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Yoonho Boo and Wonyong Sung. 2017. Structured sparse ternary weight coding of deep neural networks for efficient hardware implementations. In Signal Processing Systems (SiPS), 2017 IEEE International Workshop on. IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  4. L-W Chan and Frank Fallside. 1987. An adaptive training algorithm for back propagation networks. Computer speech & language 2, 3--4 (1987), 205--218.Google ScholarGoogle Scholar
  5. Convolutional neural network 2018. Convolutional neural network. https: //en.wikipedia.org/wiki/Convolutional_neural_network.Google ScholarGoogle Scholar
  6. Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in neural information processing systems. 3123--3131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or -1. arXiv preprint arXiv:1602.02830 (2016).Google ScholarGoogle Scholar
  8. Mike Foedisch and Aya Takeuchi. 2004. Adaptive real-time road detection using neural networks. In Intelligent Transportation Systems, 2004. Proceedings. The 7th International IEEE Conference on. IEEE, 167--172.Google ScholarGoogle ScholarCross RefCross Ref
  9. Sameh Galal and Mark Horowitz. 2011. Energy-efficient floating-point unit design. IEEE Transactions on computers 60, 7 (2011), 913--922. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yiwen Guo, Anbang Yao, and Yurong Chen. 2016. Dynamic network surgery for efficient dnns. In Advances In Neural Information Processing Systems. 1379--1387. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: efcient inference engine on compressed deep neural network. In Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on. IEEE, 243--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).Google ScholarGoogle Scholar
  13. Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. In Advances in neural information processing systems. 1135--1143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Identity Mappings in Deep Residual Networks. In European Conference on Computer Vision. 630--645.Google ScholarGoogle Scholar
  15. Mark Horowitz. {n. d.}. Energy table for 45nm process.Google ScholarGoogle Scholar
  16. Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. 2016. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <b model size. arXiv preprint arXiv:1602.07360 (2016).Google ScholarGoogle Scholar
  17. FPGA Intel. 2017. SDK for OpenCL Programming Guide. UG-OCL002 8 (2017).Google ScholarGoogle Scholar
  18. Cheng-Bin Jin, Shengzhe Li, Trung Dung Do, and Hakil Kim. 2015. Real-time human action recognition using CNN over temporal images for static video surveillance cameras. In Pacifc Rim Conference on Multimedia. Springer, 330--339. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, Raquel Urtasun, and Andreas Moshovos. 2015. Reduced-precision strategies for bounded memory in deep neural nets. arXiv preprint arXiv:1511.05236 (2015).Google ScholarGoogle Scholar
  20. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature 521, 7553 (2015), 436.Google ScholarGoogle Scholar
  21. Fengfu Li and Bin Liu. 2016. Ternary Weight Networks. CoRR abs/1605.04711 (2016). arXiv:1605.04711 http://arxiv.org/abs/1605.04711Google ScholarGoogle Scholar
  22. P. tukjunger M. Bevá. 2005. Fixed-point arithmetic in FPGA. Acta Polytechnica 45, 2 (2005), 389--393.Google ScholarGoogle Scholar
  23. Armaan Hasan Nagpurwala, C Sundaresan, and CVS Chaitanya. 2013. Implementation of HDLC controller design using Verilog HDL. In Electrical, Electronics and System Engineering (ICEESE), 2013 International Conference on. IEEE, 7--10.Google ScholarGoogle ScholarCross RefCross Ref
  24. Eriko Nurvitadhi, Jaewoong Sim, David Shefeld, Asit Mishra, Srivatsan Krishnan, and Debbie Marr. 2016. Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC. In Field Programmable Logic and Applications (FPL), 2016 26th International Conference on. IEEE, 1--4.Google ScholarGoogle ScholarCross RefCross Ref
  25. G Alonzo Vera, Marios Pattichis, and James Lyke. 2011. A dynamic dual fixedpoint arithmetic architecture for FPGAs. International Journal of Reconfigurable Computing 2011 (2011).Google ScholarGoogle Scholar
  26. Shmuel Winograd. 1980. Arithmetic complexity of computations. Vol. 33. Siam.Google ScholarGoogle Scholar
  27. Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 161--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen. 2017. Incremental network quantization: Towards lossless cnns with low-precision weights. arXiv preprint arXiv:1702.03044 (2017).Google ScholarGoogle Scholar

Index Terms

  1. A Deep Learning Inference Accelerator Based on Model Compression on FPGA

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
        February 2019
        360 pages
        ISBN:9781450361378
        DOI:10.1145/3289602

        Copyright © 2019 Owner/Author

        Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 February 2019

        Check for updates

        Qualifiers

        • poster

        Acceptance Rates

        Overall Acceptance Rate125of627submissions,20%
      • Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0

        Other Metrics