skip to main content
research-article

Throughput-Optimized FPGA Accelerator for Deep Convolutional Neural Networks

Authors Info & Claims
Published:19 July 2017Publication History
Skip Abstract Section

Abstract

Deep convolutional neural networks (CNNs) have gained great success in various computer vision applications. State-of-the-art CNN models for large-scale applications are computation intensive and memory expensive and, hence, are mainly processed on high-performance processors like server CPUs and GPUs. However, there is an increasing demand of high-accuracy or real-time object detection tasks in large-scale clusters or embedded systems, which requires energy-efficient accelerators because of the green computation requirement or the limited battery restriction. Due to the advantages of energy efficiency and reconfigurability, Field-Programmable Gate Arrays (FPGAs) have been widely explored as CNN accelerators. In this article, we present an in-depth analysis of computation complexity and the memory footprint of each CNN layer type. Then a scalable parallel framework is proposed that exploits four levels of parallelism in hardware acceleration. We further put forward a systematic design space exploration methodology to search for the optimal solution that maximizes accelerator throughput under the FPGA constraints such as on-chip memory, computational resources, external memory bandwidth, and clock frequency. Finally, we demonstrate the methodology by optimizing three representative CNNs (LeNet, AlexNet, and VGG-S) on a Xilinx VC709 board. The average performance of the three accelerators is 424.7, 445.6, and 473.4GOP/s under 100MHz working frequency, which outperforms the CPU and previous work significantly.

References

  1. Ossama Abdel-Hamid, Abdel-Rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn, and Dong Yu. 2014. Convolutional neural networks for speech recognition. IEEE/ACM Trans. Aud. Speech Lang. Process. 22, 10 (2014), 1533--1545. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bernard Bosi, Guy Bois, and Yvon Savaria. 1999. Reconfigurable pipelined 2-D convolvers for fast digital signal processing. IEEE Trans. VLSI Syst. 7 (1999), 299--308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Y.-Lan Boureau, Jean Ponce, and Yann LeCun. 2010. A theoretical analysis of feature pooling in visual recognition. In Proceedings of the 27th International Conference on Machine Learning (ICML’10), Vol. 7. 111--118.Google ScholarGoogle Scholar
  4. Srimat Chakradhar, Murugan Sankaradas, Venkata Jakkula, and Srihari Cadambi. 2010. A dynamically configurable coprocessor for convolutional neural networks. In Proceedings of the 37th Annual International Symposium on Computer Architecture. ACM, 247--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2014. Dadiannao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, Vol. 49. IEEE Computer Society, 609--622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning. ACM, 160--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Misha Denil, Babak Shakibi, Laurent Dinh, Marc Aurelio Ranzato, and Nando de Freitas. 2013. Predicting parameters in deep learning. In Advances in Neural Information Processing Systems, Vol. 7. 2148--2156.Google ScholarGoogle Scholar
  8. Clément Farabet, Cyril Poulet, Jefferson Y. Han, and Yann LeCun. 2009. Cnp: An fpga-based processor for convolutional networks. In Proceedings of the International Conference on Field Programmable Logic and Applications, 2009 (FPL’09), Vol. 49. IEEE, 32--37.Google ScholarGoogle ScholarCross RefCross Ref
  9. Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 580--587. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1725--1732. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.Google ScholarGoogle Scholar
  12. Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent convolutional neural networks for text classification. In Proceedings of Twenty-Ninth AAAI Conference on Artificial Intelligence. 2267--2273.Google ScholarGoogle Scholar
  13. Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, Yu Wang, and Huazhong Yang. 2016. Going deeper with embedded fpga platform for convolutional neural network. In Proceedings of the ACM International Symposium on Field-Programmable Gate Arrays. ACM, 16--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ali Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. 2014. CNN features off-the-shelf: An astounding baseline for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 806--813. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Naveen Suda, Vikas Chandra, Ganesh Dasika, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, and Yu Cao. 2016. Throughput-optimized opencl-based FPGA accelerator for large-scale convolutional neural networks. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 16--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--9. Google ScholarGoogle ScholarCross RefCross Ref
  17. Andrea Vedaldi and Karel Lenc. 2015. MatConvNet: Convolutional neural networks for matlab. In Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, Vol. 7. ACM, 689--692. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Xilinx. 2015. Virtex7-product-table.pdf. https://www.xilinx.com/support/documentation/selection-guides.Google ScholarGoogle Scholar
  19. Zhongwen Xu, Yi Yang, and Alex G. Hauptmann. 2015. A discriminative CNN video representation for event detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1798--1807. Google ScholarGoogle ScholarCross RefCross Ref
  20. Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 161--170. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Throughput-Optimized FPGA Accelerator for Deep Convolutional Neural Networks

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Reconfigurable Technology and Systems
        ACM Transactions on Reconfigurable Technology and Systems  Volume 10, Issue 3
        September 2017
        187 pages
        ISSN:1936-7406
        EISSN:1936-7414
        DOI:10.1145/3102109
        • Editor:
        • Steve Wilton
        Issue’s Table of Contents

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 July 2017
        • Accepted: 1 March 2017
        • Revised: 1 November 2016
        • Received: 1 June 2016
        Published in trets Volume 10, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader