skip to main content
10.1145/3546000.3546010acmotherconferencesArticle/Chapter ViewAbstractPublication Pageshp3cConference Proceedingsconference-collections
research-article

An Efficient Parallel Architecture for Convolutional Neural Networks Accelerator on FPGAs

Published: 19 August 2022 Publication History

Abstract

Convolutional Neural Networks (CNNs) have been widely used in the field of computer vision. Due to the computational complexity of CNNs, their computational efficiency has become a major concern. Field Programmable Gate Array (FPGA) is an ideal embedded device for accelerating CNNs due to its parallelism and programmability. However, the key challenge is how to efficiently deploy CNNs on embedded platform FPGA. Based on the inherent parallelism of CNNs, this paper proposes an efficient parallel accelerator architecture with two processing element (PE) arrays to accelerate CNNs through layer-wise calculation. Using three tile strategies, the accelerator can be reconfigured to accelerate different CNNs, including VGG, and tiny-YOLO v2. Consuming 448 dsps, the experimental results show that the accelerator can reach a peak performance of 164.25 Giga Operations Per Second (GOPS) on VGG-16 and 138.77 GOPS on tiny-YOLO v2. Compared with previous research, this accelerator achieves enhanced power efficiency and performance density.

References

[1]
[1] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
[2]
[2] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[3]
[3] Shahmustafa Mujawar, Divya Kiran, and Hariharan Ramasangu. An efficient cnn architecture for image classification on fpga accelerator. In 2018 Second International Conference on Advances in Electronics, Computers and Communications (ICAECC), pages 1–4. IEEE, 2018.
[4]
[4] Joseph Redmon and Ali Farhadi. Yolo9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7263–7271, 2017.
[5]
[5] Daniel Pestana, Pedro R Miranda, João D Lopes, Rui P Duarte, Mário P Véstias, Horácio C Neto, and José T De Sousa. A full featured configurable accelerator for object detection with yolo. IEEE Access, 9:75864–75877, 2021.
[6]
[6] Lin Zhang, Xiaofang Hu, Yue Zhou, Guangdong Zhou, and Shukai Duan. Memristive deeplab: A hardware friendly deep cnn for semantic segmentation. Neurocomputing, 451:181–191, 2021.
[7]
[7] Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 1725–1732, 2014.
[8]
[8] Pavan Sandula, Harish Reddy Kolanu, and Manish Okade. Cnn-based camera motion classification using hsi color model for compressed videos. Signal, Image and Video Processing, 16(1):103–110, 2022.
[9]
[9] Xue-Wen Chen and Xiaotong Lin. Big data deep learning: challenges and perspectives. IEEE access, 2:514–525, 2014.
[10]
[10] Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA international symposium on field-programmable gate arrays, pages 161–170, 2015.
[11]
[11] Paolo Meloni, Gianfranco Deriu, Francesco Conti, Igor Loi, Luigi Raffo, and Luca Benini. A high-efficiency runtime reconfigurable ip for cnn acceleration on a mid-range all-programmable soc. In 2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig), pages 1–8. IEEE, 2016.
[12]
[12] Shayan Moini, Bijan Alizadeh, Mohammad Emad, and Reza Ebrahimpour. A resource-limited hardware accelerator for convolutional neural networks in embedded vision applications. IEEE Transactions on Circuits and Systems II: Express Briefs, 64(10):1217–1221, 2017.
[13]
[13] Ali Azarmi Gilan, Mohammad Emad, and Bijan Alizadeh. Fpga-based implementation of a real-time object recognition system using convolutional neural network. IEEE Transactions on Circuits and Systems II: Express Briefs, 67(4):755–759, 2019.
[14]
[14] Chun Bao, Tao Xie, Wenbin Feng, Le Chang, and Chongchong Yu. A power-efficient optimizing framework fpga accelerator based on winograd for yolo. IEEE Access, 8:94307–94317, 2020.
[15]
[15] Hongmin Huang, Xianghong Hu, Xueming Li, and Xiaoming Xiong. An efficient loop tiling framework for convolutional neural network inference accelerators. IET Circuits, Devices & Systems, 2021.
[16]
[16] Liqiang Lu, Jiaming Xie, Ruirui Huang, Jiansong Zhang, Wei Lin, and Yun Liang. An efficient hardware accelerator for sparse convolutional neural networks on fpgas. In 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 17–25. IEEE, 2019.
[17]
[17] Kaiyuan Guo, Lingzhi Sui, Jiantao Qiu, Jincheng Yu, Junbin Wang, Song Yao, Song Han, Yu Wang, and Huazhong Yang. Angel-eye: A complete design flow for mapping cnn onto embedded fpga. IEEE transactions on computer-aided design of integrated circuits and systems, 37(1):35–47, 2017.
[18]
[18] Jixuan Li, Ka-Fai Un, Wei-Han Yu, Pui-In Mak, and Rui P Martins. An fpga-based energy-efficient reconfigurable convolutional neural network accelerator for object recognition applications. IEEE Transactions on Circuits and Systems II: Express Briefs, 68(9):3143–3147, 2021.
[19]
[19] Yap June Wai, Zulkalnain bin Mohd Yussof, Sani Irwan bin Salim, and Lim Kim Chuan. Fixed point implementation of tiny-yolo-v2 using opencl on fpga. International Journal of Advanced Computer Science and Applications, 9(10):506–512, 2018.
[20]
[20] Chung-Bin Wu, Ching-Shun Wang, and Yu-Kuan Hsiao. Reconfigurable hardware architecture design and implementation for ai deep learning accelerator. In 2020 IEEE 9th Global Conference on Consumer Electronics (GCCE), pages 154–155. IEEE, 2020.
[21]
[21] Paolo Meloni, Gianfranco Deriu, Francesco Conti, Igor Loi, Luigi Raffo, and Luca Benini. Curbing the roofline: a scalable and flexible architecture for cnns on fpga. In Proceedings of the ACM International Conference on Computing Frontiers, pages 376–383, 2016.
[22]
[22] Jason Cong and Bingjun Xiao. Minimizing computation in convolutional neural networks. In International conference on artificial neural networks, pages 281–290. Springer, 2014.

Cited By

View all
  • (2023)Dedicated FPGA Implementation of the Gaussian TinyYOLOv3 AcceleratorIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2023.328951470:10(3882-3886)Online publication date: Oct-2023

Index Terms

  1. An Efficient Parallel Architecture for Convolutional Neural Networks Accelerator on FPGAs

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      HP3C '22: Proceedings of the 6th International Conference on High Performance Compilation, Computing and Communications
      June 2022
      221 pages
      ISBN:9781450396295
      DOI:10.1145/3546000
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 19 August 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Convolutional Neural Network (CNN)
      2. FPGA
      3. computational efficiency
      4. parallel architecture
      5. reconfigurable

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      • Key-Area Research and Development Program of Guangdong

      Conference

      HP3C'22

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)39
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 17 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Dedicated FPGA Implementation of the Gaussian TinyYOLOv3 AcceleratorIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2023.328951470:10(3882-3886)Online publication date: Oct-2023

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media