research-article

An Efficient Parallel Architecture for Convolutional Neural Networks Accelerator on FPGAs

Authors:

Xiong XiaomingAuthors Info & Claims

HP3C '22: Proceedings of the 6th International Conference on High Performance Compilation, Computing and Communications

Pages 66 - 71

https://doi.org/10.1145/3546000.3546010

Published: 19 August 2022 Publication History

Abstract

Convolutional Neural Networks (CNNs) have been widely used in the field of computer vision. Due to the computational complexity of CNNs, their computational efficiency has become a major concern. Field Programmable Gate Array (FPGA) is an ideal embedded device for accelerating CNNs due to its parallelism and programmability. However, the key challenge is how to efficiently deploy CNNs on embedded platform FPGA. Based on the inherent parallelism of CNNs, this paper proposes an efficient parallel accelerator architecture with two processing element (PE) arrays to accelerate CNNs through layer-wise calculation. Using three tile strategies, the accelerator can be reconfigured to accelerate different CNNs, including VGG, and tiny-YOLO v2. Consuming 448 dsps, the experimental results show that the accelerator can reach a peak performance of 164.25 Giga Operations Per Second (GOPS) on VGG-16 and 138.77 GOPS on tiny-YOLO v2. Compared with previous research, this accelerator achieves enhanced power efficiency and performance density.

References

[1]

[1] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.

[2]

[2] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

[3]

[3] Shahmustafa Mujawar, Divya Kiran, and Hariharan Ramasangu. An efficient cnn architecture for image classification on fpga accelerator. In 2018 Second International Conference on Advances in Electronics, Computers and Communications (ICAECC), pages 1–4. IEEE, 2018.

[4]

[4] Joseph Redmon and Ali Farhadi. Yolo9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7263–7271, 2017.

[5]

[5] Daniel Pestana, Pedro R Miranda, João D Lopes, Rui P Duarte, Mário P Véstias, Horácio C Neto, and José T De Sousa. A full featured configurable accelerator for object detection with yolo. IEEE Access, 9:75864–75877, 2021.

[6]

[6] Lin Zhang, Xiaofang Hu, Yue Zhou, Guangdong Zhou, and Shukai Duan. Memristive deeplab: A hardware friendly deep cnn for semantic segmentation. Neurocomputing, 451:181–191, 2021.

[7]

[7] Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 1725–1732, 2014.

Digital Library

[8]

[8] Pavan Sandula, Harish Reddy Kolanu, and Manish Okade. Cnn-based camera motion classification using hsi color model for compressed videos. Signal, Image and Video Processing, 16(1):103–110, 2022.

[9]

[9] Xue-Wen Chen and Xiaotong Lin. Big data deep learning: challenges and perspectives. IEEE access, 2:514–525, 2014.

[10]

[10] Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA international symposium on field-programmable gate arrays, pages 161–170, 2015.

Digital Library

[11]

[11] Paolo Meloni, Gianfranco Deriu, Francesco Conti, Igor Loi, Luigi Raffo, and Luca Benini. A high-efficiency runtime reconfigurable ip for cnn acceleration on a mid-range all-programmable soc. In 2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig), pages 1–8. IEEE, 2016.

[12]

[12] Shayan Moini, Bijan Alizadeh, Mohammad Emad, and Reza Ebrahimpour. A resource-limited hardware accelerator for convolutional neural networks in embedded vision applications. IEEE Transactions on Circuits and Systems II: Express Briefs, 64(10):1217–1221, 2017.

[13]

[13] Ali Azarmi Gilan, Mohammad Emad, and Bijan Alizadeh. Fpga-based implementation of a real-time object recognition system using convolutional neural network. IEEE Transactions on Circuits and Systems II: Express Briefs, 67(4):755–759, 2019.

[14]

[14] Chun Bao, Tao Xie, Wenbin Feng, Le Chang, and Chongchong Yu. A power-efficient optimizing framework fpga accelerator based on winograd for yolo. IEEE Access, 8:94307–94317, 2020.

[15]

[15] Hongmin Huang, Xianghong Hu, Xueming Li, and Xiaoming Xiong. An efficient loop tiling framework for convolutional neural network inference accelerators. IET Circuits, Devices & Systems, 2021.

[16]

[16] Liqiang Lu, Jiaming Xie, Ruirui Huang, Jiansong Zhang, Wei Lin, and Yun Liang. An efficient hardware accelerator for sparse convolutional neural networks on fpgas. In 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 17–25. IEEE, 2019.

[17]

[17] Kaiyuan Guo, Lingzhi Sui, Jiantao Qiu, Jincheng Yu, Junbin Wang, Song Yao, Song Han, Yu Wang, and Huazhong Yang. Angel-eye: A complete design flow for mapping cnn onto embedded fpga. IEEE transactions on computer-aided design of integrated circuits and systems, 37(1):35–47, 2017.

[18]

[18] Jixuan Li, Ka-Fai Un, Wei-Han Yu, Pui-In Mak, and Rui P Martins. An fpga-based energy-efficient reconfigurable convolutional neural network accelerator for object recognition applications. IEEE Transactions on Circuits and Systems II: Express Briefs, 68(9):3143–3147, 2021.

[19]

[19] Yap June Wai, Zulkalnain bin Mohd Yussof, Sani Irwan bin Salim, and Lim Kim Chuan. Fixed point implementation of tiny-yolo-v2 using opencl on fpga. International Journal of Advanced Computer Science and Applications, 9(10):506–512, 2018.

[20]

[20] Chung-Bin Wu, Ching-Shun Wang, and Yu-Kuan Hsiao. Reconfigurable hardware architecture design and implementation for ai deep learning accelerator. In 2020 IEEE 9th Global Conference on Consumer Electronics (GCCE), pages 154–155. IEEE, 2020.

[21]

[21] Paolo Meloni, Gianfranco Deriu, Francesco Conti, Igor Loi, Luigi Raffo, and Luca Benini. Curbing the roofline: a scalable and flexible architecture for cnns on fpga. In Proceedings of the ACM International Conference on Computing Frontiers, pages 376–383, 2016.

Digital Library

[22]

[22] Jason Cong and Bingjun Xiao. Minimizing computation in convolutional neural networks. In International conference on artificial neural networks, pages 281–290. Springer, 2014.

Cited By

Ki SPark JKim H(2023)Dedicated FPGA Implementation of the Gaussian TinyYOLOv3 AcceleratorIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2023.328951470:10(3882-3886)Online publication date: Oct-2023
https://doi.org/10.1109/TCSII.2023.3289514

Index Terms

An Efficient Parallel Architecture for Convolutional Neural Networks Accelerator on FPGAs
1. Computing methodologies
  1. Parallel computing methodologies
2. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs

Recommendations

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
FPGA '15: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Convolutional neural network (CNN) has been widely employed for image recognition because it can achieve high accuracy by emulating behavior of optic nerves in living creatures. Recently, rapid growth of modern applications based on deep learning ...
Reconfigurable Hardware Accelerator for Convolution Operations in Convolutional Neural Networks
ICCBN '24: Proceedings of the 2024 12th International Conference on Communications and Broadband Networking

Convolutional neural network (CNN) have significantly advanced image classification, video processing, and pattern recognition. Compared to other hardware deployment platforms, field programmable gate arrays (FPGAs) offer advantages such as ...
A Runtime Programmable Accelerator for Convolutional and Multilayer Perceptron Neural Networks on FPGA
Applied Reconfigurable Computing. Architectures, Tools, and Applications
Abstract
Deep neural networks (DNNs) are prevalent for many applications related to classification, prediction and regression. To perform different applications with better performance and accuracy, an optimized network architecture is required, which can ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

HP3C '22: Proceedings of the 6th International Conference on High Performance Compilation, Computing and Communications

June 2022

221 pages

ISBN:9781450396295

DOI:10.1145/3546000

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 August 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Key-Area Research and Development Program of Guangdong

Conference

HP3C'22

HP3C'22: 2022 6th International Conference on High Performance Compilation, Computing and Communications

June 23 - 25, 2022

Jilin, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
127
Total Downloads

Downloads (Last 12 months)39
Downloads (Last 6 weeks)3

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ki SPark JKim H(2023)Dedicated FPGA Implementation of the Gaussian TinyYOLOv3 AcceleratorIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2023.328951470:10(3882-3886)Online publication date: Oct-2023
https://doi.org/10.1109/TCSII.2023.3289514

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents