research-article

Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA

Authors:

Chunyuan ZhangAuthors Info & Claims

FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Pages 97 - 106

https://doi.org/10.1145/3174243.3174257

Published: 15 February 2018 Publication History

Abstract

Three-dimensional convolutional neural networks (3D CNNs) are used efficiently in many computer vision applications. Most previous work in this area has concentrated only on designing and optimizing accelerators for 2D CNN, with few attempts made to accelerate 3D CNN on FPGA. We find accelerating 3D CNNs on FPGA to be challenge due to their high computational complexity and storage demands. More importantly, although the computation patterns of 2D and 3D CNNs are analogous, the conventional approaches adopted for accelerating 2D CNNs may be unfit for 3D CNN acceleration. In this paper, in order to accelerate 2D and 3D CNNs using a uniform framework, we propose a uniform template-based architecture that uses templates based on the Winograd algorithm to ensure fast development of 2D and 3D CNN accelerators. Furthermore, we also develop a uniform analytical model to facilitate efficient design space explorations of 2D and 3D CNN accelerators based on our architecture. Finally, we demonstrate the effectiveness of the template-based architecture by implementing accelerators for real-life 2D and 3D CNNs (VGG16 and C3D) on multiple FPGA platforms. On S2C VUS440, we achieve up to 1.13 TOPS and 1.11 TOPS under low resource utilization for VGG16 and C3D, respectively. End-to-end comparisons with CPU and GPU solutions demonstrate that our implementation of C3D achieves gains of up to 13x and 60x in performance and energy relative to a CPU solution, and a 6.4x energy efficiency gain over a GPU solution.

References

[1]

M. Alwani et al., "Fused-layer cnn accelerators," In MICRO, pages 1--12. IEEE, 2016.

[2]

U. Aydonat et al., "An opencl deep learning accelerator on arria 10," arXiv preprint arXiv:1701.03534, 2017.

Digital Library

[3]

J. Cong et al., "Bandwidth optimization through on-chip memory restructuring for hls," Design Automation Conference (DAC), pages 1--6. IEEE, 2017.

Digital Library

[4]

K. He et al., "Deep residual learning for image recognition," In CVPR, pages 770--778, 2016.

[5]

S. Ji et al., "3d convolutional neural networks for human action recognition," IEEE transactions on pattern analysis and machine intelligence, 35(1):221--231, 2013.

Digital Library

[6]

N. P. Jouppi et al., "In-datacenter performance analysis of a tensor processing unit," arXiv preprint arXiv:1704.04760, 2017.

Digital Library

[7]

K. Kamnitsas et al., "Efficient multi-scale 3d cnn with fully connected crf for accurate brain lesion segmentation," Medical image analysis, 36:61--78, 2017.

[8]

A. Krizhevsky et al., "Imagenet classification with deep convolutional neural networks," In NIPS, pages 1097--1105, 2012.

Digital Library

[9]

Q. Lan et al., "High performance implementation of 3d convolutional neural networks on a gpu," Inpress at https://www.hindawi.com/journals/cin/aip/8348671/.

[10]

A. Lavin and S. Gray, "Fast algorithms for convolutional neural networks," In CVPR, pages 4013--4021, 2016.

[11]

H. Li et al., "A high performance fpga-based accelerator for large-scale convolutional neural networks," In FPL, pages 1--9. IEEE, 2016.

[12]

Y. Ma et al., "Optimizing loop operation and dataflow in fpga acceleration of deep convolutional neural networks," In FPGA, pages 45--54. ACM, 2017.

Digital Library

[13]

D. Mahajan et al., "Tabla: A unified template-based framework for accelerating statistical machine learning," In HPCA, pages 14--26. IEEE, 2016.

[14]

J. Qiu et al., "Going deeper with embedded fpga platform for convolutional neural network," In FPGA, pages 26--35. ACM, 2016.

Digital Library

[15]

K. Simonyan et al., "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.

[16]

N. Suda et al., "Throughput-optimized opencl-based fpga accelerator for largescale convolutional neural networks," In FPGA, pages 16--25. ACM, 2016.

Digital Library

[17]

D. Tran et al., "Learning spatiotemporal features with 3d convolutional networks," In ICCV, pages 4489--4497, 2015.

Digital Library

[18]

X.Wei et al., "Automated systolic array architecture synthesis for high throughput cnn inference on fpgas," In DAC, page 29. ACM, 2017.

Digital Library

[19]

S. Winograd. "On multiplication of polynomials modulo a polynomial," SIAM Journal on Computing, 9(2):225--229, 1980.

[20]

C. Zhang et al., "Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks," In ICCAD, pages 1--8. IEEE, 2016.

Digital Library

[21]

C. Zhang et al., "Optimizing fpga-based accelerator design for deep convolutional neural networks," In FPGA, pages 161--170. ACM, 2015.

Digital Library

[22]

C. Zhang et al., "Frequency domain acceleration of convolutional neural networks on cpu-fpga shared memory system," In FPGA, pages 35--44. ACM, 2017.

Digital Library

[23]

J. Zhang and J. Li, "Improving the performance of opencl-based fpga accelerator for convolutional neural network," In FPGA, pages 25--34. ACM, 2017.

Digital Library

Cited By

Aghapour ESapra DPimentel APathania A(2024)ARM-CO-UP: ARM COoperative Utilization of ProcessorsACM Transactions on Design Automation of Electronic Systems10.1145/365647229:5(1-30)Online publication date: 8-Apr-2024
https://dl.acm.org/doi/10.1145/3656472
Kim SJung JLee K(2024)A Real-Time Sparsity-Aware 3D-CNN Processor for Mobile Hand Gesture RecognitionIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2024.340807271:8(3695-3707)Online publication date: Aug-2024
https://doi.org/10.1109/TCSI.2024.3408072
Sun MXu KLin XHu YYin B(2024)Hardware-Friendly 3-D CNN Acceleration With Balanced Kernel Group SparsityIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.339004043:10(3027-3040)Online publication date: Oct-2024
https://doi.org/10.1109/TCAD.2024.3390040
Show More Cited By

Recommendations

Toward an Efficient Deep Pipelined Template-Based Architecture for Accelerating the Entire 2-D and 3-D CNNs on FPGA
3-D convolutional neural networks (3-D CNNs) are used efficiently in many computer vision applications. Most previous work in this area has concentrated only on design and optimization of accelerators for 2-D CNNs, with few attempts having been made to ...
An Efficient Design Flow for Accelerating Complicated-connected CNNs on a Multi-FPGA Platform
ICPP '19: Proceedings of the 48th International Conference on Parallel Processing

Convolutional Neural Networks (CNNs) have achieved impressive performance on various computer vision tasks. To facilitate better performance, some complicated-connected CNN models (e.g., GoogLeNet and DenseNet) have recently been proposed, and have ...
Exploration of memory access optimization for FPGA-based 3D CNN accelerator
DATE '20: Proceedings of the 23rd Conference on Design, Automation and Test in Europe

Three-dimensional convolutional networks (3D CNNs) are used efficiently in various video recognition applications. Compared to traditional 2D CNNs, extra temporal dimension causes 3D CNNs more computationally intensive and to have a larger memory ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

February 2018

310 pages

ISBN:9781450356145

DOI:10.1145/3174243

General Chair:
Jason H. Anderson
University of Toronto, Canada
,
Program Chair:
Kia Bazargan
University of Minnesota, USA

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 February 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Program on Key Basic Research Project

Conference

FPGA '18

Sponsor:

SIGDA

FPGA '18: The 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

February 25 - 27, 2018

CALIFORNIA, Monterey, USA

Acceptance Rates

FPGA '18 Paper Acceptance Rate 10 of 116 submissions, 9%;

Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

71
Total Citations
View Citations
1,668
Total Downloads

Downloads (Last 12 months)93
Downloads (Last 6 weeks)9

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Aghapour ESapra DPimentel APathania A(2024)ARM-CO-UP: ARM COoperative Utilization of ProcessorsACM Transactions on Design Automation of Electronic Systems10.1145/365647229:5(1-30)Online publication date: 8-Apr-2024
https://dl.acm.org/doi/10.1145/3656472
Kim SJung JLee K(2024)A Real-Time Sparsity-Aware 3D-CNN Processor for Mobile Hand Gesture RecognitionIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2024.340807271:8(3695-3707)Online publication date: Aug-2024
https://doi.org/10.1109/TCSI.2024.3408072
Sun MXu KLin XHu YYin B(2024)Hardware-Friendly 3-D CNN Acceleration With Balanced Kernel Group SparsityIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.339004043:10(3027-3040)Online publication date: Oct-2024
https://doi.org/10.1109/TCAD.2024.3390040
Yao YChen XAtmer HKaxiras S(2024)TangramFP: Energy-Efficient, Bit-Parallel, Multiply-Accumulate for Deep Neural Networks2024 IEEE 36th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD63648.2024.00009(1-12)Online publication date: 13-Nov-2024
https://doi.org/10.1109/SBAC-PAD63648.2024.00009
Xiang DLi CHuang WHuang Y(2024)A Novel FPGA Accelerator of R(2+1)D2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM60383.2024.00031(1-7)Online publication date: 5-May-2024
https://doi.org/10.1109/FCCM60383.2024.00031
Toupas PYu ZBouganis CTzovaras D(2024)SMOF: Streaming Modern CNNs on FPGAs with Smart Off-Chip Eviction2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM60383.2024.00029(185-196)Online publication date: 5-May-2024
https://doi.org/10.1109/FCCM60383.2024.00029
Khan FAdeel Pasha MMasud S(2024)Exploring Memory Access Techniques for Efficient FPGA based 3D CNN Accelerator Design2024 IEEE 6th International Conference on AI Circuits and Systems (AICAS)10.1109/AICAS59952.2024.10595963(218-222)Online publication date: 22-Apr-2024
https://doi.org/10.1109/AICAS59952.2024.10595963
Liu FLi HHu WHe Y(2024)Review of neural network model acceleration techniques based on FPGA platformsNeurocomputing10.1016/j.neucom.2024.128511(128511)Online publication date: Aug-2024
https://doi.org/10.1016/j.neucom.2024.128511
Yang YChen CWang Z(2024)Architectures for Machine LearningHandbook of Computer Architecture10.1007/978-981-97-9314-3_12(321-379)Online publication date: 21-Dec-2024
https://doi.org/10.1007/978-981-97-9314-3_12
Jia XZhang YLiu GYang XZhang TZheng JXu DLiu ZLiu MYan XWang HZheng RWang LLi DPareek SWeng JTian LXie DLuo HShan Y(2023)XVDPU: A High-Performance CNN Accelerator on the Versal Platform Powered by the AI EngineACM Transactions on Reconfigurable Technology and Systems10.1145/361783617:2(1-24)Online publication date: 13-Sep-2023
https://dl.acm.org/doi/10.1145/3617836
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten