poster

Compressed CNN Training with FPGA-based Accelerator

Authors:

Yu Wang,

Huazhong YangAuthors Info & Claims

FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Page 189

https://doi.org/10.1145/3289602.3293977

Published: 20 February 2019 Publication History

Abstract

Training convolutional neural network (CNN) usually requires large amount of computation resource, time and power. Researchers and cloud service providers in this region needs fast and efficient training system. GPU is currently the best candidate for CNN training. But FPGAs have already shown good performance and energy efficiency as CNN inference accelerators. In this work, we design a compressed training process together with an FPGA-based accelerator for energy efficient CNN training. We adopt two of the widely used model compression methods, quantization and pruning, to accelerate CNN training process. The difference between inference and training brought challenges to apply the two methods in training. First, training requires higher data precision. We use the gradient accumulation buffer to achieve low operation complexity while keeping gradient descent precision. Second, sparse network results in different types of functions in forward and back-propagation phases. We design a novel architecture to utilize both inference and back-propagation sparsity. Experimental results show that the proposed training process achieves similar accuracy compared with traditional training process with floating point data. The proposed accelerator achieves 641GOP/s equivalent performance and 2.86x better energy efficiency compared with GPU.

Cited By

View all

Lu JWang HLin JWang Z(2024)WinTA: An Efficient Reconfigurable CNN Training Accelerator With Decomposition WinogradIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2023.333847171:2(634-645)Online publication date: Feb-2024
https://doi.org/10.1109/TCSI.2023.3338471
Lu JNi CWang Z(2023)ETA: An Efficient Training Accelerator for DNNs Based on Hardware-Algorithm Co-OptimizationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.314585034:10(7660-7674)Online publication date: Oct-2023
https://doi.org/10.1109/TNNLS.2022.3145850
Tsai TLin D(2023)An On-Chip Fully Connected Neural Network Training Hardware Accelerator Based on Brain Float Point and Sparsity AwarenessIEEE Open Journal of Circuits and Systems10.1109/OJCAS.2023.32450614(85-98)Online publication date: 2023
https://doi.org/10.1109/OJCAS.2023.3245061
Show More Cited By

Index Terms

Compressed CNN Training with FPGA-based Accelerator
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
2. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators

Recommendations

An FPGA-based Fine Tuning Accelerator for a Sparse CNN
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Fine-tuning learns abundant feature expression for a wide range of natural images by using a pre-trained CNN model. It can be applied to a wide range of the neural network (NN)based computer vision problems. This paper proposes an FPGA-based fine-tuning ...
FPGA-based low-batch training accelerator for modern CNNs featuring high bandwidth memory
ICCAD '20: Proceedings of the 39th International Conference on Computer-Aided Design

Training convolutional neural networks (CNNs) requires intensive computations as well as a large amount of storage and memory access. While low bandwidth off-chip memories in prior FPGA works have hindered the system-level performance, modern FPGAs ...
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
FPGA '15: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Convolutional neural network (CNN) has been widely employed for image recognition because it can achieve high accuracy by emulating behavior of optic nerves in living creatures. Recently, rapid growth of modern applications based on deep learning ...

Comments

Information & Contributors

Information

Published In

FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

February 2019

360 pages

ISBN:9781450361378

DOI:10.1145/3289602

General Chair:
Kia Bazargan
Univ. of Minnesota, USA
,
Program Chair:
Stephen Neuendorffer
Xilinx, USA

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 February 2019

Check for updates

Author Tags

Qualifiers

Poster

Funding Sources

National Natural Science Foundation of China
National Key R&D Program of China

Conference

FPGA '19

Sponsor:

SIGDA

FPGA '19: The 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

February 24 - 26, 2019

CA, Seaside, USA

Acceptance Rates

Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Lu JWang HLin JWang Z(2024)WinTA: An Efficient Reconfigurable CNN Training Accelerator With Decomposition WinogradIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2023.333847171:2(634-645)Online publication date: Feb-2024
https://doi.org/10.1109/TCSI.2023.3338471
Lu JNi CWang Z(2023)ETA: An Efficient Training Accelerator for DNNs Based on Hardware-Algorithm Co-OptimizationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.314585034:10(7660-7674)Online publication date: Oct-2023
https://doi.org/10.1109/TNNLS.2022.3145850
Tsai TLin D(2023)An On-Chip Fully Connected Neural Network Training Hardware Accelerator Based on Brain Float Point and Sparsity AwarenessIEEE Open Journal of Circuits and Systems10.1109/OJCAS.2023.32450614(85-98)Online publication date: 2023
https://doi.org/10.1109/OJCAS.2023.3245061
Mao YLiu Q(2023)An FPGA-based Mix-grained Sparse Training Accelerator2023 International Conference on Field Programmable Technology (ICFPT)10.1109/ICFPT59805.2023.00043(276-277)Online publication date: 12-Dec-2023
https://doi.org/10.1109/ICFPT59805.2023.00043
Guo CLou BLiu XBoland DLeong PZhuo C(2023)BOOST: Block Minifloat-Based On-Device CNN Training Accelerator with Transfer Learning2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323638(1-9)Online publication date: 28-Oct-2023
https://doi.org/10.1109/ICCAD57390.2023.10323638
Lu JHuang JWang Z(2022)THETA: A High-Efficiency Training Accelerator for DNNs With Triple-Side Sparsity ExplorationIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2022.317558230:8(1034-1046)Online publication date: Aug-2022
https://doi.org/10.1109/TVLSI.2022.3175582
Xu MLu JWang ZLin J(2022)An Efficient CNN Training Accelerator Leveraging Transposable Block Sparsity2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS)10.1109/AICAS54282.2022.9869938(230-233)Online publication date: 13-Jun-2022
https://doi.org/10.1109/AICAS54282.2022.9869938
Nguyen DKim HLee H(2021)Layer-Specific Optimization for Mixed Data Flow With Mixed Precision in FPGA Design for CNN-Based Object DetectorsIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2020.302056931:6(2450-2464)Online publication date: Jun-2021
https://doi.org/10.1109/TCSVT.2020.3020569
Shao HLu JLin JWang Z(2021)An FPGA-Based Reconfigurable Accelerator for Low-Bit DNN Training2021 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI51109.2021.00054(254-259)Online publication date: Jul-2021
https://doi.org/10.1109/ISVLSI51109.2021.00054
Zeng HPrasanna VNeuendorffer SShannon L(2020)GraphACTProceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3373087.3375312(255-265)Online publication date: 23-Feb-2020
https://dl.acm.org/doi/10.1145/3373087.3375312
Show More Cited By

Abstract

Cited By

Index Terms

Recommendations

An FPGA-based Fine Tuning Accelerator for a Sparse CNN

FPGA-based low-batch training accelerator for modern CNNs featuring high bandwidth memory

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Share

Share this Publication link

Share on social media

Affiliations