skip to main content
10.1145/3289602.3293977acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
poster

Compressed CNN Training with FPGA-based Accelerator

Published: 20 February 2019 Publication History

Abstract

Training convolutional neural network (CNN) usually requires large amount of computation resource, time and power. Researchers and cloud service providers in this region needs fast and efficient training system. GPU is currently the best candidate for CNN training. But FPGAs have already shown good performance and energy efficiency as CNN inference accelerators. In this work, we design a compressed training process together with an FPGA-based accelerator for energy efficient CNN training. We adopt two of the widely used model compression methods, quantization and pruning, to accelerate CNN training process. The difference between inference and training brought challenges to apply the two methods in training. First, training requires higher data precision. We use the gradient accumulation buffer to achieve low operation complexity while keeping gradient descent precision. Second, sparse network results in different types of functions in forward and back-propagation phases. We design a novel architecture to utilize both inference and back-propagation sparsity. Experimental results show that the proposed training process achieves similar accuracy compared with traditional training process with floating point data. The proposed accelerator achieves 641GOP/s equivalent performance and 2.86x better energy efficiency compared with GPU.

Cited By

View all
  • (2024)WinTA: An Efficient Reconfigurable CNN Training Accelerator With Decomposition WinogradIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2023.333847171:2(634-645)Online publication date: Feb-2024
  • (2023)ETA: An Efficient Training Accelerator for DNNs Based on Hardware-Algorithm Co-OptimizationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.314585034:10(7660-7674)Online publication date: Oct-2023
  • (2023)An On-Chip Fully Connected Neural Network Training Hardware Accelerator Based on Brain Float Point and Sparsity AwarenessIEEE Open Journal of Circuits and Systems10.1109/OJCAS.2023.32450614(85-98)Online publication date: 2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
February 2019
360 pages
ISBN:9781450361378
DOI:10.1145/3289602
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 February 2019

Check for updates

Author Tags

  1. convolutional neural network
  2. fpga
  3. training

Qualifiers

  • Poster

Funding Sources

Conference

FPGA '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)WinTA: An Efficient Reconfigurable CNN Training Accelerator With Decomposition WinogradIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2023.333847171:2(634-645)Online publication date: Feb-2024
  • (2023)ETA: An Efficient Training Accelerator for DNNs Based on Hardware-Algorithm Co-OptimizationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.314585034:10(7660-7674)Online publication date: Oct-2023
  • (2023)An On-Chip Fully Connected Neural Network Training Hardware Accelerator Based on Brain Float Point and Sparsity AwarenessIEEE Open Journal of Circuits and Systems10.1109/OJCAS.2023.32450614(85-98)Online publication date: 2023
  • (2023)An FPGA-based Mix-grained Sparse Training Accelerator2023 International Conference on Field Programmable Technology (ICFPT)10.1109/ICFPT59805.2023.00043(276-277)Online publication date: 12-Dec-2023
  • (2023)BOOST: Block Minifloat-Based On-Device CNN Training Accelerator with Transfer Learning2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323638(1-9)Online publication date: 28-Oct-2023
  • (2022)THETA: A High-Efficiency Training Accelerator for DNNs With Triple-Side Sparsity ExplorationIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2022.317558230:8(1034-1046)Online publication date: Aug-2022
  • (2022)An Efficient CNN Training Accelerator Leveraging Transposable Block Sparsity2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS)10.1109/AICAS54282.2022.9869938(230-233)Online publication date: 13-Jun-2022
  • (2021)Layer-Specific Optimization for Mixed Data Flow With Mixed Precision in FPGA Design for CNN-Based Object DetectorsIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2020.302056931:6(2450-2464)Online publication date: Jun-2021
  • (2021)An FPGA-Based Reconfigurable Accelerator for Low-Bit DNN Training2021 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI51109.2021.00054(254-259)Online publication date: Jul-2021
  • (2020)GraphACTProceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3373087.3375312(255-265)Online publication date: 23-Feb-2020
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media