skip to main content
10.1145/3394885.3431532acmconferencesArticle/Chapter ViewAbstractPublication PagesaspdacConference Proceedingsconference-collections
research-article

Block-Circulant Neural Network Accelerator Featuring Fine-Grained Frequency-Domain Quantization and Reconfigurable FFT Modules

Published: 29 January 2021 Publication History

Abstract

Block-circulant based compression is a popular technique to accelerate neural network inference. Though storage and computing costs can be reduced by transforming weights into block-circulant matrices, this method incurs uneven data distribution in the frequency domain and imbalanced workload. In this paper, we propose RAB: a Reconfigurable Architecture Block-Circulant Neural Network Accelerator to solve the problems via two techniques. First, a fine-grained frequency-domain quantization is proposed to accelerate MAC operations. Second, a reconfigurable architecture is designed to transform FFT/IFFT modules into MAC modules, which alleviates the imbalanced workload and further improves efficiency. Experimental results show that RAB can achieve 1.9x/1.8x area/energy efficiency improvement compared with the state-of-the-art block-circulant compression based accelerator.

References

[1]
Caiwen Ding and et al. 2017. CirCNN: accelerating and compressing deep neural networks using block-circulant weight matrices. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 395--408.
[2]
Caiwen Ding and et al. 2019. REQ-YOLO: A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs. In FPGA. 33--42.
[3]
S. Lin et al. 2018. FFT-based deep learning deployment in embedded systems. In DATE. 1045--1050.
[4]
J. S. Garofolo and et al. 1993. DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NET speech disc 1--1.1. Nasa Sti/recon Technical Report N 93 (1993).
[5]
Song Han and et al. 2016. ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA. In Acm/sigda International Symposium on Field-programmable Gate Arrays.
[6]
Andrew G. Howard and et al. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. CoRR abs/1704.04861 (2017). arXiv:1704.04861
[7]
Kellerer and et al. 2004. Knapsack Problems.
[8]
J. Lee and et al. 2019. An Energy-Efficient Sparse Deep-Neural-Network Learning Accelerator With Fine-Grained Mixed Precision of FP8/P16. IEEE Solid-State Circuits Letters 2, 11 (2019), 232--235.
[9]
B. Moons and et al. 2016. Energy-efficient ConvNets through approximate computing. In WACV. 1--8.
[10]
Eunhyeok Park and et al. 2018. Energy-Efficient Neural Network Accelerator Based on Outlier-Aware Low-Precision Computation. In ISCA. 688--698.
[11]
Haşim Sak and et al. 2014. Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition. Computer Science (2014), 338--342.
[12]
et al Sandier. 2018. MobiIeNetV2: Inverted Residuals and Linear Bottlenecks. In CVPR
[13]
Han Song and et al. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. Acm Sigarch Computer Architecture News 44, 3 (2016), 243--254.
[14]
Wang and et al. 2018. C-LSTM: Enabling Efficient LSTM Using Structured Compression Techniques on FPGAs. In FPGA 11--20.
[15]
Kuan Wang and et al. 2018. HAQ: Hardware-Aware Automated Quantization. CoRR abs/1811.08886 (2018). arXiv:1811.08886
[16]
Shaokai Ye and et al. 2018. A Unified Framework of DNN Weight Pruning and Weight Clustering/Quantization Using ADMM. (2018).
[17]
Zhe Yuan and et al. 2018. Sticker: A 0.41--62.1 TOPS/W 8Bit Neural Network Processor with Multi-Sparsity Compatible Convolution Arrays and Online Tuning Acceleration for Fully Connected Layers. In 2018 IEEE Symposium on VLSI Circuits.
[18]
J. Yue and et al. 2019. 7.5 A 65nm 0.39-to-140.3TOPS/W l-to-12b Unified Neural Network Processor Using Block-Circulant-Enabled Transpose-Domain Acceleration with 8.1x Higher TOPS/mm2and 6T HBST-TRAM-Based 2D Data-Reuse Architecture. In ISSCC. 138--140.

Cited By

View all
  • (2023)FPGA-Based Accelerator for Rank-Enhanced and Highly-Pruned Block-Circulant Neural Networks2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10137111(1-6)Online publication date: Apr-2023
  • (2023)Accelerating Deep Convolutional Neural Networks Using Number Theoretic TransformIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2022.321452870:1(315-326)Online publication date: Jan-2023
  • (2023)Performance-Driven LSTM Accelerator Hardware Using Split-Matrix-Based MVMCircuits, Systems, and Signal Processing10.1007/s00034-023-02412-442:11(6660-6683)Online publication date: 8-Jun-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPDAC '21: Proceedings of the 26th Asia and South Pacific Design Automation Conference
January 2021
930 pages
ISBN:9781450379991
DOI:10.1145/3394885
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 January 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Acceleration
  2. Artificial neural networks
  3. Energy Efficiency
  4. Quantization

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • National Key R&D Program
  • NSFC

Conference

ASPDAC '21
Sponsor:

Acceptance Rates

ASPDAC '21 Paper Acceptance Rate 111 of 368 submissions, 30%;
Overall Acceptance Rate 466 of 1,454 submissions, 32%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)FPGA-Based Accelerator for Rank-Enhanced and Highly-Pruned Block-Circulant Neural Networks2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10137111(1-6)Online publication date: Apr-2023
  • (2023)Accelerating Deep Convolutional Neural Networks Using Number Theoretic TransformIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2022.321452870:1(315-326)Online publication date: Jan-2023
  • (2023)Performance-Driven LSTM Accelerator Hardware Using Split-Matrix-Based MVMCircuits, Systems, and Signal Processing10.1007/s00034-023-02412-442:11(6660-6683)Online publication date: 8-Jun-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media