research-article

Block-Circulant Neural Network Accelerator Featuring Fine-Grained Frequency-Domain Quantization and Reconfigurable FFT Modules

Authors:

Huazhong YangAuthors Info & Claims

ASPDAC '21: Proceedings of the 26th Asia and South Pacific Design Automation Conference

Pages 813 - 818

https://doi.org/10.1145/3394885.3431532

Published: 29 January 2021 Publication History

Abstract

Block-circulant based compression is a popular technique to accelerate neural network inference. Though storage and computing costs can be reduced by transforming weights into block-circulant matrices, this method incurs uneven data distribution in the frequency domain and imbalanced workload. In this paper, we propose RAB: a Reconfigurable Architecture Block-Circulant Neural Network Accelerator to solve the problems via two techniques. First, a fine-grained frequency-domain quantization is proposed to accelerate MAC operations. Second, a reconfigurable architecture is designed to transform FFT/IFFT modules into MAC modules, which alleviates the imbalanced workload and further improves efficiency. Experimental results show that RAB can achieve 1.9x/1.8x area/energy efficiency improvement compared with the state-of-the-art block-circulant compression based accelerator.

References

[1]

Caiwen Ding and et al. 2017. CirCNN: accelerating and compressing deep neural networks using block-circulant weight matrices. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 395--408.

[2]

Caiwen Ding and et al. 2019. REQ-YOLO: A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs. In FPGA. 33--42.

[3]

S. Lin et al. 2018. FFT-based deep learning deployment in embedded systems. In DATE. 1045--1050.

[4]

J. S. Garofolo and et al. 1993. DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NET speech disc 1--1.1. Nasa Sti/recon Technical Report N 93 (1993).

[5]

Song Han and et al. 2016. ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA. In Acm/sigda International Symposium on Field-programmable Gate Arrays.

[6]

Andrew G. Howard and et al. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. CoRR abs/1704.04861 (2017). arXiv:1704.04861

[7]

Kellerer and et al. 2004. Knapsack Problems.

[8]

J. Lee and et al. 2019. An Energy-Efficient Sparse Deep-Neural-Network Learning Accelerator With Fine-Grained Mixed Precision of FP8/P16. IEEE Solid-State Circuits Letters 2, 11 (2019), 232--235.

[9]

B. Moons and et al. 2016. Energy-efficient ConvNets through approximate computing. In WACV. 1--8.

[10]

Eunhyeok Park and et al. 2018. Energy-Efficient Neural Network Accelerator Based on Outlier-Aware Low-Precision Computation. In ISCA. 688--698.

[11]

Haşim Sak and et al. 2014. Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition. Computer Science (2014), 338--342.

[12]

et al Sandier. 2018. MobiIeNetV2: Inverted Residuals and Linear Bottlenecks. In CVPR

[13]

Han Song and et al. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. Acm Sigarch Computer Architecture News 44, 3 (2016), 243--254.

Digital Library

[14]

Wang and et al. 2018. C-LSTM: Enabling Efficient LSTM Using Structured Compression Techniques on FPGAs. In FPGA 11--20.

[15]

Kuan Wang and et al. 2018. HAQ: Hardware-Aware Automated Quantization. CoRR abs/1811.08886 (2018). arXiv:1811.08886

[16]

Shaokai Ye and et al. 2018. A Unified Framework of DNN Weight Pruning and Weight Clustering/Quantization Using ADMM. (2018).

[17]

Zhe Yuan and et al. 2018. Sticker: A 0.41--62.1 TOPS/W 8Bit Neural Network Processor with Multi-Sparsity Compatible Convolution Arrays and Online Tuning Acceleration for Fully Connected Layers. In 2018 IEEE Symposium on VLSI Circuits.

[18]

J. Yue and et al. 2019. 7.5 A 65nm 0.39-to-140.3TOPS/W l-to-12b Unified Neural Network Processor Using Block-Circulant-Enabled Transpose-Domain Acceleration with 8.1x Higher TOPS/mm2and 6T HBST-TRAM-Based 2D Data-Reuse Architecture. In ISSCC. 138--140.

Cited By

Song HYoon JKim DKwon EOh TKang S(2023)FPGA-Based Accelerator for Rank-Enhanced and Highly-Pruned Block-Circulant Neural Networks2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10137111(1-6)Online publication date: Apr-2023
https://doi.org/10.23919/DATE56975.2023.10137111
Prasetiyo Hong SArthanto YKim J(2023)Accelerating Deep Convolutional Neural Networks Using Number Theoretic TransformIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2022.321452870:1(315-326)Online publication date: Jan-2023
https://doi.org/10.1109/TCSI.2022.3214528
Joseph TBindiya T(2023)Performance-Driven LSTM Accelerator Hardware Using Split-Matrix-Based MVMCircuits, Systems, and Signal Processing10.1007/s00034-023-02412-442:11(6660-6683)Online publication date: 8-Jun-2023
https://doi.org/10.1007/s00034-023-02412-4

Recommendations

Efficient Hardware Implementation of Cellular Neural Networks with Incremental Quantization and Early Exit
Special Issue on Neuromorphic Computing

Cellular neural networks (CeNNs) have been widely adopted in image processing tasks. Recently, various hardware implementations of CeNNs have emerged in the literature, with Field Programmable Gate Array (FPGA) being one of the most popular choices due ...
A Fine-Grained Sparse Accelerator for Multi-Precision DNN
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Neural Networks (NNs) have made a significant breakthrough in many fields, while they also pose a great challenge to hardware platforms since the state-of-the-art neural networks are both communicational- and computational-intensive. Researchers ...
An LSTM Acceleration Method Based on Embedded Neural Network Accelerator
ACAI '21: Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence

With the maturity of neural network technology, chips to accelerate neural network inference are emerging endlessly. Faced with the emerging complex neural network operators (such as LSTM) that are constantly evolving in neural network algorithms, it ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPDAC '21: Proceedings of the 26th Asia and South Pacific Design Automation Conference

January 2021

930 pages

ISBN:9781450379991

DOI:10.1145/3394885

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE CAS
IEEE CEDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 January 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Key R&D Program
NSFC

Conference

ASPDAC '21

Sponsor:

SIGDA

ASPDAC '21: 26th Asia and South Pacific Design Automation Conference

January 18 - 21, 2021

Tokyo, Japan

Acceptance Rates

ASPDAC '21 Paper Acceptance Rate 111 of 368 submissions, 30%;

Overall Acceptance Rate 466 of 1,454 submissions, 32%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
91
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Song HYoon JKim DKwon EOh TKang S(2023)FPGA-Based Accelerator for Rank-Enhanced and Highly-Pruned Block-Circulant Neural Networks2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10137111(1-6)Online publication date: Apr-2023
https://doi.org/10.23919/DATE56975.2023.10137111
Prasetiyo Hong SArthanto YKim J(2023)Accelerating Deep Convolutional Neural Networks Using Number Theoretic TransformIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2022.321452870:1(315-326)Online publication date: Jan-2023
https://doi.org/10.1109/TCSI.2022.3214528
Joseph TBindiya T(2023)Performance-Driven LSTM Accelerator Hardware Using Split-Matrix-Based MVMCircuits, Systems, and Signal Processing10.1007/s00034-023-02412-442:11(6660-6683)Online publication date: 8-Jun-2023
https://doi.org/10.1007/s00034-023-02412-4

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten