research-article

Data Stream Oriented Fine-grained Sparse CNN Accelerator with Efficient Unstructured Pruning Strategy

Authors:

Weiqiang LiuAuthors Info & Claims

GLSVLSI '22: Proceedings of the Great Lakes Symposium on VLSI 2022

Pages 243 - 248

https://doi.org/10.1145/3526241.3530318

Published: 06 June 2022 Publication History

Abstract

Network pruning can effectively alleviate the excessive parameters and computation issues in CNNs. However, unstructured pruning is not hardware friendly, while structured pruning will result in a significant loss of accuracy. In this paper, an unstructured fine-grained pruning strategy is proposed and achieves a 16X compression ratio with a top-1 accuracy loss of 1.4% for VGG-16. Combined with the proposed hardware-oriented hyperparameter selection method, compression rates of up to 64X can be obtained while fully meeting the edge-side accuracy requirements. Further, a light-weight, high-performance sparse CNN accelerator with modified systolic array is proposed for pruned VGG-16. The experimental results show that compared with the most advanced design, the proposed accelerator can achieve 21 Frames Per Second (FPS) with 3X better power efficiency and 2.19X better calculation density.

Supplementary Material

MP4 File (GLSVLSI22-modfp103.mp4)

Network pruning can effectively alleviate the excessive parameters and computation issues in CNNs. However, unstructured pruning is not hardware friendly, while structured pruning will result in a significant loss of accuracy. In this paper, an unstructured fine-grained pruning strategy is proposed and achieves a 16X compression ratio with a top-1 accuracy loss of 1.4% for VGG-16. Combined with the proposed hardware-oriented hyperparameter selection method, compression rates of up to 64X can be obtained while fully meeting the edge-side accuracy requirements. Further, a light-weight, high-performance sparse CNN accelerator with modified systolic array is proposed for pruned VGG-16. The experimental results show that compared with the most advanced design, the proposed accelerator can achieve 21 Frames Per Second (FPS) with 3X better power efficiency and 2.19X better calculation density.

Download
55.24 MB

References

[1]

B. Jacob et al. 2018. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA, 2704--2713.

[2]

D. T. Kwadjo and C. Bobda. 2020. Late Breaking Results: Automated Hardware Generation of CNN Models on FPGAs. In Proc. ACM/IEEE Design Automation Conference (DAC). Virtual, 1--2.

[3]

F. Karimzadeh, N. Cao, B. Crafton, J. Romberg and A. Raychowdhury. 2021. A Hardware-Friendly Approach Towards Sparse Neural Networks Based on LFSR-Generated Pseudo-Random Sequences. IEEE Transactions on Circuits and Systems I: Regular Papers 68, 2 (2021), 751--764.

[4]

H. Mao et al. 2017. Exploring the Granularity of Sparsity in Convolutional Neural Networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Hawaii, USA, 1927--1934.

[5]

Huang, Wenjin and Wu, Huangtao et al. 2018. Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 1 (2018), 35--47.

[6]

J. Deng, W. Dong, R. Socher, L. Li, Kai Li and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Miami Beach, Florida, USA, 248--255.

[7]

K. Simonyan and A. Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556 (2014).

[8]

L. Deng, G. Li, S. Han, L. Shi and Y. Xie. 2020. Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey. Proc. IEEE 108, 4 (2020), 485--532.

[9]

L. Gong, C. Wang, X. Li and X. Zhou. 2021. Improving HW/SW Adaptability for Accelerating CNNs on FPGAs Through A Dynamic/Static Co-Reconfiguration Approach. IEEE Transactions on Parallel and Distributed Systems 32, 7 (2021), 1854--1865.

[10]

M. Fan et al. 2021. Rethinking BiSeNet For Real-time Semantic Segmentation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Virtual, 9711--9720.

[11]

N. P. Jouppi et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proc. ACM/IEEE Annual International Symposium on Computer Architecture (ISCA). Toronto, ON, Canada, 1--12.

Digital Library

[12]

Pool J, Yu C. 2021. Channel Permutations for N: M Sparsity. Advances in Neural Information Processing Systems 34 (2021).

[13]

Q. Chen, Y. Wang, T. Yang, X. Zhang, J. Cheng and J. Sun. 2021. You Only Look One-level Feature. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Virtual, 13034--13043.

[14]

Sun W, Zhou A, Stuijk S, et al. 2021. DominoSearch: Find layer-wise fine-grained N: M sparse schemes from dense neural networks. Advances in Neural Information Processing Systems 34 (2021).

[15]

T. Chen, B. Ji, T. Ding, and B. Fang, et al. 2021. Only Train Once: A One-Shot Neural Network Training And Pruning Framework. arXiv:arXiv:2107.07467

[16]

T. Yuan, W. Liu, J. Han and F. Lombardi. 2021. High Performance CNN Accelerators Based on Hardware and Algorithm Co-Optimization. IEEE Transactions on Circuits and Systems I: Regular Papers 68, 1 (2021), 250--263.

[17]

W. Huang et al. 2021. FPGA-Based High-Throughput CNN Hardware Accelerator With High Computing Resource Utilization Ratio. IEEE Transactions on Neural Networks and Learning Systems (2021), 1--15.

[18]

Y. Bengio, N. Léonard and A. Courville. 2013. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation. arXiv:arXiv:1308.3432

[19]

Y. He, P. Liu, Z. Wang, Z. Hu and Y. Yang. 2019. Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA, 4335--4344.

[20]

Y. Ma, Y. Cao, S. Vrudhula and J. Seo. 2020. Automatic Compilation of Diverse CNNs Onto High-Performance FPGA Accelerators. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 2 (2020), 424--437.

[21]

Y. Yu, C. Wu, T. Zhao, K. Wang and L. He. 2020. OPU: An FPGA-Based Overlay Processor for Convolutional Neural Networks. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 28, 1 (2020), 35--47.

[22]

Yao Z, Cao S, Xiao W, et al. 2019. Balanced sparsity for efficient dnn inference on gpu. Proceedings of the AAAI Conference on Artificial Intelligence 33, 1 (2019), 5676--5683.

[23]

Z. Tan et al. 2020. PCNN: Pattern-based Fine-Grained Regular Pruning Towards Optimizing CNN Accelerators. In Proc. ACM/IEEE Design Automation Conference (DAC). Virtual, 1--6.

Cited By

Zhang YWang HPan Z(2025)An efficient CNN accelerator for pattern-compressed sparse neural networks on FPGANeurocomputing10.1016/j.neucom.2024.128700611(128700)Online publication date: Jan-2025
https://doi.org/10.1016/j.neucom.2024.128700
Yu TWu BChen KZhang GLiu W(2024)AttnACQ: Attentioned-AutoCorrelation-Based Query for Hyperdimensional Associative MemoryIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2024.343456271:12(4984-4988)Online publication date: Dec-2024
https://doi.org/10.1109/TCSII.2024.3434562
Wu BYu TChen KLiu W(2024)Edge-Side Fine-Grained Sparse CNN Accelerator With Efficient Dynamic Pruning SchemeIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2023.334741771:3(1285-1298)Online publication date: Mar-2024
https://doi.org/10.1109/TCSI.2023.3347417
Show More Cited By

Index Terms

Data Stream Oriented Fine-grained Sparse CNN Accelerator with Efficient Unstructured Pruning Strategy
1. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs

Recommendations

An FPGA-based Fine Tuning Accelerator for a Sparse CNN
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Fine-tuning learns abundant feature expression for a wide range of natural images by using a pre-trained CNN model. It can be applied to a wide range of the neural network (NN)based computer vision problems. This paper proposes an FPGA-based fine-tuning ...
VLCP: A High-Performance FPGA-based CNN Accelerator with Vector-level Cluster Pruning
NANOARCH '23: Proceedings of the 18th ACM International Symposium on Nanoscale Architectures

Convolutional neural networks (CNNs) are widely used in computer vision, natural language processing, and other application scenarios. But deploying CNNs at the edge is challenging due to their large number of parameters. Pruning is a solution that can ...
Dynamic CNN Accelerator Supporting Efficient Filter Generator with Kernel Enhancement and Online Channel Pruning
ASPDAC '22: Proceedings of the 27th Asia and South Pacific Design Automation Conference

Deep neural network achieves exciting performance in several tasks with heavy storing and computing costs. Previous works adopt pruning-based methods to slim deep network. For traditional pruning, either the convolution kernel or the network inference is ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

GLSVLSI '22: Proceedings of the Great Lakes Symposium on VLSI 2022

June 2022

560 pages

ISBN:9781450393225

DOI:10.1145/3526241

General Chairs:
Ioannis Savidis
Drexel University, USA
,
Avesta Sasan
University of California, Davis, USA
,
Program Chairs:
Himanshu Thapliyal
University of Tennessee, Knoxville, USA
,
Ronald F. DeMara
University of Central Florida, USA

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

GLSVLSI '22

Sponsor:

SIGDA

GLSVLSI '22: Great Lakes Symposium on VLSI 2022

June 6 - 8, 2022

CA, Irvine, USA

Acceptance Rates

Overall Acceptance Rate 312 of 1,156 submissions, 27%

Upcoming Conference

GLSVLSI '25

Sponsor:
sigda

Great Lakes Symposium on VLSI 2025

June 30 - July 2, 2025

New Orleans , LA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
222
Total Downloads

Downloads (Last 12 months)44
Downloads (Last 6 weeks)6

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang YWang HPan Z(2025)An efficient CNN accelerator for pattern-compressed sparse neural networks on FPGANeurocomputing10.1016/j.neucom.2024.128700611(128700)Online publication date: Jan-2025
https://doi.org/10.1016/j.neucom.2024.128700
Yu TWu BChen KZhang GLiu W(2024)AttnACQ: Attentioned-AutoCorrelation-Based Query for Hyperdimensional Associative MemoryIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2024.343456271:12(4984-4988)Online publication date: Dec-2024
https://doi.org/10.1109/TCSII.2024.3434562
Wu BYu TChen KLiu W(2024)Edge-Side Fine-Grained Sparse CNN Accelerator With Efficient Dynamic Pruning SchemeIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2023.334741771:3(1285-1298)Online publication date: Mar-2024
https://doi.org/10.1109/TCSI.2023.3347417
Yu TWu BChen KZhang GLiu W(2024)Memristor-Based Approximate Query Architecture for In-Memory Hyperdimensional ComputingIEEE Transactions on Computers10.1109/TC.2024.344186173:11(2605-2618)Online publication date: Nov-2024
https://doi.org/10.1109/TC.2024.3441861
Yu TWu BChen KZhang GLiu W(2024)Fully Learnable Hyperdimensional Computing Framework With Ultratiny Accelerator for Edge-Side ApplicationsIEEE Transactions on Computers10.1109/TC.2023.333731673:2(574-585)Online publication date: Feb-2024
https://doi.org/10.1109/TC.2023.3337316
Wu BZhu HYu TLiu W(2023)Toward Spintronics Non-volatile Computing-in-Memory ArchitectureDesign and Applications of Emerging Computer Systems10.1007/978-3-031-42478-6_3(67-89)Online publication date: 17-Aug-2023
https://doi.org/10.1007/978-3-031-42478-6_3

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten