skip to main content
10.1145/3526241.3530318acmconferencesArticle/Chapter ViewAbstractPublication PagesglsvlsiConference Proceedingsconference-collections
research-article

Data Stream Oriented Fine-grained Sparse CNN Accelerator with Efficient Unstructured Pruning Strategy

Published: 06 June 2022 Publication History

Abstract

Network pruning can effectively alleviate the excessive parameters and computation issues in CNNs. However, unstructured pruning is not hardware friendly, while structured pruning will result in a significant loss of accuracy. In this paper, an unstructured fine-grained pruning strategy is proposed and achieves a 16X compression ratio with a top-1 accuracy loss of 1.4% for VGG-16. Combined with the proposed hardware-oriented hyperparameter selection method, compression rates of up to 64X can be obtained while fully meeting the edge-side accuracy requirements. Further, a light-weight, high-performance sparse CNN accelerator with modified systolic array is proposed for pruned VGG-16. The experimental results show that compared with the most advanced design, the proposed accelerator can achieve 21 Frames Per Second (FPS) with 3X better power efficiency and 2.19X better calculation density.

Supplementary Material

MP4 File (GLSVLSI22-modfp103.mp4)
Network pruning can effectively alleviate the excessive parameters and computation issues in CNNs. However, unstructured pruning is not hardware friendly, while structured pruning will result in a significant loss of accuracy. In this paper, an unstructured fine-grained pruning strategy is proposed and achieves a 16X compression ratio with a top-1 accuracy loss of 1.4% for VGG-16. Combined with the proposed hardware-oriented hyperparameter selection method, compression rates of up to 64X can be obtained while fully meeting the edge-side accuracy requirements. Further, a light-weight, high-performance sparse CNN accelerator with modified systolic array is proposed for pruned VGG-16. The experimental results show that compared with the most advanced design, the proposed accelerator can achieve 21 Frames Per Second (FPS) with 3X better power efficiency and 2.19X better calculation density.

References

[1]
B. Jacob et al. 2018. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA, 2704--2713.
[2]
D. T. Kwadjo and C. Bobda. 2020. Late Breaking Results: Automated Hardware Generation of CNN Models on FPGAs. In Proc. ACM/IEEE Design Automation Conference (DAC). Virtual, 1--2.
[3]
F. Karimzadeh, N. Cao, B. Crafton, J. Romberg and A. Raychowdhury. 2021. A Hardware-Friendly Approach Towards Sparse Neural Networks Based on LFSR-Generated Pseudo-Random Sequences. IEEE Transactions on Circuits and Systems I: Regular Papers 68, 2 (2021), 751--764.
[4]
H. Mao et al. 2017. Exploring the Granularity of Sparsity in Convolutional Neural Networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Hawaii, USA, 1927--1934.
[5]
Huang, Wenjin and Wu, Huangtao et al. 2018. Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 1 (2018), 35--47.
[6]
J. Deng, W. Dong, R. Socher, L. Li, Kai Li and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Miami Beach, Florida, USA, 248--255.
[7]
K. Simonyan and A. Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556 (2014).
[8]
L. Deng, G. Li, S. Han, L. Shi and Y. Xie. 2020. Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey. Proc. IEEE 108, 4 (2020), 485--532.
[9]
L. Gong, C. Wang, X. Li and X. Zhou. 2021. Improving HW/SW Adaptability for Accelerating CNNs on FPGAs Through A Dynamic/Static Co-Reconfiguration Approach. IEEE Transactions on Parallel and Distributed Systems 32, 7 (2021), 1854--1865.
[10]
M. Fan et al. 2021. Rethinking BiSeNet For Real-time Semantic Segmentation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Virtual, 9711--9720.
[11]
N. P. Jouppi et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proc. ACM/IEEE Annual International Symposium on Computer Architecture (ISCA). Toronto, ON, Canada, 1--12.
[12]
Pool J, Yu C. 2021. Channel Permutations for N: M Sparsity. Advances in Neural Information Processing Systems 34 (2021).
[13]
Q. Chen, Y. Wang, T. Yang, X. Zhang, J. Cheng and J. Sun. 2021. You Only Look One-level Feature. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Virtual, 13034--13043.
[14]
Sun W, Zhou A, Stuijk S, et al. 2021. DominoSearch: Find layer-wise fine-grained N: M sparse schemes from dense neural networks. Advances in Neural Information Processing Systems 34 (2021).
[15]
T. Chen, B. Ji, T. Ding, and B. Fang, et al. 2021. Only Train Once: A One-Shot Neural Network Training And Pruning Framework. arXiv:arXiv:2107.07467
[16]
T. Yuan, W. Liu, J. Han and F. Lombardi. 2021. High Performance CNN Accelerators Based on Hardware and Algorithm Co-Optimization. IEEE Transactions on Circuits and Systems I: Regular Papers 68, 1 (2021), 250--263.
[17]
W. Huang et al. 2021. FPGA-Based High-Throughput CNN Hardware Accelerator With High Computing Resource Utilization Ratio. IEEE Transactions on Neural Networks and Learning Systems (2021), 1--15.
[18]
Y. Bengio, N. Léonard and A. Courville. 2013. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation. arXiv:arXiv:1308.3432
[19]
Y. He, P. Liu, Z. Wang, Z. Hu and Y. Yang. 2019. Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA, 4335--4344.
[20]
Y. Ma, Y. Cao, S. Vrudhula and J. Seo. 2020. Automatic Compilation of Diverse CNNs Onto High-Performance FPGA Accelerators. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 2 (2020), 424--437.
[21]
Y. Yu, C. Wu, T. Zhao, K. Wang and L. He. 2020. OPU: An FPGA-Based Overlay Processor for Convolutional Neural Networks. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 28, 1 (2020), 35--47.
[22]
Yao Z, Cao S, Xiao W, et al. 2019. Balanced sparsity for efficient dnn inference on gpu. Proceedings of the AAAI Conference on Artificial Intelligence 33, 1 (2019), 5676--5683.
[23]
Z. Tan et al. 2020. PCNN: Pattern-based Fine-Grained Regular Pruning Towards Optimizing CNN Accelerators. In Proc. ACM/IEEE Design Automation Conference (DAC). Virtual, 1--6.

Cited By

View all
  • (2025)An efficient CNN accelerator for pattern-compressed sparse neural networks on FPGANeurocomputing10.1016/j.neucom.2024.128700611(128700)Online publication date: Jan-2025
  • (2024)AttnACQ: Attentioned-AutoCorrelation-Based Query for Hyperdimensional Associative MemoryIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2024.343456271:12(4984-4988)Online publication date: Dec-2024
  • (2024)Edge-Side Fine-Grained Sparse CNN Accelerator With Efficient Dynamic Pruning SchemeIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2023.334741771:3(1285-1298)Online publication date: Mar-2024
  • Show More Cited By

Index Terms

  1. Data Stream Oriented Fine-grained Sparse CNN Accelerator with Efficient Unstructured Pruning Strategy

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    GLSVLSI '22: Proceedings of the Great Lakes Symposium on VLSI 2022
    June 2022
    560 pages
    ISBN:9781450393225
    DOI:10.1145/3526241
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 June 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. sparse CNN accelerator and systolic array
    2. unstructured pruning

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    GLSVLSI '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 312 of 1,156 submissions, 27%

    Upcoming Conference

    GLSVLSI '25
    Great Lakes Symposium on VLSI 2025
    June 30 - July 2, 2025
    New Orleans , LA , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)44
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 27 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)An efficient CNN accelerator for pattern-compressed sparse neural networks on FPGANeurocomputing10.1016/j.neucom.2024.128700611(128700)Online publication date: Jan-2025
    • (2024)AttnACQ: Attentioned-AutoCorrelation-Based Query for Hyperdimensional Associative MemoryIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2024.343456271:12(4984-4988)Online publication date: Dec-2024
    • (2024)Edge-Side Fine-Grained Sparse CNN Accelerator With Efficient Dynamic Pruning SchemeIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2023.334741771:3(1285-1298)Online publication date: Mar-2024
    • (2024)Memristor-Based Approximate Query Architecture for In-Memory Hyperdimensional ComputingIEEE Transactions on Computers10.1109/TC.2024.344186173:11(2605-2618)Online publication date: Nov-2024
    • (2024)Fully Learnable Hyperdimensional Computing Framework With Ultratiny Accelerator for Edge-Side ApplicationsIEEE Transactions on Computers10.1109/TC.2023.333731673:2(574-585)Online publication date: Feb-2024
    • (2023)Toward Spintronics Non-volatile Computing-in-Memory ArchitectureDesign and Applications of Emerging Computer Systems10.1007/978-3-031-42478-6_3(67-89)Online publication date: 17-Aug-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media