skip to main content
10.1145/3665314.3670813acmconferencesArticle/Chapter ViewAbstractPublication PagesislpedConference Proceedingsconference-collections
research-article

JointNF: Enhancing DNN Performance through Adaptive N:M Pruning across both Weight and Activation

Published: 09 September 2024 Publication History

Abstract

Balancing accuracy and hardware efficiency remains a challenge with traditional pruning methods. N:M sparsity is a recent approach offering a compromise, allowing up to N non-zero weights in a group of M consecutive weights. However, N:M pruning enforces a uniform sparsity level of N/M across all layers, which does not align well sparse nature of deep neural networks (DNNs). To achieve a more flexible sparsity pattern and a higher overall sparsity level, we present JointNF, a novel joint N:M and structured pruning algorithm to enable fine-grained structured pruning with adaptive sparsity levels across the DNN layers. Moreover, we show for the first time that N:M pruning can also be applied over the input activation for further performance enhancement.

References

[1]
2020. NVIDIA A100. https://www.nvidia.com/en-us/data-center/al00/.
[2]
Alexey Dosovitskiy et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[3]
Jonathan Frankle and Michael Carbin. 2018. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635 (2018).
[4]
Song Han and et al. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. ICLR (2016).
[5]
Kaiming He and et al. 2017. Mask r-cnn. In Proceedings of ICCV.
[6]
Yihui He and et al. 2017. Channel pruning for accelerating very deep neural networks. In ICCV.
[7]
Yang He, Ping Liu, and Ziwei Wang. 2019. Filter pruning via geometric median for deep convolutional neural networks acceleration. In CVPR. 4340--4349.
[8]
Itay Hubara and et al. 2021. Accelerated sparse neural training: A provable and efficient method to find n: m transposable masks. NeurIPS (2021).
[9]
HT Kung et al. 2019. Systolic building block for logic-on-logic 3d-ic implementations of convolutional neural networks. In 2019 IEEE ISCAS. IEEE, 1--5.
[10]
HT Kung, Bradley McDanel, and Sai Qian Zhang. 2018. Adaptive tiling: Applying fixed-size systolic arrays to sparse convolutional neural networks. In 2018 24th International Conference on Pattern Recognition (ICPR). IEEE, 1006--1011.
[11]
HT Kung, Bradley McDanel, and Sai Qian Zhang. 2019. Packing sparse convolutional neural networks for efficient systolic array implementations: Column combining under joint optimization. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 821--834.
[12]
Zhuang Liu and et al. 2017. Learning efficient convolutional networks through network slimming. In ICCV. 2736--2744.
[13]
Jian-Hao Luo et al. 2017. Thinet: A filter level pruning method for deep neural network compression. arXiv preprint arXiv:1707.06342 ( 2017).
[14]
Jiachen Mao et al. 2021. Tprune: Efficient transformer pruning for mobile devices. ACM Transactions on Cyber-Physical Systems (2021).
[15]
Junghun Oh and et al. 2022. Attentive fine-grained structured sparsity for image restoration. In CVPR. 17673--17682.
[16]
Huan Wang, Can Qin, Yulun Zhang, and Yun Fu. 2020. Neural pruning via growing regularization. arXiv preprint arXiv.2012.09243 (2020).
[17]
Wei Wen et al. 2016. Learning structured sparsity in deep neural networks. In Advances in Neural Information Processing Systems.
[18]
Aojun Zhou et al. 2021. Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch. In International Conference on Learning Representations.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISLPED '24: Proceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design
August 2024
384 pages
ISBN:9798400706882
DOI:10.1145/3665314
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 September 2024

Check for updates

Author Tags

  1. hardware accelerator
  2. pruning
  3. transformer

Qualifiers

  • Research-article

Conference

ISLPED '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 398 of 1,159 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 123
    Total Downloads
  • Downloads (Last 12 months)123
  • Downloads (Last 6 weeks)29
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media