research-article

JointNF: Enhancing DNN Performance through Adaptive N:M Pruning across both Weight and Activation

Authors:

Sai Qian Zhang,

David BrooksAuthors Info & Claims

ISLPED '24: Proceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design

Pages 1 - 6

https://doi.org/10.1145/3665314.3670813

Published: 09 September 2024 Publication History

Abstract

Balancing accuracy and hardware efficiency remains a challenge with traditional pruning methods. N:M sparsity is a recent approach offering a compromise, allowing up to N non-zero weights in a group of M consecutive weights. However, N:M pruning enforces a uniform sparsity level of N/M across all layers, which does not align well sparse nature of deep neural networks (DNNs). To achieve a more flexible sparsity pattern and a higher overall sparsity level, we present JointNF, a novel joint N:M and structured pruning algorithm to enable fine-grained structured pruning with adaptive sparsity levels across the DNN layers. Moreover, we show for the first time that N:M pruning can also be applied over the input activation for further performance enhancement.

References

[1]

2020. NVIDIA A100. https://www.nvidia.com/en-us/data-center/al00/.

[2]

Alexey Dosovitskiy et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).

[3]

Jonathan Frankle and Michael Carbin. 2018. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635 (2018).

[4]

Song Han and et al. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. ICLR (2016).

[5]

Kaiming He and et al. 2017. Mask r-cnn. In Proceedings of ICCV.

[6]

Yihui He and et al. 2017. Channel pruning for accelerating very deep neural networks. In ICCV.

[7]

Yang He, Ping Liu, and Ziwei Wang. 2019. Filter pruning via geometric median for deep convolutional neural networks acceleration. In CVPR. 4340--4349.

[8]

Itay Hubara and et al. 2021. Accelerated sparse neural training: A provable and efficient method to find n: m transposable masks. NeurIPS (2021).

[9]

HT Kung et al. 2019. Systolic building block for logic-on-logic 3d-ic implementations of convolutional neural networks. In 2019 IEEE ISCAS. IEEE, 1--5.

[10]

HT Kung, Bradley McDanel, and Sai Qian Zhang. 2018. Adaptive tiling: Applying fixed-size systolic arrays to sparse convolutional neural networks. In 2018 24th International Conference on Pattern Recognition (ICPR). IEEE, 1006--1011.

[11]

HT Kung, Bradley McDanel, and Sai Qian Zhang. 2019. Packing sparse convolutional neural networks for efficient systolic array implementations: Column combining under joint optimization. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 821--834.

Digital Library

[12]

Zhuang Liu and et al. 2017. Learning efficient convolutional networks through network slimming. In ICCV. 2736--2744.

[13]

Jian-Hao Luo et al. 2017. Thinet: A filter level pruning method for deep neural network compression. arXiv preprint arXiv:1707.06342 ( 2017).

[14]

Jiachen Mao et al. 2021. Tprune: Efficient transformer pruning for mobile devices. ACM Transactions on Cyber-Physical Systems (2021).

[15]

Junghun Oh and et al. 2022. Attentive fine-grained structured sparsity for image restoration. In CVPR. 17673--17682.

[16]

Huan Wang, Can Qin, Yulun Zhang, and Yun Fu. 2020. Neural pruning via growing regularization. arXiv preprint arXiv.2012.09243 (2020).

[17]

Wei Wen et al. 2016. Learning structured sparsity in deep neural networks. In Advances in Neural Information Processing Systems.

[18]

Aojun Zhou et al. 2021. Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch. In International Conference on Learning Representations.

Recommendations

Reweighted Alternating Direction Method of Multipliers for DNN weight pruning
Abstract
As Deep Neural Networks (DNNs) continue to grow in complexity and size, leading to a substantial computational burden, weight pruning techniques have emerged as an effective solution. This paper presents a novel method for dynamic regularization-...
Attention-based adaptive structured continuous sparse network pruning
Abstract
Deep neural network models, especially CNNs, have a wide range of applications in many fields, but their high computational power requirements limit the deployment applications in many resource-constrained embedded devices. Pruning techniques ...
Efficient label-free pruning and retraining for Text-VQA Transformers
Abstract
Recent advancements in Scene Text Visual Question Answering (Text-VQA) employ autoregressive Transformers, showing improved performance with larger models and pre-training datasets. Although various pruning frameworks exist to simplify ...
Highlights
- We study a label-free importance score for structured pruning of autoregressive Transformers.
- We propose an adaptive retraining approach for pruned Transformer models of varying sizes.
- Our pruned model achieve up to 60% reduction ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISLPED '24: Proceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design

August 2024

384 pages

ISBN:9798400706882

DOI:10.1145/3665314

Chair:
Pascal Meinerzhagen,
Program Chair:
Kapil Dev,
Program Co-chair:
Jerald Yoo

Copyright © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE CAS
IEEE EDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 September 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ISLPED '24

Sponsor:

SIGDA

ISLPED '24: 29th ACM/IEEE International Symposium on Low Power Electronics and Design

August 5 - 7, 2024

CA, Newport Beach, USA

Acceptance Rates

Overall Acceptance Rate 398 of 1,159 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
123
Total Downloads

Downloads (Last 12 months)123
Downloads (Last 6 weeks)29

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten