Journals & Magazines >IEEE Transactions on Neural N... >Volume: 33 Issue: 9

Non-Structured DNN Weight Pruning—Is It Beneficial in Any Platform?

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Large deep neural network (DNN) models pose the key challenge to energy efficiency due to the significantly higher energy consumption of off-chip DRAM accesses than arith...Show More

Metadata

Abstract:

Large deep neural network (DNN) models pose the key challenge to energy efficiency due to the significantly higher energy consumption of off-chip DRAM accesses than arithmetic or SRAM operations. It motivates the intensive research on model compression with two main approaches. Weight pruning leverages the redundancy in the number of weights and can be performed in a non-structured, which has higher flexibility and pruning rate but incurs index accesses due to irregular weights, or structured manner, which preserves the full matrix structure with a lower pruning rate. Weight quantization leverages the redundancy in the number of bits in weights. Compared to pruning, quantization is much more hardware-friendly and has become a “must-do” step for FPGA and ASIC implementations. Thus, any evaluation of the effectiveness of pruning should be on top of quantization. The key open question is, with quantization, what kind of pruning (non-structured versus structured) is most beneficial? This question is fundamental because the answer will determine the design aspects that we should really focus on to avoid the diminishing return of certain optimizations. This article provides a definitive answer to the question for the first time. First, we build ADMM-NN-S by extending and enhancing ADMM-NN, a recently proposed joint weight pruning and quantization framework, with the algorithmic supports for structured pruning, dynamic ADMM regulation, and masked mapping and retraining. Second, we develop a methodology for fair and fundamental comparison of non-structured and structured pruning in terms of both storage and computation efficiency. Our results show that ADMM-NN-S consistently outperforms the prior art: 1) it achieves

$348\times$ ,

$36\times$ , and

$8\times$ overall weight pruning on LeNet-5, AlexNet, and ResNet-50, respectively, with (almost) zero accuracy loss and 2) we demonstrate the first fully binarized (for all layers) DNNs can be lossless in accuracy in many ca...

Published in: IEEE Transactions on Neural Networks and Learning Systems ( Volume: 33, Issue: 9, September 2022)

Page(s): 4930 - 4944

Date of Publication: 18 March 2021

ISSN Information:

PubMed ID: 33735086

DOI: 10.1109/TNNLS.2021.3063265

Funding Agency:

Contents

References is not available for this document.

Non-Structured DNN Weight Pruning—Is It Beneficial in Any Platform?

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Non-Structured DNN Weight Pruning—Is It Beneficial in Any Platform?

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?