skip to main content
10.1145/3508352.3549379acmconferencesArticle/Chapter ViewAbstractPublication PagesiccadConference Proceedingsconference-collections
research-article
Public Access

All-in-One: A Highly Representative DNN Pruning Framework for Edge Devices with Dynamic Power Management

Published: 22 December 2022 Publication History

Abstract

During the deployment of deep neural networks (DNNs) on edge devices, many research efforts are devoted to the limited hardware resource. However, little attention is paid to the influence of dynamic power management. As edge devices typically only have a budget of energy with batteries (rather than almost unlimited energy support on servers or workstations), their dynamic power management often changes the execution frequency as in the widely-used dynamic voltage and frequency scaling (DVFS) technique. This leads to highly unstable inference speed performance, especially for computation-intensive DNN models, which can harm user experience and waste hardware resources. We firstly identify this problem and then propose All-in-One, a highly representative pruning framework to work with dynamic power management using DVFS. The framework can use only one set of model weights and soft masks (together with other auxiliary parameters of negligible storage) to represent multiple models of various pruning ratios. By re-configuring the model to the corresponding pruning ratio for a specific execution frequency (and voltage), we are able to achieve stable inference speed, i.e., keeping the difference in speed performance under various execution frequencies as small as possible. Our experiments demonstrate that our method not only achieves high accuracy for multiple models of different pruning ratios, but also reduces their variance of inference latency for various frequencies, with minimal memory consumption of only one model and one soft mask.

References

[1]
Yoshua Bengio, Nicholas Léonard, and Aaron C. Courville. 2013. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation. CoRR abs/1308.3432 (2013). arXiv:1308.3432 http://arxiv.org/abs/1308.3432
[2]
Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv:2004.10934 (2020).
[3]
Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. 2019. Once-for-All: Train One Network and Specialize it for Efficient Deployment. In International Conference on Learning Representations.
[4]
Ting-Wu Chin, Ari S Morcos, and Diana Marculescu. 2021. Joslim: Joint Widths and Weights Optimization for Slimmable Neural Networks. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 119--134.
[5]
Peiyan Dong, Siyue Wang, et al. 2020. RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition. arXiv:2002.11474 (2020).
[6]
Yifan Gong, Geng Yuan, Zheng Zhan, Wei Niu, Zhengang Li, Pu Zhao, Yuxuan Cai, Sijia Liu, Bin Ren, Xue Lin, et al. 2021. Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration. arXiv preprint arXiv:2111.11581 (2021).
[7]
Yifan Gong, Zheng Zhan, Zhengang Li, Wei Niu, Xiaolong Ma, Wenhao Wang, Bin Ren, Caiwen Ding, Xue Lin, Xiaolin Xu, et al. 2020. A privacy-preserving-oriented dnn pruning and mobile acceleration framework. In Proceedings of the 2020 on Great Lakes Symposium on VLSI. 119--124.
[8]
Yushuo Guan, Ning Liu, Pengyu Zhao, Zhengping Che, Kaigui Bian, Yanzhi Wang, and Jian Tang. 2020. DAIS: Automatic Channel Pruning via Differentiable Annealing Indicator Search. arXiv:2011.02166 [cs.CV]
[9]
Shaopeng Guo, Yujie Wang, Quanquan Li, and Junjie Yan. 2020. Dmcp: Differentiable markov channel pruning for neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1539--1547.
[10]
Yiwen Guo, Anbang Yao, and Yurong Chen. 2016. Dynamic network surgery for efficient dnns. In NeurIPS. 1379--1387.
[11]
Song Han, Jeff Pool, et al. 2015. Learning both weights and connections for efficient neural network. In NeurIPS. 1135--1143.
[12]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR.
[13]
Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han. 2018. Amc: Automl for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision (ECCV). 784--800.
[14]
Yang He, Ping Liu, et al. 2019. Filter pruning via geometric median for deep convolutional neural networks acceleration. In CVPR.
[15]
Yihui He, Xiangyu Zhang, and Jian Sun. 2017. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 1389--1397.
[16]
Tong Jian, Yifan Gong, Zheng Zhan, Runbin Shi, Nasim Soltani, Zifeng Wang, Jennifer G Dy, Kaushik Roy Chowdhury, Yanzhi Wang, and Stratis Ioannidis. 2021. Radio frequency fingerprinting on the edge. IEEE Transactions on Mobile Computing (2021).
[17]
Zhengang Li, Yifan Gong, Xiaolong Ma, Sijia Liu, Mengshu Sun, Zheng Zhan, Zhenglun Kong, Geng Yuan, and Yanzhi Wang. 2020. SS-Auto: A single-shot, automatic structured weight pruning framework of DNNs with ultra-high efficiency. arXiv preprint arXiv:2001.08839 (2020).
[18]
Ning Liu, Xiaolong Ma, et al. 2020. AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates. In AAAI.
[19]
Zhi-Gang Liu and Matthew Mattina. 2019. Learning low-precision neural networks without straight-through estimator (ste). arXiv preprint arXiv:1903.01061 (2019).
[20]
Xiaolong Ma, Fu-Ming Guo, Wei Niu, Xue Lin, Jian Tang, Kaisheng Ma, Bin Ren, and Yanzhi Wang. 2020. Pconv: The missing but desirable sparsity in dnn weight pruning for real-time execution on mobile devices. In Thirty-Fourth AAAI conference on artificial intelligence (AAAI).
[21]
Xiaolong Ma, Zhengang Li, Yifan Gong, Tianyun Zhang, Wei Niu, Zheng Zhan, Pu Zhao, Jian Tang, Xue Lin, Bin Ren, et al. 2020. Blk-rew: A unified block-based dnn pruning framework using reweighted regularization method. arXiv preprint arXiv:2001.08357 (2020).
[22]
Xiaolong Ma, Geng Yuan, Sheng Lin, Caiwen Ding, Fuxun Yu, Tao Liu, Wujie Wen, Xiang Chen, and Yanzhi Wang. 2020. Tiny but Accurate: A Pruned, Quantized and Optimized Memristor Crossbar Framework for Ultra Efficient DNN Implementation. In ASP-DAC.
[23]
Chuhan Min, Aosen Wang, Yiran Chen, Wenyao Xu, and Xin Chen. 2018. 2PF-PCE: Two-Phase Filter Pruning Based on Conditional Entropy. arXiv preprint arXiv:1809.02220 (2018).
[24]
Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin, Yanzhi Wang, and Bin Ren. 2020. PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning. arXiv preprint arXiv:2001.00138 (2020).
[25]
Elvis Nunez, Maxwell Horton, Anish Prabhu, Anurag Ranjan, Ali Farhadi, and Mohammad Rastegari. 2021. LCS: Learning Compressible Subspaces for Adaptive Network Compression at Inference Time. arXiv preprint arXiv:2110.04252 (2021).
[26]
Yuhong Song, Weiwen Jiang, Bingbing Li, Panjie Qi, Qingfeng Zhuge, Edwin Hsing-Mean Sha, Sakyasingha Dasgupta, Yiyu Shi, and Caiwen Ding. 2021. Dancing along battery: Enabling transformer with run-time reconfigurability on mobile devices. In 2021 58th ACM/IEEE Design Automation Conference (DAC). IEEE, 1003--1008.
[27]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.
[28]
Wei Wen, Chunpeng Wu, et al. 2016. Learning structured sparsity in deep neural networks. In NeurIPS. 2074--2082.
[29]
Yushu Wu, Yifan Gong, Pu Zhao, Yanyu Li, Zheng Zhan, Wei Niu, Hao Tang, Minghai Qin, Bin Ren, and Yanzhi Wang. 2022. Compiler-Aware Neural Architecture Search for On-Mobile Real-time Super-Resolution. arXiv preprint arXiv:2207.12577 (2022).
[30]
Penghang Yin, Jiancheng Lyu, Shuai Zhang, Stanley Osher, Yingyong Qi, and Jack Xin. 2019. Understanding straight-through estimator in training activation quantized neural nets. arXiv preprint arXiv:1903.05662 (2019).
[31]
Jiahui Yu and Thomas S Huang. 2019. Universally slimmable networks and improved training techniques. In Proceedings of the IEEE/CVF international conference on computer vision. 1803--1811.
[32]
Jiahui Yu, Linjie Yang, Ning Xu, Jianchao Yang, and Thomas Huang. 2018. Slimmable neural networks. arXiv preprint arXiv:1812.08928 (2018).
[33]
Xiyu Yu, Tongliang Liu, Xinchao Wang, and Dacheng Tao. 2017. On compressing deep models by low rank and sparse decomposition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7370--7379.
[34]
Geng Yuan, Xiaolong Ma, Wei Niu, Zhengang Li, Zhenglun Kong, Ning Liu, Yifan Gong, Zheng Zhan, Chaoyang He, Qing Jin, et al. 2021. Mest: Accurate and fast memory-economic sparse training framework on the edge. Advances in Neural Information Processing Systems 34 (2021), 20838--20850.
[35]
Zheng Zhan, Yifan Gong, Pu Zhao, Geng Yuan, Wei Niu, Yushu Wu, Tianyun Zhang, Malith Jayaweera, David Kaeli, Bin Ren, et al. 2021. Achieving on-mobile real-time super-resolution with neural architecture and pruning search. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4821--4831.
[36]
Tianyun Zhang, Xiaolong Ma, Zheng Zhan, Shanglin Zhou, Caiwen Ding, Makan Fardad, and Yanzhi Wang. 2021. A unified dnn weight pruning framework using reweighted optimization methods. In 2021 58th ACM/IEEE Design Automation Conference (DAC). IEEE, 493--498.
[37]
Tianyun Zhang, Shaokai Ye, et al. 2018. Systematic Weight Pruning of DNNs using Alternating Direction Method of Multipliers. ECCV (2018).
[38]
Tianyun Zhang, Kaiqi Zhang, Shaokai Ye, Jian Tang, Wujie Wen, Xue Lin, Makan Fardad, and Yanzhi Wang. 2018. Adam-admm: A unified, systematic framework of structured weight pruning for dnns. arXiv preprint arXiv:1807.11091 (2018).
[39]
Chenglong Zhao, Bingbing Ni, Jian Zhang, Qiwei Zhao, Wenjun Zhang, and Qi Tian. 2019. Variational Convolutional Neural Network Pruning. In CVPR. 2780--2789.
[40]
Xiaotian Zhu, Wengang Zhou, and Houqiang Li. 2018. Improving Deep Neural Network Sparsity through Decorrelation Regularization. In IJCAI.
[41]
Zhuangwei Zhuang, Mingkui Tan, et al. 2018. Discrimination-aware channel pruning for deep neural networks. In NeurIPS. 875--886.

Cited By

View all
  • (2024)LOTUS: learning-based online thermal and latency variation management for two-stage detectors on edge devicesProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3657310(1-6)Online publication date: 23-Jun-2024
  • (2023)Condense: A Framework for Device and Frequency Adaptive Neural Network Models on the Edge2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247713(1-6)Online publication date: 9-Jul-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design
October 2022
1467 pages
ISBN:9781450392174
DOI:10.1145/3508352
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE-EDS: Electronic Devices Society
  • IEEE CAS
  • IEEE CEDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 December 2022

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

ICCAD '22
Sponsor:
ICCAD '22: IEEE/ACM International Conference on Computer-Aided Design
October 30 - November 3, 2022
California, San Diego

Acceptance Rates

Overall Acceptance Rate 457 of 1,762 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)134
  • Downloads (Last 6 weeks)10
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)LOTUS: learning-based online thermal and latency variation management for two-stage detectors on edge devicesProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3657310(1-6)Online publication date: 23-Jun-2024
  • (2023)Condense: A Framework for Device and Frequency Adaptive Neural Network Models on the Edge2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247713(1-6)Online publication date: 9-Jul-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media