skip to main content
10.1145/3508352.3549379acmconferencesArticle/Chapter ViewAbstractPublication PagesiccadConference Proceedingsconference-collections
research-article
Public Access

All-in-One: A Highly Representative DNN Pruning Framework for Edge Devices with Dynamic Power Management

Published:22 December 2022Publication History

ABSTRACT

During the deployment of deep neural networks (DNNs) on edge devices, many research efforts are devoted to the limited hardware resource. However, little attention is paid to the influence of dynamic power management. As edge devices typically only have a budget of energy with batteries (rather than almost unlimited energy support on servers or workstations), their dynamic power management often changes the execution frequency as in the widely-used dynamic voltage and frequency scaling (DVFS) technique. This leads to highly unstable inference speed performance, especially for computation-intensive DNN models, which can harm user experience and waste hardware resources. We firstly identify this problem and then propose All-in-One, a highly representative pruning framework to work with dynamic power management using DVFS. The framework can use only one set of model weights and soft masks (together with other auxiliary parameters of negligible storage) to represent multiple models of various pruning ratios. By re-configuring the model to the corresponding pruning ratio for a specific execution frequency (and voltage), we are able to achieve stable inference speed, i.e., keeping the difference in speed performance under various execution frequencies as small as possible. Our experiments demonstrate that our method not only achieves high accuracy for multiple models of different pruning ratios, but also reduces their variance of inference latency for various frequencies, with minimal memory consumption of only one model and one soft mask.

References

  1. Yoshua Bengio, Nicholas Léonard, and Aaron C. Courville. 2013. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation. CoRR abs/1308.3432 (2013). arXiv:1308.3432 http://arxiv.org/abs/1308.3432Google ScholarGoogle Scholar
  2. Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv:2004.10934 (2020).Google ScholarGoogle Scholar
  3. Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. 2019. Once-for-All: Train One Network and Specialize it for Efficient Deployment. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  4. Ting-Wu Chin, Ari S Morcos, and Diana Marculescu. 2021. Joslim: Joint Widths and Weights Optimization for Slimmable Neural Networks. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 119--134.Google ScholarGoogle Scholar
  5. Peiyan Dong, Siyue Wang, et al. 2020. RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition. arXiv:2002.11474 (2020).Google ScholarGoogle Scholar
  6. Yifan Gong, Geng Yuan, Zheng Zhan, Wei Niu, Zhengang Li, Pu Zhao, Yuxuan Cai, Sijia Liu, Bin Ren, Xue Lin, et al. 2021. Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration. arXiv preprint arXiv:2111.11581 (2021).Google ScholarGoogle Scholar
  7. Yifan Gong, Zheng Zhan, Zhengang Li, Wei Niu, Xiaolong Ma, Wenhao Wang, Bin Ren, Caiwen Ding, Xue Lin, Xiaolin Xu, et al. 2020. A privacy-preserving-oriented dnn pruning and mobile acceleration framework. In Proceedings of the 2020 on Great Lakes Symposium on VLSI. 119--124.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Yushuo Guan, Ning Liu, Pengyu Zhao, Zhengping Che, Kaigui Bian, Yanzhi Wang, and Jian Tang. 2020. DAIS: Automatic Channel Pruning via Differentiable Annealing Indicator Search. arXiv:2011.02166 [cs.CV]Google ScholarGoogle Scholar
  9. Shaopeng Guo, Yujie Wang, Quanquan Li, and Junjie Yan. 2020. Dmcp: Differentiable markov channel pruning for neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1539--1547.Google ScholarGoogle ScholarCross RefCross Ref
  10. Yiwen Guo, Anbang Yao, and Yurong Chen. 2016. Dynamic network surgery for efficient dnns. In NeurIPS. 1379--1387.Google ScholarGoogle Scholar
  11. Song Han, Jeff Pool, et al. 2015. Learning both weights and connections for efficient neural network. In NeurIPS. 1135--1143.Google ScholarGoogle Scholar
  12. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR.Google ScholarGoogle Scholar
  13. Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han. 2018. Amc: Automl for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision (ECCV). 784--800.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Yang He, Ping Liu, et al. 2019. Filter pruning via geometric median for deep convolutional neural networks acceleration. In CVPR.Google ScholarGoogle Scholar
  15. Yihui He, Xiangyu Zhang, and Jian Sun. 2017. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 1389--1397.Google ScholarGoogle ScholarCross RefCross Ref
  16. Tong Jian, Yifan Gong, Zheng Zhan, Runbin Shi, Nasim Soltani, Zifeng Wang, Jennifer G Dy, Kaushik Roy Chowdhury, Yanzhi Wang, and Stratis Ioannidis. 2021. Radio frequency fingerprinting on the edge. IEEE Transactions on Mobile Computing (2021).Google ScholarGoogle Scholar
  17. Zhengang Li, Yifan Gong, Xiaolong Ma, Sijia Liu, Mengshu Sun, Zheng Zhan, Zhenglun Kong, Geng Yuan, and Yanzhi Wang. 2020. SS-Auto: A single-shot, automatic structured weight pruning framework of DNNs with ultra-high efficiency. arXiv preprint arXiv:2001.08839 (2020).Google ScholarGoogle Scholar
  18. Ning Liu, Xiaolong Ma, et al. 2020. AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates. In AAAI.Google ScholarGoogle Scholar
  19. Zhi-Gang Liu and Matthew Mattina. 2019. Learning low-precision neural networks without straight-through estimator (ste). arXiv preprint arXiv:1903.01061 (2019).Google ScholarGoogle Scholar
  20. Xiaolong Ma, Fu-Ming Guo, Wei Niu, Xue Lin, Jian Tang, Kaisheng Ma, Bin Ren, and Yanzhi Wang. 2020. Pconv: The missing but desirable sparsity in dnn weight pruning for real-time execution on mobile devices. In Thirty-Fourth AAAI conference on artificial intelligence (AAAI).Google ScholarGoogle ScholarCross RefCross Ref
  21. Xiaolong Ma, Zhengang Li, Yifan Gong, Tianyun Zhang, Wei Niu, Zheng Zhan, Pu Zhao, Jian Tang, Xue Lin, Bin Ren, et al. 2020. Blk-rew: A unified block-based dnn pruning framework using reweighted regularization method. arXiv preprint arXiv:2001.08357 (2020).Google ScholarGoogle Scholar
  22. Xiaolong Ma, Geng Yuan, Sheng Lin, Caiwen Ding, Fuxun Yu, Tao Liu, Wujie Wen, Xiang Chen, and Yanzhi Wang. 2020. Tiny but Accurate: A Pruned, Quantized and Optimized Memristor Crossbar Framework for Ultra Efficient DNN Implementation. In ASP-DAC.Google ScholarGoogle Scholar
  23. Chuhan Min, Aosen Wang, Yiran Chen, Wenyao Xu, and Xin Chen. 2018. 2PF-PCE: Two-Phase Filter Pruning Based on Conditional Entropy. arXiv preprint arXiv:1809.02220 (2018).Google ScholarGoogle Scholar
  24. Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin, Yanzhi Wang, and Bin Ren. 2020. PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning. arXiv preprint arXiv:2001.00138 (2020).Google ScholarGoogle Scholar
  25. Elvis Nunez, Maxwell Horton, Anish Prabhu, Anurag Ranjan, Ali Farhadi, and Mohammad Rastegari. 2021. LCS: Learning Compressible Subspaces for Adaptive Network Compression at Inference Time. arXiv preprint arXiv:2110.04252 (2021).Google ScholarGoogle Scholar
  26. Yuhong Song, Weiwen Jiang, Bingbing Li, Panjie Qi, Qingfeng Zhuge, Edwin Hsing-Mean Sha, Sakyasingha Dasgupta, Yiyu Shi, and Caiwen Ding. 2021. Dancing along battery: Enabling transformer with run-time reconfigurability on mobile devices. In 2021 58th ACM/IEEE Design Automation Conference (DAC). IEEE, 1003--1008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.Google ScholarGoogle Scholar
  28. Wei Wen, Chunpeng Wu, et al. 2016. Learning structured sparsity in deep neural networks. In NeurIPS. 2074--2082.Google ScholarGoogle Scholar
  29. Yushu Wu, Yifan Gong, Pu Zhao, Yanyu Li, Zheng Zhan, Wei Niu, Hao Tang, Minghai Qin, Bin Ren, and Yanzhi Wang. 2022. Compiler-Aware Neural Architecture Search for On-Mobile Real-time Super-Resolution. arXiv preprint arXiv:2207.12577 (2022).Google ScholarGoogle Scholar
  30. Penghang Yin, Jiancheng Lyu, Shuai Zhang, Stanley Osher, Yingyong Qi, and Jack Xin. 2019. Understanding straight-through estimator in training activation quantized neural nets. arXiv preprint arXiv:1903.05662 (2019).Google ScholarGoogle Scholar
  31. Jiahui Yu and Thomas S Huang. 2019. Universally slimmable networks and improved training techniques. In Proceedings of the IEEE/CVF international conference on computer vision. 1803--1811.Google ScholarGoogle ScholarCross RefCross Ref
  32. Jiahui Yu, Linjie Yang, Ning Xu, Jianchao Yang, and Thomas Huang. 2018. Slimmable neural networks. arXiv preprint arXiv:1812.08928 (2018).Google ScholarGoogle Scholar
  33. Xiyu Yu, Tongliang Liu, Xinchao Wang, and Dacheng Tao. 2017. On compressing deep models by low rank and sparse decomposition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7370--7379.Google ScholarGoogle ScholarCross RefCross Ref
  34. Geng Yuan, Xiaolong Ma, Wei Niu, Zhengang Li, Zhenglun Kong, Ning Liu, Yifan Gong, Zheng Zhan, Chaoyang He, Qing Jin, et al. 2021. Mest: Accurate and fast memory-economic sparse training framework on the edge. Advances in Neural Information Processing Systems 34 (2021), 20838--20850.Google ScholarGoogle Scholar
  35. Zheng Zhan, Yifan Gong, Pu Zhao, Geng Yuan, Wei Niu, Yushu Wu, Tianyun Zhang, Malith Jayaweera, David Kaeli, Bin Ren, et al. 2021. Achieving on-mobile real-time super-resolution with neural architecture and pruning search. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4821--4831.Google ScholarGoogle ScholarCross RefCross Ref
  36. Tianyun Zhang, Xiaolong Ma, Zheng Zhan, Shanglin Zhou, Caiwen Ding, Makan Fardad, and Yanzhi Wang. 2021. A unified dnn weight pruning framework using reweighted optimization methods. In 2021 58th ACM/IEEE Design Automation Conference (DAC). IEEE, 493--498.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Tianyun Zhang, Shaokai Ye, et al. 2018. Systematic Weight Pruning of DNNs using Alternating Direction Method of Multipliers. ECCV (2018).Google ScholarGoogle Scholar
  38. Tianyun Zhang, Kaiqi Zhang, Shaokai Ye, Jian Tang, Wujie Wen, Xue Lin, Makan Fardad, and Yanzhi Wang. 2018. Adam-admm: A unified, systematic framework of structured weight pruning for dnns. arXiv preprint arXiv:1807.11091 (2018).Google ScholarGoogle Scholar
  39. Chenglong Zhao, Bingbing Ni, Jian Zhang, Qiwei Zhao, Wenjun Zhang, and Qi Tian. 2019. Variational Convolutional Neural Network Pruning. In CVPR. 2780--2789.Google ScholarGoogle Scholar
  40. Xiaotian Zhu, Wengang Zhou, and Houqiang Li. 2018. Improving Deep Neural Network Sparsity through Decorrelation Regularization. In IJCAI.Google ScholarGoogle Scholar
  41. Zhuangwei Zhuang, Mingkui Tan, et al. 2018. Discrimination-aware channel pruning for deep neural networks. In NeurIPS. 875--886.Google ScholarGoogle Scholar

Index Terms

  1. All-in-One: A Highly Representative DNN Pruning Framework for Edge Devices with Dynamic Power Management
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design
          October 2022
          1467 pages
          ISBN:9781450392174
          DOI:10.1145/3508352

          Copyright © 2022 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 22 December 2022

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate457of1,762submissions,26%

          Upcoming Conference

          ICCAD '24
          IEEE/ACM International Conference on Computer-Aided Design
          October 27 - 31, 2024
          New York , NY , USA

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader