skip to main content
10.1145/3581783.3611741acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Finding Efficient Pruned Network via Refined Gradients for Pruned Weights

Published: 27 October 2023 Publication History

Abstract

With the growth of deep neural networks (DNN), the number of DNN parameters has drastically increased. This makes DNN models hard to be deployed on resource-limited embedded systems. To alleviate this problem, dynamic pruning methods have emerged, which try to find diverse sparsity patterns during training by utilizing Straight-Through-Estimator (STE) to approximate gradients of pruned weights. STE can help the pruned weights revive in the process of finding dynamic sparsity patterns. However, using these coarse gradients causes training instability and performance degradation owing to the unreliable gradient signal of the STE approximation. In this work, to tackle this issue, we introduce refined gradients to update the pruned weights by forming dual forwarding paths from two sets (pruned and unpruned) of weights. We propose a novel Dynamic Collective Intelligence Learning (DCIL) which makes use of the learning synergy between the collective intelligence of both weight sets. We verify the usefulness of the refined gradients by showing enhancements in the training stability and the model performance on the CIFAR and ImageNet datasets. DCIL outperforms various previously proposed pruning schemes including other dynamic pruning methods with enhanced stability during training. The code is provided in Github.

References

[1]
Yoshua Bengio, Nicholas Léonard, and Aaron Courville. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013).
[2]
Tim Dettmers and Luke Zettlemoyer. 2019. Sparse networks from scratch: Faster training without losing performance. arXiv preprint arXiv:1907.04840 (2019).
[3]
Xuanyi Dong, Junshi Huang, Yi Yang, and Shuicheng Yan. 2017. More Is Less: A More Complicated Network With Less Inference Complexity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4]
Utku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, and Erich Elsen. 2020. Rigging the lottery: Making all tickets winners. In International Conference on Machine Learning. PMLR, 2943--2952.
[5]
Yiwen Guo, Anbang Yao, and Yurong Chen. 2016. Dynamic Network Surgery for Efficient DNNs. Advances in Neural Information Processing Systems 29 (2016), 1379--1387.
[6]
Song Han, Jeff Pool, John Tran, and William J Dally. 2015. Learning both weights and connections for efficient neural networks. Advances in Neural Information Processing Systems (2015).
[7]
Babak Hassibi and David G Stork. 1993. Second order derivatives for network pruning: Optimal brain surgeon. Morgan Kaufmann.
[8]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[9]
Yang He, Guoliang Kang, Xuanyi Dong, Yanwei Fu, and Yi Yang. 2018. Soft filter pruning for accelerating deep convolutional neural networks. International Joint Conference on Artificial Intelligence (IJCAI) (2018).
[10]
Yang He, Ping Liu, Ziwei Wang, Zhilan Hu, and Yi Yang. 2019. Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11]
Yihui He, Xiangyu Zhang, and Jian Sun. 2017. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE international conference on computer vision. 1389--1397.
[12]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
[13]
Jangho Kim, Seonguk Park, and Nojun Kwak. 2018. Paraphrasing Complex Network: Network Compression via Factor Transfer. In Advances in Neural Information Processing Systems, Vol. 31.
[14]
Jangho Kim, KiYoon Yoo, and Nojun Kwak. 2020. Position-based scaled gradient for model quantization and pruning. Advances in neural information processing systems 33 (2020), 20415--20426.
[15]
Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).
[16]
Yann LeCun, John S Denker, and Sara A Solla. 1990. Optimal brain damage. In Advances in neural information processing systems. 598--605.
[17]
Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2017. Pruning filters for efficient convnets. International Conference on Learning Representations (2017).
[18]
Tao Lin, Sebastian U Stich, Luis Barba, Daniil Dmitriev, and Martin Jaggi. 2020. Dynamic model pruning with feedback. InternationalConference on Learning Representations (2020).
[19]
Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, and Trevor Darrell. 2019. Rethinking the value of network pruning. International Conference on Learning Representations (2019).
[20]
Decebal Constantin Mocanu, Elena Mocanu, Peter Stone, Phuong H Nguyen, Madeleine Gibescu, and Antonio Liotta. 2018. Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nature communications 9, 1 (2018), 1--12.
[21]
Hesham Mostafa and Xin Wang. 2019. Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization. In International Conference on Machine Learning. PMLR, 4646--4655.
[22]
Alexandra Peste, Eugenia Iofinova, Adrian Vladu, and Dan Alistarh. 2021. Ac/dc: Alternating compressed/decompressed training of deep neural networks. Advances in Neural Information Processing Systems 34 (2021), 8557--8570.
[23]
Alex Renda, Jonathan Frankle, and Michael Carbin. 2020. Comparing rewinding and fine-tuning in neural network pruning. International Conference on Learning Representations (2020).
[24]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211--252. https: //doi.org/10.1007/s11263-015-0816-y
[25]
Sidak Pal Singh and Dan Alistarh. 2020. Woodfisher: Efficient second-order approximation for neural network compression. Advances in Neural Information Processing Systems 33 (2020), 18098--18109.
[26]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research 15, 56 (2014), 1929--1958. http://jmlr.org/papers/v15/srivastava14a.html
[27]
Xia Xiao and Zigeng Wang. 2019. Autoprune: Automatic network pruning by regularizing auxiliary parameters. Advances in Neural Information Processing Systems 32 (NeurIPS 2019) 32 (2019).
[28]
Geng Yuan, Xiaolong Ma, Wei Niu, Zhengang Li, Zhenglun Kong, Ning Liu, Yifan Gong, Zheng Zhan, Chaoyang He, Qing Jin, et al. 2021. Mest: Accurate and fast memory-economic sparse training framework on the edge. Advances in Neural Information Processing Systems 34 (2021), 20838--20850.
[29]
Sergey Zagoruyko and Nikos Komodakis. 2016. Wide residual networks. The British Machine Vision Conference (BMVC) (2016).
[30]
Michael Zhu and Suyog Gupta. 2017. To prune, or not to prune: exploring the efficacy of pruning for model compression. arXiv preprint arXiv:1710.01878 (2017).
[31]
Max Zimmer, Christoph Spiegel, and Sebastian Pokutta. 2023. How I Learned to Stop Worrying and Love Retraining. International Conference on Learning Representations (2023).

Index Terms

  1. Finding Efficient Pruned Network via Refined Gradients for Pruned Weights

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '23: Proceedings of the 31st ACM International Conference on Multimedia
      October 2023
      9913 pages
      ISBN:9798400701085
      DOI:10.1145/3581783
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 October 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. convolutional neural networks
      2. model compression
      3. model pruning
      4. refined gradients

      Qualifiers

      • Research-article

      Conference

      MM '23
      Sponsor:
      MM '23: The 31st ACM International Conference on Multimedia
      October 29 - November 3, 2023
      Ottawa ON, Canada

      Acceptance Rates

      Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 157
        Total Downloads
      • Downloads (Last 12 months)72
      • Downloads (Last 6 weeks)9
      Reflects downloads up to 05 Mar 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media