Skip to main content

Gradient Coupled Flow: Performance Boosting on Network Pruning by Utilizing Implicit Loss Decrease

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14448))

Included in the following conference series:

  • 509 Accesses

Abstract

Network pruning prior to training makes generalization more challenging than ever, while recent studies mainly focus on the trainability of the pruned networks in isolation. This paper explores a new perspective on loss implicit decrease of the data to be trained caused by one-batch training during each round, whose first-order approximation we term gradient coupled flow. We thus present a criterion sensitive to gradient coupled flow (GCS), which is hypothesized to capture those weights most sensitive to performance boosting at initialization. Interestingly, our explorations show there exists a linear correlation between generalization and implicit loss decrease based measurements on previous works as well as GCS, which ideally describes causes of accuracy fluctuation in a fine-grained manner. Our code is made public at: https://github.com/kangxiatao/pruning_before_training.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bellec, G., Kappel, D., Maass, W., Legenstein, R.: Deep rewiring: training very sparse deep networks. CoRR abs/1711.05136 (2017). https://openreview.net/forum?id=BJ_wN01C-

  2. Chen, T., et al.: The lottery ticket hypothesis for pre-trained BERT networks. arXiv preprint arXiv:2007.12223 (2020). https://proceedings.neurips.cc/paper/2020/hash/b6af2c9703f203a2794be03d443af2e3-Abstract.html

  3. Cho, M., Joshi, A., Hegde, C.: ESPN: extremely sparse pruned networks. CoRR abs/2006.15741 (2020). https://arxiv.org/abs/2006.15741

  4. Desai, S., Zhan, H., Aly, A.: Evaluating lottery tickets under distributional shifts. EMNLP-IJCNLP 2019, p. 153 (2019). https://doi.org/10.18653/v1/D19-6117

  5. Dettmers, T., Zettlemoyer, L.: Sparse networks from scratch: faster training without losing performance. CoRR abs/1907.04840 (2019). https://openreview.net/forum?id=K9bw7vqp_s

  6. Evci, U., Gale, T., Menick, J., Castro, P.S., Elsen, E.: Rigging the lottery: making all tickets winners. In: ICML, Proceedings of Machine Learning Research, vol. 119, pp. 2943–2952. PMLR (2020). http://proceedings.mlr.press/v119/evci20a.html

  7. Frankle, J., Carbin, M.: The lottery ticket hypothesis: finding sparse, trainable neural networks. In: ICLR (2019). https://openreview.net/forum?id=rJl-b3RcF7

  8. I Frankle, J., Dziugaite, G.K., Roy, D.M., Carbin, M.: Stabilizing the lottery ticket hypothesis. arXiv preprint arXiv:1903.01611 (2019). https://doi.org/10.48550/arXiv.1903.01611

  9. Gale, T., Elsen, E., Hooker, S.: The state of sparsity in deep neural networks. CoRR abs/1902.09574 (2019). http://arxiv.org/abs/1902.09574

  10. Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv preprint arXiv:1510.00149 (2015)

  11. Hayou, S., Ton, J., Doucet, A., Teh, Y.W.: Robust pruning at initialization. In: ICLR (2021). https://openreview.net/forum?id=vXj_ucZQ4hA

  12. de Jorge, P., Sanyal, A., Behl, H.S., Torr, P.H.S., Rogez, G., Dokania, P.K.: Progressive skeletonization: trimming more fat from a network at initialization. In: ICLR (2021). https://openreview.net/forum?id=9GsFOUyUPi

  13. LeCun, Y., Denker, J.S., Solla, S.A.: Optimal brain damage. In: Advances in Neural Information Processing Systems, pp. 598–605 (1990). http://papers.nips.cc/paper/250-optimal-brain-damage

  14. Lee, N., Ajanthan, T., Gould, S., Torr, P.H.S.: A signal propagation perspective for pruning neural networks at initialization. In: ICLR (2020). https://openreview.net/forum?id=HJeTo2VFwH

  15. Lee, N., Ajanthan, T., Torr, P.H.S.: Snip: single-shot network pruning based on connection sensitivity. In: ICLR (Poster) (2019). https://openreview.net/forum?id=B1VZqjAcYX

  16. Liu, S., Yin, L., Mocanu, D.C., Pechenizkiy, M.: Do we actually need dense over-parameterization? In-time over-parameterization in sparse training (2021). https://doi.org/10.48550/ARXIV.2102.02887, https://arxiv.org/abs/2102.02887

  17. Liu, T., Zenke, F.: Finding trainable sparse networks through neural tangent transfer. In: ICML, Proceedings of Machine Learning Research, vol. 119, pp. 6336–6347. PMLR (2020). http://proceedings.mlr.press/v119/liu20o.html

  18. Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C.: Learning efficient convolutional networks through network slimming. In: ICCV, pp. 2755–2763. IEEE Computer Society (2017). https://doi.org/10.1109/ICCV.2017.298

  19. Liu, Z., Sun, M., Zhou, T., Huang, G., Darrell, T.: Rethinking the value of network pruning. In: ICLR (Poster) (2019). https://openreview.net/forum?id=rJlnB3C5Ym

  20. Malach, E., Yehudai, G., Shalev-Shwartz, S., Shamir, O.: Proving the lottery ticket hypothesis: pruning is all you need. In: ICML, Proceedings of Machine Learning Research, vol. 119, pp. 6682–6691. PMLR (2020). http://proceedings.mlr.press/v119/

  21. Mocanu, D.C., Mocanu, E., Stone, P., Nguyen, P.H., Gibescu, M., Liotta, A.: Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nat. Commun. 9(1), 1–12 (2018). https://doi.org/10.1038%2Fs41467-018-04316-3

  22. Morcos, A.S., Yu, H., Paganini, M., Tian, Y.: One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers. In: NeurIPS, pp. 4933–4943 (2019). https://proceedings.neurips.cc/paper/2019/hash/a4613e8d72a61b3b69b32d040f89ad81-Abstract.html

  23. Mostafa, H., Wang, X.: Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization. In: ICML, Proceedings of Machine Learning Research, vol. 97, pp. 4646–4655. PMLR (2019). http://proceedings.mlr.press/v97/mostafa19a.html

  24. Nowlan, S.J., Hinton, G.E.: Simplifying neural networks by soft weight-sharing. Neural Comput. 4(4), 473–493 (1992). https://doi.org/10.1162/neco.1992.4.4.473

    Article  Google Scholar 

  25. Orseau, L., Hutter, M., Rivasplata, O.: Logarithmic pruning is all you need. In: NeurIPS (2020). https://proceedings.neurips.cc/paper/2020/hash/1e9491470749d5b0e361ce4f0b24d037-Abstract.html

  26. Tanaka, H., Kunin, D., Yamins, D.L., Ganguli, S.: Pruning neural networks without any data by iteratively conserving synaptic flow. In: NeurIPS (2020). https://proceedings.neurips.cc/paper/2020/hash/46a4378f835dc8040c8057beb6a2da52-Abstract.html

  27. Verdenius, S., Stol, M., Forré, P.: Pruning via iterative ranking of sensitivity statistics. CoRR abs/2006.00896 (2020). https://arxiv.org/abs/2006.00896

  28. Vysogorets, A., Kempe, J.: Connectivity matters: neural network pruning through the lens of effective sparsity (2021). https://doi.org/10.48550/ARXIV.2107.02306, https://arxiv.org/abs/2107.02306

  29. Wang, C., Zhang, G., Grosse, R.B.: Picking winning tickets before training by preserving gradient flow. In: ICLR (2020). https://openreview.net/forum?id=SkgsACVKPH

  30. You, H., et al.: Drawing early-bird tickets: towards more efficient training of deep networks. CoRR abs/1909.11957 (2019). http://arxiv.org/abs/1909.11957

  31. Zhang, Z., Chen, X., Chen, T., Wang, Z.: Efficient lottery ticket finding: less data is more. In: ICML, Proceedings of Machine Learning Research, vol. 139, pp. 12380–12390. PMLR (2021). http://proceedings.mlr.press/v139/zhang21c.html

  32. Zhou, H., Lan, J., Liu, R., Yosinski, J.: Deconstructing lottery tickets: zeros, signs, and the supermask. In: NeurIPS, pp. 3592–3602 (2019), https://proceedings.neurips.cc/paper/2019/hash/1113d7a76ffceca1bb350bfe145467c6-Abstract.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiatao Kang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, J., Kang, X., Xiao, J., Yao, J. (2024). Gradient Coupled Flow: Performance Boosting on Network Pruning by Utilizing Implicit Loss Decrease. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Lecture Notes in Computer Science, vol 14448. Springer, Singapore. https://doi.org/10.1007/978-981-99-8082-6_18

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8082-6_18

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8081-9

  • Online ISBN: 978-981-99-8082-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics