Gradient Coupled Flow: Performance Boosting on Network Pruning by Utilizing Implicit Loss Decrease

Wu, Jiaying; Kang, Xiatao; Xiao, Jingying; Yao, Jiayi

doi:10.1007/978-981-99-8082-6_18

Jiaying Wu^12,13,
Xiatao Kang¹²,
Jingying Xiao¹² &
…
Jiayi Yao¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14448))

Included in the following conference series:

International Conference on Neural Information Processing

509 Accesses

Abstract

Network pruning prior to training makes generalization more challenging than ever, while recent studies mainly focus on the trainability of the pruned networks in isolation. This paper explores a new perspective on loss implicit decrease of the data to be trained caused by one-batch training during each round, whose first-order approximation we term gradient coupled flow. We thus present a criterion sensitive to gradient coupled flow (GCS), which is hypothesized to capture those weights most sensitive to performance boosting at initialization. Interestingly, our explorations show there exists a linear correlation between generalization and implicit loss decrease based measurements on previous works as well as GCS, which ideally describes causes of accuracy fluctuation in a fine-grained manner. Our code is made public at: https://github.com/kangxiatao/pruning_before_training.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bellec, G., Kappel, D., Maass, W., Legenstein, R.: Deep rewiring: training very sparse deep networks. CoRR abs/1711.05136 (2017). https://openreview.net/forum?id=BJ_wN01C-
Chen, T., et al.: The lottery ticket hypothesis for pre-trained BERT networks. arXiv preprint arXiv:2007.12223 (2020). https://proceedings.neurips.cc/paper/2020/hash/b6af2c9703f203a2794be03d443af2e3-Abstract.html
Cho, M., Joshi, A., Hegde, C.: ESPN: extremely sparse pruned networks. CoRR abs/2006.15741 (2020). https://arxiv.org/abs/2006.15741
Desai, S., Zhan, H., Aly, A.: Evaluating lottery tickets under distributional shifts. EMNLP-IJCNLP 2019, p. 153 (2019). https://doi.org/10.18653/v1/D19-6117
Dettmers, T., Zettlemoyer, L.: Sparse networks from scratch: faster training without losing performance. CoRR abs/1907.04840 (2019). https://openreview.net/forum?id=K9bw7vqp_s
Evci, U., Gale, T., Menick, J., Castro, P.S., Elsen, E.: Rigging the lottery: making all tickets winners. In: ICML, Proceedings of Machine Learning Research, vol. 119, pp. 2943–2952. PMLR (2020). http://proceedings.mlr.press/v119/evci20a.html
Frankle, J., Carbin, M.: The lottery ticket hypothesis: finding sparse, trainable neural networks. In: ICLR (2019). https://openreview.net/forum?id=rJl-b3RcF7
I Frankle, J., Dziugaite, G.K., Roy, D.M., Carbin, M.: Stabilizing the lottery ticket hypothesis. arXiv preprint arXiv:1903.01611 (2019). https://doi.org/10.48550/arXiv.1903.01611
Gale, T., Elsen, E., Hooker, S.: The state of sparsity in deep neural networks. CoRR abs/1902.09574 (2019). http://arxiv.org/abs/1902.09574
Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv preprint arXiv:1510.00149 (2015)
Hayou, S., Ton, J., Doucet, A., Teh, Y.W.: Robust pruning at initialization. In: ICLR (2021). https://openreview.net/forum?id=vXj_ucZQ4hA
de Jorge, P., Sanyal, A., Behl, H.S., Torr, P.H.S., Rogez, G., Dokania, P.K.: Progressive skeletonization: trimming more fat from a network at initialization. In: ICLR (2021). https://openreview.net/forum?id=9GsFOUyUPi
LeCun, Y., Denker, J.S., Solla, S.A.: Optimal brain damage. In: Advances in Neural Information Processing Systems, pp. 598–605 (1990). http://papers.nips.cc/paper/250-optimal-brain-damage
Lee, N., Ajanthan, T., Gould, S., Torr, P.H.S.: A signal propagation perspective for pruning neural networks at initialization. In: ICLR (2020). https://openreview.net/forum?id=HJeTo2VFwH
Lee, N., Ajanthan, T., Torr, P.H.S.: Snip: single-shot network pruning based on connection sensitivity. In: ICLR (Poster) (2019). https://openreview.net/forum?id=B1VZqjAcYX
Liu, S., Yin, L., Mocanu, D.C., Pechenizkiy, M.: Do we actually need dense over-parameterization? In-time over-parameterization in sparse training (2021). https://doi.org/10.48550/ARXIV.2102.02887, https://arxiv.org/abs/2102.02887
Liu, T., Zenke, F.: Finding trainable sparse networks through neural tangent transfer. In: ICML, Proceedings of Machine Learning Research, vol. 119, pp. 6336–6347. PMLR (2020). http://proceedings.mlr.press/v119/liu20o.html
Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C.: Learning efficient convolutional networks through network slimming. In: ICCV, pp. 2755–2763. IEEE Computer Society (2017). https://doi.org/10.1109/ICCV.2017.298
Liu, Z., Sun, M., Zhou, T., Huang, G., Darrell, T.: Rethinking the value of network pruning. In: ICLR (Poster) (2019). https://openreview.net/forum?id=rJlnB3C5Ym
Malach, E., Yehudai, G., Shalev-Shwartz, S., Shamir, O.: Proving the lottery ticket hypothesis: pruning is all you need. In: ICML, Proceedings of Machine Learning Research, vol. 119, pp. 6682–6691. PMLR (2020). http://proceedings.mlr.press/v119/
Mocanu, D.C., Mocanu, E., Stone, P., Nguyen, P.H., Gibescu, M., Liotta, A.: Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nat. Commun. 9(1), 1–12 (2018). https://doi.org/10.1038%2Fs41467-018-04316-3
Morcos, A.S., Yu, H., Paganini, M., Tian, Y.: One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers. In: NeurIPS, pp. 4933–4943 (2019). https://proceedings.neurips.cc/paper/2019/hash/a4613e8d72a61b3b69b32d040f89ad81-Abstract.html
Mostafa, H., Wang, X.: Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization. In: ICML, Proceedings of Machine Learning Research, vol. 97, pp. 4646–4655. PMLR (2019). http://proceedings.mlr.press/v97/mostafa19a.html
Nowlan, S.J., Hinton, G.E.: Simplifying neural networks by soft weight-sharing. Neural Comput. 4(4), 473–493 (1992). https://doi.org/10.1162/neco.1992.4.4.473
Article Google Scholar
Orseau, L., Hutter, M., Rivasplata, O.: Logarithmic pruning is all you need. In: NeurIPS (2020). https://proceedings.neurips.cc/paper/2020/hash/1e9491470749d5b0e361ce4f0b24d037-Abstract.html
Tanaka, H., Kunin, D., Yamins, D.L., Ganguli, S.: Pruning neural networks without any data by iteratively conserving synaptic flow. In: NeurIPS (2020). https://proceedings.neurips.cc/paper/2020/hash/46a4378f835dc8040c8057beb6a2da52-Abstract.html
Verdenius, S., Stol, M., Forré, P.: Pruning via iterative ranking of sensitivity statistics. CoRR abs/2006.00896 (2020). https://arxiv.org/abs/2006.00896
Vysogorets, A., Kempe, J.: Connectivity matters: neural network pruning through the lens of effective sparsity (2021). https://doi.org/10.48550/ARXIV.2107.02306, https://arxiv.org/abs/2107.02306
Wang, C., Zhang, G., Grosse, R.B.: Picking winning tickets before training by preserving gradient flow. In: ICLR (2020). https://openreview.net/forum?id=SkgsACVKPH
You, H., et al.: Drawing early-bird tickets: towards more efficient training of deep networks. CoRR abs/1909.11957 (2019). http://arxiv.org/abs/1909.11957
Zhang, Z., Chen, X., Chen, T., Wang, Z.: Efficient lottery ticket finding: less data is more. In: ICML, Proceedings of Machine Learning Research, vol. 139, pp. 12380–12390. PMLR (2021). http://proceedings.mlr.press/v139/zhang21c.html
Zhou, H., Lan, J., Liu, R., Yosinski, J.: Deconstructing lottery tickets: zeros, signs, and the supermask. In: NeurIPS, pp. 3592–3602 (2019), https://proceedings.neurips.cc/paper/2019/hash/1113d7a76ffceca1bb350bfe145467c6-Abstract.html

Download references

Author information

Authors and Affiliations

School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha, China
Jiaying Wu, Xiatao Kang, Jingying Xiao & Jiayi Yao
Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha, China
Jiaying Wu

Authors

Jiaying Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xiatao Kang
View author publications
You can also search for this author in PubMed Google Scholar
Jingying Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Jiayi Yao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiatao Kang .

Editor information

Editors and Affiliations

Central South University, Changsha, China
Biao Luo
Chinese Academy of Sciences, Beijing, China
Long Cheng
Zhejiang University, Hangzhou, China
Zheng-Guang Wu
Guangdong University of Technology, Guangzhou, China
Hongyi Li
UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, J., Kang, X., Xiao, J., Yao, J. (2024). Gradient Coupled Flow: Performance Boosting on Network Pruning by Utilizing Implicit Loss Decrease. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Lecture Notes in Computer Science, vol 14448. Springer, Singapore. https://doi.org/10.1007/978-981-99-8082-6_18

Download citation

DOI: https://doi.org/10.1007/978-981-99-8082-6_18
Published: 15 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8081-9
Online ISBN: 978-981-99-8082-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Gradient Coupled Flow: Performance Boosting on Network Pruning by Utilizing Implicit Loss Decrease