Simultaneous Linear Connectivity of Neural Networks Modulo Permutation

Sharma, Ekansh; Kwok, Devin; Denton, Tom; Roy, Daniel M.; Rolnick, David; Dziugaite, Gintare Karolina

doi:10.1007/978-3-031-70368-3_16

Ekansh Sharma^13,14,
Devin Kwok^15,16,
Tom Denton¹⁷,
Daniel M. Roy^13,14,
David Rolnick^15,16 &
…
Gintare Karolina Dziugaite¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14947))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

641 Accesses

Abstract

Neural networks typically exhibit permutation symmetry, as reordering neurons in each layer does not change the underlying function they compute. These symmetries contribute to the non-convexity of the networks’ loss landscapes, since linearly interpolating between two permuted versions of a trained network tends to encounter a high loss barrier. Recent work has argued that permutation symmetries are the only sources of non-convexity, meaning there are essentially no such barriers between trained networks if they are permuted appropriately. In this work, we refine these arguments into three distinct claims of increasing strength. We show that existing evidence only supports “weak linear connectivity”—that for each pair of networks belonging to a set of SGD solutions, there exist (multiple) permutations that linearly connect it with the other networks. In contrast, the claim “strong linear connectivity”—that for each network, there exists one permutation that simultaneously connects it with the other networks—is both intuitively and practically more desirable. This stronger claim would imply that the loss landscape is convex after accounting for permutation, and enable linear interpolation between three or more independently trained models without increased loss. In this work, we introduce an intermediate claim—that for certain sequences of networks, there exists one permutation that simultaneously aligns matching pairs of networks from these sequences. Specifically, we discover that a single permutation aligns sequences of iteratively trained as well as iteratively pruned networks, meaning that two networks exhibit low loss barriers at each step of their optimization and sparsification trajectories respectively. Finally, we provide the first evidence that strong linear connectivity may be possible under certain conditions, by showing that barriers decrease with increasing network width when interpolating among three networks.

E. Sharma and D. Kwok—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Permutation-Invariant Representation of Neural Networks with Neuron Embeddings

Topological Properties of the Set of Functions Generated by Neural Networks of Fixed Size

Article Open access 14 May 2020

Rewiring Networks for Graph Neural Network Training Using Discrete Geometry

Notes

1.
See Appendix A for details on handling other types of layers.
2.
Weight matching is approximate, as finding the actual optimum is NP-hard [1, 3].
3.
These conditions include sufficient width, and the use of layer normalization.
4.
In fact, a strictly stronger claim is made: for a certain class of networks $\mathcal {F}$, for all $\theta _1 \in \mathcal {F}$, there is a single permutation that can be applied to $\theta _1$ removing the error barrier between the permuted $\theta _1$ and any other network in the class $\mathcal {F}$. Note that this also means that the networks in $\mathcal {F}$ are piece-wise linearly connected before permuting.

References

Ainsworth, S., Hayase, J., Srinivasa, S.: Git re-basin: merging models modulo permutation symmetries. In: The Eleventh International Conference on Learning Representations (2023)
Google Scholar
Akash, A.K., Li, S., Trillos, N.G.: Wasserstein barycenter-based model fusion and linear mode connectivity of neural networks (2022)
Google Scholar
Altschuler, J.M., Boix-Adserà, E.: Wasserstein barycenters are NP-hard to compute. SIAM J. Math. Data Sci. 4(1), 179–203 (2022)
Article MathSciNet Google Scholar
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization (2016)
Google Scholar
Benzing, F., et al.: Random initialisations performing above chance and how to find them. In: OPT 2022: Optimization for Machine Learning (NeurIPS 2022 Workshop) (2022)
Google Scholar
Entezari, R., Sedghi, H., Saukh, O., Neyshabur, B.: The role of permutation invariance in linear mode connectivity of neural networks. In: International Conference on Learning Representations (2022)
Google Scholar
Frankle, J., Carbin, M.: The lottery ticket hypothesis: finding sparse, trainable neural networks. In: International Conference on Learning Representations (2019)
Google Scholar
Frankle, J., Dziugaite, G.K., Roy, D., Carbin, M.: Linear mode connectivity and the lottery ticket hypothesis. In: Proceedings of the 37th International Conference on Machine Learning, vol. 119, pp. 3259–3269. PMLR (2020)
Google Scholar
Goodfellow, I.J., Vinyals, O., Saxe, A.M.: Qualitatively characterizing neural network optimization problems (2015)
Google Scholar
Jordan, K., Sedghi, H., Saukh, O., Entezari, R., Neyshabur, B.: REPAIR: REnormalizing permuted activations for interpolation repair. In: The Eleventh International Conference on Learning Representations (2023)
Google Scholar
Li, Y., Yosinski, J., Clune, J., Lipson, H., Hopcroft, J.: Convergent learning: Do different neural networks learn the same representations? In: Proceedings of the 1st International Workshop on Feature Extraction: Modern Questions and Challenges at NIPS 2015, vol. 44, pp. 196–212. PMLR (2015)
Google Scholar
Lucas, J.R., Bae, J., Zhang, M.R., Fort, S., Zemel, R., Grosse, R.B.: On monotonic linear interpolation of neural network parameters. In: Proceedings of the 38th International Conference on Machine Learning, vol. 139, pp. 7168–7179. PMLR (2021)
Google Scholar
Nagarajan, V., Kolter, J.Z.: Uniform convergence may be unable to explain generalization in deep learning. In: Advances in Neural Information Processing Systems, vol. 32, pp. 11615–11626 (2019)
Google Scholar
O’Neill, J., V. Steeg, G., Galstyan, A.: Layer-wise neural network compression via layer fusion. In: Proceedings of the 13th Asian Conference on Machine Learning, vol. 157, pp. 1381–1396. PMLR (2021)
Google Scholar
Paul, M., Chen, F., Larsen, B.W., Frankle, J., Ganguli, S., Dziugaite, G.K.: Unmasking the lottery ticket hypothesis: What’s encoded in a winning ticket’s mask? In: The Eleventh International Conference on Learning Representations (2023)
Google Scholar
Paul, M., Larsen, B., Ganguli, S., Frankle, J., Dziugaite, G.K.: Lottery tickets on a data diet: Finding initializations with sparse trainable networks. In: Advances in Neural Information Processing Systems, vol. 35, pp. 18916–18928 (2022)
Google Scholar
Peña, F.A.G., Medeiros, H.R., Dubail, T., Aminbeidokhti, M., Granger, E., Pedersoli, M.: Re-basin via implicit Sinkhorn differentiation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20237–20246, June 2023
Google Scholar
Raghu, M., Gilmer, J., Yosinski, J., Sohl-Dickstein, J.: SVCCA: singular vector canonical correlation analysis for deep learning dynamics and interpretability. In: Advances in Neural Information Processing Systems, vol. 30, pp. 6076–6085 (2017)
Google Scholar
Simsek, B., Ged, F., Jacot, A., Spadaro, F., Hongler, C., Gerstner, W., Brea, J.: Geometry of the loss landscape in overparameterized neural networks: symmetries and invariances. In: Proceedings of the 38th International Conference on Machine Learning, vol. 139, pp. 9722–9732. PMLR (2021)
Google Scholar
Singh, S.P., Jaggi, M.: Model fusion via optimal transport. In: Advances in Neural Information Processing Systems, vol. 33, pp. 22045–22055 (2020)
Google Scholar
Tatro, N., Chen, P.Y., Das, P., Melnyk, I., Sattigeri, P., Lai, R.: Optimizing mode connectivity via neuron alignment. In: Advances in Neural Information Processing Systems, vol. 33, pp. 15300–15311 (2020)
Google Scholar
Vlaar, T.J., Frankle, J.: What can linear interpolation of neural network loss landscapes tell us? In: Proceedings of the 39th International Conference on Machine Learning, vol. 162, pp. 22325–22341. PMLR (2022)
Google Scholar
Wang, H., Yurochkin, M., Sun, Y., Papailiopoulos, D., Khazaeni, Y.: Federated learning with matched averaging. In: International Conference on Learning Representations (2020)
Google Scholar
Wortsman, M., Horton, M.C., Guestrin, C., Farhadi, A., Rastegari, M.: Learning neural network subspaces. In: Proceedings of the 38th International Conference on Machine Learning, vol. 139, pp. 11217–11227. PMLR (2021)
Google Scholar
Yurochkin, M., Agarwal, M., Ghosh, S., Greenewald, K., Hoang, N., Khazaeni, Y.: Bayesian nonparametric federated learning of neural networks. In: Proceedings of the 36th International Conference on Machine Learning, vol. 97, pp. 7252–7261. PMLR (2019)
Google Scholar

Download references

Acknowledgements

The authors would like to thank Tiffany Vlaar and Utku Evci for feedback on a draft and various ideas, as well as Udbhav Bamba for preliminary implementation work. DMR and DR are supported by Canada CIFAR AI Chairs and NSERC Discovery Grants. The authors also acknowledge material support from NVIDIA in the form of computational resources, and are grateful for technical support from the Mila IDT and Vector teams in maintaining the Mila and Vector Compute Clusters. Resources used to prepare this research were provided, in part, by Mila (mila.quebec), the Vector Institute (vectorinstitute.ai), the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute.

Author information

Authors and Affiliations

University of Toronto, Toronto, Canada
Ekansh Sharma & Daniel M. Roy
Vector Institute, Toronto, Canada
Ekansh Sharma & Daniel M. Roy
McGill University, Montreal, Canada
Devin Kwok & David Rolnick
Mila Quebec AI Institute, Montreal, Canada
Devin Kwok & David Rolnick
Google DeepMind, San Francisco, CA, USA
Tom Denton
Google DeepMind, Toronto, Canada
Gintare Karolina Dziugaite

Authors

Ekansh Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Devin Kwok
View author publications
You can also search for this author in PubMed Google Scholar
Tom Denton
View author publications
You can also search for this author in PubMed Google Scholar
Daniel M. Roy
View author publications
You can also search for this author in PubMed Google Scholar
David Rolnick
View author publications
You can also search for this author in PubMed Google Scholar
Gintare Karolina Dziugaite
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Ekansh Sharma , Devin Kwok or Gintare Karolina Dziugaite .

Editor information

Editors and Affiliations

LTCI, Télécom Paris, Palaiseau Cedex, France
Albert Bifet
KU Leuven, Leuven, Belgium
Jesse Davis
Faculty of Informatics, Vytautas Magnus University, Akademija, Lithuania
Tomas Krilavičius
Institute of Computer Science, University of Tartu, Tartu, Estonia
Meelis Kull
Department of Computer Science, Bundeswehr University Munich, Munich, Germany
Eirini Ntoutsi
Dept. of Computer Science, University of Helsinki, Helsinki, Finland
Indrė Žliobaitė

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2161 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sharma, E., Kwok, D., Denton, T., Roy, D.M., Rolnick, D., Dziugaite, G.K. (2024). Simultaneous Linear Connectivity of Neural Networks Modulo Permutation. In: Bifet, A., Davis, J., Krilavičius, T., Kull, M., Ntoutsi, E., Žliobaitė, I. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14947. Springer, Cham. https://doi.org/10.1007/978-3-031-70368-3_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-70368-3_16
Published: 22 August 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70367-6
Online ISBN: 978-3-031-70368-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Simultaneous Linear Connectivity of Neural Networks Modulo Permutation