Bregman Proximal Gradient Algorithms for Deep Matrix Factorization

Mukkamala, Mahesh Chandra; Westerkamp, Felix; Laude, Emanuel; Cremers, Daniel; Ochs, Peter

doi:10.1007/978-3-030-75549-2_17

Mahesh Chandra Mukkamala¹³,
Felix Westerkamp¹⁴,
Emanuel Laude¹⁴,
Daniel Cremers¹⁴ &
…
Peter Ochs¹³

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12679))

Included in the following conference series:

International Conference on Scale Space and Variational Methods in Computer Vision

1241 Accesses

Abstract

A typical assumption for the convergence of first order optimization methods is the Lipschitz continuity of the gradient of the objective function. However, for many practical applications this assumption is violated. To overcome this issue extensions based on generalized proximity measures, known as Bregman distances, were introduced. This initiated the development of the Bregman Proximal Gradient (BPG) algorithms, which, however, rely on problem dependent Bregman distances. In this paper, we develop Bregman distances for deep matrix factorization problems, which yields a BPG algorithm with theoretical convergence guarantees, while allowing for a constant step size strategy. Moreover, we demonstrate that the algorithms based on the developed Bregman distance outperform their Euclidean counterparts as well as alternating minimization based approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arora, S., Cohen, N., Hu, W., Luo, Y.: Implicit regularization in deep matrix factorization. In: Advances in Neural Information Processing Systems, pp. 7413–7424 (2019)
Google Scholar
Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116(1–2), 5–16 (2009)
Article MathSciNet Google Scholar
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)
Article MathSciNet Google Scholar
Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42(2), 330–348 (2017)
Article MathSciNet Google Scholar
Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003)
Article MathSciNet Google Scholar
Berg, R.V.D., Kipf, T.N., Welling, M.: Graph convolutional matrix completion. arXiv preprint arXiv:1706.02263 (2017)
Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2), 556–572 (2007)
Article MathSciNet Google Scholar
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2013). https://doi.org/10.1007/s10107-013-0701-9
Article MathSciNet MATH Google Scholar
Bolte, J., Sabach, S., Teboulle, M., Vaisbourd, Y.: First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM J. Optim. 28(3), 2131–2151 (2018)
Article MathSciNet Google Scholar
Choromanska, A., Henaff, M., Mathieu, M., Arous, G.B., LeCun, Y.: The loss surfaces of multilayer networks. In: Artificial Intelligence and Statistics, pp. 192–204 (2015)
Google Scholar
Davis, D., Drusvyatskiy, D., MacPhee, K.J.: Stochastic model-based minimization under high-order growth. arxiv preprint arXiv:1807.00255 (2018)
Dragomir, R.A., d’Aspremont, A., Bolte, J.: Quartic first-order methods for low rank minimization. arxiv preprint arXiv:1901.10791 (2019)
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(Jul), 2121–2159 (2011)
MathSciNet MATH Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
MATH Google Scholar
Hanzely, F., Richtárik, P.: Fastest rates for stochastic mirror descent methods. arxiv preprint arXiv:1803.07374 (2018)
Harper, F.M., Konstan, J.A.: The movielens datasets: history and context. ACM Trans. Interact. Intell. Syst. (TIIS) 5(4), 19 (2016)
Google Scholar
Kawaguchi, K.: Deep learning without poor local minima. In: Advances in Neural Information Processing Systems, pp. 586–594 (2016)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arxiv preprint arXiv:1412.6980 (2014)
Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)
Article MathSciNet Google Scholar
Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)
Article Google Scholar
Li, Q., Zhu, Z., Tang, G., Wakin, M.B.: Provable Bregman-divergence based methods for nonconvex and non-lipschitz problems. arXiv preprint arXiv:1904.09712 (2019)
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)
Article MathSciNet Google Scholar
Lu, H., Freund, R.M., Nesterov, Y.: Relatively smooth convex optimization by first-order methods, and applications. SIAM J. Optim. 28(1), 333–354 (2018)
Article MathSciNet Google Scholar
Monti, F., Bronstein, M.M., Bresson, X.: Geometric matrix completion with recurrent multi-graph neural networks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 3700–3710 (2017)
Google Scholar
Mukkamala, M.C., Ochs, P.: Beyond alternating updates for matrix factorization with inertial Bregman proximal gradient algorithms. In: Advances in Neural Information Processing Systems, pp. 4266–4276 (2019)
Google Scholar
Mukkamala, M.C., Ochs, P., Pock, T., Sabach, S.: Convex-Concave backtracking for inertial Bregman proximal gradient algorithms in nonconvex optimization. SIAM J. Math. Data Sci. 2(3), 658–682 (2020)
Article MathSciNet Google Scholar
Nesterov, Y.: Introductory lectures on convex optimization: a basic course (2004)
Google Scholar
Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: inertial proximal algorithm for nonconvex optimization. SIAM J. Imaging Sci. 7(2), 1388–1419 (2014)
Article MathSciNet Google Scholar
Pock, T., Sabach, S.: Inertial proximal alternating linearized minimization (iPALM) for nonconvex and nonsmooth problems. SIAM J. Imaging Sci. 9(4), 1756–1787 (2016)
Article MathSciNet Google Scholar
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis, Fundamental Principles of Mathematical Sciences, vol. 317. Springer, Berlin (1998)
Google Scholar
Wang, X., He, X., Wang, M., Feng, F., Chua, T.S.: Neural graph collaborative filtering. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 165–174 (2019)
Google Scholar
Wu, Y., Poczos, B., Singh, A.: Towards understanding the generalization bias of two layer convolutional linear classifiers with gradient descent. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 1070–1078. PMLR (2019)
Google Scholar
Yun, C., Sra, S., Jadbabaie, A.: Global optimality conditions for deep neural networks. In: International Conference on Learning Representations (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Tübingen, Tübingen, Germany
Mahesh Chandra Mukkamala & Peter Ochs
Technical University of Munich, Munich, Germany
Felix Westerkamp, Emanuel Laude & Daniel Cremers

Authors

Mahesh Chandra Mukkamala
View author publications
You can also search for this author in PubMed Google Scholar
Felix Westerkamp
View author publications
You can also search for this author in PubMed Google Scholar
Emanuel Laude
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Cremers
View author publications
You can also search for this author in PubMed Google Scholar
Peter Ochs
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mahesh Chandra Mukkamala .

Editor information

Editors and Affiliations

UNICAEN, GREYC – Normandy University, Caen, France
Abderrahim Elmoataz
ENSICAEN, GREYC – Normandy University, Caen, France
Jalal Fadili
CNRS, GREYC – Normandy University, Caen, France
Yvain Quéau
UNICAEN, GREYC – Normandy University, Caen, France
Julien Rabin
ENSICAEN, GREYC – Normandy University, Caen, France
Loïc Simon

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 430 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mukkamala, M.C., Westerkamp, F., Laude, E., Cremers, D., Ochs, P. (2021). Bregman Proximal Gradient Algorithms for Deep Matrix Factorization. In: Elmoataz, A., Fadili, J., Quéau, Y., Rabin, J., Simon, L. (eds) Scale Space and Variational Methods in Computer Vision. SSVM 2021. Lecture Notes in Computer Science(), vol 12679. Springer, Cham. https://doi.org/10.1007/978-3-030-75549-2_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-75549-2_17
Published: 30 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75548-5
Online ISBN: 978-3-030-75549-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics