Skip to main content

Bregman Proximal Gradient Algorithms for Deep Matrix Factorization

  • Conference paper
  • First Online:
Scale Space and Variational Methods in Computer Vision (SSVM 2021)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12679))

  • 1241 Accesses

Abstract

A typical assumption for the convergence of first order optimization methods is the Lipschitz continuity of the gradient of the objective function. However, for many practical applications this assumption is violated. To overcome this issue extensions based on generalized proximity measures, known as Bregman distances, were introduced. This initiated the development of the Bregman Proximal Gradient (BPG) algorithms, which, however, rely on problem dependent Bregman distances. In this paper, we develop Bregman distances for deep matrix factorization problems, which yields a BPG algorithm with theoretical convergence guarantees, while allowing for a constant step size strategy. Moreover, we demonstrate that the algorithms based on the developed Bregman distance outperform their Euclidean counterparts as well as alternating minimization based approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arora, S., Cohen, N., Hu, W., Luo, Y.: Implicit regularization in deep matrix factorization. In: Advances in Neural Information Processing Systems, pp. 7413–7424 (2019)

    Google Scholar 

  2. Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116(1–2), 5–16 (2009)

    Article  MathSciNet  Google Scholar 

  3. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)

    Article  MathSciNet  Google Scholar 

  4. Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42(2), 330–348 (2017)

    Article  MathSciNet  Google Scholar 

  5. Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003)

    Article  MathSciNet  Google Scholar 

  6. Berg, R.V.D., Kipf, T.N., Welling, M.: Graph convolutional matrix completion. arXiv preprint arXiv:1706.02263 (2017)

  7. Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2), 556–572 (2007)

    Article  MathSciNet  Google Scholar 

  8. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2013). https://doi.org/10.1007/s10107-013-0701-9

    Article  MathSciNet  MATH  Google Scholar 

  9. Bolte, J., Sabach, S., Teboulle, M., Vaisbourd, Y.: First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM J. Optim. 28(3), 2131–2151 (2018)

    Article  MathSciNet  Google Scholar 

  10. Choromanska, A., Henaff, M., Mathieu, M., Arous, G.B., LeCun, Y.: The loss surfaces of multilayer networks. In: Artificial Intelligence and Statistics, pp. 192–204 (2015)

    Google Scholar 

  11. Davis, D., Drusvyatskiy, D., MacPhee, K.J.: Stochastic model-based minimization under high-order growth. arxiv preprint arXiv:1807.00255 (2018)

  12. Dragomir, R.A., d’Aspremont, A., Bolte, J.: Quartic first-order methods for low rank minimization. arxiv preprint arXiv:1901.10791 (2019)

  13. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(Jul), 2121–2159 (2011)

    MathSciNet  MATH  Google Scholar 

  14. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)

    MATH  Google Scholar 

  15. Hanzely, F., Richtárik, P.: Fastest rates for stochastic mirror descent methods. arxiv preprint arXiv:1803.07374 (2018)

  16. Harper, F.M., Konstan, J.A.: The movielens datasets: history and context. ACM Trans. Interact. Intell. Syst. (TIIS) 5(4), 19 (2016)

    Google Scholar 

  17. Kawaguchi, K.: Deep learning without poor local minima. In: Advances in Neural Information Processing Systems, pp. 586–594 (2016)

    Google Scholar 

  18. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arxiv preprint arXiv:1412.6980 (2014)

  19. Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)

    Article  MathSciNet  Google Scholar 

  20. Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)

    Article  Google Scholar 

  21. Li, Q., Zhu, Z., Tang, G., Wakin, M.B.: Provable Bregman-divergence based methods for nonconvex and non-lipschitz problems. arXiv preprint arXiv:1904.09712 (2019)

  22. Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)

    Article  MathSciNet  Google Scholar 

  23. Lu, H., Freund, R.M., Nesterov, Y.: Relatively smooth convex optimization by first-order methods, and applications. SIAM J. Optim. 28(1), 333–354 (2018)

    Article  MathSciNet  Google Scholar 

  24. Monti, F., Bronstein, M.M., Bresson, X.: Geometric matrix completion with recurrent multi-graph neural networks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 3700–3710 (2017)

    Google Scholar 

  25. Mukkamala, M.C., Ochs, P.: Beyond alternating updates for matrix factorization with inertial Bregman proximal gradient algorithms. In: Advances in Neural Information Processing Systems, pp. 4266–4276 (2019)

    Google Scholar 

  26. Mukkamala, M.C., Ochs, P., Pock, T., Sabach, S.: Convex-Concave backtracking for inertial Bregman proximal gradient algorithms in nonconvex optimization. SIAM J. Math. Data Sci. 2(3), 658–682 (2020)

    Article  MathSciNet  Google Scholar 

  27. Nesterov, Y.: Introductory lectures on convex optimization: a basic course (2004)

    Google Scholar 

  28. Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: inertial proximal algorithm for nonconvex optimization. SIAM J. Imaging Sci. 7(2), 1388–1419 (2014)

    Article  MathSciNet  Google Scholar 

  29. Pock, T., Sabach, S.: Inertial proximal alternating linearized minimization (iPALM) for nonconvex and nonsmooth problems. SIAM J. Imaging Sci. 9(4), 1756–1787 (2016)

    Article  MathSciNet  Google Scholar 

  30. Rockafellar, R.T., Wets, R.J.B.: Variational Analysis, Fundamental Principles of Mathematical Sciences, vol. 317. Springer, Berlin (1998)

    Google Scholar 

  31. Wang, X., He, X., Wang, M., Feng, F., Chua, T.S.: Neural graph collaborative filtering. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 165–174 (2019)

    Google Scholar 

  32. Wu, Y., Poczos, B., Singh, A.: Towards understanding the generalization bias of two layer convolutional linear classifiers with gradient descent. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 1070–1078. PMLR (2019)

    Google Scholar 

  33. Yun, C., Sra, S., Jadbabaie, A.: Global optimality conditions for deep neural networks. In: International Conference on Learning Representations (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mahesh Chandra Mukkamala .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 430 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mukkamala, M.C., Westerkamp, F., Laude, E., Cremers, D., Ochs, P. (2021). Bregman Proximal Gradient Algorithms for Deep Matrix Factorization. In: Elmoataz, A., Fadili, J., Quéau, Y., Rabin, J., Simon, L. (eds) Scale Space and Variational Methods in Computer Vision. SSVM 2021. Lecture Notes in Computer Science(), vol 12679. Springer, Cham. https://doi.org/10.1007/978-3-030-75549-2_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-75549-2_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-75548-5

  • Online ISBN: 978-3-030-75549-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics