Nonmonotone feasible arc search algorithm for minimization on Stiefel manifold

Francisco, Juliano B.; Gonçalves, Douglas S.

doi:10.1007/s40314-023-02310-0

Nonmonotone feasible arc search algorithm for minimization on Stiefel manifold

Published: 08 May 2023

Volume 42, article number 175, (2023)
Cite this article

Computational and Applied Mathematics Aims and scope Submit manuscript

159 Accesses
Explore all metrics

Abstract

We devise a new numerical method for solving the minimization problem over the Stiefel manifold, that is, the set of matrices of order $n \times p$ (here $p \le n$) with orthonormal columns. Our approach consists in a nonmonotone feasible arc search along a sufficient descent direction to assure convergence to stationary points, regardless the initial point considered. The feasibility of the iterates is maintained through a variation of the Cayley transform and thus our scheme can be seen as a retraction-based algorithm for minimization with orthogonality constraints. We emphasize that our scheme solves a $p\times p$ linear system at each iteration and has computational complexity of $O(np^2) + O(p^3)$, which is interesting when $p \ll n$. We present a general algorithmic framework for minimization on Stiefel manifold, give its global convergence properties and report numerical experiments on interesting applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A framework of constraint preserving update schemes for optimization on Stiefel manifold

Article 17 September 2014

Transportless conjugate gradient for optimization on Stiefel manifold

Article 26 May 2020

First order optimality conditions and steepest descent algorithm on orthogonal Stiefel manifolds

Article 23 August 2018

Data availability statement

The data sets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Notes

Available at https://github.com/optsuite/OptM.
http://www.manopt.org.
using the last 10 iterations in the memory.
Certified by the Matlab routine eigs.

References

Abrudan T, Eriksson J, Koivunen V (2008) Steepest descent algorithms for optimization under unitary matrix constraint. IEEE Trans Signal Process 56(3):1134–1147
Article MathSciNet MATH Google Scholar
Abrudan T, Eriksson J, Koivunen V (2009) Conjugate gradient algorithm for optimization under unitary matrix constraint. Signal Process 89:1704–1714
Article MATH Google Scholar
Absil PA, Malick J (2012) Projection-like retractions on matrix manifolds. SIAM J Optim 22(1):135–158
Article MathSciNet MATH Google Scholar
Absil P-A, Mahony R, Sepulchre R (2004) Riemannian geometry of Grassmann manifolds with a view on algorithmic computation. Acta Appl Math 80:199–220
Article MathSciNet MATH Google Scholar
Absil P-A, Baker CG, Gallivan KA (2007) Trust-region methods on Riemannian manifolds. Found Comput Math 7(3):303–330
Article MathSciNet MATH Google Scholar
Absil PA, Mahony R, Sepulchre R (2008) Optimization algorithms on matrix manifolds. Princeton University Press, Princeton
Book MATH Google Scholar
Barzilai J, Borwein JM (1988) Two point step size gradient methods. IMA J Numer Anal 8:141–148
Article MathSciNet MATH Google Scholar
Bendokat T, Zimmermann R (2021) Efficient quasi-geodesics on the Stiefel manifold. In: Nielsen F, Barbaresco F (eds) Geometric science of information. Springer International Publishing, New York, pp 763–771
Chapter Google Scholar
Bertsekas DP (2003) Constrained optimization and Lagrange multiplier methods. Massachusetts Institute of Technology, Cambridge
MATH Google Scholar
Boumal N, Mishra B, Absil P-A, Sepulchre R (2014) Manopt, a Matlab toolbox for optimization on manifolds. J Mach Learn Res 15:1455–1459
MATH Google Scholar
Cancès E, Chakir R, Maday Y (2010) Numerical analysis of nonlinear eigenvalue problems. J Sci Comput 45:90–117
Article MathSciNet MATH Google Scholar
Davis TA, Hu Y (2011) The University of Florida sparse matrix collection. ACM Trans Math Softw 38(1):1–25
MathSciNet MATH Google Scholar
Dolan ED, Moré JJ (2002) Benchmarking optimization software with performance profiles. Math Progr 91:201–213
Article MathSciNet MATH Google Scholar
Edelman A, Arias TA, Smith ST (1998) The geometry of algorithms with orthogonality constraints. SIAM J Matrix Anal Appl 20(2):303–353
Article MathSciNet MATH Google Scholar
Francisco JB, Viloche Bazán FS (2012) Nonmonotone algorithm for minimization on closed sets with application to minimization on Stiefel manifolds. J Comput Appl Math 236(10):2717–2727
Article MathSciNet MATH Google Scholar
Francisco JB, Bazán FSV, Weber Mendonça M (2017) Non-monotone algorithm for minimization on arbitrary domains with applications to large-scale orthogonal procrustes problem. Appl. Numer. Math 112:51–64
Article MathSciNet MATH Google Scholar
Francisco JB, Gonçalves DS, Bazán FSV, Paredes LLT (2020) Non-monotone inexact restoration method for nonlinear programming. Comput Optim Appl 76:867–888
Article MathSciNet MATH Google Scholar
Francisco JB, Gonçalves DS, Bazán FSV, Paredes LLT (2021) Nonmonotone inexact restoration approach for minimization with orthogonality constraints. Numer Algorithms 86:1651–1684
Article MathSciNet MATH Google Scholar
Gao B, Liu X, Chen X, Yuan Y (2018) A new first-order algorithmic framework for optimization problems with orthogonality constraints. SIAM J Optim 28(1):302–332
Article MathSciNet MATH Google Scholar
Golub GA, Van Loan CF (1996) Matrix computations, 3rd edn. The Johns Hopkins University Press, London
MATH Google Scholar
Grippo L, Lampariello F, Lucidi S (1986) A nonmonotone line search technique for Newton’s method. SIAM J Numer Anal 23:707–716
Article MathSciNet MATH Google Scholar
Helgaker T, Jørgensen J, Olsen J (2000) Molecular electronic—structure theory. Wiley, Chichester
Book Google Scholar
Hu J, Liu X, Wen Z, Yuan Y (2020) A brief introduction to manifold optimization. J Oper Res Soc China 8:199–248
Article MathSciNet MATH Google Scholar
Huang W, Absil P-A, Gallivan KA (2016) A Riemannian BFGS method for nonconvex optimization problems. Springer International Publishing, Cham, pp 627–634
MATH Google Scholar
Iannazzo B, Porcelli M (2018) The Riemannian Barzilai–Borwein method with nonmonotone line search and the matrix geometric mean computation. IMA J Numer Anal 38:495–517
Article MathSciNet MATH Google Scholar
Janin R (1984) Direction derivative of the marginal function in nonlinear programming. Math Progr Study 21:127–138
MATH Google Scholar
Jiang B, Dai YH (2015) A framework of constraint preserving update schemes for optimization on Stiefel manifold. Math Progr 153(2):535–575
Article MathSciNet MATH Google Scholar
Journée M, Nesterov Y, Richtárik P, Sepulchre R (2010) Generalized power method for sparse principal component analysis. J Mach Learn Res 11:517–553
MathSciNet MATH Google Scholar
Kohn W, Nobel Lecture (1999) Electronic structure of matter-wave functions and density functionals. Rev Mod Phys 71(5):1253–1266
Article Google Scholar
Nishimori Y, Akaho S (2005) Learning algorithms utilizing quasi-geodesic flows on the Stiefel manifold. Neurocomputing 67:106–135
Article Google Scholar
Oviedo H, Dalmau O, Lara H (2021) Two adaptive scaled gradient projection methods for Stiefel manifold constrained optimization. Numer Algorithms 87:1107–1127
Article MathSciNet MATH Google Scholar
Raydan M (1997) The Barzilai and Borwein gradient method for large scale unconstrained minimization problem. SIAM J Optim 7:26–33
Article MathSciNet MATH Google Scholar
Shariff M (1995) A constrained conjugate gradient method and the solution of linear equations. Comput Math Appl 30(11):25–37
Article MathSciNet MATH Google Scholar
Trendafilov N, Gallo M (2021) Multivariate data analysis on matrix manifolds. Springer series in the data sciences, Springer, Cham
Turaga P, Veeraraghavan A, Chellappa R (2008) Statistical analysis on Stiefel and Grassmann manifolds with applications in computer vision. In: IEEE conference on computer vision and pattern recognition, pp 1–8
Wen Z, Yin W (2013) A feasible method for optimization with orthogonality constraints. Math Progr 142:397–434
Article MathSciNet MATH Google Scholar
Zhang H, Hager W (2004) A nonmonotone line search technique and its application to unconstrained optimization. SIAM J Optim 14(4):1043–1056
Article MathSciNet MATH Google Scholar
Zhao Z, Bai Z-J, Jin X-Q (2015) A Riemannian Newton algorithm for nonlinear eigenvalue problems. SIAM J Matrix Anal Appl 36(2):752–774
Article MathSciNet MATH Google Scholar
Zhu X (2015) A feasible filter method for the nearest low-rank correlation matrix problem. Numer Algorithms 69:763–784
Article MathSciNet MATH Google Scholar

Download references

Funding

This work was partially supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico - CNPq - Brasil, Grant no. 305213/2021-0.

Author information

Authors and Affiliations

Department of Mathematics, Federal University of Santa Catarina, Florianópolis, SC, 88040900, Brazil
Juliano B. Francisco & Douglas S. Gonçalves

Authors

Juliano B. Francisco
View author publications
You can also search for this author in PubMed Google Scholar
Douglas S. Gonçalves
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Douglas S. Gonçalves.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests that are relevant to the content of this article.

Additional information

Communicated by Orizon Pereira Ferreira.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Floating point arithmetic considerations

We remark that in Theorem 1 the calculations of ${{\mathcal {R}}}_T$, U and V are assumed to be computed with high precision, as well as the solution of the linear system (A2), in such a way that $X^\textrm{T}X = I$ throughout the proof. Indeed, this assumption is even observed in our numerical experiments. Nevertheless, occasionally, such precision can be lost and such a consideration is no longer true. In this direction, we also provide a result analogous to Theorem 1 but considering the feasibility residual ${{{\mathcal {R}}}}_V = I - X^\textrm{T}X$ into calculations. Therefore, in this context of loss of numerical precision, we recommend to replace ${{\mathcal {R}}}_1$ and ${{\mathcal {R}}}_T$ in (20) by ${{\mathcal {R}}}_1^\textrm{Full}$ and ${{\mathcal {R}}}_T^\textrm{Full}$, respectively, according to the next theorem.

Theorem 3

Let $X \in \Gamma $, $Z \in {\mathbb {R}}^{n\times p}$ and define $W(t)\in {\mathbb {R}}^{2p \times p}$, ${{\mathcal {R}}}_1$ and ${{\mathcal {R}}}_T$ as in Theorem 1. For all $t\in {\mathbb {R}}$, if we consider ${{\mathcal {R}}}_V = I - X^\textrm{T}X$,

$$\begin{aligned} {{\mathcal {R}}}_1^\textrm{Full} = \left( {{\mathcal {R}}}_1 - \frac{1}{2}{{\mathcal {R}}}_T{{\mathcal {R}}}_VX^\textrm{T}Z \right) \left( I+\frac{t}{2}{{\mathcal {R}}}_VX^\textrm{T}Z\right) X^\textrm{T}X \end{aligned}$$

and ${{\mathcal {R}}}_T^\textrm{Full} = {{\mathcal {R}}}_T X^\textrm{T}X$, we have that

$$\begin{aligned} \left\{ \begin{array}{l} \displaystyle W_1(t) = \left( I + \frac{t}{2}{{\mathcal {R}}}_V X^\textrm{T}Z\right) X^\textrm{T}X \left( I + \frac{t}{2}W_2(t)\right) + O(\Vert {{\mathcal {R}}}_V\Vert ^2), \quad \text{ and } \\ \displaystyle W_2(t) = \left( I - \frac{t}{2}\left( -Z^\textrm{T}X - \frac{t}{2}{{\mathcal {R}}}_1^\textrm{Full} + \frac{{{\mathcal {R}}}_T^\textrm{Full}}{2}\right) \right) ^{-1} \left( -Z^\textrm{T}X - \frac{t}{2}{{\mathcal {R}}}_1^\textrm{Full} + \frac{{{\mathcal {R}}}_T^\textrm{Full}}{2}\right) \end{array} \right. \end{aligned}$$

(A1)

Proof

From Theorem 1, we have that $X^{+}(t) = X + tUW(t)$, with W(t) solution of

$$\begin{aligned} \left( I-\frac{t}{2}V^\textrm{T}U\right) W(t) = V^\textrm{T}X. \end{aligned}$$

(A2)

Now, using definition of ${{\mathcal {R}}}_V$ and ${{\mathcal {R}}}_T$, we have, from definition of ${{\mathcal {R}}}_1^\textrm{Full}$, that

$$\begin{aligned} \small V^\textrm{T} U = \left( \begin{array}{cc} (I-XX)X^\textrm{T}Z &{} X^\textrm{T}X \\ \displaystyle (-Z+\frac{1}{2}X{{\mathcal {R}}}_T)^\textrm{T}(Z-XX^\textrm{T}Z) &{}\displaystyle -Z^\textrm{T}X+\frac{{{\mathcal {R}}}_T^\textrm{Full}}{2} \end{array}\right) = \left( \begin{array}{cc} {{\mathcal {R}}}_VX^\textrm{T}Z &{} X^\textrm{T}X \\ \displaystyle -{{\mathcal {R}}}_1^\textrm{Full} &{} \displaystyle -Z^\textrm{T}X+\frac{{{\mathcal {R}}}_T^\textrm{Full}}{2} \end{array}\right) \end{aligned}$$

and

$$\begin{aligned} V^\textrm{T} X = \left( \begin{array}{cc} X^\textrm{T}X \\ \displaystyle -Z^\textrm{T}X + \frac{{{\mathcal {R}}}_T^\textrm{Full}}{2} \end{array}\right) . \end{aligned}$$

Thus, linear system (A2) leads to

$$\begin{aligned} \left\{ \begin{array}{l} \displaystyle W_1(t) = \left( I - \frac{t}{2}{{\mathcal {R}}}_V X^\textrm{T}Z\right) ^{-1} X^\textrm{T}X\left( I + \frac{t}{2}W_2(t)\right) , \quad \text{ and } \\ \displaystyle \frac{t}{2} {{\mathcal {R}}}_1^\textrm{Full} W_1(t) + \left( I - \frac{t}{2}\left( -Z^\textrm{T}X + \frac{{{\mathcal {R}}}_T^\textrm{Full}}{2}\right) \right) W_2(t) = -Z^\textrm{T}X + \frac{1}{2} {{\mathcal {R}}}_T^\textrm{Full} \end{array} \right. \end{aligned}$$

Now, by assuming that $\Vert \frac{t}{2}{{\mathcal {R}}}_V X^\textrm{T}Z\Vert < 1$, from Bannach’s Lemma (Golub and Van Loan 1996), $(I - \frac{t}{2}{{\mathcal {R}}}_V X^\textrm{T}Z)^{-1} = I + \frac{t}{2}{{\mathcal {R}}}_V X^\textrm{T}Z + O(\Vert {{\mathcal {R}}}_V\Vert ^2)$. Then, previous linear system becomes

$$\begin{aligned} \left\{ \begin{array}{l} \displaystyle W_1(t) = \left( I + \frac{t}{2}{{\mathcal {R}}}_VX^\textrm{T}Z\right) X^\textrm{T}X\left( I + \frac{t}{2}W_2(t)\right) + O(\Vert {{\mathcal {R}}}_V\Vert ^2), \\ \displaystyle W_2(t) = \left( I - \frac{t}{2}\left( -Z^\textrm{T}X - \frac{t}{2}{{\mathcal {R}}}_1^\textrm{Full} + \frac{{{\mathcal {R}}}_T^\textrm{Full}}{2}\right) \right) ^{-1}\left( -Z^\textrm{T}X - \frac{t}{2}{{\mathcal {R}}}_1^\textrm{Full} + \frac{{{\mathcal {R}}}_T^\textrm{Full}}{2}\right) \end{array} \right. \end{aligned}$$

from where the proof follows. $\square $

It is worth observing that when ${{\mathcal {R}}}_V =0$, Theorem 3 is reduced to Theorem 1. In addition, since ${{\mathcal {R}}}_V \approx 0$ (since X is practically feasible), the term $O(\Vert {{\mathcal {R}}}_V\Vert ^2)$ can be disregarded in (A1).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Francisco, J.B., Gonçalves, D.S. Nonmonotone feasible arc search algorithm for minimization on Stiefel manifold. Comp. Appl. Math. 42, 175 (2023). https://doi.org/10.1007/s40314-023-02310-0

Download citation

Received: 27 November 2022
Revised: 20 March 2023
Accepted: 16 April 2023
Published: 08 May 2023
DOI: https://doi.org/10.1007/s40314-023-02310-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nonmonotone feasible arc search algorithm for minimization on Stiefel manifold

Abstract

Access this article

Similar content being viewed by others

A framework of constraint preserving update schemes for optimization on Stiefel manifold

Transportless conjugate gradient for optimization on Stiefel manifold

First order optimality conditions and steepest descent algorithm on orthogonal Stiefel manifolds

Data availability statement

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A: Floating point arithmetic considerations

Theorem 3

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Nonmonotone feasible arc search algorithm for minimization on Stiefel manifold

Abstract

Access this article

Similar content being viewed by others

A framework of constraint preserving update schemes for optimization on Stiefel manifold

Transportless conjugate gradient for optimization on Stiefel manifold

First order optimality conditions and steepest descent algorithm on orthogonal Stiefel manifolds

Data availability statement

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A: Floating point arithmetic considerations

Appendix A: Floating point arithmetic considerations

Theorem 3

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation