Skip to main content
Log in

Nonmonotone feasible arc search algorithm for minimization on Stiefel manifold

  • Published:
Computational and Applied Mathematics Aims and scope Submit manuscript

Abstract

We devise a new numerical method for solving the minimization problem over the Stiefel manifold, that is, the set of matrices of order \(n \times p\) (here \(p \le n\)) with orthonormal columns. Our approach consists in a nonmonotone feasible arc search along a sufficient descent direction to assure convergence to stationary points, regardless the initial point considered. The feasibility of the iterates is maintained through a variation of the Cayley transform and thus our scheme can be seen as a retraction-based algorithm for minimization with orthogonality constraints. We emphasize that our scheme solves a \(p\times p\) linear system at each iteration and has computational complexity of \(O(np^2) + O(p^3)\), which is interesting when \(p \ll n\). We present a general algorithmic framework for minimization on Stiefel manifold, give its global convergence properties and report numerical experiments on interesting applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data availability statement

The data sets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Notes

  1. Available at https://github.com/optsuite/OptM.

  2. http://www.manopt.org.

  3. using the last 10 iterations in the memory.

  4. Certified by the Matlab routine eigs.

References

  • Abrudan T, Eriksson J, Koivunen V (2008) Steepest descent algorithms for optimization under unitary matrix constraint. IEEE Trans Signal Process 56(3):1134–1147

    Article  MathSciNet  MATH  Google Scholar 

  • Abrudan T, Eriksson J, Koivunen V (2009) Conjugate gradient algorithm for optimization under unitary matrix constraint. Signal Process 89:1704–1714

    Article  MATH  Google Scholar 

  • Absil PA, Malick J (2012) Projection-like retractions on matrix manifolds. SIAM J Optim 22(1):135–158

    Article  MathSciNet  MATH  Google Scholar 

  • Absil P-A, Mahony R, Sepulchre R (2004) Riemannian geometry of Grassmann manifolds with a view on algorithmic computation. Acta Appl Math 80:199–220

    Article  MathSciNet  MATH  Google Scholar 

  • Absil P-A, Baker CG, Gallivan KA (2007) Trust-region methods on Riemannian manifolds. Found Comput Math 7(3):303–330

    Article  MathSciNet  MATH  Google Scholar 

  • Absil PA, Mahony R, Sepulchre R (2008) Optimization algorithms on matrix manifolds. Princeton University Press, Princeton

    Book  MATH  Google Scholar 

  • Barzilai J, Borwein JM (1988) Two point step size gradient methods. IMA J Numer Anal 8:141–148

    Article  MathSciNet  MATH  Google Scholar 

  • Bendokat T, Zimmermann R (2021) Efficient quasi-geodesics on the Stiefel manifold. In: Nielsen F, Barbaresco F (eds) Geometric science of information. Springer International Publishing, New York, pp 763–771

    Chapter  Google Scholar 

  • Bertsekas DP (2003) Constrained optimization and Lagrange multiplier methods. Massachusetts Institute of Technology, Cambridge

    MATH  Google Scholar 

  • Boumal N, Mishra B, Absil P-A, Sepulchre R (2014) Manopt, a Matlab toolbox for optimization on manifolds. J Mach Learn Res 15:1455–1459

    MATH  Google Scholar 

  • Cancès E, Chakir R, Maday Y (2010) Numerical analysis of nonlinear eigenvalue problems. J Sci Comput 45:90–117

    Article  MathSciNet  MATH  Google Scholar 

  • Davis TA, Hu Y (2011) The University of Florida sparse matrix collection. ACM Trans Math Softw 38(1):1–25

    MathSciNet  MATH  Google Scholar 

  • Dolan ED, Moré JJ (2002) Benchmarking optimization software with performance profiles. Math Progr 91:201–213

    Article  MathSciNet  MATH  Google Scholar 

  • Edelman A, Arias TA, Smith ST (1998) The geometry of algorithms with orthogonality constraints. SIAM J Matrix Anal Appl 20(2):303–353

    Article  MathSciNet  MATH  Google Scholar 

  • Francisco JB, Viloche Bazán FS (2012) Nonmonotone algorithm for minimization on closed sets with application to minimization on Stiefel manifolds. J Comput Appl Math 236(10):2717–2727

    Article  MathSciNet  MATH  Google Scholar 

  • Francisco JB, Bazán FSV, Weber Mendonça M (2017) Non-monotone algorithm for minimization on arbitrary domains with applications to large-scale orthogonal procrustes problem. Appl. Numer. Math 112:51–64

    Article  MathSciNet  MATH  Google Scholar 

  • Francisco JB, Gonçalves DS, Bazán FSV, Paredes LLT (2020) Non-monotone inexact restoration method for nonlinear programming. Comput Optim Appl 76:867–888

    Article  MathSciNet  MATH  Google Scholar 

  • Francisco JB, Gonçalves DS, Bazán FSV, Paredes LLT (2021) Nonmonotone inexact restoration approach for minimization with orthogonality constraints. Numer Algorithms 86:1651–1684

    Article  MathSciNet  MATH  Google Scholar 

  • Gao B, Liu X, Chen X, Yuan Y (2018) A new first-order algorithmic framework for optimization problems with orthogonality constraints. SIAM J Optim 28(1):302–332

    Article  MathSciNet  MATH  Google Scholar 

  • Golub GA, Van Loan CF (1996) Matrix computations, 3rd edn. The Johns Hopkins University Press, London

    MATH  Google Scholar 

  • Grippo L, Lampariello F, Lucidi S (1986) A nonmonotone line search technique for Newton’s method. SIAM J Numer Anal 23:707–716

    Article  MathSciNet  MATH  Google Scholar 

  • Helgaker T, Jørgensen J, Olsen J (2000) Molecular electronic—structure theory. Wiley, Chichester

    Book  Google Scholar 

  • Hu J, Liu X, Wen Z, Yuan Y (2020) A brief introduction to manifold optimization. J Oper Res Soc China 8:199–248

    Article  MathSciNet  MATH  Google Scholar 

  • Huang W, Absil P-A, Gallivan KA (2016) A Riemannian BFGS method for nonconvex optimization problems. Springer International Publishing, Cham, pp 627–634

    MATH  Google Scholar 

  • Iannazzo B, Porcelli M (2018) The Riemannian Barzilai–Borwein method with nonmonotone line search and the matrix geometric mean computation. IMA J Numer Anal 38:495–517

    Article  MathSciNet  MATH  Google Scholar 

  • Janin R (1984) Direction derivative of the marginal function in nonlinear programming. Math Progr Study 21:127–138

    MATH  Google Scholar 

  • Jiang B, Dai YH (2015) A framework of constraint preserving update schemes for optimization on Stiefel manifold. Math Progr 153(2):535–575

    Article  MathSciNet  MATH  Google Scholar 

  • Journée M, Nesterov Y, Richtárik P, Sepulchre R (2010) Generalized power method for sparse principal component analysis. J Mach Learn Res 11:517–553

    MathSciNet  MATH  Google Scholar 

  • Kohn W, Nobel Lecture (1999) Electronic structure of matter-wave functions and density functionals. Rev Mod Phys 71(5):1253–1266

    Article  Google Scholar 

  • Nishimori Y, Akaho S (2005) Learning algorithms utilizing quasi-geodesic flows on the Stiefel manifold. Neurocomputing 67:106–135

    Article  Google Scholar 

  • Oviedo H, Dalmau O, Lara H (2021) Two adaptive scaled gradient projection methods for Stiefel manifold constrained optimization. Numer Algorithms 87:1107–1127

    Article  MathSciNet  MATH  Google Scholar 

  • Raydan M (1997) The Barzilai and Borwein gradient method for large scale unconstrained minimization problem. SIAM J Optim 7:26–33

    Article  MathSciNet  MATH  Google Scholar 

  • Shariff M (1995) A constrained conjugate gradient method and the solution of linear equations. Comput Math Appl 30(11):25–37

    Article  MathSciNet  MATH  Google Scholar 

  • Trendafilov N, Gallo M (2021) Multivariate data analysis on matrix manifolds. Springer series in the data sciences, Springer, Cham

  • Turaga P, Veeraraghavan A, Chellappa R (2008) Statistical analysis on Stiefel and Grassmann manifolds with applications in computer vision. In: IEEE conference on computer vision and pattern recognition, pp 1–8

  • Wen Z, Yin W (2013) A feasible method for optimization with orthogonality constraints. Math Progr 142:397–434

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang H, Hager W (2004) A nonmonotone line search technique and its application to unconstrained optimization. SIAM J Optim 14(4):1043–1056

    Article  MathSciNet  MATH  Google Scholar 

  • Zhao Z, Bai Z-J, Jin X-Q (2015) A Riemannian Newton algorithm for nonlinear eigenvalue problems. SIAM J Matrix Anal Appl 36(2):752–774

    Article  MathSciNet  MATH  Google Scholar 

  • Zhu X (2015) A feasible filter method for the nearest low-rank correlation matrix problem. Numer Algorithms 69:763–784

    Article  MathSciNet  MATH  Google Scholar 

Download references

Funding

This work was partially supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico - CNPq - Brasil, Grant no. 305213/2021-0.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Douglas S. Gonçalves.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests that are relevant to the content of this article.

Additional information

Communicated by Orizon Pereira Ferreira.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Floating point arithmetic considerations

Appendix A: Floating point arithmetic considerations

We remark that in Theorem 1 the calculations of \({{\mathcal {R}}}_T\), U and V are assumed to be computed with high precision, as well as the solution of the linear system (A2), in such a way that \(X^\textrm{T}X = I\) throughout the proof. Indeed, this assumption is even observed in our numerical experiments. Nevertheless, occasionally, such precision can be lost and such a consideration is no longer true. In this direction, we also provide a result analogous to Theorem 1 but considering the feasibility residual \({{{\mathcal {R}}}}_V = I - X^\textrm{T}X\) into calculations. Therefore, in this context of loss of numerical precision, we recommend to replace \({{\mathcal {R}}}_1\) and \({{\mathcal {R}}}_T\) in (20) by \({{\mathcal {R}}}_1^\textrm{Full}\) and \({{\mathcal {R}}}_T^\textrm{Full}\), respectively, according to the next theorem.

Theorem 3

Let \(X \in \Gamma \), \(Z \in {\mathbb {R}}^{n\times p}\) and define \(W(t)\in {\mathbb {R}}^{2p \times p}\), \({{\mathcal {R}}}_1\) and \({{\mathcal {R}}}_T\) as in Theorem 1. For all \(t\in {\mathbb {R}}\), if we consider \({{\mathcal {R}}}_V = I - X^\textrm{T}X\),

$$\begin{aligned} {{\mathcal {R}}}_1^\textrm{Full} = \left( {{\mathcal {R}}}_1 - \frac{1}{2}{{\mathcal {R}}}_T{{\mathcal {R}}}_VX^\textrm{T}Z \right) \left( I+\frac{t}{2}{{\mathcal {R}}}_VX^\textrm{T}Z\right) X^\textrm{T}X \end{aligned}$$

and \({{\mathcal {R}}}_T^\textrm{Full} = {{\mathcal {R}}}_T X^\textrm{T}X\), we have that

$$\begin{aligned} \left\{ \begin{array}{l} \displaystyle W_1(t) = \left( I + \frac{t}{2}{{\mathcal {R}}}_V X^\textrm{T}Z\right) X^\textrm{T}X \left( I + \frac{t}{2}W_2(t)\right) + O(\Vert {{\mathcal {R}}}_V\Vert ^2), \quad \text{ and } \\ \displaystyle W_2(t) = \left( I - \frac{t}{2}\left( -Z^\textrm{T}X - \frac{t}{2}{{\mathcal {R}}}_1^\textrm{Full} + \frac{{{\mathcal {R}}}_T^\textrm{Full}}{2}\right) \right) ^{-1} \left( -Z^\textrm{T}X - \frac{t}{2}{{\mathcal {R}}}_1^\textrm{Full} + \frac{{{\mathcal {R}}}_T^\textrm{Full}}{2}\right) \end{array} \right. \end{aligned}$$
(A1)

Proof

From Theorem 1, we have that \(X^{+}(t) = X + tUW(t)\), with W(t) solution of

$$\begin{aligned} \left( I-\frac{t}{2}V^\textrm{T}U\right) W(t) = V^\textrm{T}X. \end{aligned}$$
(A2)

Now, using definition of \({{\mathcal {R}}}_V\) and \({{\mathcal {R}}}_T\), we have, from definition of \({{\mathcal {R}}}_1^\textrm{Full}\), that

$$\begin{aligned} \small V^\textrm{T} U = \left( \begin{array}{cc} (I-XX)X^\textrm{T}Z &{} X^\textrm{T}X \\ \displaystyle (-Z+\frac{1}{2}X{{\mathcal {R}}}_T)^\textrm{T}(Z-XX^\textrm{T}Z) &{}\displaystyle -Z^\textrm{T}X+\frac{{{\mathcal {R}}}_T^\textrm{Full}}{2} \end{array}\right) = \left( \begin{array}{cc} {{\mathcal {R}}}_VX^\textrm{T}Z &{} X^\textrm{T}X \\ \displaystyle -{{\mathcal {R}}}_1^\textrm{Full} &{} \displaystyle -Z^\textrm{T}X+\frac{{{\mathcal {R}}}_T^\textrm{Full}}{2} \end{array}\right) \end{aligned}$$

and

$$\begin{aligned} V^\textrm{T} X = \left( \begin{array}{cc} X^\textrm{T}X \\ \displaystyle -Z^\textrm{T}X + \frac{{{\mathcal {R}}}_T^\textrm{Full}}{2} \end{array}\right) . \end{aligned}$$

Thus, linear system (A2) leads to

$$\begin{aligned} \left\{ \begin{array}{l} \displaystyle W_1(t) = \left( I - \frac{t}{2}{{\mathcal {R}}}_V X^\textrm{T}Z\right) ^{-1} X^\textrm{T}X\left( I + \frac{t}{2}W_2(t)\right) , \quad \text{ and } \\ \displaystyle \frac{t}{2} {{\mathcal {R}}}_1^\textrm{Full} W_1(t) + \left( I - \frac{t}{2}\left( -Z^\textrm{T}X + \frac{{{\mathcal {R}}}_T^\textrm{Full}}{2}\right) \right) W_2(t) = -Z^\textrm{T}X + \frac{1}{2} {{\mathcal {R}}}_T^\textrm{Full} \end{array} \right. \end{aligned}$$

Now, by assuming that \(\Vert \frac{t}{2}{{\mathcal {R}}}_V X^\textrm{T}Z\Vert < 1\), from Bannach’s Lemma (Golub and Van Loan 1996), \((I - \frac{t}{2}{{\mathcal {R}}}_V X^\textrm{T}Z)^{-1} = I + \frac{t}{2}{{\mathcal {R}}}_V X^\textrm{T}Z + O(\Vert {{\mathcal {R}}}_V\Vert ^2)\). Then, previous linear system becomes

$$\begin{aligned} \left\{ \begin{array}{l} \displaystyle W_1(t) = \left( I + \frac{t}{2}{{\mathcal {R}}}_VX^\textrm{T}Z\right) X^\textrm{T}X\left( I + \frac{t}{2}W_2(t)\right) + O(\Vert {{\mathcal {R}}}_V\Vert ^2), \\ \displaystyle W_2(t) = \left( I - \frac{t}{2}\left( -Z^\textrm{T}X - \frac{t}{2}{{\mathcal {R}}}_1^\textrm{Full} + \frac{{{\mathcal {R}}}_T^\textrm{Full}}{2}\right) \right) ^{-1}\left( -Z^\textrm{T}X - \frac{t}{2}{{\mathcal {R}}}_1^\textrm{Full} + \frac{{{\mathcal {R}}}_T^\textrm{Full}}{2}\right) \end{array} \right. \end{aligned}$$

from where the proof follows. \(\square \)

It is worth observing that when \({{\mathcal {R}}}_V =0\), Theorem 3 is reduced to Theorem 1. In addition, since \({{\mathcal {R}}}_V \approx 0\) (since X is practically feasible), the term \(O(\Vert {{\mathcal {R}}}_V\Vert ^2)\) can be disregarded in (A1).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Francisco, J.B., Gonçalves, D.S. Nonmonotone feasible arc search algorithm for minimization on Stiefel manifold. Comp. Appl. Math. 42, 175 (2023). https://doi.org/10.1007/s40314-023-02310-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s40314-023-02310-0

Keywords

Navigation