Skip to main content
Log in

Preconditioning and globalizing conjugate gradients in dual space for quadratically penalized nonlinear-least squares problems

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

When solving nonlinear least-squares problems, it is often useful to regularize the problem using a quadratic term, a practice which is especially common in applications arising in inverse calculations. A solution method derived from a trust-region Gauss-Newton algorithm is analyzed for such applications, where, contrary to the standard algorithm, the least-squares subproblem solved at each iteration of the method is rewritten as a quadratic minimization subject to linear equality constraints. This allows the exploitation of duality properties of the associated linearized problems. This paper considers a recent conjugate-gradient-like method which performs the quadratic minimization in the dual space and produces, in exact arithmetic, the same iterates as those produced by a standard conjugate-gradients method in the primal space. This dual algorithm is computationally interesting whenever the dimension of the dual space is significantly smaller than that of the primal space, yielding gains in terms of both memory usage and computational cost. The relation between this dual space solver and PSAS (Physical-space Statistical Analysis System), another well-known dual space technique used in data assimilation problems, is explained. The use of an effective preconditioning technique is proposed and refined convergence bounds derived, which results in a practical solution method. Finally, stopping rules adequate for a trust-region solver are proposed in the dual space, providing iterates that are equivalent to those obtained with a Steihaug-Toint truncated conjugate-gradient method in the primal space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Algorithm 2.1
Algorithm 2.2
Algorithm 2.3
Algorithm 2.4
Algorithm 3.1
Algorithm 3.2

Similar content being viewed by others

References

  1. Akkraoui, A.E., Gauthier, P., Pellerin, S., Buis, S.: Intercomparison of the primal and dual formulations of variational data assimilation. Q. J. R. Meteorol. Soc. 134, 1015–1025 (2008)

    Article  Google Scholar 

  2. Arioli, M.: A stopping criterion for the conjugate gradient algorithm in a finite element framework. Numer. Math. 97, 1–24 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  3. Ashby, S.F., Holst, M.J., Manteuffel, T.A., Saylor, P.E.: The role of the inner product in stopping criteria for conjugate gradient iterations. BIT 41(1), 026–052 (2001)

    Article  MathSciNet  Google Scholar 

  4. Axelsson, O.: Iterative Solution Methods. Cambridge University Press, Cambridge (1996)

    MATH  Google Scholar 

  5. Björck, Å.: Numerical Methods for Least Squares Problems. SIAM, Philadelphia (1996)

    Book  MATH  Google Scholar 

  6. Bouttier, F., Courtier, P.: Data assimilation concepts and methods. Technical report, ECMWF, Reading, England, 1999

  7. Cartis, C., Gould, N.I.M., Toint, P.L.: Adaptive cubic overestimation methods for unconstrained optimization. Part I: Motivation, convergence and numerical results. Math. Program., Ser. A (2009). doi:10.1007/s10107-009-0286-5, 51 pp.

    Google Scholar 

  8. Conn, A.R., Gould, N.I.M., Toint, P.L.: Trust-Region Methods. MPS-SIAM Series on Optimization, vol. 01. SIAM, Philadelphia (2000)

    Book  MATH  Google Scholar 

  9. Courtier, P.: Dual formulation of four-dimensional variational assimilation. Q. J. R. Meteorol. Soc. 123, 2449–2461 (1997)

    Article  Google Scholar 

  10. Courtier, P., Thépaut, J.-N., Hollingsworth, A.: A strategy for operational implementation of 4D-Var using an incremental approach. Q. J. R. Meteorol. Soc. 120, 1367–1388 (1994)

    Article  Google Scholar 

  11. Dennis, J.E., Schnabel, R.B.: Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Prentice-Hall, Englewood Cliffs (1983). Reprinted as Classics in Applied Mathematics, vol. 16. SIAM, Philadelphia (1996)

    MATH  Google Scholar 

  12. Fisher, M., Nocedal, J., Tremolet, Y., Wright, S.J.: Data assimilation in weather forecasting: a case study in PDE-constrained optimization. Optim. Eng. 10, 409–426 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  13. Giering, R., Kaminski, T.: Recipes for adjoint code construction. ACM Trans. Math. Softw. 24(4), 437–474 (1998)

    Article  MATH  Google Scholar 

  14. Golub, G.H., Van Loan, C.F.: Matrix Computations, 2nd edn. Johns Hopkins University Press, Baltimore (1989)

    MATH  Google Scholar 

  15. Gratton, S., Tshimanga, J.: An observation-space formulation of variational assimilation using a restricted preconditioned conjugate-gradient algorithm. Q. J. R. Meteorol. Soc. 135, 1573–1585 (2009)

    Article  Google Scholar 

  16. Gratton, S., Lawless, A., Nichols, N.K.: Approximate Gauss-Newton methods for nonlinear least-squares problems. SIAM J. Optim. 18, 106–132 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  17. Kelley, C.T.: Iterative Methods for Optimization. Frontiers in Applied Mathematics. SIAM, Philadelphia (1999)

    Book  MATH  Google Scholar 

  18. Lampe, J., Rojas, M., Sorensen, D.C., Voss, H.: Accelerating the LSTRS algorithm. SIAM J. Sci. Comput. 33(1), 175–194 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  19. Morales, J.L., Nocedal, J.: Automatic preconditioning by limited-memory quasi-Newton updating. SIAM J. Optim. 10, 1079–1096 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  20. Nocedal, J., Wright, S.J.: Numerical Optimization. Series in Operations Research. Springer, Heidelberg (1999)

    Book  Google Scholar 

  21. Rojas, M., Santos, S.A., Sorensen, D.C.: Algorithm 873: LSTRS: MATLAB software for large-scale trust-region subproblems and regularization. ACM Trans. Math. Softw. 34(2), 11 (2008)

    Article  MathSciNet  Google Scholar 

  22. Roux, F.X.: Acceleration of the outer conjugate gradient by reorthogonalization for a domain decomposition method for structural analysis problems. In: Proceedings of the 3rd International Conference on Supercomputing’, ICS’89, pp. 471–476. ACM, New York (1989)

    Chapter  Google Scholar 

  23. Stoll, M., Wathen, A.: Combination preconditioning and the Bramble-Pasciak+ preconditioner. SIAM J. Matrix Anal. Appl. 30(2), 582–608 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  24. Tarantola, A.: Inverse Problem Theory. Methods for Data Fitting and Model Parameter Estimation. Elsevier, Amsterdam (1987)

    MATH  Google Scholar 

  25. Tschimanga, Jean: On a class of limited memory preconditioners for large-scale nonlinear least-squares problems (with application to variational ocean data assimilation). PhD thesis, Department of Mathematics, FUNDP, University of Namur, Namur, Belgium (2007)

  26. Tshimanga, J., Gratton, S., Weaver, A., Sartenaer, A.: Limited-memory preconditioners with application to incremental four-dimensional variational data assimilation. Q. J. R. Meteorol. Soc. 134, 751–769 (2008)

    Article  Google Scholar 

  27. van der Vorst, H.A.: Iterative Krylov Methods for Large Linear Systems. Cambridge University Press, Cambridge (2003)

    Book  MATH  Google Scholar 

  28. Weaver, A.T., Vialard, J., Anderson, D.L.T.: Three- and four-dimensional variational assimilation with a general circulation model of the tropical Pacific ocean, Part 1: Formulation, internal diagnostics and consistency checks. Mon. Weather Rev. 131, 1360–1378 (2003)

    Article  Google Scholar 

Download references

Acknowledgements

The third author gratefully acknowledges partial support from CERFACS.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philippe L. Toint.

Appendix: Proof for the Lemma 2.4

Appendix: Proof for the Lemma 2.4

Proof

If the singular value decomposition for H is given by

$$ H = [U_1 \mbox{\ \ } U_2 ] \left [\begin{array}{c@{\quad}c}\Sigma_r & 0 \\0 & 0 \\\end{array} \right ] \left [ \begin{array}{c}V_1^T \\ [2pt]V_2^T \\\end{array}\right ],$$
(A.1)

a possible theoretical choice for \(\check{H}\) could be

$$\check{H} = \Sigma_rV_1^T.$$

Denoting \(\check{R}= ( U_{1}^{T}R^{-1}U_{1} )^{-1}\), direct computations show that

$$B^{-1}+H^TR^{-1}H=B^{-1}+\check{H}^T\check{R}^{-1}\check{H}.$$

Using the assumption (2.23) and denoting \(\check{G}=U_{1}^{T} G U_{1}\), we also obtain that \(F\check{H}^{T} = B \check{H}^{T} \check{G}\).

The matrix \(\check{H}\) is now a r×n matrix of rank r and Lemma 2.3 can then be applied using r, \(\check{R}\), \(\check{H}\) and \(\check{G}\) instead of m, R, H and G, yielding the desired result where ν 1,…,ν r the eigenvalues of \(\check{G}(I_{r}+\check{R}^{-1}\check{H}B\check{H}^{T})\), replace those of G(I m +R −1 HBH T) in (2.48). We next investigate the relations between these two sets of eigenvalues.

Using the relation on \(\check{H}\), \(\check{R}\) and \(\check{G}\), it can be shown that

$$ \bigl[U_1U_1^TG\widehat{A}\bigr] U_1 = U_1[\check{G}\check{A}]$$
(A.2)

where \(\widehat{A}\) is defined in (2.21). This says that U 1 is an invariant subspace of \(U_{1}U_{1}^{T}G\widehat{A}\) and every eigenvalue of \(\check{G}\check{A}\) is an eigenvalue of \(U_{1}U_{1}^{T}G\widehat{A}\). Therefore, the nonzero eigenvalues of \(U_{1}U_{1}^{T}G\widehat{A}\) are equal to the eigenvalues of \(\check{G}\check{A}\) using the fact that \(U_{1}U_{1}^{T}G\widehat{A}\) has (mr) null eigenvalues. We now consider the relations between the eigenvalues of \(U_{1}U_{1}^{T}G\widehat{A}\) and \(G\widehat{A}\), and start by rewriting these matrices blockwise.

Using the relation (A.1), it can be shown that

$$HBH^T = U_1\Sigma_rV_1^TBV_1\Sigma_rU_1^T.$$
(A.3)

Defining

$$ U_1 = \left [ \begin{array}{c}U_r \\0 \\\end{array} \right ],$$
(A.4)

HBH T can thus be rewritten in a block matrix form as

(A.5)

where \(M_{r} = U_{r} \Sigma_{r}V_{1}^{T}BV_{1}\Sigma_{r}U_{r}^{T} \) has full rank r. Using the equality (2.23), we can write that

$$HPH^T = HBH^TG.$$
(A.6)

Hence, HBH T G is symmetric due to the symmetry of HPH T. Using the relation (A.5) and defining

$$G = \left [ \begin{array}{c@{\quad}c}G_r & G_2 \\G_3 & G_{m-r} \\\end{array} \right ] ,$$
(A.7)

where G r is r×r matrix and G mr is a mr×mr matrix, we can write HBH T G as

$$ HBH^TG = \left [ \begin{array}{c@{\quad}c}M_rG_r & M_rG_2 \\0 & 0 \\\end{array}\right ].$$
(A.8)

From the symmetry of HBH T G given by (A.8), M r G 2=0 which implies that G 2=0 since M r is a full rank matrix. Thus, G has the form

$$ G = \left [ \begin{array}{c@{\quad}c}G_r & 0 \\G_3 & G_{m-r} \\\end{array} \right ].$$
(A.9)

We next derive a block matrix form of \(\widehat{A}\). Defining R −1 as

$$R^{-1} = \left [ \begin{array}{c@{\quad}c}R^{-1}_r & R^{-1}_2 \\R^{-1}_3 & R^{-1}_{m-r} \\\end{array} \right ],$$
(A.10)

and using (A.5) and the definition of \(\widehat{A}\), we can write \(\widehat{A}\) as

(A.11)

where \(\widehat{A}_{r} = I + R^{-1}_{r}M_{r}\). From (A.4), (A.9) and (A.11), we deduce that

$$ G\widehat{A}= \left [ \begin{array}{c@{\quad}c}G_r \widehat{A}_r & 0 \\G_3 \widehat{A}_r & G_{m-r} \\\end{array} \right ] \quad \mbox{and} \quad U_1U_1^TG\widehat{A}= \left [ \begin{array}{c@{\quad}c}G_r \widehat{A}_r & 0 \\0 & 0 \\\end{array} \right ].$$
(A.12)

From (A.12), the eigenvalues of \(G\widehat{A}\) are the eigenvalues of G r A r and the eigenvalues of G mr . Also, the eigenvalues of \(U_{1}U_{1}^{T}G\widehat{A}\) are the eigenvalues of G r A r and (mr) null eigenvalues. Therefore, the nonzero eigenvalues of \(U_{1}U_{1}^{T}G\widehat{A}\) which are equal to the eigenvalues of \(\check{G}\check{A}\) form a subset of the eigenvalues of \(G\widehat{A}\). As a result, the eigenvalues of \(\check{G}(I_{r}+\check{R}^{-1}\check {H}B\check{H}^{T})\) can be used in (2.47) instead of those G(I m +R −1 HBH T), which completes the proof. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gratton, S., Gürol, S. & Toint, P.L. Preconditioning and globalizing conjugate gradients in dual space for quadratically penalized nonlinear-least squares problems. Comput Optim Appl 54, 1–25 (2013). https://doi.org/10.1007/s10589-012-9478-7

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-012-9478-7

Keywords

Navigation