Parameter Choice Rules for Discrete Ill-Posed Problems Based on Extrapolation Methods

Azzarelli, Andrea; Brezinski, Claude; Fenu, Caterina; Redivo-Zaglia, Michela; Rodriguez, Giuseppe

doi:10.1007/s10915-025-02822-3

Parameter Choice Rules for Discrete Ill-Posed Problems Based on Extrapolation Methods

Open access
Published: 13 February 2025

Volume 103, article number 9, (2025)
Cite this article

Download PDF

You have full access to this open access article

Journal of Scientific Computing Aims and scope Submit manuscript

Parameter Choice Rules for Discrete Ill-Posed Problems Based on Extrapolation Methods

Download PDF

406 Accesses
Explore all metrics

Abstract

Linear discrete inverse problems are common in many applicative fields. Regularization consists of substituting to the original ill-conditioned problem an approximated formulation depending on a parameter, which has to be chosen so that the new problem is well-conditioned and its solution is close enough to the ideal solution. When the parameter is discrete, like in the truncated singular value decomposition (TSVD) and in the generalized TSVD (TGSVD), one has to choose a vector out of a sequence. In this paper we explore the possibility to employ a sequence of extrapolated solutions to estimate the best parameter, as well as substituting to the regularized solution an extrapolated one. We investigate the use of three classical vector extrapolation methods, MPE (minimal polynomial extrapolation), RRE (reduced rank extrapolation), and VEA (vector epsilon algorithm). For the VEA method we also develop a new computational scheme which reduces memory storage and computing time. Numerical experiments compare the performance of the newly introduced approaches with other well-known methods.

Minimal polynomial and reduced rank extrapolation methods are related

Article 14 September 2016

Hosvd-tmpe: an extrapolation method for multidimensional sequences

Article 03 June 2024

A convergence study for reduced rank extrapolation on nonlinear systems

Article 20 August 2019

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

A linear discrete ill-posed problem, arising in many applications, consists of the solution of the optimization problem

$$\begin{aligned} \min _{\textbf{x}\in {\mathbb R}^n}\Vert A\textbf{x}- {\textbf{b}}\Vert , \end{aligned}$$

(1)

where $\Vert \cdot \Vert $ denotes the usual 2-norm, $A\in {\mathbb R}^{m\times n}$ is a (often large) matrix whose singular values exhibits a fast decaying to zero, so that the norm of the Moore–Penrose pseudoinverse $A^\dag $ of A is considerably large. The data vector ${\textbf{b}}\in {\mathbb R}^m$ often contains field measurements affected by experimental errors, so it can be represented by

$$\begin{aligned} {\textbf{b}}= \widehat{{\textbf{b}}} + \textbf{e}, \qquad \text {with } \textbf{e}=\frac{\eta \Vert \widehat{{\textbf{b}}}\Vert }{\sqrt{m}} \varvec{\epsilon }, \end{aligned}$$

where $\varvec{\epsilon }\in {\mathbb R}^m$ is a random vector, usually normally distributed with zero mean value and unitary variance and $\eta $ is the noise level, often unknown. The unknown vector $\widehat{{\textbf{b}}}$ is the right-hand side which corresponds to the exact solution $\widehat{\textbf{x}}$.

Due to the numerical unboundedness of the pseudo-inverse matrix, the approximation of the usual least-squares solution $\widehat{\textbf{x}}:=A^\dag \widehat{{\textbf{b}}}$ of (1) by $A^\dag {\textbf{b}}$ leads to a severe propagation of the error $\textbf{e}$ in ${\textbf{b}}$, which makes the computed solution useless.

A regularized solution is obtained by replacing (1) by a nearby optimization problem, which is less sensitive to error propagation. Such solutions always depend upon one or more regularization parameters, whose tuning greatly influence the quality of the reconstruction. The accurate choice of these parameters is a critical step in every regularization method.

The first attempt to apply extrapolation techniques to discrete ill-posed problems is probably [14], where the Tikhonov solution is represented as a rational function, whose coefficients are determined by interpolation, and the regularized solution is obtained by extrapolating at $\lambda =0$, being $\lambda $ the regularization parameter. Further contribution to this idea are contained in [5, 34].

Following an idea developed in [1, 8], in [17] an extrapolation idea lead to a new criterion for estimating the parameter in various regularization methods. This technique, initially developed for square ill-conditioned linear systems, was later applied to least-squares [18] and large-scale problems [44], exploiting the connection between the Lanczos process and Gauss quadrature rules [27].

In this paper, inspired by [6], we investigate the application of different vector extrapolation methods to the sequence of regularized solution generated by TSVD, in the case of a problem in standard form, or TGSVD, for general-form regularization problems. In the case of the VEA method, we also develop a computational scheme to optimize performance. The analysis of the discussed extrapolated solution methods suggested both the introduction of a new criterion for the choice of the regularization parameter in TSVD/TGSVD, and the possibility to define another regularized solution, consisting in the vector extrapolated from a sequence of solutions. This new approach improves the accuracy of the results in a significant number of cases, as our numerical experiments show.

The paper is organized as follows. In Sect. 2 some notions about the truncated (generalized) singular value decomposition are recalled, and the notation is fixed. The basic idea behind the methods discussed in the paper is explained in Sect. 3. Section 4 describes three standard vector extrapolation methods and introduces a new computational scheme to implement the vector epsilon algorithm in the particular setting considered. The extrapolation approach introduced in the paper for computing a regularized solution to a discrete ill-posed problem is exposed in Sect. 5, and Sect. 6 presents several numerical experiments to ascertain its performance. Finally, Sect. 7 draws some conclusions on the paper.

2 TSVD and TGSVD

Let $A=U\Sigma V^T$ be the singular value decomposition (SVD) [4] of the matrix A, where U and V are orthogonal matrices of size m and n, respectively, $\Sigma ={{\,\textrm{diag}\,}}(\sigma _1,\ldots ,\sigma _p,0,\ldots ,0) \in {\mathbb R}^{m \times n}$ is the diagonal matrix of the singular values, and p is the rank of A.

The best rank $\ell $ approximation ($\ell \le p$) to the matrix A according to the Euclidean norm, i.e., the matrix $A_\ell $ which minimizes $\Vert A-R\Vert $ over all the matrices R of rank $\ell $, can be easily obtained by the SVD decomposition of A. This procedure allows one to replace the ill-conditioned matrix by a well-conditioned rank-deficient matrix $A_\ell $. The corresponding solution to (1) is known as the truncated SVD (TSVD) solution [30] and can be expressed as

$$\begin{aligned} \textbf{x}_{\ell } = A_\ell ^\dag {\textbf{b}}= \sum _{i=1}^\ell \frac{\textbf{u}_i^T{\textbf{b}}}{\sigma _i} \textbf{v}_i, \end{aligned}$$

(2)

where $\ell =1,\ldots ,p$, is the regularization parameter, $\sigma _i$ are the singular values, and the singular vectors $\textbf{u}_i$ and $\textbf{v}_i$ are the orthonormal columns of U and V, respectively. To standardize the notation, we extend the sequence (2) by adding the over-regularized vector $\textbf{x}_0=(0,\ldots ,0)^T$.

To introduce a regularization matrix $L\in {\mathbb R}^{t\times n}$ ($t\le n$), problem (1) is usually replaced by

$$\begin{aligned} \min _{\textbf{x}\in \mathcal {S}} \Vert L\textbf{x}\Vert , \qquad \mathcal {S} = \{ \textbf{x}\in {\mathbb R}^n: A^TA\textbf{x}= A^T{\textbf{b}}\}, \end{aligned}$$

(3)

under the assumption $\mathcal {N}(A) \cap \mathcal {N}(L) = \{0\}$, where $\mathcal {N}(A)$ denotes the null-space of the matrix A. The generalized singular value decomposition (GSVD) [40] of the matrix pair (A, L) is the factorization

$$\begin{aligned} A = U \Sigma _A Z^{-1}, \qquad L = V \Sigma _L Z^{-1}, \end{aligned}$$

(4)

where $U \in {\mathbb R}^{m \times m}$ and $V \in {\mathbb R}^{t \times t}$ are orthogonal matrices, $Z \in {\mathbb R}^{n \times n}$ is nonsingular, and $\Sigma _A$ and $\Sigma _L$ have the same size of A and L, respectively.

In the case $m\ge n\ge p$, the two matrices are given by

$$\begin{aligned} \Sigma _A = \begin{bmatrix} 0_{n-p} & 0 & 0 \\ 0 & C & 0 \\ 0 & 0 & I_{d} \\ 0 & 0 & 0 \end{bmatrix}, \qquad \Sigma _L = \begin{bmatrix} I_{n-p} & 0 & 0 \\ 0 & S & 0 \end{bmatrix}, \end{aligned}$$

where $d = n-t$. When $p\le m<n$, we have

$$\begin{aligned} \Sigma _A = \begin{bmatrix} 0 & 0_{m-p} & 0 & 0\\ 0 & 0 & C & 0 \\ 0 & 0 & 0 & I_{d} \end{bmatrix}, \qquad \Sigma _L = \begin{bmatrix} I_{n-p} & 0 & 0 \\ 0 & S & 0 \end{bmatrix}, \end{aligned}$$

where

$$\begin{aligned} C = {{\,\textrm{diag}\,}}(c_1,\ldots ,c_{p-d}), \qquad S = {{\,\textrm{diag}\,}}(s_1,\ldots ,s_{p-d}), \end{aligned}$$

with $c_i^2+s_i^2=1$, $i=1,\dots ,p-d$. The diagonal elements are ordered so that the generalized singular values $c_i/s_i$ are nondecreasing with $i=1,\ldots ,p-d$, i.e.,

$$\begin{aligned} 0< c_1 \le c_2 \le \dots \le c_{p-d} < 1 \qquad \text {and} \qquad 1> s_1 \ge s_2 \ge \dots \ge s_{p-d} > 0. \end{aligned}$$

If a block has a zero index in the above matrices, like $I_d$ with $d=0$, the corresponding rows and columns are removed.

The truncated GSVD (TGSVD) solution $\textbf{x}_{\ell }$ to (3) is then defined as

$$\begin{aligned} \textbf{x}_{\ell } = \sum _{i=t-\ell +1}^{t} \frac{\textbf{u}_i^T{\textbf{b}}}{c_{i-n+p}}\, \textbf{z}_i + \sum _{i=t+1}^n (\textbf{u}_i^T{\textbf{b}})\, \textbf{z}_i, \end{aligned}$$

(5)

where $\textbf{u}_i$ and $\textbf{z}_i$ are the columns of U and Z, respectively, and $\ell =0,1,\ldots ,p-d$ is the regularization parameter.

3 Extrapolation and Regularization

Let us assume that, chosen an initial vector $\textbf{x}_0\in {\mathbb R}^n$, a vector sequence is generated by the following rule

$$\begin{aligned} \textbf{x}_{i+1} = M \textbf{x}_i + \textbf{g}, \qquad i=0,1,\ldots , \end{aligned}$$

(6)

with $\textbf{x}_i\in {\mathbb R}^n$, for chosen matrix M and vector $\textbf{g}$. Regardless of the convergence of the sequence, its limit/antilimit $\textbf{x}$ [46] is the fixed point of transformation (6), assuming $I-M$ is invertible. In general, M and $\textbf{g}$ are unknown.

An extrapolation formula gives an approximation of the limit of the vector sequence. We will denote by $\textbf{y}_{k}^{(0)}$ the extrapolated vector obtained from the sequence $\textbf{x}_0,\dots ,\textbf{x}_{k}$.

In T(G)SVD regularization, one constructs the finite sequence of truncated (generalized) SVD solutions $\{\textbf{x}_\ell \}_{\ell =0,1,\ldots }$, according to either formula (2) or (5). Such a sequence, as $\ell $ increases, approaches the least-squares solution of (1), which is strongly contaminated by propagation error, so using an extrapolation method to accelerate convergence is not usually desirable.

In discrete ill-posed problems, a sequence of regularized solutions usually exhibits a semiconvergence behaviour. First, low frequencies, associated to large singular values, are added to the solution. In this phase, the regularized solution approaches the solution $\widehat{\textbf{x}}$ corresponding to the error-free right-hand side $\widehat{{\textbf{b}}}$. Then, high frequencies, coupled to small singular values, enter the numerical solution, making it diverge. The first process is slow and clearly convergent to $\widehat{\textbf{x}}$, whereas the second is abrupt and produces fast-changing successive solutions.

At the same time, since the matrix characterizing a linear discrete ill-posed problem typically has a small number of dominant singular values, the physical solution $\widehat{\textbf{x}}$ can be approximated with a reasonable accuracy by a vector belonging to a space of small dimension, compared to the size of the problem n. For this reason, the sought solution can be thought of as the limit of a sequence of the kind (6), for an unknown matrix M of small rank.

In this paper, we use a vector extrapolation technique to forecast the limit of the sequence and compare it to the next regularized solution. In particular, we will investigate the behaviour of the MPE, RRE and VEA methods. If the successive regularized solution and the extrapolated one are similar, then the approximation is accepted. If they are not, this means that the next vector in the sequence is contaminated by propagation error, enlarged by ill-conditioning, so we decide to stop the iteration and accept the previous approximation.

This extrapolation process can be applied to the full sequence $\textbf{x}_0,\dots ,\textbf{x}_{k}$, for increasing values of the index k. An approach similar to this was described in [6]. An alternative idea, which does not require an increasing computational complexity, consists of keeping fixed the number $k+1$ of vectors to be extrapolated as the process progresses, “forgetting” the first batch of solutions. We will denote by $\textbf{y}_{k}^{(r)}$ the extrapolated vector obtained from the sequence $\textbf{x}_r,\dots ,\textbf{x}_{r+k}$. A further variation, which will be investigated in the paper and compared to the previous ones, consists of accepting as a solution the extrapolated vector, rather than the regularized one.

This procedure result in a method for choosing the regularization parameter that can be coupled to any regularization process depending on a discrete parameter, like T(G)SVD or, in principle, any iterative regularization method; see [26] on the capability of Krylov spaces generated by the Lanczos and the Golub–Kahan methods to approximate the spaces spanned by singular vectors. The same idea might be applied to regularization methods for nonlinear problems; see, e.g., the one discussed in [22, 23].

4 Extrapolation Techniques

In this section we briefly review the minimal polynomial extrapolation (MPE) method [20], reduced rank extrapolation (RRE) [24, 38], and the vector epsilon algorithm (VEA) [50]; see [10] for a complete treatment of the theory and practice of extrapolation.

Many scalar sequences $(u_n)$ used in numerical analysis (for example the Rayleigh quotient in the power method, for computing the dominant eigenvalue of a matrix) satisfy, for all n, a linear homogeneous difference equation of the form

$$\begin{aligned} a_0(u_n-u)+\cdots +a_k(u_{n+k}-u)=0, \end{aligned}$$

(7)

with $a_0a_k \ne 0$ and $a_0+\cdots +a_k \ne 0$, a condition that can be replaced without loss of generality by $a_0+\cdots +a_k=1$. The set of sequences satisfying (7) is named the kernel of Shanks transformation. If the sequence $(u_n)$ converges, u is its limit. If it does not converge, u is named its antilimit. Shanks transformation [46] provides a way of computing u as a ratio of two determinants, valid for all n. If $(u_n)$ does not satisfy (7), Shanks transformation can still be applied to it, but the values corresponding to u it delivers now depend on n, and obviously on k, and they are denoted by $e_k(u_n)$. Thus, Shanks transformation transforms the sequence $(u_n)$ into a set of sequences $\{e_k(u_n)\}$, whose members are expressed by ratio of determinants.

Since ratios of determinants are difficult to compute, due to large complexity and propagation of rounding errors, a recursive algorithm, the scalar $\varepsilon $-algorithm, was proposed by Wynn [49]. Obviously, for vector sequences, the scalar $\varepsilon $-algorithm could be applied componentwise, but this is not satisfactory for many reasons. Thus, by defining the inverse of a vector (in fact, using the pseudo-inverse of a rectangular matrix with only one column), Wynn extended his algorithm to the vector case by obtaining the vector scalar $\varepsilon $-algorithm (VEA).

VEA lacks an algebraic framework, since it starts from a modification of a scalar algorithm. This is the reason why the construction of a vector extension of the scalar Shanks transformation had to start from (7), where now the $u_n$ and u are vectors or, more generally, elements of a vector space. This approach led to a new transformation and its corresponding recursive algorithm for its implementation, the topological $\varepsilon $-algorithm (TEA) [7], and its later simplification [11, 12]. Other vector extensions with the same kernel of Shanks transformation are the MPE, the modified MPE (MMPE), and the RRE. Let us discuss them.

4.1 Minimal Polynomial Extrapolation

Let $\Delta \textbf{x}_i=\textbf{x}_{i+1}-\textbf{x}_i$ and $\Delta \textbf{x}=\textbf{x}-\textbf{x}_0$. The MPE formula is induced by the polynomials

$$\begin{aligned} p(y) = \sum _{j=0}^{s+1} p_j y^j, \end{aligned}$$

of degree $s+1$, with $p(1)\ne 0$ and $p_{s+1}=1$, and

$$\begin{aligned} S_p(y) = \sum _{i=0}^s q_i y^i, \end{aligned}$$

of degree s, with coefficients $q_s=p_{s+1}$, $q_{i-1}=q_i+p_i$, $i=s,s-1,\ldots ,1$.

From the relation

$$\begin{aligned} (I-M)S_p(M) = p(I)-p(M), \end{aligned}$$

by requiring that p(M) “annihilates” $\Delta \textbf{x}_0$, that is,

$$\begin{aligned} p(M)\Delta \textbf{x}_0=0, \end{aligned}$$

(8)

one obtains

$$\begin{aligned} p(1)\Delta \textbf{x}= S_p(M)\Delta \textbf{x}_0 = \sum _{i=0}^s q_i \Delta \textbf{x}_i, \end{aligned}$$

from which it follows

$$\begin{aligned} \textbf{x}= \sum _{j=0}^{s+1} \frac{p_j}{p(1)} \textbf{x}_j. \end{aligned}$$

(9)

A monic polynomial p(y) which satisfies (8) and such that $p(1)\ne 0$ does not always exist. Firstly, it may be impossible so satisfy the equality in (8). For this reason, the problem is usually formulated in the least-squares sense

$$\begin{aligned} \min _{\textbf{p}\in {\mathbb R}^{s+1}} \Vert K_s\textbf{p}+\Delta \textbf{x}_{s+1}\Vert , \end{aligned}$$

(10)

where $K_s=[\Delta \textbf{x}_0,\ldots ,\Delta \textbf{x}_s]\in {\mathbb R}^{n\times (s+1)}$ and $\textbf{p}=(p_0,\ldots ,p_s)^T$. Secondly, the polynomial satisfying (10) may not verify $p(1)\ne 0$, that is, $\textbf{p}^T\textbf{1}\ne -1$, with $\textbf{1}=(1,\ldots ,1)^T$. If $p(1)=0$, s has to be replaced by $s-1$. So, the condition $p(1)\ne 0$ is not a restriction and can always be assumed.

In the following, we consider subsequences of length $s+3$ starting at vector $\textbf{x}_r$, so the above problem (10) becomes

$$\begin{aligned} \min _{\textbf{p}\in {\mathbb R}^{s+1}} \Vert K_{r,s}\textbf{p}+\Delta \textbf{x}_{r+s+1}\Vert , \end{aligned}$$

(11)

with $K_{r,s}=[\Delta \textbf{x}_r,\ldots ,\Delta \textbf{x}_{r+s}]\in {\mathbb R}^{n\times (s+1)}$, transforming (9) into the extrapolated solution

$$\begin{aligned} \textbf{y}_{s+2}^{(r)} = \sum _{j=0}^{s+1} \gamma _j \textbf{x}_{r+j}, \qquad \text {with } \gamma _j=\frac{p_j}{p(1)}. \end{aligned}$$

(12)

Solving (11) requires the availability of $s+3$ vectors $\textbf{x}_r,\textbf{x}_{r+1},\ldots ,\textbf{x}_{r+s+2}$.

When problem (11) is solved by the normal equations approach, assuming the matrix $K_{r,s}$ is full rank, one has to solve the square linear system

$$\begin{aligned} K_{r,s}^T K_{r,s} \textbf{p} = -K_{r,s}^T \Delta \textbf{x}_{r+s+1}. \end{aligned}$$

(13)

The MMPE method [48] consists of solving the linear system

$$\begin{aligned} B_s^T K_{r,s} \textbf{p} = -B_s^T \Delta \textbf{x}_{r+s+1}, \end{aligned}$$

(14)

for a chosen full-rank matrix $B_s\in {\mathbb R}^{n\times (s+1)}$. The columns of $B_s$ are sometimes set to be the canonical basis for ${\mathbb R}^{s+1}$, i.e., $B=[\textbf{e}_1,\ldots ,\textbf{e}_{s+1}]$, where the components of $\textbf{e}_i$ are all zero except the ith, which is 1; this amounts to requiring that the first $s+1$ equations in (11) are exactly verified. Another possible choice, with the aim of reducing the condition number of the coefficient matrix in (14), is $B=[\textbf{q}_1,\ldots ,\textbf{q}_{s+1}]$, with $\textbf{q}_i$ orthonormal vectors in ${\mathbb R}^n$ [9].

4.2 Reduced Rank Extrapolation

An alternative approach is the RRE method [24, 38]. It can be defined in various different ways (see, e.g., [34, 47]), for example, by expressing the extrapolated vector as

$$\begin{aligned} \textbf{y}_{s+2}^{(r)} = \textbf{x}_r + \sum _{i=0}^{s} \xi _i \Delta \textbf{x}_{r+i}, \end{aligned}$$

whose coefficients vector ${\varvec{\xi }}=(\xi _0,\ldots ,\xi _s)^T$ is obtained by solving the least-squares problem

$$\begin{aligned} \min _{{\varvec{\xi }}\in {\mathbb R}^{s+1}} \Vert W_{r,s} {\varvec{\xi }}+ \Delta \textbf{x}_r\Vert , \end{aligned}$$

(15)

where $W_{r,s}=[\Delta ^2\textbf{x}_r,\ldots ,\Delta ^2\textbf{x}_{r+s}]\in {\mathbb R}^{n\times (s+1)}$ and $\Delta ^2\textbf{x}_i=\Delta \textbf{x}_{i+1}-\Delta \textbf{x}_i$. Equivalently, the extrapolated vector can be expressed like in (12), with coefficients obtained by solving the following constrained problem

$$\begin{aligned} \min _{{\varvec{\gamma }}\in {\mathbb R}^{s+2}} \Vert K_{r,s+1} {\varvec{\gamma }}\Vert , \qquad \text {subject to } \sum _{j=0}^{s+1} \gamma _j = 1, \end{aligned}$$

(16)

with ${\varvec{\gamma }}=(\gamma _0,\ldots ,\gamma _{s+1})^T$. The solutions of (15) and (16) are related as follows

$$\begin{aligned} \gamma _0=1-\xi _0,\qquad \gamma _i=\xi _{i-1}-\xi _i,\ i=1,\ldots ,s, \qquad \gamma _{s+1}=\xi _s. \end{aligned}$$

The MPE, the RRE and the VEA are extensions to the non-scalar Shanks scalar sequence transformation. They are described in [15] and in [13], The MMPE also enters into the same type of methods and has the same type of property for its kernel. Moreover, it can be recursively implemented by the $S\beta $-algorithm due to Jbilou [33, 35].

4.3 Vector Epsilon Algorithm

Another possibility is Wynn’s VEA method [50], the very first vector extrapolation algorithm to appear.

The algorithm is the vectorial generalization of Shanks’ $\varepsilon $-algorithm [46]. It consists of an initialization step

$$\begin{aligned} {\varvec{\varepsilon }}_{-1}^{(r)}=\textbf{0},\quad {\varvec{\varepsilon }}_{0}^{(r)}=\textbf{x}_r, \qquad r=0,1,\dots , \end{aligned}$$

followed by the iteration

$$\begin{aligned} {\varvec{\varepsilon }}_{k+1}^{(r)}={\varvec{\varepsilon }}_{k-1}^{(r+1)}+[{\varvec{\varepsilon }}_{k}^{(r+1)}-{\varvec{\varepsilon }}_{k}^{(r)}]^{-1}, \qquad r,k=0,1,\dots , \end{aligned}$$

(17)

where the inverse of a vector is actually its Moore–Penrose pseudoinverse, that is,

$$\begin{aligned} \textbf{x}^{-1}:=\frac{\textbf{x}}{\textbf{x}^T\textbf{x}}=(\textbf{x}^\dagger )^T. \end{aligned}$$

(18)

Table 1 Vector epsilon algorithm computing table

Full size table

The vectors ${\varvec{\varepsilon }}_{k}^{(r)}$ computed by the algorithm are displayed in Table 1 as a double entry table according to the following computational scheme

We are interested in even columns of Table 1. Indeed, ${\varvec{\varepsilon }}_{k}^{(r)}$ is an extrapolated solution, obtained using the $k+1$ vectors $\textbf{x}_r,\dots ,\textbf{x}_{r+k}$, only if k is even. Odd columns of the table contain temporary vectors needed for the computation. In order to preserve our notation we set

$$\begin{aligned} \textbf{y}_{k}^{(r)}={\left\{ \begin{array}{ll} {\varvec{\varepsilon }}_{k}^{(r)}, & \text {if } { k} \text { even},\\ {\varvec{\varepsilon }}_{k-1}^{(r+1)}, \quad & \text {if } k \text { odd}. \end{array}\right. } \end{aligned}$$

It was proved in [37] that the kernel of the vector $\varepsilon $-algorithm contains the set of vector sequences satisfying (7). It was shown in [16], at least for the first step of the algorithm, that it also contains other sequences. However, due to the complexity of the algorithm (the vectors it produces can be expressed as ratios of designants, a noncommutative extension of determinants [45]), it was not possible to characterize them further. Numerous numerical experiments had shown that VEA performs much better than MPE and RRE. This will be also documented in the numerical experiments section of this paper.

In the following, we will consider vectors expressed as linear combinations of independent vectors $\{{\textbf{v}}_1,{\textbf{v}}_2,\ldots ,{\textbf{v}}_n\}$ with generic coefficients $\delta _i$, $i=1,\ldots ,n$. In this paper, such linear combinations coincide with (2) in the case of TSVD, and with (5) for TGSVD. Anyway, the following results are quite general and hold in the general case.

When VEA is applied to a sequence produced by TSVD or TGSVD, the extrapolated vectors retain a similar structure. To prove this property, it is useful to consider the following lemma.

Lemma 1

If a vector $\textbf{x}$ is a linear combination of linear independent vectors $\{{\textbf{v}}_1\dots , {\textbf{v}}_r\}$, the vector $\textbf{x}^{-1}$ defined in (18) is a linear combination of the same set of vectors.

Proof

If $\textbf{x}=\sum _{i=1}^r\delta _i{\textbf{v}}_i$, then

$$\begin{aligned} \textbf{x}^{-1} = \frac{\textbf{x}}{\textbf{x}^T\textbf{x}} = \frac{1}{\Vert \textbf{x}\Vert ^2_2}\sum _{i=1}^r\delta _i{\textbf{v}}_i = \sum _{i=1}^r\tilde{\delta _i}{\textbf{v}}_i, \qquad \end{aligned}$$

with $\tilde{\delta _i}=\delta _i/\Vert \textbf{x}\Vert ^2_2$. $\square $

Theorem 1

Let $\{{\textbf{v}}_1,\dots ,{\textbf{v}}_n\}$ be linearly independent in ${\mathbb R}^n$ and let the vector epsilon algorithm be applied to the sequence

$$\begin{aligned} \textbf{x}_r=\sum _{i=1}^r\delta _i{\textbf{v}}_i, \qquad r=0,1,\dots , n. \end{aligned}$$

(19)

Then, for $k=1,\dots ,\lfloor \frac{n}{2}\rfloor $, the extrapolated vectors can be written in the form

$$\begin{aligned} \begin{aligned} {\varvec{\varepsilon }}_{2k-1}^{(r)}&=\sum _{i=r+1}^{r+2k-1}\alpha _{2k-1,i}^{(r)}{\textbf{v}}_i,&r&=0,\dots ,n-2k+1, \\ {\varvec{\varepsilon }}_{2k}^{(r)}&=\textbf{x}_r+\sum _{i=r+1}^{r+2k}\alpha _{2k,i}^{(r)}{\textbf{v}}_i,&r&=0,\dots ,n-2k, \end{aligned} \end{aligned}$$

(20)

for certain coefficients $\alpha _{2k-1,i}^{(r)}$ and $\alpha _{2k,i}^{(r)}$ whose explicit expression will be given in the proof.

Proof

To simplify notation, we set $\alpha _{k,i}^{(r)} = 0$ for $k\le 0$ or $i<r+1$, that is, outside the index ranges in summations (20).

First, we observe that from (19) it follows

$$\begin{aligned} \textbf{x}_{r+1}-\textbf{x}_r=\delta _{r+1}{\textbf{v}}_{r+1}. \end{aligned}$$

(21)

The proof proceeds by induction. In the case $k=0$, both sums in (20) are empty, so we obtain the two initialization columns of the algorithm

$$\begin{aligned} {\varvec{\varepsilon }}_{-1}^{(r)}=\textbf{0}, \qquad {\varvec{\varepsilon }}_{0}^{(r)}=\textbf{x}_r; \end{aligned}$$

see Table 1.

Now, let us assume that (20) are valid for a given k, We are going to prove that ${\varvec{\varepsilon }}_{2k+1}^{(r)}$ and ${\varvec{\varepsilon }}_{2k+2}^{(r)}$ have a similar structure. We observe that, from (21),

$$\begin{aligned} \begin{aligned} {\varvec{\varepsilon }}_{2k}^{(r+1)}-{\varvec{\varepsilon }}_{2k}^{(r)}&= \textbf{x}_{r+1}+\sum _{i=r+2}^{r+2k+1}\alpha _{2k,i}^{(r+1)}{\textbf{v}}_i- \textbf{x}_r-\sum _{i=r+1}^{r+2k}\alpha _{2k,i}^{(r)}{\textbf{v}}_i \\&= (\delta _{r+1}-\alpha _{2k,r+1}^{(r)}){\textbf{v}}_{r+1} + \sum _{i=r+2}^{r+2k}(\alpha _{2k,i}^{(r+1)}-\alpha _{2k,i}^{(r)}){\textbf{v}}_i + \alpha _{2k,r+2k+1}^{(r+1)}{\textbf{v}}_{r+2k+1} \\&= \sum _{i=r+1}^{r+2k+1}\beta _{2k+1,i}^{(r)}{\textbf{v}}_i \end{aligned} \end{aligned}$$

where

$$\begin{aligned} \beta _{2k+1,i}^{(r)}= {\left\{ \begin{array}{ll} \delta _{r+1}-\alpha _{2k,r+1}^{(r)}, \qquad & i=r+1, \\ \alpha _{2k,i}^{(r+1)}-\alpha _{2k,i}^{(r)}, & i=r+2,\dots ,r+2k, \\ \alpha _{2k,r+2k+1}^{(r+1)}, & i=r+2k+1. \end{array}\right. } \end{aligned}$$

Note that, for $k=0$ this reduces to

$$\begin{aligned} \beta _{1,r+1}^{(r)}=\delta _{r+1}. \end{aligned}$$

By Lemma 1, setting

$$\begin{aligned} \varvec{\beta }_{2k+1}^{(r)} =(\beta _{2k+1,r+1}^{(r)},\beta _{2k+1,r+2}^{(r)},\ldots ,\beta _{2k+1,r+2k+1}^{(r)})^T \in {\mathbb R}^{2k+1}, \end{aligned}$$

we obtain

$$\begin{aligned} {[}{\varvec{\varepsilon }}_{2k}^{(r+1)}-{\varvec{\varepsilon }}_{2k}^{(r)}]^{-1} = \sum _{i=r+1}^{r+2k+1}\tilde{\beta }_{2k+1,i}^{(r)}{\textbf{v}}_i, \qquad \text {with } \tilde{\beta }_{2k+1,i}^{(r)}= \frac{\beta _{2k+1,i}^{(r)}}{\Vert \varvec{\beta }_{2k+1}^{(r)}\Vert ^2}. \end{aligned}$$

The expression we are looking for is then

$$\begin{aligned} {\varvec{\varepsilon }}_{2k+1}^{(r)}&={\varvec{\varepsilon }}_{2k-1}^{(r+1)}+[{\varvec{\varepsilon }}_{2k}^{(r+1)}-{\varvec{\varepsilon }}_{2k}^{(r)}]^{-1} \\&=\sum _{i=r+2}^{r+2k}\alpha _{2k-1,i}^{(r+1)}{\textbf{v}}_i +\sum _{i=r+1}^{r+2k+1}\tilde{\beta }_{2k+1,i}^{(r)}{\textbf{v}}_i =\sum _{i=r+1}^{r+2k+1}\alpha _{2k+1,i}^{(r)}{\textbf{v}}_i, \end{aligned}$$

where

$$\begin{aligned} \alpha _{2k+1,i}^{(r)}= {\left\{ \begin{array}{ll} \alpha _{2k-1,i}^{(r)}+\tilde{\beta }_{2k+1,i}^{(r)}, \qquad & i=r+2,\dots ,r+2k,\\ \tilde{\beta }_{2k+1,i}^{(r)}, \qquad & i=r+1,r+2k+1. \end{array}\right. } \end{aligned}$$

In particular, for $k=0$ we have

$$\begin{aligned} \alpha _{1,r+1}^{(r)}= \tilde{\beta }_{1,r+1}^{(r)} =\frac{1}{\delta _{r+1}}. \end{aligned}$$

Similar arguments lead to

$$\begin{aligned} {\varvec{\varepsilon }}_{2k+2}^{(r)}=\textbf{x}_r+\sum _{i=r+1}^{r+2k+2}\alpha _{2k+2,i}^{(r)}{\textbf{v}}_i, \end{aligned}$$

with coefficients obtained by setting

$$\begin{aligned} \begin{aligned} \beta _{2k+2,i}^{(r)}&= {\left\{ \begin{array}{ll} -\alpha _{2k+1,r+1}^{(r)}, \qquad & i=r+1, \\ \alpha _{2k+1,i}^{(r+1)}-\alpha _{2k+1,i}^{(r)}, \qquad & i=r+2,\dots ,r+2k+1, \\ \alpha _{2k+1,r+2k+2}^{(r+1)}, & i=r+2k+2, \end{array}\right. } \\ \varvec{\beta }_{2k+2}^{(r)}&= (\beta _{2k+2,r+1}^{(r)},\beta _{2k+2,r+2}^{(r)},\ldots ,\beta _{2k+2,r+2k+2}^{(r)})^T \in {\mathbb R}^{2k+2}. \end{aligned} \end{aligned}$$

With $\tilde{\beta }_{2k+2,i}^{(r)}=\beta _{2k+2,i}^{(r)}/\Vert \varvec{\beta }_{2k+2}^{(r)}\Vert ^2$, it finally holds

$$\begin{aligned} \alpha _{2k+2,i}^{(r)}= {\left\{ \begin{array}{ll} \delta _{r+1}+\tilde{\beta }_{2k+2,r+1}^{(r)}, \qquad & i=r+1,\\ \alpha _{2k,i}^{(r+1)}+\tilde{\beta }_{2k+2,i}^{(r)}, \qquad & i=r+2,\dots ,r+2k+1,\\ \tilde{\beta }_{2k+2,r+2k+2}^{(r)}, \qquad & i=r+2k+2. \end{array}\right. } \end{aligned}$$

The above formula, for $k=0$, simplify to

$$\begin{aligned} \beta _{2,i}^{(r)}= {\left\{ \begin{array}{ll} -\alpha _{1,r+1}^{(r)}=-\delta _{r+1}, \qquad & i=r+1, \\ \alpha _{1,r+2}^{(r+1)}=\delta _{r+2}, & i=r+2, \end{array}\right. } \end{aligned}$$

so that $\Vert \varvec{\beta }_{2}^{(r)}\Vert ^2 = \delta _{r+1}^2+\delta _{r+2}^2$, and

$$\begin{aligned} \alpha _{2,i}^{(r)}= {\left\{ \begin{array}{ll} \delta _{r+1}+\tilde{\beta }_{2,r+1}^{(r)}=\delta _{r+1}-\delta _{r+1}/\big (\delta _{r+1}^2+\delta _{r+2}^2\big ), \qquad & i=r+1,\\ \tilde{\beta }_{2,r+2}^{(r)}=\delta _{r+2}/\big (\delta _{r+1}^2+\delta _{r+2}^2\big ), \qquad & i=r+2. \end{array}\right. } \end{aligned}$$

This completes the induction. $\square $

The above theorem can be applied to extrapolate sequences of vectors generated by either the TSVD or the the GSVD. It also suggests an alternative algorithm for computing extrapolated vectors, which performs a recursion on the coefficients, rather than on the sequence vectors. The new approach requires less storage space than the standard application of (17) and, in our experiments, is faster than that. We believe that it might also be more stable in computations involving long extrapolation sequences. In our case, where the length of the vector sequence is deliberately kept small, we did not observe any stability issues. We defer to future work the investigation of properties and performance of the new computational scheme for VEA, under the assumptions of Theorem 1.

5 Comparison of Regularized and Extrapolated Solutions

Following an approach similar to [6], our aim is to determine a criterion to estimate the value of the parameter of a regularization method and, possibly, improve the accuracy of the selected regularized solution.

In [6], the authors apply the RRE method to the sequence of solutions produced by either TSVD and LSQR [41, 42]. They propose to keep $r=0$ fixed in (15) and let s vary in order to obtain different extrapolated vectors. According to our notation, they compute the extrapolated vectors $\textbf{t}_k=\textbf{y}_{k}^{(0)}$, $k=1,2,\dots $, by using the solutions $\{\textbf{x}_\ell \}_{0\le \ell \le k}$ obtained by the two regularization methods. In the case of TSVD, the vectors $\textbf{t}_k$ are expressed as sums of singular vectors by introducing suitable filter factors.

The algorithms, called RRE-TSVD or RRE-LSQR, respectively, choose the regularization parameter as the index k for which either the relative difference between two successive vectors is below the machine precision $\text {eps}=2.22\cdot 10^{-15}$, that is,

$$\begin{aligned} \frac{\Vert \textbf{t}_k-\textbf{t}_{k-1}\Vert }{\Vert \textbf{t}_{k-1}\Vert } < \text {eps}, \end{aligned}$$

or when the difference $\Vert \textbf{t}_k-\textbf{x}_k\Vert $ is the smallest; see Algorithm 1 and Example 4.1 in [6].

Our approach is different. Indeed, we compute extrapolated solutions $\textbf{y}_{k}^{(r)}$ by firstly increasing the number k of vectors to extrapolate until it reaches a chosen value s, and then letting the index r of the first vector of the sequence vary. The corresponding extrapolated solutions are

$$\begin{aligned} \textbf{t}^{(s)}_{\ell } = {\left\{ \begin{array}{ll} \textbf{x}_\ell , & \ell =0,1,2, \\ \textbf{y}_{\ell -1}^{(0)}, & \ell =3,\ldots ,s, \\ \textbf{y}_{s}^{(\ell -s-1)}, \quad & \ell >s. \end{array}\right. } \end{aligned}$$

(22)

The reason for this choice of notation is that in order to apply one of the extrapolation methods described in Sect. 4 at least three vectors are needed. This explains why the first three extrapolated vectors are equal to those of the original sequence.

When $\ell >s$, the extrapolated sequence is actually $\textbf{x}_{\ell -s-1},\ldots ,\textbf{x}_{\ell -1}$, and $\textbf{t}^{(s)}_{\ell }$ is expected to approximate $\textbf{x}_\ell $. Keeping the number of vectors in the sequence fixed at a relatively small value has two positive effects. Firstly, in discrete ill-posed problems A is typically dominated by few large singular values, so the solution is in a space of small dimension and its behaviour is usually well represented by the last few regularized solutions before noise propagation makes the sequence diverge. Secondly, a small s is attractive because it reduces the computational load for constructing each extrapolated solution.

Another difference in our approach is that we investigate the performance of different extrapolation techniques than RRE, such as MPE and VEA. Indeed, the kernel of the extrapolation method, that is the set of vectors that can be reproduced by the method, determines the variety of solutions that can be recovered. Moreover, this methods will be applied to sequences of TSVD and LSQR solutions, like it was done in [6], but also to TGSVD solutions, when a regularization matrix is involved in regularization.

We would like to choose as a regularization parameter the index $\ell _\text {extr}$ for which the quantity

$$\begin{aligned} \rho _\ell = \frac{\Vert \textbf{t}^{(s)}_\ell - \textbf{x}_\ell \Vert _\infty }{\Vert \textbf{t}^{(s)}_\ell + \textbf{x}_\ell \Vert _\infty }, \qquad \ell =3,4,\ldots , \end{aligned}$$

(23)

is the smallest.

Since the behaviour of the regularized solution $\textbf{x}_\ell $ is erratic for large values of $\ell $, choosing the absolute minimum of $\rho _\ell $ often leads to an overestimation of the parameter. So, we look for some sort of relative minimum, i.e., for a chosen $\bar{k}\in {\mathbb N}$, we select as a regularization parameter the first index $\ell _\text {extr}$ such that

$$\begin{aligned} \rho _{\ell _\text {extr}}\le \rho _\ell , \qquad \text {for }\ell =1,\ldots ,\ell _\text {extr}+\bar{k}. \end{aligned}$$

In our experiments we set $\bar{k}=s$.

We denote this parameter estimation procedure regularized and extrapolated solutions comparison (RESC). When it is applied to the extrapolation methods of Sect. 4, it will be called RESC-MPE, RESC-RRE, and RESC-VEA, respectively.

Once the parameter is estimated, we may take as an approximation of the solution either the regularized vector $\textbf{x}_{\ell _\text {extr}}$ or the extrapolated vector $\textbf{t}^{(s)}_{\ell _\text {extr}}$. We will show that the latter is often numerically more accurate than the former.

We remark that when MPE is applied to a TSVD sequence $\textbf{x}_r,\dots ,\textbf{x}_{r+s+2}$ computed by (2), we have that

$$\begin{aligned} \Delta \textbf{x}_{\ell } = \textbf{x}_{\ell +1}-\textbf{x}_{\ell } = \frac{\textbf{u}_{\ell +1}^T{\textbf{b}}}{\sigma _{\ell +1}} \textbf{v}_{\ell +1}, \end{aligned}$$

so that $K_{r,s}$ in (11) is an orthogonal matrix, and the vector $\Delta \textbf{x}_{r+s+1}$ is orthogonal to all its columns. Then, normal equations (13) lead to $\textbf{p}=\textbf{0}$, that is, $p(y)=y^{s+1}$. This means that the extrapolated solution $\textbf{y}_{s+2}^{(r)}$ is simply $\textbf{x}_{r+s+1}$. This shows that the MPE method exactly reproduces the TSVD solutions, so it does not add any information to the sequence.

If the TGSVD sequence (5) is considered, we have

$$\begin{aligned} \Delta \textbf{x}_{\ell }:= \delta _{t-\ell }\, \textbf{z}_{t-\ell }, \qquad \text {with } \delta _{t-\ell } = \frac{\textbf{u}_{t-\ell }^T{\textbf{b}}}{c_{t-\ell -n+p}}, \end{aligned}$$

and the matrix in (11) becomes

$$\begin{aligned} K_{r,s} = [\textbf{z}_{t-r},\ldots ,\textbf{z}_{t-r-s-1}] {{\,\textrm{diag}\,}}(\delta _{t-r},\ldots ,\delta _{t-r-s-1}). \end{aligned}$$

Since the matrix Z in (4) is not orthogonal, in general, the MPE method yields extrapolated solutions which are different from the vectors in the starting sequence.

6 Computed Examples

To assess the performance of the proposed extrapolation methods, even in comparison to other competitors, we tested them on a large number of model problems with known solution. Each one of them is constructed by selecting an ill-conditioned matrix $A\in {\mathbb R}^{n\times n}$ and a model solution $\widehat{\textbf{x}}\in {\mathbb R}^n$. After computing the exact right-hand side $\widehat{{\textbf{b}}}=A\widehat{\textbf{x}}$, we add a noise vector $\textbf{e}$, sampled from a normal distribution, scaled so that $\Vert \textbf{e}\Vert =\eta \Vert \widehat{\textbf{b}}\Vert $ for a fixed noise level $\eta $, to obtain a data vector ${\textbf{b}}=\widehat{{\textbf{b}}}+\textbf{e}$.

Combining the following possibilities we obtain 23040 problems in total:

8 test matrices (see Table 2);
8 model solutions (see Table 3);
2 sizes: $36\times 36$ and $100\times 100$;
3 regularization matrices $L=I,D_1,D_2$, that is, the identity matrix and the discretization of the first and second derivatives, respectively;
6 noise levels: $\eta =10^{-1},10^{-2},\dots ,10^{-6}$;
10 realizations of the random vector $\textbf{e}$ for each noise level.

We denote this set of examples by the symbol ${\mathcal {E}}$.

Table 2 Test matrices and condition numbers estimated by the cond function of Matlab; the examples are taken from Regularization Tools [31], except Hilbert and Lotkin, which are provided by Matlab

Full size table

Table 3 Model solutions $\widehat{\textbf{x}}=(\widehat{x}_1,\widehat{x}_2,\ldots ,\widehat{x}_n)$; each row shows the value of $\widehat{x}_k$, $k=1,\ldots ,n.$

Full size table

For each example, we compute the T(G)SVD sequence $\textbf{x}_\ell $ and choose the truncation parameter $\ell _{\text {extr}}$ as described in Sect. 5. In order to ascertain the accuracy of the results, we consider the following quality index for an approximated solution vector ${\textbf{w}}$ computed by one of the method considered

$$\begin{aligned} Q({\textbf{w}}) = \frac{\Vert \widehat{\textbf{x}}-{\textbf{w}}\Vert }{\Vert \widehat{\textbf{x}}-\textbf{x}_{k_\text {opt}}\Vert }, \end{aligned}$$

where

$$\begin{aligned} k_{\text {opt}}=\arg \min _\ell \Vert {\widehat{\textbf{x}}-\textbf{x}_\ell }\Vert \end{aligned}$$

(24)

denotes the truncation index corresponding to the best solution in the sequence $\textbf{x}_\ell $. A value $Q({\textbf{w}})=1$ means that the solution $\textbf{w}$ is as good as $\textbf{x}_{k_{\text {opt}}}$, while large values of Q characterize inaccurate solutions. The situation $Q({\textbf{w}})<1$ is unlikely but possible, since ${\textbf{w}}=\textbf{t}^{(s)}_{\ell _\text {extr}}$ is an extrapolated solution which is not contained in the sequence $\textbf{x}_\ell $.

We consider a test a success if $Q({\textbf{w}})\le 10$, a failure if $10<Q(\textbf{w})\le 10^2$, a severe failure if $Q(\textbf{w})>10^2$.

6.1 Comparing RESC-MPE, RESC-RRE, and RESC-VEA

In the first experiment we compare the accuracy of the different extrapolation techniques adopted in the RESC method. We let the parameter s, introduced in Eq. (22), vary in the range $s=2,4,\ldots ,12$. This parameter determines the number of vectors in the sequence that are used to compute an extrapolated solution.

For each value of s, we apply to the test set ${\mathcal {E}}$ the three extrapolation approaches RESC-MPE, RESC-RRE, and RESC-VEA, for estimating the truncation parameter k in TSVD or TGSVD, depending on being $L=I$ or $L\ne I$. This produces, for each model problem, three different solutions $\textbf{x}_{\ell _\text {extr}}$ and the corresponding quality indices $Q(\textbf{x}_{\ell _\text {extr}})$. The three continuous lines in Fig. 1 show the percentage of failures and of severe failures produced by the three methods, with respect to the variation of s. The three methods are roughly equivalent, with the MPE extrapolation method producing slightly less trustable results, since it exhibits the worse performance in the right-hand side graph, which reports the severe failures.

The same procedure is applied to obtain the extrapolated vectors $\textbf{t}^{(s)}_{\ell _\text {extr}}$ and the corresponding quality indices $Q(\textbf{t}^{(s)}_{\ell _\text {extr}}$). The results are reported as dashed line in Fig. 1. It is evident that the extrapolated solutions perform much better than the T(G)SVD solution with the truncation index determined by criterion (23). Also in this case, RRE and VEA perform better than MPE, with a slight improvement for VEA, whose severe failures fall below $1\%$ for $s=12$. In any case, a value of s around 8–10 seems more than sufficient to obtain a trustable regularized solution. Larger values of s may produce a small improvement in the results, but slow down the execution time. What often happens is that the curves in Fig. 1 exhibit a plateau where the results are substantially equivalent. It must also be noticed that for those model problem where the optimal truncation parameter is very small, the number of sequence vectors used to compute the extrapolated solution is always smaller than s. This is due to definition (22).

Given the success of extrapolated solutions, it is natural to wonder if it may happen that such solution is more accurate than the “optimal” solution, whose truncation index is defined by Eq. (24), that is, if $Q(\textbf{t}^{(s)}_{\ell _\text {extr}})<1$ for some numerical experiment. Figure 2 displays the percentage of tests for which this happens, showing that in a sensible number of experiments the extrapolated solution performs better than the optimal T(G)SVD solution. Also in this case, RESC-VEA produces the best results; for example, in $1\%$ of the experiments $Q(\textbf{w})<0.85$.

To further illustrate this aspect, we plot in Fig. 3 the regularized solutions obtained in two experiments where it results $Q(\textbf{t}^{(s)}_{\ell _\text {extr}})\in [0.5,1]$. They regard the Phillips and the Foxgood test matrices with two particular solutions and noise level $\eta =0.1$. In both cases, the exact solution is compared to the optimal solution $\textbf{x}_{k_{\text {opt}}}$, the solution $\textbf{x}_{\ell _\text {extr}}$ identified by the truncation parameter estimated by RESC-VEA with $s=10$, and the corresponding extrapolated solution $\textbf{t}^{(s)}_{\ell _\text {extr}}$.

6.2 Comparing RESC-VEA to Other Estimation Methods

Since RESC-VEA is the method which performs better on the test set ${\mathcal {E}}$, here we compare it to other well-known regularization parameter estimation methods.

In general, the most widely used method for estimating a regularization parameter is Morozov’s discrepancy principle [39]. It has a sound theoretical framework, and requires the knowledge of the noise level; see [25, Chapter 4] for a complete discussion. In particular, it has been proved that, under suitable assumptions, the regularized solution identified by the discrepancy principle converges to the least-squares solution when the noise level decreases to zero.

It is known that estimation methods, like the ones we discuss here, which do not make use of information on the noise will fail on specific examples [2]. Nevertheless, there are interesting results on the analysis of their convergence [36]. Moreover, in many important applications only generic information on the distribution of the noise is available, and its level is often unknown. For this reason, we only compare RESC-VEA to other such “heuristic” estimation methods.

Many methods exist for the estimation of a discrete regularization parameter; see [43] for a review. We report here a comparison to the two methods which proved to be most effective in our experimentation, namely, the quasi-optimality criterion and the L-curve method.

The quasi-optimality criterion was introduced by Morozov in [39]; see also [2, 3]. In our setting, the index of the regularized solution is chosen in order to minimize the norm of the difference between two successive T(G)SVD solutions, that is,

$$\begin{aligned} k_\text {qo} = \arg \min _k \Vert \textbf{x}_{k+1}-\textbf{x}_k\Vert . \end{aligned}$$

The L-curve method is based on considering the curve obtained by joining the points

$$\begin{aligned} (\log \Vert \textbf{x}_k\Vert ,\log \Vert \textbf{b}-A\textbf{x}_k\Vert ), \quad k=1,\ldots ,n. \end{aligned}$$

The shape of such curve often resembles a letter “L”, that is, it presents two almost linear parts joining at a point with high curvature. The method chooses the index of the solution $\textbf{x}_{k_\text {lc}}$ corresponding to this “corner” [28, 29]. There are different numerical procedures to accomplish this task, we used the “corner” algorithm, described in [32].

Table 4 Comparison of 4 estimation methods on the test set ${\mathcal {E}}$

Full size table

The results of the comparison are displayed in Table 4. The first column contains the name of the test matrices used in the experiments. Each of the following columns reports, for each method, the number of failures ($Q>10^1$) and severe failures ($Q>10^2$) observed in the 23040 tests. The last row shows the cumulative results in terms of both number of tests and percentage. For all methods, we show the results produced by the regularized solutions $\textbf{x}_k$, whose index k has been estimated by the methods. In the case of RESC-VEA, we also display the results obtained by the extrapolated solution $\textbf{t}^{(s)}_k$, with $s=10$.

Table 5 Comparison of 4 estimation methods on the test set ${\mathcal {E}}$

Full size table

It is clearly seen that the extrapolated solution identified by RESC-VEA (fifth column) is the only one which can compete with the results produced by the quasi-optimality criterion, in the second column. The other two methods considered, in the third and fourth columns, give good results, but of lesser accuracy. It is also remarkable that the extrapolated solution from RESC-VEA is quite robust, as the number of severe failures does not vary much with respect to different test matrices.

Table 6 Comparison of 4 estimation methods on the test set ${\mathcal {E}}$

Full size table

The same numerical results can be observed in Tables 5, 6 and 7, grouped in terms of model solution, regularization matrix, and noise level, respectively. They confirm the previous remarks made on the four methods. The second and fifth columns of Table 7 show a substantial increase in the number of failures for the smallest noise level. This is a known problem, as some heuristic methods are able to effectively estimate a regularization parameter only when the effect of the noise is enough sensible in the data.

Table 7 Comparison of 4 estimation methods on the test set ${\mathcal {E}}$

Full size table

Finally, we compare RESC-VEA to the RRE-TSVD method described in [6]. Since the method was introduced for problems in standard form ($L=I$) we consider the test set ${\mathcal {E}}_0\subset {\mathcal {E}}$, which limits the experiments to this case.

In [6], the RRE-TSVD method is mostly considered as a parameter estimation criterion. The possibility to use the estimated parameter k to identify both the regularized solution $\textbf{x}_k$, taken from the TSVD sequence, and the kth extrapolated vector $\textbf{t}_k$ is not discussed there. Since it is a trivial extension, we consider both implementations.

Table 8 reports the results similarly to Table 4, that is, grouped according to test matrix. The two implementations of RRE-TSVD are compared to the corresponding implementations of RESC-VEA. For the last method, we set $s=6$ to show that even extrapolating a rather short sequence of vectors the results are quite good. In this case, we obtained slightly best results setting $s=10$.

From the table it is clear that, on this particular set of examples, RRE-TSVD does not performs well as a parameter estimation method, anyway its results are highly competitive if the RRE extrapolated solution is used instead. The results produced by RESC-VEA are sensibly better, confirming the superiority of the new approach which couples the VEA extrapolation scheme to a short sequence of vectors.

Table 8 Comparison of the RRE-TSVD method from [6] and RESC-VEA (with $s=6$) on the test set ${\mathcal {E}}_0$

Full size table

7 Conclusion

In this paper, a new method for estimating the regularization parameter in TSVD and TGSVD has been introduced, based on the analysis of a sequence of extrapolated solutions derived from the original T(G)SVD sequence. It is similar to the method introduced in [6], with some important differences. In the new approach, the length of the extrapolating sequence is kept fixed and extrapolation methods different from RRE are considered. Moreover, we also propose to replace the identified T(G)SVD solution by its extrapolated version, and numerical tests show that this approach is particularly successful. Experiments also show that the VEA extrapolation method performs better than MPE and RRE, and we propose a new implementation of the VEA scheme, under the particular assumptions of our problem, that reduces memory storage and complexity. A complete study of this scheme is deferred to future work.

A limitation of the proposed approach is the use of TSVD and TGSVD, which limit the applicability of the method to small size problems. It is our intention to apply the investigated extrapolation methods to regularization techniques suitable for large scale problems, e.g., hybrid methods based on projection in Krylov subspaces [19, 21].

In conclusion, the paper confirms the great potential of extrapolation in the solution of discrete ill-posed problems, already observed in various other references.

Data Availability

Synthetic data used in numerical experiments can be reproduced by following the instructions given in the paper.

References

Auchmuty, G.: A posteriori error estimates for linear equations. Numer. Math. 61, 1–6 (1992)
Article MathSciNet MATH Google Scholar
Bakushinskii, A.B.: Remarks on choosing a regularization parameter using quasi-optimality and ratio criterion. USSR Comp. Math. Math. Phys. 24, 181–182 (1984)
Article MATH Google Scholar
Bauer, F., Kindermann, S.: Recent results on the quasi-optimality principle. J. Inv. Ill-Posed Probl. 17, 5–18 (2009)
MathSciNet MATH Google Scholar
Björck, A.: Numerical methods for least squares problems. SIAM, Philadelphia (1996)
Book MATH Google Scholar
Bouhamidi, A., Jbilou, K., Reichel, L., Sadok, H.: An extrapolated TSVD method for linear discrete ill-posed problems with kronecker structure. Linear Algebra Appl. 434, 1677–1688 (2011)
Article MathSciNet MATH Google Scholar
Bouhamidi, A., Jbilou, K., Reichel, L., Sadok, H., Wang, Z.: Vector extrapolation applied to truncated singular value decomposition and truncated iteration. J. Eng. Math. 93(1), 99–112 (2015)
Article MathSciNet MATH Google Scholar
Brezinski, C.: Généralisations de la transformation de Shanks, de la table de Padé et de l’$\varepsilon $-algorithme. Calcolo 12, 317–360 (1975)
Article MathSciNet MATH Google Scholar
Brezinski, C.: Error estimates for the solution of linear systems. SIAM J. Sci. Comput. 21, 764–781 (1999)
Article MathSciNet MATH Google Scholar
Brezinski, C., Cipolla, S., Redivo-Zaglia, M., Saad, Y.: Shanks and Anderson-type acceleration techniques for systems of nonlinear equations. IMA J. Numer. Anal. 42(4), 3058–3093 (2022)
Article MathSciNet MATH Google Scholar
Brezinski, C., Redivo-Zaglia, M.: Extrapolation methods: theory and practice. Elsevier, Amsterdam (2013)
MATH Google Scholar
Brezinski, C., Redivo-Zaglia, M.: The simplified topological $\varepsilon $-algorithms for accelerating sequences in a vector space. SIAM J. Sci. Comput. 36(5), A2227–A2247 (2014)
Article MathSciNet MATH Google Scholar
Brezinski, C., Redivo-Zaglia, M.: The simplified topological $\varepsilon $-algorithms: software and applications. Numer. Algorithms 74, 1237–1260 (2017)
Article MathSciNet MATH Google Scholar
Brezinski, C., Redivo-Zaglia, M.: The genesis and early developments of Aitken’s process, Shanks’ transformation, the $\varepsilon $-algorithm, and related fixed point methods. Numer. Algorithms 80, 11–133 (2019)
Article MathSciNet MATH Google Scholar
Brezinski, C., Redivo-Zaglia, M., Rodriguez, G., Seatzu, S.: Extrapolation techniques for ill-conditioned linear systems. Numer. Math. 81(1), 1–29 (1998)
Article MathSciNet MATH Google Scholar
Brezinski, C., Redivo-Zaglia, M., Saad, Y.: Shanks sequence transformations and Anderson acceleration. SIAM Rev. 60(3), 646–669 (2018)
Article MathSciNet MATH Google Scholar
Brezinski, C., Redivo-Zaglia, M., Salam, A.: On the kernel of vector $\varvec {\varepsilon }$-algorithm and related topics. Numer. Algorithms 92(1), 207–221 (2023)
Article MathSciNet MATH Google Scholar
Brezinski, C., Rodriguez, G., Seatzu, S.: Error estimates for linear systems with applications to regularization. Numer. Algorithms 49, 85–104 (2008)
Article MathSciNet MATH Google Scholar
Brezinski, C., Rodriguez, G., Seatzu, S.: Error estimates for the regularization of least squares problems. Numer. Algorithms 51, 61–76 (2009)
Article MathSciNet MATH Google Scholar
Buccini, A., Reichel, L.: Software for limited memory restarted $\ell ^p$-$\ell ^q$ minimization methods using generalized Krylov subspaces. Electron. Trans. Numer. Anal. 61, 66–91 (2024)
Article MathSciNet MATH Google Scholar
Cabay, S., Jackson, L.W.: A polynomial extrapolation method for finding limits and antilimits of vector sequences. SIAM J. Numer. Anal. 13(5), 734–752 (1976)
Article MathSciNet MATH Google Scholar
Chung, J., Gazzola, S.: Computational methods for large-scale inverse problems: A survey on hybrid projection methods. SIAM Rev. 66(2), 205–284 (2024)
Article MathSciNet MATH Google Scholar
Deidda, G.P., Díaz de Alba, P., Rodriguez, G.: Identifying the magnetic permeability in multi-frequency EM data inversion. Electron. Trans. Numer. Anal. 47, 1–17 (2017)
Deidda, G.P., Fenu, C., Rodriguez, G.: Regularized solution of a nonlinear problem in electromagnetic sounding. Inverse Probl. 30, 125014 (2014)
Article MathSciNet MATH Google Scholar
Eddy, R.P.: Extrapolating to the limit of a vector sequence. In: Wang, P.C.C. (ed.) Information linkage between applied mathematics and industry, pp. 387–396. Academic Press, New York (1979)
Chapter MATH Google Scholar
Engl, H.W., Hanke, M., Neubauer, A.: Regularization of inverse problems. Kluwer, Dordrecht (1996)
Book MATH Google Scholar
Gazzola, S., Onunwor, E., Reichel, L., Rodriguez, G.: On the Lanczos and Golub-Kahan reduction methods applied to discrete ill-posed problems. Numer. Linear Algebra Appl. 23, 187–204 (2016)
Article MathSciNet MATH Google Scholar
Golub, G.H., Meurant, G.: Matrices. Moments and quadrature with applications. Princeton University Press, Princeton (2010)
MATH Google Scholar
Hansen, P.: Analysis of discrete ill-posed problems by means of the L-curve. SIAM Review 34, 561–580 (1992)
Article MathSciNet MATH Google Scholar
Hansen, P., O’Leary, D.: The use of the L-curve in the regularization of discrete ill-posed problems. SIAM J. Sci. Comput. 14, 1487–1503 (1993)
Article MathSciNet MATH Google Scholar
Hansen, P.C.: The truncated SVD as a method for regularization. BIT 27, 543–553 (1987)
Article MathSciNet MATH Google Scholar
Hansen, P.C.: REGULARIZATION TOOLS: A Matlab package for analysis and solution of discrete ill-posed problems. Numer. Algorithms 6, 1–35 (1994)
Article MathSciNet MATH Google Scholar
Hansen, P.C., Jensen, T.K., Rodriguez, G.: An adaptive pruning algorithm for the discrete L-curve criterion. J. Comput. Appl. Math. 198(2), 483–492 (2007)
Article MathSciNet MATH Google Scholar
Jbilou, K.: Méthodes d’extrapolation et de projection. applications aux suites de vecteurs (1988). Thèse de 3ème cycle, Université des Sciences et Techniques de Lille
Jbilou, K., Reichel, L., Sadok, H.: Vector extrapolation enhanced TSVD for linear discrete ill-posed problems. Numer. Algorithms 51(2), 195–208 (2009)
Article MathSciNet MATH Google Scholar
Jbilou, K., Sadok, H.: Some results about vector extrapolation methods and related fixed point iteration. J. Comp. Appl. Math. 36, 385–398 (1991)
Article MathSciNet MATH Google Scholar
Kindermann, S.: Convergence analysis of minimization-based noise level-free parameter choice rules for linear ill-posed problems. Electron. Trans. Numer. Anal. 38, 233–257 (2012)
MathSciNet MATH Google Scholar
McLeod, J.: A note on the $\varepsilon $-algorithm. Computing 7(1–2), 17–24 (1971)
Article MathSciNet MATH Google Scholar
Mešina, M.: Convergence acceleration for the iterative solution of the equations $X=AX+f$. Comput. Meth. Appl. Mech. Eng. 10(2), 165–173 (1977)
Article MathSciNet MATH Google Scholar
Morozov, V.A.: Methods for solving incorrectly posed problems. Springer Verlag, New York (1984)
Book MATH Google Scholar
Paige, C.C., Saunders, M.A.: Towards a generalized singular value decomposition. SIAM J. Numer. Anal. 18(3), 398–405 (1981)
Article MathSciNet MATH Google Scholar
Paige, C.C., Saunders, M.A.: Algorithm 583: LSQR: Sparse linear equations and least squares problems. ACM Trans. Math. Softw. 8(2), 195–209 (1982)
Article MATH Google Scholar
Paige, C.C., Saunders, M.A.: LSQR: An algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Softw. 8(1), 43–71 (1982)
Article MathSciNet MATH Google Scholar
Reichel, L., Rodriguez, G.: Old and new parameter choice rules for discrete ill-posed problems. Numer. Algorithms 63, 65–87 (2013)
Article MathSciNet MATH Google Scholar
Reichel, L., Rodriguez, G., Seatzu, S.: Error estimates for large-scale ill-posed problems. Numer. Algorithms 51, 341–361 (2009)
Article MathSciNet MATH Google Scholar
Salam, A.: Non-commutative extrapolation algorithms. Numer. Algorithms 7, 225–251 (1994)
Article MathSciNet MATH Google Scholar
Shanks, D.: Non-linear transformations of divergent and slowly convergent sequences. J. Math. Phys. 34(1–4), 1–42 (1955)
Article MathSciNet MATH Google Scholar
Sidi, A.: Efficient implementation of minimal polynomial and reduced rank extrapolation methods. J. Comp. Appl. Math. 36(3), 305–337 (1991)
Article MathSciNet MATH Google Scholar
Sidi, A., Ford, W.F., Smith, D.A.: Acceleration of convergence of vector sequences. SIAM J. Numer. Anal. 23(1), 178–196 (1986)
Article MathSciNet MATH Google Scholar
Wynn, P.: On a device for computing the $e_m(s_n)$ tranformation. Math. Comput. 10(54), 91–96 (1956)
Article MathSciNet MATH Google Scholar
Wynn, P.: Acceleration techniqes for iterated vector and matrix problems. Math. Comp. 16, 301–322 (1962)
Article MathSciNet MATH Google Scholar

Download references

Funding

Open access funding provided by Università degli Studi di Cagliari within the CRUI-CARE Agreement. CB acknowledges support from Labex CEMPI (ANR-11-LABX-0007-01). The research of CF and GR is partially supported by Fondazione di Sardegna, Progetto biennale bando 2021, “Computational Methods and Networks in Civil Engineering (COMANCHE)”; this project also funded a visit by CB and MRZ to the University of Cagliari, during which part of this work was developed. GR is partially supported by the PRIN 2022 project “Inverse Problems in the Imaging Sciences (IPIS)” (2022ANC8HL) and by the PRIN-PNRR 2022 project “AQuAInt - Approximation and Quadrature for Applicative Integral Models” (P20229RMLB). AA and CF are partially supported by the PRIN-PNRR 2022 project “MATH4CHARM - MATHematics for Cultural HeritAge preseRvation disseMination” (P2022PMEN2). AA, CF, MRZ, and GR are members of the GNCS group of INdAM; AA, CF, and GR are partially supported by INdAM-GNCS 2024 Project “Algebra lineare numerica per problemi di grandi dimensioni: aspetti teorici e applicazioni”. The work of CF is partially supported by PNRR Project “FIATLUCS - Favorire l’Inclusione e l’Accessibilitá per i Trasferimenti Locali Urbani a Cagliari e Sobborghi” (F23C24000240006).

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, University of Cagliari, Cagliari, Italy
Andrea Azzarelli, Caterina Fenu & Giuseppe Rodriguez
Department of Mathematics, University of Lille, Lille, France
Claude Brezinski
University of Padua, Padua, Italy
Michela Redivo-Zaglia

Authors

Andrea Azzarelli
View author publications
You can also search for this author inPubMed Google Scholar
Claude Brezinski
View author publications
You can also search for this author inPubMed Google Scholar
Caterina Fenu
View author publications
You can also search for this author inPubMed Google Scholar
Michela Redivo-Zaglia
View author publications
You can also search for this author inPubMed Google Scholar
Giuseppe Rodriguez
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Giuseppe Rodriguez.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Azzarelli, A., Brezinski, C., Fenu, C. et al. Parameter Choice Rules for Discrete Ill-Posed Problems Based on Extrapolation Methods. J Sci Comput 103, 9 (2025). https://doi.org/10.1007/s10915-025-02822-3

Download citation

Received: 16 August 2024
Revised: 16 January 2025
Accepted: 24 January 2025
Published: 13 February 2025
DOI: https://doi.org/10.1007/s10915-025-02822-3

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Parameter Choice Rules for Discrete Ill-Posed Problems Based on Extrapolation Methods

Abstract

Similar content being viewed by others

Minimal polynomial and reduced rank extrapolation methods are related

Hosvd-tmpe: an extrapolation method for multidimensional sequences

A convergence study for reduced rank extrapolation on nonlinear systems

1 Introduction

2 TSVD and TGSVD

3 Extrapolation and Regularization

4 Extrapolation Techniques

4.1 Minimal Polynomial Extrapolation

4.2 Reduced Rank Extrapolation

4.3 Vector Epsilon Algorithm

Lemma 1

Proof

Theorem 1

Proof

5 Comparison of Regularized and Extrapolated Solutions

6 Computed Examples

6.1 Comparing RESC-MPE, RESC-RRE, and RESC-VEA

6.2 Comparing RESC-VEA to Other Estimation Methods

7 Conclusion

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification