Nonlinear Kaczmarz algorithms and their convergence

doi:10.1016/j.cam.2021.113720

Journal of Computational and Applied Mathematics

Volume 399, 1 January 2022, 113720

https://doi.org/10.1016/j.cam.2021.113720 Get rights and content

Abstract

This paper proposes a class of randomized Kaczmarz algorithms for obtaining isolated solutions of large-scale well-posed or overdetermined nonlinear systems of equations. This type of algorithm improves the classic Newton method. Each iteration only needs to calculate one row of the Jacobian instead of the entire matrix, which greatly reduces the amount of calculation and storage. Therefore, these algorithms are called matrix-free algorithms. According to the different probability selection patterns of choosing a row of the Jacobian matrix, the nonlinear Kaczmarz (NK) algorithm, the nonlinear randomized Kaczmarz (NRK) algorithm and the nonlinear uniformly randomized Kaczmarz (NURK) algorithm are proposed. In addition, the NURK algorithm is similar to the stochastic gradient descent (SGD) algorithm in nonlinear optimization problems. The only difference is the choice of step size. In the case of noise-free data, theoretical analysis and the results of numerical based on the classical tangential cone conditions show that the algorithms proposed in this paper are superior to the SGD algorithm in terms of iterations and calculation time.

Introduction

Consider solving the following nonlinear system of equations $f (x) = 0,$ where $f : R^{n} \to R^{m}$ is a nonlinear vector-valued function, and $x \in R^{n}$ is an unknown vector. Eqs. (1) can also be written in the form of component quantities: $f_{i} (x) = 0, i = 1, 2, \dots, m,$ where at least one $f_{i} : D (f_{i}) \subseteq R^{n} \to R$ ( $i = 1, 2, \dots, m$ ) is a nonlinear operator. If $x^{*} \in R^{n}$ exists such that $f (x^{*}) = 0$ holds, then $x^{*}$ is called the solution of the nonlinear system of Eqs. (1).

Nonlinear problems widely exist in many important fields such as national defense, economy, engineering, and management. For example, nonlinear finite element problems, nonlinear fracture problems, circuit problems, nonlinear programming problems can all be attributed to nonlinear problems [1], [2], [3]. As long as a differential equation contains the nonlinear term of the unknown function and its derivative, whether it is a difference method or a finite element method, the discretization will yield a nonlinear systems of equations. Furthermore, with the advent of the big data era, large-scale nonlinear problems are widely present in various application problems. Therefore, it is a common problem to solve large-scale nonlinear systems of equations of the form (1).

Assuming that the above equations have a unique solution in a certain region $D$ (i.e., the equations are consistent), the Newton method is the most commonly used method to solve this problem, but when the system is very large or the Jacobian matrix is singular, the Newton method will consume a lot of resources. At present, the algorithm for solving large-scale overdetermined nonlinear systems of equations is not perfect, but the Kaczmarz algorithm for solving large-scale overdetermined linear equations has been well developed. The classical Kaczmarz iteration composed of cyclic orthogonal projections was designed in 1937 by the Polish mathematician Stefan Kaczmarz [4]. The Kaczmarz algorithm mainly solves the following large-scale overdetermined consistent linear equations: $A x = b,$ where the matrix $A \in R^{m \times n} (m \geq n)$ , $b \in R^{m}$ , and $x$ is an $n$ dimensional unknown vector. The Kaczmarz method can be formulated as $x_{k + 1} = x_{k} + \frac{b_{i} - 〈 a_{i}, x_{k} 〉}{{‖ a_{i} ‖}_{2}^{2}} a_{i}, k = 0, 1, \dots$ where $a_{i}$ denotes the $i$ th row of $A$ , and $b_{i}$ denotes the $i$ th component of b, $i = k (m o d m) + 1$ .

Kaczmarz proved the convergence of the Kaczmarz method for (3) with an invertible coefficient matrix (m $=$ n) [4], but the convergence rate of the general Kaczmarz method is not easy to obtain, see [5], [6]. When the method sweeps through the rows of $A$ in a random manner rather than in the given order, Strohmer and Vershynin [7] proved this randomized Kaczmarz (RK) method converges with expected exponential rate when the row index $i$ is randomly chosen values in ${1, 2, \dots, m}$ with probability proportional to ${‖ a_{i} ‖}_{2}^{2}$ .

Using the idea of the RK method, a nonlinear randomized orthogonal projection method—the NRK method is proposed. The iterative formula of the NRK algorithm and its origin are shown in Section 2. But in order to avoid calculating the entire Jacobian matrix, the NRK method does not directly use the RK algorithm’s random strategy of selecting projection rows, but uses $p_{i} = \frac{{| f_{i} (x) |}^{2}}{{‖ f (x) ‖}^{2}}$ as the probability criterion to select the projection row at each step, and then prove the linear convergence of the expected value. In addition, we are surprised to find that the numerical experiment with the use of the NK algorithm, which selects a row of the Jacobian matrix in order, is better, and the NK algorithm does not need to calculate the probability for each iteration, thereby reducing the amount of calculation and computing time (CPU). Similar to the NK algorithm, the NURK algorithm is proposed. The probability that the NURK algorithm selects a projection row each time is uniformly distributed. The linear convergence in expectation of the NURK algorithm is also proved.

The nonlinear successive over-relaxation (NSOR) algorithm for solving nonlinear optimization problems has been proposed as early as the 1960s, and its specific iterative formula can be seen in [8]. It actually solves the nonlinear system of equations which is derived from the problem of minimizing a strictly convex functional. In 1984, Brewster et al. presented the convergence analysis of the NSOR algorithm [9], [10]. In order to accelerate its convergence rate, Brewster et al. proposed different selection strategies of overrelaxation parameter $ω$ [11], [12]. In addition, Ecker [13] and Vrahatis [14] et al. applied this method to solve nonlinear problems arised in various application. The NSOR algorithm is quite different from the NRK algorithm proposed in this paper, which can be compared in the following aspects. (1) Iteration formulas: The former only uses the diagonal elements of the Hessian matrix (i.e. the Jacobi matrix in this paper), and only changes one component of the unknown vector in each iteration. But the latter uses one row of Jacobi matrix to change all components of the unknown vector. (2) Scope of application: In the former method, the Hessian (Jacobian) matrix of a nonlinear problem is assumed to be positive or semi-positive definite, and its diagonal elements are not zero. However, the latter only needs Jacobian to be full column rank matrix. Furthermore, the former can only be used to solve the square nonlinear system of equations, but the latter can solve not only square but also overdetermined nonlinear system of equations. (3) Selection strategies of row index: The former method usually chooses the row index according to the order of the equations, while the latter method introduces a randomized selection strategy. In general, the NRK algorithm can be used widely, and it can achieve high convergence rate due to the introduction of randomized strategies.

Before this paper, Kowar et al. [15] introduced the idea of the Kaczmarz algorithm on the basis of Landweber iteration [16] and constructed the Landweber–Kaczmarz (LK) algorithm in 2007. Both the Landweber iteration and the LK algorithm are for solving nonlinear ill-posed problems. But the Landweber iteration is not suitable solving large-scale overdetermined nonlinear problems, and the LK algorithm’s step strategy is quite different from ours. The stochastic gradient descent (SGD) algorithm recently proposed by Bangti Jin et al. [17] can be regarded as a randomized version of the LK algorithm. The SGD algorithm selects the Jacobian rows according to the uniform probability each time, which is the same as the NURK algorithm. The difference between the two algorithms is that the NURK algorithm uses a exact step size for each iteration. However, even without the Assumption 2.1(ii)-(iv) in [17], the NURK algorithm can deliver the property of linear convergence. Moreover, Needell et al. also compared the randomized Kaczmarz algorithm with SGD algorithm in [18], and improved the RK algorithm to construct a randomized Kaczmarz algorithm with partially biased sampling. But if this algorithm is applied to the problem of solving nonlinear systems of equations, according to the idea of the nonlinear Kaczmarz algorithm mentioned later, the entire Jacobian matrix is needed, which will undoubtedly increase the amount of calculation. In order to avoid this problem, the NRK algorithm changes the probability of selecting rows of the Jacobian matrix, thereby speeding up the convergence speed.

A brief overview of this paper is given below: in Section 2, we introduce the NRK algorithm and give the proof of the convergence of the NRK algorithm; in Section 3, the NK and NURK algorithms and the corresponding proof of convergence are given; in Section 4, we compare this kind of nonlinear Kaczmarz algorithm with the SGD algorithm. in Section 5, we present numerical experiments using the three nonlinear Kaczmarz algorithms to solve classical nonlinear system of equations. The final Section 6 gives relevant summaries and conclusions.

The following notations are used in this paper. On the Euclidean space $R^{n}$ , let $‖ \cdot ‖$ be the Euclidean norm. If $A \in R^{m \times n}$ , ${a_{1}^{T}, \dots, a_{n}^{T}}$ denotes the set of the rows of $A$ . The $i$ th row and $j$ th column of matrix $A$ is denoted by the $A_{i j}$ ( $i, j \in {1, 2, \dots, n}$ ). $‖ A (i, :) ‖$ $(i = 1, 2, \dots, m)$ represents the norm of the $i$ th row of matrix $A$ . ${‖ A ‖}_{F} = \sqrt{\sum_{i, j} A_{i j}^{2}}$ means the Frobenius norm. $A^{+}$ is the generalized inverse of matrix $A$ . ${‖ A^{+} ‖}_{2} = \frac{1}{σ_{m i n}}$ , $σ_{m i n}$ is the smallest non-zero singular value of the matrix $A$ , $κ_{F} (A) = {‖ A ‖}_{F} {‖ A^{+} ‖}_{2}$ .

Section snippets

Nonlinear randomized Kaczmarz algorithm

Note that $f^{'} (x)$ is the Jacobian matrix at $x$ and $\nabla f_{i} (x)$ is $i$ th row, where $f (x)$ is ${(f_{1} (x), \dots, f_{m} (x))}^{T}$ . The linear approximation can be obtained by the Taylor expansion at $x^{(k)}$ and truncated after the first derivative term : $f (x) \approx f (x^{(k)}) + f^{'} (x^{(k)}) (x - x^{(k)}) .$ The approximate solution $x^{(k + 1)}$ of $f (x) = 0$ can be approximated by a series of hyperplanes $f_{i} (x^{(k)}) + 〈 \nabla f_{i} (x^{(k)}), x - x^{(k)} 〉 = 0, i = 1, 2, \dots, m,$ i.e., $〈 \nabla f_{i} (x^{(k)}), x 〉 = - f_{i} (x^{(k)}) + 〈 \nabla f_{i} (x^{(k)}), x^{(k)} 〉, i = 1, 2, \dots, m,$ where the normal vector of the $i$ th hyperplane is $\nabla f_{i} (x^{(k)})$ , and

Nonlinear Kaczmarz algorithm and nonlinear uniformly randomized Kaczmarz algorithm

The ideas and iterative formulas of the NK algorithm and the NURK algorithm are similar to those of the NRK algorithm. The only difference is the choice of rows. The NK algorithm selects the projection rows for each iteration in order, while the NURK algorithm randomly selects rows according to uniformly distributed probability. The process of the two algorithms is given below: when the number of nonlinear equations is very large, the number of iteration steps will increase accordingly.

Comparison with SGD algorithm

Since the nonlinear randomized Kaczmarz algorithm proposed in this paper is similar to the SGD algorithm recently proposed by Bangti Jin et al. in terms of iterative formula, this section will briefly compare the two algorithms. Moreover, Needell et al. have also compared these two kinds of algorithms, and improved the randomized Kaczmarz algorithm, and then proposed the Randomized Kaczmarz with partially biased sampling algorithm. The basic introduction of these two kinds of algorithms is

Numerical experiment

In this section results of numerical experiments of the NRK algorithm, the NK algorithm, the Newton method [22], the Gauss–Newton method [23] and the SCD algorithm [17] are given. We compare the performance of several algorithms from the aspects of the iteration steps (IT) and computing time (CPU). The numerical results in this paper are the average of 50 experiments. The termination criterion of the iteration is $‖ r (x) ‖ \leq ε$ ( $r (x)$ means nonlinear residual, $ε = 1 0^{- 3}$ ) or that the number of iterations

Conclusions

The main contributions of this paper are: (1) a kind of nonlinear Kaczmarz algorithms with new probability selection methods are proposed, (2) the analysis of the linear convergence in expectation of the NRK algorithm and NURK algorithm are completed, (3) the comparison with the SGD algorithm. The novelty of these algorithms mainly lies in the full use of the idea of Kaczmarz algorithm. The introduction of the randomized selection strategy accelerates the convergence rate of NRK algorithm.

Acknowledgments

The authors are most grateful to the anonymous referees for their constructive comments and helpful suggestions, which greatly improved the quality of this paper.

References (27)

EckerA. et al.
A system of simultaneous non-linear equations in three-thousand variables
J. Comput. Phys.
(1986)
VrahatisM.N. et al.
A convergence-improving iterative method for computing periodic orbits near bifurcation points
J. Comput. Phys.
(1990)
ByrdR.H. et al.
Representations of quasi-Newton matrices and their use in limited memory methods
Math. Program.
(1994)
HeinzW. et al.
Variational methods in imaging
Siam Rev.
(2010)
T. Schuster, B. Kaltenbacher, B. Hofmann, K.S. Kazimierski, Regularization Methods in Banach Spaces, De Gruyter,...
KaczmarzS.
Angenaherte auflosung von systemen linearer gleichungenti bulletin international de lacademie polonaise des sciences et des lettres
Classe Sci. Math. Natl. Ser. A Sci. Math.
(2020)
C. Popa, Projection algorithms-classical results and developments, https://www.lap-publishing.com/,...
ScherzerO. et al.
Kaczmarz methods for regularizing nonlinear ill-posed equa- tions ii: Applications
Inverse Probl. Imaging
(2017)
StrohmerT. et al.
A randomized kaczmarz algorithm with exponential convergence
J. Fourier Anal. Appl.
(2009)
KannanR.
Relaxation Methods in Nonlinear Problems
(1984)

BrewsterM.E. et al.

Nonlinear successive over-relaxation

Numer. Math.

(1984)

BrewsterM.E. et al.

Global convergence of nonlinear successive overrelaxation via linear theory

Computing

(1985)

BrewsterM.E. et al.

Varying relaxation parameters in nonlinear successive overrelaxation

Computing

(1985)

Cited by (0)

^☆: This research is supported by National Key Research and Development program of China (2019YFC1408400) and the Fundamental Research Funds for the Central Universities (19CX05003A-2).

View full text

Nonlinear Kaczmarz algorithms and their convergence☆

Abstract

Introduction

Section snippets

Nonlinear randomized Kaczmarz algorithm

Nonlinear Kaczmarz algorithm and nonlinear uniformly randomized Kaczmarz algorithm

Comparison with SGD algorithm

Numerical experiment

Conclusions

Acknowledgments

J. Comput. Phys.

J. Comput. Phys.

Representations of quasi-Newton matrices and their use in limited memory methods

Math. Program.

Variational methods in imaging

Siam Rev.

Angenaherte auflosung von systemen linearer gleichungenti bulletin international de lacademie polonaise des sciences et des lettres

Classe Sci. Math. Natl. Ser. A Sci. Math.

Kaczmarz methods for regularizing nonlinear ill-posed equa- tions ii: Applications

Inverse Probl. Imaging

A randomized kaczmarz algorithm with exponential convergence

J. Fourier Anal. Appl.

Relaxation Methods in Nonlinear Problems

Nonlinear successive over-relaxation

Numer. Math.

Global convergence of nonlinear successive overrelaxation via linear theory

Computing

Varying relaxation parameters in nonlinear successive overrelaxation

Computing