A variable transformation to box constraints for solving variational inequalities

Banz, Lothar; Schröder, Andreas

doi:10.1007/s40314-024-03066-x

A variable transformation to box constraints for solving variational inequalities

Open access
Published: 22 January 2025

Volume 44, article number 135, (2025)
Cite this article

Download PDF

You have full access to this open access article

Computational and Applied Mathematics Aims and scope Submit manuscript

A variable transformation to box constraints for solving variational inequalities

Download PDF

268 Accesses
Explore all metrics

Abstract

For a class of linear variational inequalities resulting, e.g., from Signorini contact problems, a simple variable transformation (or synonymously a basis transformation) can be used to convert the linear inequality constraints into simple box constraints. This enables the use of efficient, stable and simple to implement iterative solvers. This manuscript shows how this variable transformation can be implicitly incorporated within an accelerated projected SOR scheme (APSOR) as well as the primal-dual active set method. In particular, some possible acceleration steps are proposed and the global convergence of the APSOR scheme is proven. Moreover, some implementational aspects are discussed and it is demonstrated that the accelerated projected symmetric SOR scheme is competitive to the locally quadratic converging primal-dual active set method.

A Penalty Approach for a Box Constrained Variational Inequality Problem

Article 22 May 2018

A structure-preserving pivotal method for affine variational inequalities

Article 03 March 2017

Projection free methods on product domains

Article Open access 04 June 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In this paper we consider variational inequalities of the following type: Find $x \in K$ such that

$$\begin{aligned} (y-x)^\top A x \ge (y-x)^\top L \end{aligned}$$

(1)

for all $y \in K$ with $A\in {\mathbb {R}}^{n\times n}$, $L\in {\mathbb {R}}^n$, $B \in {\mathbb {R}}^{m\times n}$, $g \in {\mathbb {R}}^m$ and $K:=\left\{ y\in {\mathbb {R}}^n \mid By\le g\right\} $, where A is assumed to be positive definite (but not necessarily symmetric). The inequality sign $\le $ has to be understood componentwisely. Variational inequalities of this type appear in many problems of mathematical programming. For instance, they result from the discretization of contact problems and have a vast amount of related applications (Glowinski 1984; Glowinski et al. 1981; Kikuchi and Oden 1988). In many cases, the matrix A is symmetric, for instance, when A results from the discretization of a (linear elastic) contact problem with standard finite elements or boundary elements. However, the application of more advanced discretization methods, such as interior penalty discontinuous Galerkin methods, may lead to a non-symmetric A (Banz and Stephan 2014a, b).

Provided that K is non-empty, it is well known that the variational inequality (1) has a unique solution. However, the computation of this solution is by no means trivial. For particular problems there exist highly optimized well advanced solvers like monotone multigrid solvers (Kornhuber 1997). But, for more general problems or if the implementational effort should be kept small, one may resort to more basic projection-contraction methods such as presented in He (1997); Bnouhachem (2006) or to CG-based methods with proportioning and projection (Dostál 1997; Dostál and Schöberl 2005). The convergence of these methods is often guaranteed, however their applicability depend on an efficient projection back onto the set K (which is available for box constraints). And even then, the number of iterations often depends on user-chosen and problem-dependent parameters. Moreover, the realization of projections for non-box constraints is often as complicated as solving (1) itself.

We note if A is symmetric, the variational inequality problem (1) is equivalent to the strictly convex, quadratic minimization problem: Find $x \in K$ such that

$$\begin{aligned} \quad E(x)=\min _{y\in K} E(y) \end{aligned}$$

(2)

with $E(y):=\frac{1}{2} y^\top Ay-y^\top L$. For such a problem a huge number of algorithms are described in literature, e.g., Geiger and Kanzow (2002), Gill et al. (1981), Jarre and Stoer (2013), Nocedal and Wright (1999). Many of these utilize the first order optimality or KKT conditions

$$\begin{aligned} Ax + B^\top \lambda = L \end{aligned}$$

(3a)

$$\begin{aligned} Bx \le g \end{aligned}$$

(3b)

$$\begin{aligned} \lambda \ge 0 \end{aligned}$$

(3c)

$$\begin{aligned} \lambda ^\top (Bx-g)=0, \end{aligned}$$

(3d)

where $\lambda \in {\mathbb {R}}^m$ serves as a Lagrange multiplier associated to the inequality constraints. An efficient algorithm which is based on the solution of the KKT-System (3) is the primal-dual active set algorithm (Kunisch and Rendl 2003; Curtis et al. 2014). In many implementations of such optimization schemes, the general constraints $By \le g$ and simple box constraints $y \le g$ are distinguished for the sake of computational efficiency. This is even more important for projection-contraction methods as the projection onto a convex set with box constraints can be computed efficiently just by taking a simple maximum or minimum, whereas in general the projection itself is a non-trivial minimization problem. Moreover, the primal-dual active set method (Kunisch and Rendl 2003) may converge for box constraints but not necessarily for general inequality constraints (Curtis et al. 2014).

In this paper, we consider a special matrix class for the constraints matrix B, which is characterized by the property that each column of B has at most one non-zero entry. Such linear constraints are of special interest, for instance, in the discretization of contact problems and, in particular, in Signorini contact problems of linear elasticity (cf. e.g. Hlávaček et al. 1988; Kikuchi and Oden 1988) in which each row of the matrix B contains a vector which is the outer normal vector of a certain boundary point. Based on this matrix class, we introduce a variable transformation which enables the transformation of the constraints $Bx\le g$ to box constraints so that efficient and easy to implement solution schemes can be applied. For its definition we exploit some ideas from a transformation approach originally proposed by Hlavacek et al. in Hlávaček et al. (1988). The variable transformation and its inverse are easily and efficiently computable and, equally important, fit well into iterative solvers which exploit the possibly sparse structure of the matrix A. Similar variable transformation techniques for transforming linear inequality contraints into box contraints are introduced in literature. We refer to Krause (2001), Dickopf and Krause (2009), Kothari and Krause (2022), where variable transformations enable the use of projected Gauss Seidel schemes as smoother within monotone multigrid methods.

The paper is structured as follows: In Sect. 2 the matrix class for B as well as the variable transformation are introduced and analyzed. In Sect. 3, we discuss an efficient realization of the variable transformation within the projected successive overrelaxation procedure (PSOR), which is well-known in the context of obstacle problems (see, e.g. (Hlávaček et al. 1988, p. 20), (Glowinski et al. 1981, p. 67), (Glowinski 1984, p. 40) and (Hlávaček et al. 1988, p. 20)) and of simplified Signorini problems (see, e.g. (Kikuchi and Oden 1988, p. 128)). In particular, the variable transformation appears only implicitly and all variables are stated with respect to their original basis. Moreover, we discuss an acceleration approach based on the solution of low dimensional subproblems yielding the accelerated PSOR procedure (APSOR) as similarly introduced in Blum et al. (2004); Schröder and Blum (2008). In Sect. 4 a different more explicit realization of the variable transformation as a matrix-matrix-matrix–vector multiplication is presented exemplarily for the primal-dual active set method as proposed in Kunisch and Rendl (2003). Various numerical studies are presented in Sect. 5, which analyze the acceleration techniques for PSOR schemes and the effect of the variable transformation on both the number of iterations as well as the CPU-time. Detailed proofs, in particular of the global convergence of the accelerated PSOR scheme with implicit variable transformation, are placed in the appendix.

2 Transformation to box constraints

In this section, we introduce the following matrix class for the matrix B: Denoting the index set of non-zero row entries by

$$\begin{aligned} \alpha _i:=\left\{ j \in \{1,\ldots ,n\} \mid B_{ij} \ne 0\right\} \end{aligned}$$

(4)

for each row of B with index $1 \le i \le m$, we assume the condition for B that

$$\begin{aligned} \alpha _i \cap \alpha _k = \emptyset \end{aligned}$$

(5)

for all $1\le i,k\le m$ with $i \ne k$. This condition implies that in each column of B there exists at most one non-zero entry, which further implies $m \le n$ if B contains no zero rows. Note that (5) implies separable linear constraints in the sense of separable convex constraints as, for instance, described in Dostál et al. (2023).

Based on this matrix class we introduce a variable transformation which converts the linear constraints $By \le g$ to box constraints. For this purpose, we define the quantity $\gamma =(\gamma _1,\ldots ,\gamma _n)$ as

$$\begin{aligned} \gamma _j:={\left\{ \begin{array}{ll} i,& \exists i \in \{1,\ldots ,m\}: B_{ij} \ne 0\\ 0,& \text{ otherwise } \end{array}\right. } \end{aligned}$$

for the column index $j\in \{1,\ldots ,n\}$. Note that this quantity is indeed well-defined and indicates the unique row index of the non-zero entry of the j-th column of B. We further define the pivot index $\rho (i)\in \alpha _i$ for the row index $1\le i \le m$ of B by

$$\begin{aligned} |B_{i,\rho (i)}|=\max _{j\in \alpha _i}|B_{ij}|. \end{aligned}$$

Without loss of generality we assume $B_{i,\rho (i)}\ne 0$, as rows with only zero entries can be eliminated if the corresponding entry in g is non-negative. Note, if this entry were negative, the convex set K would be empty. Further note that this assumption on B means

$$\begin{aligned} \{\gamma _i\mid 1\le i\le n,\, \gamma _i\ge 1\}=\{1,\ldots ,m\} \end{aligned}$$

(6)

and that $\gamma _{\rho (i)}=i$. With these quantities at hand we specify the vectors $\sigma ,\kappa \in {{\mathbb {R}}}^n$ by

$$\begin{aligned} \sigma _i&:={\left\{ \begin{array}{ll} (1-B_{\gamma _i,i})/B_{{\gamma _i},i} & \text {if } \gamma _i\ge 1,\; i=\rho (\gamma _i)\\ -B_{{\gamma _i},i}/B_{\gamma _i,\rho (\gamma _i)} & \text {if } \gamma _i\ge 1,\; i\ne \rho (\gamma _i)\\ 0 & \text{ otherwise } \end{array}\right. }\\ \kappa _i&:={\left\{ \begin{array}{ll} B_{\gamma _i,i}-1 & \text {if } \gamma _i\ge 1,\; i=\rho (\gamma _i)\\ B_{\gamma _i,i} & \text {if } \gamma _i\ge 1,\; i\ne \rho (\gamma _i)\\ 0 & \text{ otherwise } \end{array}\right. } \end{aligned}$$

for $1\le i\le n$. Eventually, we define the matrix $M^{(\xi )}\in {{\mathbb {R}}}^{n\times n}$ for a vector $\xi \in {{\mathbb {R}}}^n$ by

$$\begin{aligned} M^{(\xi )}_{ij}:={\left\{ \begin{array}{ll} \xi _j & \text {if } \gamma _j\ge 1,\; i=\rho (\gamma _j)\\ 0 & \text{ otherwise } \end{array}\right. } \end{aligned}$$

for $1\le i,j \le n$. The following lemma states that for $y\in {\mathbb {R}}^n$

$$\begin{aligned} \hat{y}:=(I+M^{(\sigma )})^{-1} y \end{aligned}$$

(7)

defines as a variable transformation (i.e. $I+M^{(\sigma )}$ is regular) and that the back- and forward-transformation can be computed explicitly as the following Lemma shows. In this paper, we call (7) box constraints transformation or bc-transformation for short.

Lemma 1

The matrices $I+M^{(\sigma )}$ and $I+M^{(\kappa )} $ are regular. Moreover, it holds

$$\begin{aligned} (I+M^{(\sigma )})^{-1}=I+M^{(\kappa )}. \end{aligned}$$

Proof

Let $k,i\in \{1,\ldots ,n\}$. First, we note that

$$\begin{aligned} \left( [I+M^{(\kappa )}][I+M^{(\sigma )}]\right) _{ki}=\delta _{ki}+M^{(\kappa )}_{ki}+M^{(\sigma )}_{ki}+(M^{(\kappa )} M^{(\sigma )})_{ki} \end{aligned}$$

with

$$\begin{aligned} \left( M^{(\kappa )} M^{(\sigma )}\right) _{ki}&=\sum _{r=1}^n M^{(\kappa )}_{kr}M^{(\sigma )}_{ri}\nonumber \\&={\left\{ \begin{array}{ll} \kappa _r \sigma _i, & \gamma _i\ge 1,\; \gamma _r\ge 1,\; k=\rho (\gamma _r),\; r=\rho (\gamma _i),\\ 0, & \text{ otherwise }. \end{array}\right. } \end{aligned}$$

(8)

If the first condition in (8) holds, we have $k=\rho (\gamma _{\rho (\gamma _i)})=\rho (\gamma _i)=r$ and $\gamma _r = \gamma _{\rho (\gamma _i)} = \gamma _i$. Thus, this condition reduces to $\gamma _i\ge 1$ and $k=\rho (\gamma _i)$. Moreover, it holds

$$\begin{aligned} \kappa _{\rho (\gamma _i)}=B_{\gamma _{\rho (\gamma _i)},\rho (\gamma _i)}-1=B_{\gamma _i,\rho (\gamma _i)}-1. \end{aligned}$$

Hence,

$$\begin{aligned} \left( M^{(\kappa )} M^{(\sigma )}\right) _{ki}={\left\{ \begin{array}{ll} (B_{\gamma _i,\rho (\gamma _i)}-1) \sigma _i, & \gamma _i\ge 1,\;k=\rho (\gamma _i),\\ 0, & \text{ otherwise } \end{array}\right. } \end{aligned}$$

and, thus,

$$\begin{aligned}&\left( [I+M^{(\kappa )}][I+M^{(\sigma )}]\right) _{ki}\\&\quad = \delta _{ki} + M^{(\kappa )}_{ki}+M^{(\sigma )}_{ki}+{\left\{ \begin{array}{ll} (B_{\gamma _i,\rho (\gamma _i)}-1)\sigma _i, & \gamma _i\ge 1,\; k=\rho (\gamma _i)\\ 0, & \text{ otherwise } \end{array}\right. }\\&\quad =\delta _{ki}+ {\left\{ \begin{array}{ll} \kappa _i+B_{\gamma _i,\rho (\gamma _i)}\sigma _i, & \gamma _i\ge 1,\; k=\rho (\gamma _i)\\ 0, & \text{ otherwise }. \end{array}\right. } \end{aligned}$$

It remains to be showed that $\kappa _i+B_{\gamma _i,\rho (\gamma _i)}\sigma _i=0$ for $\gamma _i\ge 1$ with $k=\rho (\gamma _i)$. Assume $i = \rho (\gamma _i)$ then

$$\begin{aligned} \kappa _i+B_{\gamma _i,\rho (\gamma _i)}\sigma _i = B_{\gamma _i,i}-1 + B_{\gamma _i,\rho (\gamma _i)} \frac{(1-B_{\gamma _i,i})}{B_{{\gamma _i},i}} = 0. \end{aligned}$$

For the case $i \ne \rho (\gamma _i)$ we obtain

$$\begin{aligned} \kappa _i+B_{\gamma _i,\rho (\gamma _i)}\sigma _i = B_{\gamma _i,i}- B_{\gamma _i,\rho (\gamma _i)} \frac{B_{\gamma _i,i}}{B_{\gamma _i,\rho (\gamma _i)}} =0, \end{aligned}$$

which completes the proof. $\square $

To get a better understanding of the bc-transformation we look at the following illustrative example: Let the matrix $B\in {\mathbb {R}}^{3\times 8}$ be given as

$$\begin{aligned} B:=\left( \begin{matrix} 0 & 0 & -2 & 0 & 9 & 5 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 3\\ 6 & 0 & 0 & -7 & 0 & 0 & 0 \end{matrix}\right) \end{aligned}$$

which obviously fulfills condition (5). We get

$$\begin{aligned} & \alpha _1=\{3,5,6\},\quad \alpha _2=\{7\},\quad \alpha _3=\{1,4\},\\ & \rho (1)=5,\quad \rho (2)=7,\quad \rho (3)=4 \end{aligned}$$

and

$$\begin{aligned} \gamma&=(3,0,1,3,1,1,2)\\ \sigma&=\left( \frac{6}{7},0,\frac{2}{9},-\frac{8}{7},-\frac{8}{9},-\frac{5}{9},-\frac{2}{3}\right) \\ \kappa&=\left( 6,0,-2,-8,8,5,2\right) . \end{aligned}$$

This gives the back- and forward-transformation matrices

$$\begin{aligned} I+M^{(\sigma )}=\left( \begin{matrix} 1 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 1 & 0 & 0 & 0 & 0\\ \frac{6}{7} & 0 & 0 & -\frac{1}{7} & 0 & 0 & 0\\ 0 & 0 & \frac{2}{9} & 0 & \frac{1}{9} & -\frac{5}{9} & 0\\ 0 & 0 & 0 & 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & \frac{1}{3}\\ \end{matrix}\right) ,\quad I+M^{(\kappa )}=\left( \begin{matrix} 1 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 1 & 0 & 0 & 0 & 0\\ 6 & 0 & 0 & -7 & 0 & 0 & 0\\ 0 & 0 & -2 & 0 & 9 & 5 & 0\\ 0 & 0 & 0 & 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 3\\ \end{matrix}\right) . \end{aligned}$$

The bc-transformation (7) is well adapted to the constraining matrix B as the matrix

$$\begin{aligned} P:=B(I+M^{(\sigma )}) \end{aligned}$$

(9)

becomes a (generalized) permutation matrix.

Lemma 2

Let $\hat{y}\in {\mathbb {R}}^n$. Then,

$$\begin{aligned} (P\hat{y})_j=\hat{y}_{\rho (j)} \end{aligned}$$

for $1\le j\le m$.

Proof

For $\tilde{y}:=M^{(\sigma )} \hat{y}$ we conclude from the definition of $M^{(\sigma )}$

$$\begin{aligned} \tilde{y}_i={\left\{ \begin{array}{ll} \sum _{r \in \alpha _{\gamma _i}} \sigma _r \hat{y}_r, & i=\rho (\gamma _i)\\ 0 & \text{ otherwise } \end{array}\right. } \end{aligned}$$

for $1\le i\le n$ and, therewith,

$$\begin{aligned} (P\hat{y})_j&= \left( B(I+M^{(\sigma )})\hat{y}\right) _j \\&= (B\hat{y})_j+(B \tilde{y})_j \\&= (B\hat{y})_j+B_{j,\rho (j)} \sum _{r \in \alpha _j} \sigma _r \hat{y}_r \\&= (B\hat{y})_j+B_{j,\rho (j)} \sigma _{\rho (j)} \hat{y}_{\rho (j)} + B_{j,\rho (j)} \sum _{r \in \alpha _j,\; r\ne \rho (j)} \sigma _r \hat{y}_r\\&=(B\hat{y})_j+B_{j,\rho (j)}\frac{1-B_{\gamma _{\rho (j)},\rho (j)}}{B_{\gamma _{\rho (j)},\rho (j)}} \hat{y}_{\rho (j)} - \sum _{r \in \alpha _j,\; r\ne \rho (j)} B_{j,r} \hat{y}_r \\&=(B\hat{y})_j+(1-B_{j,\rho (j)}) \hat{y}_{\rho (j)} - \sum _{r \in \alpha _j,\; r\ne \rho (j)} B_{j,r} \hat{y}_r \\&= \hat{y}_{\rho (j)}. \end{aligned}$$

$\square $

We can now prove that (1) is equivalent to the variational inequality: Find $\hat{x} \in \hat{K}$ such that

$$\begin{aligned} (\hat{y}-\hat{x})^\top \hat{A} \hat{x} \ge (\hat{y}-\hat{x})^\top \hat{L} \end{aligned}$$

(10)

for all $\hat{y} \in \hat{K}:= \{ \hat{y} \in {\mathbb {R}}^n \mid \hat{y} \le \hat{g} \}$, where

$$\begin{aligned} \hat{A}:=(I+M^{(\sigma )})^\top A (I+M^{(\sigma )}), \quad \hat{L}:=(I+M^{(\sigma )})^\top L \end{aligned}$$

and $\hat{g}\in ({\mathbb {R}}\cup \{\infty \})^n$ with

$$\begin{aligned} \hat{g}_i:={\left\{ \begin{array}{ll} g_{\gamma _i},& \gamma _i\ge 1, i=\rho (\gamma _i),\\ \infty ,& \text{ otherwise, } \end{array}\right. }\quad 1\le i\le n. \end{aligned}$$

Clearly, introducing $\hat{g} \in {\mathbb {R}}^n$ may increase the number of constraints, but a constraint with $\hat{g}_i = \infty $ is always inactive. In particular, we have $\hat{x}\in \hat{K}$ if and only if

$$\begin{aligned} \hat{x}_{\rho (\gamma _i)}\le g_{\gamma _i} \end{aligned}$$

(11)

for all $1\le i\le n$ with $\gamma _i\ge 1$. We further note that $\hat{A}$ is positive definite if this is true for the matrix A and that the matrix $\hat{A}$ is symmetric if and only if A is symmetric. In the symmetric case, A and $\hat{A}$ have the same number of positive, negative and zero eigenvalues, resulting from Sylvester’s law of inertia.

Theorem 3

If $x\in K$, then $\hat{x}:= (I+M^{(\kappa )})x\in \hat{K}$. Furthermore, if x solves (1), then $\hat{x}$ solves (10). Reversely, if $\hat{x}\in \hat{K}$, then $x:=(I+M^{(\sigma )})\hat{x}\in K$. Moreover, if $\hat{x}$ solves (10), then x solves (1).

Proof

Let $1\le i\le n$ with $\gamma _i\ge 1$ and let $x\in K$ solve (1). Using Lemma 2 we have $\hat{x}_{\rho (\gamma _i)}=(P\hat{x})_{\gamma _i}=(Bx)_{\gamma _i}\le g_{\gamma _i}$, which is $\hat{x}\in \hat{K}$ due to (11). Using again Lemma 2 we conclude for $\hat{y}\in \hat{K}$ and $y:=(I+M^{(\sigma )})\hat{y}$ that $(By)_{\gamma _i}=(P\hat{y})_{\gamma _i}={\hat{y}}_{\rho (\gamma _i)}\le g_{\gamma _i}$. Thus, (6) gives $y\in K$ and we verify

$$\begin{aligned} (\hat{y}-\hat{x})^\top \hat{A} \hat{x} = (y-x)^\top A x \ge (y-x)^\top L = (\hat{y}-\hat{x})^\top \hat{L}, \end{aligned}$$

which shows that $\hat{x}$ solves (10). Now, let $\hat{x}\in \hat{K}$ solve (10), then Lemma 2 and (11) yield $(Bx)_{\gamma _i}=(P\hat{x})_{\gamma _i}=\hat{x}_{\rho (\gamma _i)}\le g_{\gamma _i}$, which gives $x\in K$ due to (6). Applying Lemma 2 we conclude for $y\in K$ and $\hat{y}:=(I+M^{(\kappa )})y$ that $\hat{y}_{\rho (\gamma _i)}=(P\hat{y})_{\gamma _i}=(By)_{\gamma _i}\le g_{\gamma _i}$. Thus, (11) gives $\hat{y}\in \hat{K}$ and

$$\begin{aligned} (y-x)^\top A x = (\hat{y}-\hat{x})^\top A \hat{x} \ge (\hat{y}-\hat{x})^\top \hat{L} = (y-x)^\top L, \end{aligned}$$

which shows that x solves (10). $\square $

From the relation between x and $\hat{x}$ as stated in Theorem 3 we easily obtain the following result for the minimization problem: Find $\hat{x} \in \hat{K}$ such that

$$\begin{aligned} \hat{E}(\hat{x})=\min _{\hat{y}\in \hat{K}} \hat{E}(\hat{y}) \end{aligned}$$

(12)

with $\hat{E}(\hat{y}):=\frac{1}{2} \hat{y}^\top \hat{A}\hat{y}-\hat{y}^\top \hat{L}$.

Corollary 4

It holds that $x\in K$ solves (2), if and only if $\hat{x} = (I+M^{(\sigma )})^{-1} x\in \hat{K}$ solves (12). Moreover, it holds $\hat{E}(\hat{y})=E((I+M^{(\sigma )}) \hat{y})$ for all $\hat{y}\in {\mathbb {R}}^n$.

Furthermore, we get the following result for the KKT-system: Find $(\hat{x},\hat{\lambda })\in {\mathbb {R}}^n\times {\mathbb {R}}^m$ such that

$$\begin{aligned} \hat{A} \hat{x} + P^\top \hat{\lambda } = \hat{L} \end{aligned}$$

(13a)

$$\begin{aligned} P\hat{x} \le g \end{aligned}$$

(13b)

$$\begin{aligned} \hat{\lambda } \ge 0 \end{aligned}$$

(13c)

$$\begin{aligned} \hat{\lambda }^\top (P\hat{x}-g)=0. \end{aligned}$$

(13d)

Corollary 5

It holds that $(x,\lambda )\in {\mathbb {R}}^n\times {\mathbb {R}}^m$ solves (3) if and only if $(\hat{x},\hat{\lambda }):=((I+M^{(\sigma )})^{-1}x, \lambda )$ solves (13).

Theorem 3 suggests solving (1) just by using an appropriate solution procedure for the transformed problem (10). However, keeping in mind that A and $M^{(\sigma )}$ are (typically) sparse matrices, it is often not efficient to explicitly compute the matrix product $\hat{A}=(I+M^{(\sigma )})^\top A (I+M^{(\sigma )})$ with respect to CPU-time and/or memory consumption. For instance, if $B\in {\mathbb {R}}^{1\times n}$ with $\alpha _1=\{1,\ldots ,n\}$, $\hat{A}$ is dense, i.e. the sparsity pattern of B can lead to a significant increase of the number of non-zero entries of $\hat{A}$. Therefore, the direct application of solution algorithms to (10) might not be appropriate. In the following two sections we demonstrate exemplarily how the bc-transformation (7) can be embedded into iterative solvers such as the (accelerated) projected successive overrelaxation (PSOR and APSOR) and the primal-dual active set algorithm. In the PSOR algorithms the bc-transformation appears only implicitly with all the iterates being stated in the original variable x. Whereas, in the active set algorithm the bc-transformation appears more explicitly as a matrix-matrix-matrix–vector multiplication with all the iterates being stated in the transformed variable $\hat{x}$.

3 Projective successive overrelaxation with the bc-transformation

In this section, we present the projective successive overrelaxation procedures PSOR and APSOR which include the bc-transformation (7). The APSOR procedure contains an additional acceleration step, in which a low dimensional minimization problem is solved after each complete PSOR step. Both procedures need one additional auxiliary vector $z\in {\mathbb R}^m$, which enables access to the transformed variables without calculating them explicitly. For the computation of z we use the mapping $\mathcal{M}^{(\xi )}:{\mathbb {R}}^n\rightarrow {\mathbb {R}}^m$ given by

$$\begin{aligned} \left( \mathcal{M}^{(\xi )}(x)\right) _i:=\sum _{k \in \alpha _i} \xi _k x_k \end{aligned}$$

for some $\xi \in {\mathbb {R}}^n$. We denote the i-th row vector of A by $A_{i,:}$ and define $\epsilon \in {\mathbb {R}}^n$ by

$$\begin{aligned} \epsilon _i:={\left\{ \begin{array}{ll} \omega \left( [1+\sigma _i] A_{ii}\right) ^{-1},& \gamma _i\ge 1,\; i=\rho (\gamma _i)\\ \omega \left( A_{ii}+2\sigma _i A_{\rho (\gamma _i),i}+\sigma _i^2 A_{\rho (\gamma _i),\rho (\gamma _i)}\right) ^{-1},& \gamma _i\ge 1,\; i\ne \rho (\gamma _i)\\ \omega A_{ii}^{-1},& \text {otherwise} \end{array}\right. } \end{aligned}$$

(14)

for $1\le i\le n$ and with the relaxation parameter $\omega \in (0,2)$. Furthermore, we define

$$\begin{aligned} \Theta _D(x):=\{\beta \in {\mathbb {R}}^r\mid x+D\beta \in K\} \end{aligned}$$

and

$$\begin{aligned} \alpha _D(x):=\operatorname {argmin}_{\beta \in \Theta _{D}(x)} E\left( x+D\beta \right) \end{aligned}$$

with some $D\in {\mathbb {R}}^{n\times r}$.

In Algorithm 1 the PSOR procedure and the APSOR procedure with the bc-transformation are simultaneously described. In particular, omitting the acceleration step gives the PSOR procedure. For the APSOR procedure we assume that A is symmetric to ensure that the minimization problem in the acceleration step has a unique solution. The column vectors $D_{:,i}$ of D in the acceleration step can be interpreted as search directions. Their choice is, in principle arbitrary. Choosing $D_{:,i}$ as the differences of the last r iterations might be a practical selection since active components cancel out to zero in $D_{:,i}$ such that these components are not changed by the acceleration step. We refer to Blum et al. (2004), Schröder and Blum (2008) where the differences of iterations also are used in a similar acceleration approach for the projective SOR procedure. Note that choosing $r \ge 2$ linearly independent search directions may make the solution of the minimization problem in the acceleration step relatively challenging.

In order to formally describe a sequence of iterates computed by Algorithm 1 let the mappings $S^*_i:{\mathbb {R}}^n\times {\mathbb {R}}^m\rightarrow {\mathbb {R}}^n$ and $T^*_i:{\mathbb {R}}^n\times {\mathbb {R}}^m\rightarrow {\mathbb {R}}^m$ be defined such that $S^*_i(x,z)$ returns the updated x and $T^*_i(x,z)$ returns the updated z of the i-th PSOR substep. Define

$$\begin{aligned} (S^{*,i},T^{*,i}):=(S^*_i,T^*_i)\circ \ldots \circ (S^*_1,T^*_1) \end{aligned}$$

for $i\in \{1,\ldots ,n\}$ and for $D\in {\mathbb {R}}^{n\times r}$ set

$$\begin{aligned} (S^*_D,T^*_D):=(H_D,G_D)\circ (S^{*,n},T^{*,n}) \end{aligned}$$

with $H_D:{\mathbb {R}}^n\times {\mathbb {R}}^m\rightarrow {\mathbb {R}}^n$ and $G_D:{\mathbb {R}}^n\times {\mathbb {R}}^m\rightarrow {\mathbb {R}}^m$ defined as $H_D(x,z):=x$ and $G_D(x,z):=z$, respectively, if the acceleration step in Algorithm 1 is omitted (PSOR). If Algorithm 1 includes the acceleration step (APSOR), $H_D(x,z)$ returns the modified x and $G_D(x,z)$ the modified z as computed in the acceleration step, i.e.

$$\begin{aligned} H_D(x{,z}):=x+D\alpha _D(x) \end{aligned}$$

and

$$\begin{aligned} G_D(x,z):=-{\mathcal {M}}^{(\kappa )}(H_D(x,z)). \end{aligned}$$

(15)

Therewith, Algorithm 1 generates the sequence $\{x^{l},z^{l}\}_{l\in {\mathbb {N}}}$ with $x^0\in {\mathbb {R}}^n$, $z^0:=-\mathcal{M}^{(\kappa )} (x^0)$ and

$$\begin{aligned} (x^{l+1},z^{l+1}):=(S^*_{D^l},T^*_{D^l})(x^{l},z^{l}), \end{aligned}$$

(16)

where $D^l\in {\mathbb {R}}^{n\times r}$ denotes the matrix as chosen in the l-th acceleration step of the APSOR procedure. In the PSOR case we simply set $D^l:=0$.

The acceleration step in the APSOR procedure aims to further reduce the iteration error $\Vert x - x^l\Vert $ of the iterate $x^l$ by solving a low dimensional minimization problem. This may be motivated by the observation that (as A is symmetric) the difference in E is an upper bound for the error induced by A, i.e.

$$\begin{aligned} \Vert x - x^l\Vert _A^2\le 2 \left( E(x^l) - E(x)\right) \end{aligned}$$

with $\Vert x\Vert _A^2:=x^\top Ax$ (which directly results from (1)).

We note that it may not be necessary to solve the minimization problem in the acceleration step exactly. It could be sufficient to find an $\tilde{\alpha } \in \Theta _D(x)$ such that $E(x+D\tilde{\alpha })\le E(x)$, i.e. a new feasible point which does not increase the energy value. Hence, we may simply solve the unconstrained minimization problem to obtain a descend direction with respect to the energy value and afterwards perform some simple line search backtracking to guarantee feasibility. The numerical results suggest that this proceeding leads to less active constraints for the new iterate giving more flexibility to the next PSOR step. The acceleration step can be formulated in the following way: Let $\mu \in (0,1)$ and $q\in {\mathbb {N}}_0$ and define

$$\begin{aligned} \gamma _D(x):=\operatorname {argmin}_{\beta \in {\mathbb {R}}^r} E\left( x+D\beta \right) \end{aligned}$$

and

$$\begin{aligned} \xi _D(x,\gamma ):=\max \left\{ \{0\}\cup \left\{ \mu ^\ell \mid \ell \in \{0,\ldots ,q\},\; B (x+\mu ^\ell D\gamma )\le g\right\} \right\} \end{aligned}$$

(17)

for $D\in {\mathbb {R}}^{n\times r}$.

If Algorithm 1 is performed with this line search acceleration step (LSAPSOR), let $H_D(x,z)$ return the modified x and $G_D(x,z)$ the modified z as computed in this acceleration step, i.e.

$$\begin{aligned} H_D(x):=x+\xi _D(x,\gamma (x)) D\gamma _D(x). \end{aligned}$$

We note that $G_D$ still fulfills (15). With this extended definition of the mappings $H_D$ and $G_D$ the sequence $\{x^l,z^l\}_{l\in {\mathbb {N}}}$, as defined in (16), also includes the LSAPSOR scheme. Indeed, due to the convexity of E we have

$$\begin{aligned}&E(x+\xi _D(x,\gamma _D(x)) D\gamma _D(x))\\&=E((1-\xi _D(x,\gamma _D(x)))x+\xi _D(x,\gamma _D(x)) (x+D\gamma _D(x)))\\&\le (1-\xi _D(x,\gamma _D(x))) E(x)+\xi _D(x,\gamma _D(x)) E(x+D\gamma _D(x))\\&\le (1-\xi _D(x,\gamma _D(x))) E(x)+\xi _D(x,\gamma _D(x)) E(x)\\&= E(x). \end{aligned}$$

We further note that

$$\begin{aligned} \left( \gamma _D(x)\right) _1 =-\frac{(Ax-L)^\top D_{:,1}}{(AD_{:,1})^\top D_{:,1}} \end{aligned}$$

if $r=1$ and

$$\begin{aligned} \left( \gamma _D(x)\right) _1&=\frac{d_0 - \left( \gamma _D(x)\right) _2 d_1}{d_2}\\ \left( \gamma _D(x)\right) _2&=\frac{D_{:,2}^\top (L - Ax)d_2 - d_0 d_1}{(D_{:,2}^\top A D_{:,2})d_2 - d_1^2} \end{aligned}$$

with $d_0:= D_{:,1}^\top (L - Ax)$, $d_1:=D_{:,1}^\top A D_{:,2}$ and $d_2:=D_{:,1}^\top A D_{:,1}$ if $r=2$.

Theorem 6

Let A be symmetric and positive definite and let $x\in K$ be the solution of (1). Furthermore, let $\{x^l\}_{l\in {\mathbb {N}}}$ be generated by the PSOR, the APSOR or the LSAPSOR scheme as defined in (16). Then, the sequence $\{E(x^l)\}_{l\in {\mathbb {N}}}$ is monotonic decreasing and

$$\begin{aligned} \lim _{l\rightarrow \infty } x^{l}=x. \end{aligned}$$

Proof

The proof can be found in the appendix. $\square $

Remark 1

The PSOR procedure is closely related to the projective symmetric successive overrelaxation (PSSOR). For this procedure an additional backward PSOR step in Algorithm 1 has to be performed. Its analysis mainly requires some (notational) modifications in the proof of Theorem 6 as well as in the proofs of Lemmas 10, 11 and 13 as presented in the appendix. In particular, Theorem 6 remains valid for the PSSOR procedure and its accelerated variants APSSOR and LSAPSSOR.

Remark 2

The application of the bc-transformation to include the constraints given by condition (5) may not be restricted to the PSOR scheme and its accelerated variants. This iterative scheme can be seen as an example, where an implicit realization of the bc-transformation is beneficial. Other types of methods (such as CG-based methods introduced, e.g., in Dostál 1997; Dostál and Schöberl 2005; Nocedal and Wright 1999) developed for solving (1) with box constraints may benefit from an implicit realization of the bc-transformation as well. The use of the PSOR scheme (or its accelerated variants (LS)(A)P(S)SOR) as smoother in monotone multigrid methods (see, e.g., Kornhuber 1997; Krause 2001; Dickopf and Krause 2009; Kothari and Krause 2022) in combination with the bc-transformation may also have potential in the context of solving large-scale contact problems.

4 Primal-dual active set solver with the bc-transformation

A widely used method to solve the KKT-system (13) is the so-called active set method. In this algorithm an index set, the active set ${\mathcal {A}}\subset \{1,\ldots ,n\}$, is iteratively determined, which contains exactly those indices, for which the constraint (13b) is active. In each iteration step a corresponding equality constrained minimization problem is solved. If both the primal variable $\hat{x}\in {\mathbb {R}}^n$ and the dual variable $\hat{s}\in {\mathbb {R}}^n$ are used to update the active set ${\mathcal {A}}$ the algorithm is called primal-dual active set method (Kunisch and Rendl 2003). The basic procedure of the primal-dual active set algorithm for problems with box-constraint is stated in the following way: Given an active set ${\mathcal {A}}$ and its complement ${\mathcal {I}}:=\{1,\ldots ,n\}\backslash {\mathcal {A}}$, compute $\hat{x}$ and $\hat{s}$ such that

$$\begin{aligned} \begin{pmatrix} \hat{A}_{{\mathcal {A}},{\mathcal {A}}} & \hat{A}_{{\mathcal {A}},{\mathcal {I}}}\\ \hat{A}_{{\mathcal {I}},{\mathcal {A}}} & \hat{A}_{{\mathcal {I}},{\mathcal {I}}}\\ \end{pmatrix} \begin{pmatrix} \hat{x}_{\mathcal {A}}\\ \hat{x}_{\mathcal {I}} \end{pmatrix} + \begin{pmatrix} \hat{s}_{\mathcal {A}}\\ \hat{s}_{\mathcal {I}} \end{pmatrix} = \begin{pmatrix} \hat{L}_{\mathcal {A}}\\ \hat{L}_{\mathcal {I}} \end{pmatrix} \end{aligned}$$

(18)

with $\hat{s}_{\mathcal {I}}=0$, $\hat{x}_{\mathcal {A}}=\hat{g}_{\mathcal {A}}$ and update ${\mathcal {A}}=\left\{ i\in \{1,\ldots n\} \mid \hat{x}_i>\hat{g}_i \text { or } \hat{s}_i>0 \right\} $. Here, the subvector $\hat{x}_{{\mathcal {A}}}\in {\mathbb {R}}^{\ell }$ of $\hat{x}$ with $\ell :=\#{{\mathcal {A}}}$ is defined as

$$\begin{aligned} \hat{x}_{{\mathcal {A}},i}:=\hat{x}_{({\mathcal {A}})_i} \end{aligned}$$

for $1\le i\le \ell $ where $({\mathcal {A}})\in \{1,\ldots ,n\}^{\ell }$ denotes the (ordered) tuple resulting from ${\mathcal {A}}$, i.e. $({\mathcal {A}}):=(i_1,\ldots ,i_\ell )$ for $i_1,\ldots ,i_\ell \in {\mathcal {A}}$ and $i_1<\ldots <i_\ell $. The other subvectors and submatrices in (18) are analogously defined, for instance, the submatrix $\hat{A}_{{\mathcal {A}},{\mathcal {I}}}\in {\mathbb {R}}^{\ell \times s}$ of $\hat{A}$ with $s:=\#{\mathcal {I}}$ is defined as

$$\begin{aligned} \hat{A}_{{\mathcal {A}},{\mathcal {I}},ij}:=\hat{A}_{({\mathcal {A}})_i,({\mathcal {I}})_j} \end{aligned}$$

for $1\le i\le \ell $ and $1\le j\le s$.

If the hatted matrix $\hat{A}$ is readily available, $\hat{A}_{{\mathcal {I}},{\mathcal {I}}} \hat{x}_{\mathcal {I}}=\hat{L}_{\mathcal {I}} - \hat{A}_{{\mathcal {I}},{\mathcal {A}}}\hat{g}_{\mathcal {A}}$ may be solved efficiently, for instance, by a preconditioned CG method. In general, with only A and the matrix–vector multiplication $(I+M^{(\sigma )})x$ being available, we find that the matrix–vector multiplication $\hat{A}_{{\mathcal {I}},{\mathcal {I}}} \hat{x}_{\mathcal {I}}$ needed for such iterative solvers can be efficiently computed by the Algorithms 2, 3 and 4, in which the pre-computed tuple $({\mathcal {G}})$ of ${\mathcal {G}}:=\{j\in \{1,\ldots ,n\}\mid \gamma _j>0\}$ is used. Note that the order in the definition of $(\cdot )$ is only needed to ensure that this notation is well-defined, it is not needed for the computations in the algorithms.

Alternatively, we may use the projected CG method to solve the equality constrained minimization subproblem which would only require the use of Algorithms 3 and 4. We note that the dual variable $\hat{s}$ is nothing but the residuum and is computed by the matrix–vector multiplications

$$\begin{aligned} \hat{s}=(I+M^{(\sigma )})^\top [L-A(I+M^{(\sigma )})\hat{x}]. \end{aligned}$$

From Lemma 2 and Corollary 5 we obtain $P^\top \hat{\lambda } =P^\top \lambda =\hat{s}$ for the Lagrange multiplier. Thus, $\hat{\lambda }$ and $\lambda $ can be computed by $\hat{\lambda }=\lambda = P P^\top \hat{\lambda } = P \hat{s}$ (in a post-processing step). A convergence analysis of the primal-dual active set method can be found, e.g. in Kunisch and Rendl (2003), Hintermüller et al. (2002).

5 Numerical results

In this section we discuss the effect of different methods for the acceleration of the P(S)SOR scheme applied to an obstacle problem (i.e. $B=I$ and the bc-transformation is not needed). Furthermore, the application of the bc-transformation is studied in the context of a three dimensional Signorini contact problem. In the experiments all iterative solvers are initialized with a zero initial solution. A MATLAB implementation of the bc-transformation and some solvers can be found in the Online Resource 1.

5.1 Effect of the acceleration step and relaxation parameter $\omega $ for an obstacle problem

In order to demonstrate the effect of the acceleration step within the PSOR and PSSOR, see Algorithm 1, and of the choice of the relaxation parameter $\omega \in (0,2)$ in this algorithm, we consider the one dimensional obstacle problem:

$$\begin{aligned}&\left. \begin{aligned} -u^{\prime \prime }&\ge 1,\\ (u-0.35)(u^{\prime \prime }+1)&=0,\\ u&\le 0.35,\qquad \\ \end{aligned}\right\} \text {in } (-1,1)\nonumber \\&\qquad \qquad \quad \;\;\, u(\pm 1) =0. \end{aligned}$$

(19)

The discretization of this problem with lowest order conforming finite elements on a uniform mesh yields the variational inequality (1) with $B:=I \in {\mathbb {R}}^{n \times n}$,

$$\begin{aligned} A_{ij}:={\left\{ \begin{array}{ll} 2h^{-1}, & \text {if } i=j, \\ -h^{-1}, & \text {if } \mid i-j\mid =1, \\ 0, & \text {otherwise,} \end{array}\right. }\quad L_j:=h,\quad g_j:=0.35,\quad 1 \le i,j \le n \end{aligned}$$

where $h:=2/(n+1)$ is the mesh size of the one-dimensional subdivision of $(-1,1)$ with $n+2$ nodes. In particular, n coincides with the dimension or with the degrees of freedom (DOF) of the discrete problem. As the problem already has box constraints, the bc-transformation is not needed (indeed $M^{(\sigma )}=0$) and the problem can be solved without further adaptation with the PSOR or PSSOR algorithm.

Figure 1 shows the dependency of the number of the iterations resulting from the accelerated P(S)SOR schemes on the relaxation parameter $\omega $. We choose $n:=127$ and stop the iteration if $\Vert x_{P(S)SOR} - x_{AS} \Vert _2 < 10 ^{-9}$ is reached, where the discrete solution $x_{AS}$ is calculated by the primal-dual active set algorithm in combination with a direct solver. Furthermore, we use the difference of the current iterate to the last one or two iterations as search direction(s) $D_{:,i}$ within the acceleration step, in which we therefore have to solve a one or two dimensional minimization problem. We consider two fundamentally different acceleration procedures with these search directions, see the discussion in Sect. 3. While solving the 1d constrained optimization problem is quite simple, solving the constrained problem becomes significantly more difficult if two search directions are used. We call these procedures C1d and C2d, respectively. By L1d and L2d we denote the acceleration method for which the unconstrained minimization problem is solved and afterwards an inexact line search between the current iterate and the previously computed unconstrained minimizer is performed to satisfy the constraint $x \le g$, see LSAPSOR acceleration step. Starting with the full length $\xi _D=1$ the scaling parameter $\xi _D$ is halved in each back tracking step (i.e. $\mu :=0.5$ in (17)). If we perform an exact back tracking, then C1d and L1d would be the same. This is not true for two or higher dimensional minimization problems in the acceleration step. Clearly, L2d essentially requires only two matrix–vector multiplications and is, therefore, much cheaper than C2d both in computational time and in implementational effort. But also the inexactness of the line search can have a positive effect on its own. Due to the halving of $\xi _D$ we can expect that the new iterate has at least one active constraint less than the iterate resulting from an exact line search. Thus, the next P(S)SOR step has more flexibility in terms of feasible directions to reduce the energy value. In Schröder and Blum (2008) the unconstrained minimizer is projected back into the feasible set by minimizing the Euclidean distance. We call this acceleration variant P2d. While the projection is cheaper than the back tracking variant its convergence can not be guaranteed as the energy value might increase due to the projection. We observe in the numerical experiments that P2d does converge but the energy value is not monotone decreasing.

Not surprisingly, the number of iterations depends strongly on the choice of $\omega $. In Fig. 1 we observe that the optimal $\omega $ seems to be (almost) independent of the acceleration method. Moreover, Fig. 1a shows that the acceleration procedure within the PSOR has no significant effect on the number of iterations in the case of an optimal $\omega $. However, if the acceleration is applied the number of iterations are much more robust to changes in $\omega $, which eases the burden of selecting an appropriate $\omega $ for practical computations. In the case of the PSSOR scheme the acceleration step always improves the number of iterations significantly, see Fig. 1b. What stands out is the large volatility of the number of iterations if one of the acceleration procedures is applied. This volatility may be a consequence of using the differences of the last iterates for the search directions $D_{:,i}$.

It is well known that the eigenvalues of A and, therewith, the optimal $\omega $ depend on the dimension n. We set $\omega :=2-12.75/n$ for the experiments documented in Tables 1, 2, 3 and 4. Doubling n one P(S)SOR step becomes roughly twice as expensive because each row of A contains at most three non-zero entries. However, the number of iterations also increase. In the case of all PSOR schemes and the non-accelerated PSSOR scheme they are roughly doubling as well. We do not always observe this effect for the accelerated PSSOR schemes (which are further discussed at the end of this section). The implementation is done in MATLAB and the computations have been carried out on a HP EliteBook 840 G5 laptop with a i7-8550U CPU @ 1.80GHz CPU. Tables 1 and 2 emphasize once more that for reasonable good choices of $\omega $ there is no advantage in taking an acceleration step for the PSOR method. As the number of iterations is roughly the same we can easily read off the CPU time needed for the individual acceleration steps, which is mainly one or two matrix–vector multiplications, and the cost for the back tracking. The C2d case is exceptional as we use MATLAB’s own active set quadratic programming solver to solve the non-trivial constrained minimization subproblem. For the PSSOR method, where a single step is twice as expensive as a single PSOR step, the choice of the acceleration methods has a drastic effect, see Tables 3 and 4. In particular, the use of a two dimensional subproblem is by far superior to the one dimension case and, of course, the standard non-accelerated case. Due to the easier implementation and guaranteed convergence we favor the L2d case.

Table 1 PSOR: Number of iterations and CPU time in seconds for the obstacle problem

Full size table

Table 2 PSOR: Number of iterations and CPU time in seconds for the obstacle problem

Full size table

Table 3 PSSOR: Number of iterations and CPU time in seconds for the obstacle problem

Full size table

Table 4 PSSOR: Number of iterations and CPU time in seconds for the obstacle problem

Full size table

5.2 Effect of the bc-transformation on contact solvers

To analyze the effect of the bc-transformation (7) we consider a so-called Signorini contact problem (Kikuchi and Oden 1988). The reference configuration of the body $\Omega $ being in contact with a rigid obstacle is a segment of a hollow sphere. More precisely,

$$\begin{aligned} \Omega := \left\{ x(r,\varphi ,\theta )-p \mid r \in (0.8125,1.625),\ \varphi ,\theta \in \left( -\frac{\pi }{4},\frac{\pi }{4}\right) \right\} \end{aligned}$$

with the spherical coordinates $x(r,\varphi ,\theta ):=(r\cos \varphi \sin \theta , r\sin \varphi \cos \theta , r\cos \theta )$ and midpoint $p:=(-1.625,0,0)$, see Figs. 2a and b. The boundary of $\Omega $ is decomposed into the Dirichlet boundary part

$$\begin{aligned} \Gamma _D:= \left\{ x(r,\varphi ,\theta )-p \mid r = 0.8125,\, \varphi ,\theta \in \left[ -\frac{\pi }{4},\frac{\pi }{4}\right] \right\} , \end{aligned}$$

the boundary part $\Gamma _C$ which potentially comes in contact with the rigid obstacle, namely

$$\begin{aligned} \Gamma _C:= \left\{ x(r,\varphi ,\theta )-p \mid r = 1.625,\, \varphi ,\theta \in \left[ -\frac{\pi }{4},\frac{\pi }{4}\right] \right\} , \end{aligned}$$

and the Neumann boundary part $ \Gamma _N:= \partial \Omega {\setminus } (\Gamma _D\cup \Gamma _C)$. The surface of the rigid obstacle which gets in contact with $\Omega $ is described by the gap function

$$\begin{aligned} g(x_1,x_2,x_3):={\left\{ \begin{array}{ll} 0.75-\sqrt{1-(0.5 x_2+x_3)^2}, \quad & \mid 0.5x_2+x_3\mid \le r \\ 0.75, \quad & \mid 0.5x_2+x_3\mid > r \end{array}\right. }. \end{aligned}$$

For the given constant volume force $f:=(0,0,-0.1)^\top $ we seek a function $u \in [H^1(\Omega )]^3$ such that

$$\begin{aligned} -\operatorname {div} \sigma (u)&=f&\quad&\text {in } \Omega \end{aligned}$$

(20a)

$$\begin{aligned} \sigma (u)&={\mathcal {C}}:\epsilon (u)&\quad&\text {in } \Omega \end{aligned}$$

(20b)

$$\begin{aligned} u&=0&\quad&\text {on } \Gamma _D \end{aligned}$$

(20c)

$$\begin{aligned} \sigma (u)n&=0&\quad&\text {on } \Gamma _N \end{aligned}$$

(20d)

$$\begin{aligned} \sigma _n\le 0,\ u_n\le g,\ \sigma _n(u_n-g)=0,\, \sigma _t&=0&\quad&\text {on } \Gamma _C. \end{aligned}$$

(20e)

As usual, n is the outward unit normal and $\sigma _n$, $u_n$, $\sigma _t$, $u_t$ are the normal, tangential components of $\sigma (u)n$ and u, respectively. The equation (20b) with the stiffness tensor ${\mathcal {C}}$ describes Hooke’s law with Young’s modulus of elasticity $E:=2\,\textrm{GPa}$ and Poisson’s ratio $\nu :=0.42$ where $\epsilon (u)=\frac{1}{2} \left( \nabla u + \nabla u^\top \right) $ is the linearized strain tensor. The material parameters are chosen for a less or non-stiff (e.g. rubber-like) material in order to have a (reasonable) large contact area. In general, the application of Hooke’s linear-elastic material law is only justified to a very limited extent for such materials or for large deformations as occurs here.

To approximately solve (20) we subdivide $\Omega $ by a uniform mesh consisting of hexahedrons and use lowest order finite elements with vector-valued, continuous and piecewise trilinear functions vanishing on $\Gamma _D$. This gives the variational inequality (1) with the usual stiffness matrix A and the load vector L, see (Kikuchi and Oden 1988). The non-penetration condition $u_n \le g$ is enforced for the finite element solution only in the vertices of the mesh lying on $\Gamma _C$. This gives the constraining matrix B in (1), which consists of the unit normal vectors showing outward with respect to $\Gamma _C$ in these vertices. Note that B trivially satisfies the required condition (5). Further note that as $\Omega $ is a segment of a hollow sphere all these normal vectors point in different directions. See Fig. 2c for a visualization of the finite element solution.

In view of the results of the previous section we no longer consider the PSOR methods and the acceleration step with a simple projection, namely P2d. Figure 3 shows the dependency of the number of PSSOR iterations on the relaxation parameter $\omega $. We apply the PSSOR method to the explicitly transformed problem, i.e. with the matrix $\hat{A}$ and data $\hat{L}$, $\hat{g}$, and stop the iterations if $\Vert x_{PSSOR} - x_{AS} \Vert _2 < 10 ^{-9}$ for some exact discrete solution $x_{AS}$ (resulting from the active set method). The observations are similar to those for the one dimensional obstacle case of Sect. 5.1. In particular, a very strong reduction in the number of iterations is obtained by both L2d and C2d.

Tables 5, 6 and 7 report the number of iterations and the required time in seconds for differently accelerated PSSOR methods. We choose $\omega := 1.12- 1.35/\root 3 \of {\text {DOF}}$ to account for changing eigenvalues of A and stop once the relative difference of two successive iterates is less then $10^{-10}$ in the Euclidean norm. In theory the PSSOR method which acts on the explicitly transformed problem should yield exactly the same iteration sequence as the PSSOR method with implicit bc-transformation (iPSSOR). However, due to unavoidable rounding errors the sequences of iterates (and therewith the number of iterations) may differ. In the case that no acceleration is applied this effect may be neglectable. But, as discussed above, the strong dependency of the minimization subproblem on the preceding sequence of iterates and the strong effect of the acceleration step itself make the sequences of accelerated PSSOR iterates with explicit or implicit bc-transformation differ. Indeed, the effect of differing iterates can be observed in the tables and is particularly distinct for the accelerated variants. As the implementation of the C1d and C2d method within the iPSSOR is rather challenging and not very promising we have not considered these variants in the experiments.

Table 5 PSSOR with explicit transformation: Number of iterations and CPU time in seconds for Signorini contact problem

Full size table

Table 6 PSSOR with explicit transformation: Number of iterations and CPU time in seconds for Signorini contact problem

Full size table

Table 7 PSSOR with implicit transformation: Number of iterations and CPU time in seconds for Signorini contact problem

Full size table

Table 8 compares the (i)PSSOR method with L2d with the primal-dual active set method, which can be seen as a commonly used reference method to solve variational inequalities as given in (1). In the case of those variants relying on the explicit bc-transformation the CPU time for this transformation, i.e. computation of $\hat{A}$, $\hat{L}$ and $\hat{g}$ needs to be added to the solution time. The (reduced) system of linear equations within each active set iteration is either solved with MATLAB’s extremely fast direct solver (AS-D), MATLAB’s diagonal preconditioned CG-method (AS-CG) or with a diagonal preconditioned projected CG method (AS-Proj) as introduced in (Nocedal and Wright 1999, Alg. 16.2). Even when adding the transformation time to the solution time the methods acting on the explicit transformed problem are still slightly faster than solvers with the bc-transformation carried out on the fly. Interestingly, the sparsity pattern of A and $\hat{A}$ is almost identical and $\hat{A}$ has only a handful of non-zero entries more than A. This and the fact that the transformation only effects those matrix entries which are somehow associated with the contact boundary part may also explain the observation that the computed condition number of $\hat{A}$ is nearly identical to that of A, see Table 9. We note that the material parameters influence the condition number which may be significantly larger in practical applications than shown in this table. It is well-known that the condition number of the stiffness matrix strongly influence the number of the iterations of the PSOR scheme. This also appears to apply to the accelerated variants.

We expressly note that the (i)PSSOR method with an L2d acceleration step is of similar speed as the active set method. This leads us to the conclusion that the bc-transformation in conjunction with the simple to implement PSSOR scheme is very practical and, in particular, good enough to solve variational inequalities of the form (1) with B satisfying (5). Moreover, it has advantages in terms of memory requirements (as it exploits the sparsity structures of A and B), does not need an additional (direct) solver (in contrast to AS-D) and can serve as a benchmark solver to test more advanced methods that are optimized to specific problems.

Table 8 Comparison of CPU time for different methods

Full size table

Table 9 Condition numbers of A and $\hat{A}$

Full size table

Data Availability

The data for this article are available in the supplementary materials or on request from the authors.

References

Banz L, Stephan EP (2014) A posteriori error estimates of $hp$-adaptive IPDG-FEM for elliptic obstacle problems. Appl Numer Math 76:76–92
Article MathSciNet MATH Google Scholar
Banz L, Stephan EP (2014) hp-adaptive IPDG/TDG-FEM for parabolic obstacle problems. Comput Math Appl 67:712–731
Article MathSciNet MATH Google Scholar
Blum H, Braess D, Suttmeier FT (2004) A cascadic multigrid algorithm for variational inequalities. Comput Vis Sci 7(3–4):153–157
Article MathSciNet MATH Google Scholar
Bnouhachem A (2006) A new projection and contraction method for linear variational inequalities. J Math Anal Appl 314(2):513–525
Article MathSciNet MATH Google Scholar
Curtis FE, Han Z, Robinson DP (2014) A globally convergent primal-dual active-set framework for large-scale convex quadratic optimization. Comput Optim Appl 60(2):311–341
Article MathSciNet MATH Google Scholar
Dickopf T, Krause R (2009) Efficient simulation of multi-body contact problems on complex geometries: a flexible decomposition approach using constrained minimization. Internat J Numer Methods Eng 77(13):1834–1862
Article MathSciNet MATH Google Scholar
Dostál Z (1997) Box constrained quadratic programming with proportioning and projections. SIAM J Optim 7(3):871–887
Article MathSciNet MATH Google Scholar
Dostál Z, Schöberl J (2005) Minimizing quadratic functions subject to bound constraints with the rate of convergence and finite termination. Comput Optim Appl 30(1):23–43
Article MathSciNet MATH Google Scholar
Dostál Z, Kozubek T, Sadowská M, Vondrák V (2023) Scalable Algorithms for Contact Problems, 2nd edn. Advances in Mechanics and Mathematics, vol. 36, p. 443. Springer, New York
Geiger C, Kanzow C (2002) Theorie und Numerik Restringierter Optimierungsaufgaben. Springer, Berlin, Heidelberg
Book MATH Google Scholar
Gill PE, Murray W, Wright MH (1981) Practical optimization. Academic Press, Cambridge, Massachusetts
MATH Google Scholar
Glowinski R (1984) Numerical methods for nonlinear variational problems. Springer Series in Computational Physics. Springer, New York
Book MATH Google Scholar
Glowinski R, Lions JL, Trémolierès R (1981) Numerical analysis of variational inequalities. Studies in Mathematics and its Applications. North-Holland Publishing Company, Amsterdam
MATH Google Scholar
He B (1997) A class of projection and contraction methods for monotone variational inequalities. Appl Math Optim 35(1):69–76
Article MathSciNet MATH Google Scholar
Hintermüller M, Ito K, Kunisch K (2002) The primal-dual active set strategy as a semismooth newton method. SIAM J Optim 13(3):865–888
Article MathSciNet MATH Google Scholar
Hlávaček I, Haslinger J, Nečas J, Lovíšek J (1988) Solution of variational inequalities in mechanics. Applied Mathematical Sciences. Springer, New York
Book MATH Google Scholar
Jarre F, Stoer J (2013) Optimierung. Springer, Berlin, Heidelberg
MATH Google Scholar
Kikuchi N, Oden JT (1988) Contact problems in elasticity: a study of variational inequalities and finite element methods. SIAM Studies in Applied Mathematics. SIAM, Philadelphia
Book MATH Google Scholar
Kornhuber R (1997) Adaptive monotone multigrid methods for nonlinear variational problems. Vieweg+Teubner Verlag, Wiesbaden
MATH Google Scholar
Kothari H, Krause R (2022) A generalized multigrid method for solving contact problems in Lagrange multiplier based unfitted finite element method. Comput Methods Appl Mech Eng 392:114629–114630
Article MathSciNet MATH Google Scholar
Krause RH (2001) Monotone multigrid methods for signorini’s problem with friction. Dissertation, FU Berlin
Kunisch K, Rendl F (2003) An infeasible active set method for quadratic problems with simple bounds. SIAM J Optim 14:35–52
Article MathSciNet MATH Google Scholar
Nocedal J, Wright SJ (1999) Numerical optimization, 2nd edn. Springer, New York
Book MATH Google Scholar
Schröder A, Blum H (2008) Projective SOR-procedures for Signorini problems. Numerical analysis and applied mathematics. International conference on numerical analysis and applied mathematics 2008, American Institute of Physics (AIP). AIP Conference Proceedings 1048, 474–477

Download references

Funding

Open access funding provided by Paris Lodron University of Salzburg.

Author information

Lothar Banz and Andreas Schröder contributed equally to this work.

Authors and Affiliations

Department of Mathematics, Paris Lodron University of Salzburg, Hellbrunnerstrasse 34, 5020, Salzburg, Austria
Lothar Banz & Andreas Schröder

Authors

Lothar Banz
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Schröder
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andreas Schröder.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (zip 1551 KB)

Appendix A: Proof of Theorem 6

Crucial for the convergence proof is the observation that the PSOR procedure and its accelerated variants APSOR and LSAPSOR with bc-transformation are equivalent to those applied to the transformed problem (10). For this purpose we need access to the diagonal entries of the system matrix $\hat{A}$ and define

$$\begin{aligned} \hat{\delta }_i:={\left\{ \begin{array}{ll} \epsilon _i(1+\sigma _i)^{-1},& \gamma _i\ge 1, i=\rho (\gamma _i)\\ \epsilon _i,& \text{ otherwise } \end{array}\right. } \end{aligned}$$

where $\epsilon $ is defined as in (14).

Lemma 7

There holds

$$\begin{aligned} \omega \hat{A}_{ii}^{-1}=\hat{\delta }_i \end{aligned}$$

for all $1\le i \le n$ and all $\omega \in (0,2)$.

Proof

We distinguish between three cases. If $\gamma _i=0$, then $\hat{A}_{ii}=A_{ii}$. If $\gamma _i\ge 1$ and $i\ne \rho (\gamma _i)$, we have

$$\begin{aligned} \hat{A}_{ii}&=A_{ii}+2 \left( [M^{(\sigma )}]^\top A\right) _{ii}+\left( [M^{(\sigma )}]^\top A M^{(\sigma )}\right) _{ii}\\&=A_{ii}+2\sigma _i A_{\rho (\gamma _i),i}+\sigma _i^2 A_{\rho (\gamma _i),\rho (\gamma _i)}. \end{aligned}$$

Finally, if $\gamma _i\ge 1$ and $i=\rho (\gamma _i)$, we obtain

$$\begin{aligned} \hat{A}_{ii}&=A_{ii}+2 \left( M^{(\sigma )} A\right) _{ii}+\left( [M^{(\sigma )}]^\top A M^{(\sigma )}\right) _{ii}\\&=A_{ii}+2\sigma _i A_{ii}+\sigma _i^2 A_{ii}=(1+\sigma _i)^2 A_{ii}. \end{aligned}$$

Therefore, in all three cases we obtain $\omega \hat{A}_{ii}^{-1}=\hat{\delta }_i$. $\square $

To formally describe the PSOR procedures applied to (10) let $\hat{S}_i:{\mathbb {R}}^n\rightarrow {\mathbb {R}}^n$ be defined as

$$\begin{aligned} \left( \hat{S}_i(\hat{x})\right) _j:={\left\{ \begin{array}{ll} \hat{P}_i\left( \hat{x}_i+\hat{\delta }_i[\hat{L}_i-\hat{A}_{i,:} \hat{x}]\right) ,& i=j\\ {\hat{x}}_j,& \text{ otherwise }, \end{array}\right. }\quad j=1,\ldots ,n \end{aligned}$$

for $i\in \{1,\ldots ,n\}$, where the i-th row vector of $\hat{A}$ is denoted by $\hat{A}_{i,:}$. Furthermore, let $\hat{P}_i:{\mathbb {R}}\rightarrow \mathbb {R}$ be the one dimensional projection with

$$\begin{aligned} \hat{P}_i(s):={\left\{ \begin{array}{ll} \hat{g}_i, & \gamma _i\ge 1, i=\rho (\gamma _i),\; s>\hat{g}_i\\ s,& \text{ otherwise } \end{array}\right. } \end{aligned}$$

and set

$$\begin{aligned} \hat{S}^i:=\hat{S}_i\circ \ldots \circ \hat{S}_1 \end{aligned}$$

for $i\in \{1,\ldots ,n\}$. Note that

$$\begin{aligned} (\hat{S}^n(x))_i=(\hat{S}^i(x))_i \end{aligned}$$

(21)

and

$$\begin{aligned} \hat{P}_i(s)=\min \{s,\hat{g}_i\}. \end{aligned}$$

(22)

Eventually, define for $\hat{D}\in {\mathbb {R}}^{n\times r}$

$$\begin{aligned} \hat{S}_{\hat{D}}:=\hat{H}_{\hat{D}}\circ \hat{S}^n \end{aligned}$$

where $\hat{H}_{\hat{D}}(\hat{x}):=\hat{x}$ in the case of the PSOR method and

$$\begin{aligned} \hat{H}_{\hat{D}}(\hat{x}):=\hat{x}+\hat{D}\hat{\alpha }_D(\hat{x}) \end{aligned}$$

with

$$\begin{aligned} \hat{\alpha }_D(\hat{x}):=\operatorname {argmin}_{\hat{\beta }\in \hat{\Theta }_{\hat{D}}(\hat{x})} \hat{E}(\hat{x}+\hat{D}\hat{\beta }) \end{aligned}$$

and

$$\begin{aligned} \hat{\Theta }_{\hat{D}}(\hat{x}):=\{\hat{\beta }\in {\mathbb {R}}^r\mid \hat{x} + \hat{D} \hat{\beta } \le \hat{g}\} \end{aligned}$$

if the APSOR method is applied. Note that if $\hat{x}\in \hat{K}$, we have $0\in {\hat{\Theta }}_{\hat{D}}(\hat{x})$ and, thus,

$$\begin{aligned} \hat{E}\left( H_{\hat{D}}(\hat{x})\right) \le \hat{E}(\hat{x}). \end{aligned}$$

(23)

Moreover, it holds

$$\begin{aligned} (y-\hat{H}_{\hat{D}}(\hat{x}))^\top \hat{A} \hat{H}_{\hat{D}}(\hat{x}) \ge (y-\hat{H}_{\hat{D}}(\hat{x}))^\top L \end{aligned}$$

(24)

for all $y\in \{\hat{x}+\hat{D}\hat{\beta }\mid \hat{\beta }\in \hat{\Theta }_{\hat{D}}(\hat{x})\}$. This gives

$$\begin{aligned} \Vert \hat{H}_{\hat{D}}(\hat{x}) - \hat{x}\Vert _A^2\le 2 \left( E(\hat{x}) - E(\hat{H}_{\hat{D}}(\hat{x}))\right) . \end{aligned}$$

(25)

If the LSAPSOR method is applied we set

$$\begin{aligned} \hat{H}_{\hat{D}}(\hat{x}):=\hat{x}+\hat{\xi }_D(\hat{x},\hat{\gamma }_D(\hat{x}))\hat{D}\hat{\gamma }_D(\hat{x}) \end{aligned}$$

with

$$\begin{aligned} \hat{\gamma }_D(\hat{x}):=\operatorname {argmin}_{\hat{\beta }\in {\mathbb {R}}^r} \hat{E}(\hat{x}+\hat{D}\hat{\beta }) \end{aligned}$$

and

$$\begin{aligned} \hat{\xi }_D(\hat{x},\hat{\gamma }):=\max \left\{ \{0\}\cup \left\{ \mu ^\ell \mid \ell \in \{0,\ldots ,q\},\; \hat{x}+\mu ^\ell \hat{D}\hat{\gamma }\le \hat{g}\right\} \right\} . \end{aligned}$$

Defining $\varphi (t):=\hat{E}(\hat{x}+t \hat{D}\hat{\gamma }_{D}(\hat{x}))$ we get for $t\in [0,1]$

$$\begin{aligned} \varphi '(t)&=\left( \nabla {\hat{E}}(\hat{x}+t \hat{D}\hat{\gamma }_D(\hat{x}))\right) ^\top \hat{D}\hat{\gamma }_{D}(\hat{x})\\&={(}\hat{A}(\hat{x}+t\hat{D}\hat{\gamma }_D(\hat{x}){)}-{\hat{L}})^\top \hat{D}\hat{\gamma }_D(\hat{x})\\&= {(}\hat{A}(\hat{x}+\hat{D}\hat{\gamma }_D(\hat{x}){)}-{\hat{L}})^\top \hat{D}\hat{\gamma }_D(\hat{x})+(t-1) (\hat{A}\hat{D}\hat{\gamma }_D(\hat{x}))^\top \hat{D}\hat{\gamma }_D(\hat{x})\\&=(t-1) (\hat{A}\hat{D}\hat{\gamma }_D(\hat{x}))^\top \hat{D}\hat{\gamma }_D(\hat{x})\\&\le 0, \end{aligned}$$

which shows that $\varphi $ is monotonic decreasing in [0, 1] and, thus,

$$\begin{aligned} \hat{E}(\hat{H}_{\hat{D}}(\hat{x}))\le \hat{E}(\hat{x}+t \hat{D}\hat{\gamma }(\hat{x})) \end{aligned}$$

for all $t\in \hat{I}_D(\hat{x}):=[0,\hat{\xi }_D(\hat{x},\hat{\gamma }_D(\hat{x}))]$. Hence, (23) is still true. Moreover, we have (24) for all $\hat{y}\in \{\hat{x}+t\hat{D}\hat{\gamma }_D(\hat{x})\mid t\in \hat{I}_D(\hat{x})\}$. This gives that (25) is also valid for this case.

The iterates $\{\hat{x}^l\}_{l\in {\mathbb {N}}}$ of the PSOR and its accelerated variants are given by $\hat{x}^0:=(I+M^{(\kappa )})x^0$ and

$$\begin{aligned} \hat{x}^{l+1}:=\hat{S}_{\hat{D}^l}(\hat{x}^l) \end{aligned}$$

with $\hat{D}^l:=(I+M^{(\kappa )})D^l$ in the APSOR and the LSAPSOR case and $\hat{D}^l:=0$ in the PSOR case.

Lemma 8

There holds

$$\begin{aligned} -M^{(\kappa )} x&=M^{(\sigma )} (I+M^{(\kappa )}) x,\\ -\mathcal{M}^{(\kappa )} (x)&=\mathcal{M}^{(\sigma )}\left( [I+M^{(\kappa )}]x\right) \end{aligned}$$

for all $x\in {\mathbb {R}}^n$.

Proof

We have

$$\begin{aligned} M^{(\sigma )} (I+M^{(\kappa )})x&=(I+M^{(\sigma )})(I+M^{(\kappa )})x-(I+M^{(\kappa )})x\\&=x-(I+M^{(\kappa )})x\\&=-M^{(\kappa )} x. \end{aligned}$$

The second assertion follows from

$$\begin{aligned} {\bigg (}\mathcal{M}^{(\sigma )}\left( [I+M^{(\kappa )}]x\right) {\bigg )}_i&=\left( M^{(\sigma )} (I+M^{(\kappa )}) x\right) _{\rho (i )}\\ &=\left( -M^{(\kappa )}x\right) _{\rho (i )}\\ &=-{\bigg (}\mathcal{M}^{(\kappa )} (x){\bigg )}_i \end{aligned}$$

with $1\le i\le m$. $\square $

The connection of the mappings $S^*_i$ and $T^*_i$ (as introduced in Sect. 3) with the bc-transformation is given by the following assertion.

Lemma 9

There holds

$$\begin{aligned} & (S^*_i,T_i^*)(x,-\mathcal{M}^{(\kappa )}(x))\\ & \qquad =\left( [I+M^{(\sigma )}]\hat{S}_i\left( [I+M^{(\kappa )}]x\right) ,\mathcal{M}^{(\sigma )}\left\{ \hat{S}_i([I+M^{(\kappa )}]x)\right\} \right) \end{aligned}$$

for all $ 1\le i \le n$ and $x\in {\mathbb {R}}^n$.

Proof

Let $\hat{x}:=(I+M^{(\kappa )})x$ and $z:=-\mathcal{M}^{(\kappa )}(x)=\mathcal{M}^{(\sigma )}(\hat{x})$. From Lemma 7 we have

$$\begin{aligned} (\hat{S}_i(\hat{x}))_i=\hat{P}_i(\hat{x}_i+\hat{\delta }_i[\hat{L}_i- \hat{A}_{i,:}\hat{x}]). \end{aligned}$$

If $\gamma _i=0$, then, for $k\ne i$, it holds

$$\begin{aligned} \hat{x}_k=\left( \hat{S}_i(\hat{x})\right) _k,\qquad (M^{(\sigma )}\hat{x})_k=\left( M^{(\sigma )}\hat{S}_i(\hat{x})\right) _k. \end{aligned}$$

(26)

Thus, we have

$$\begin{aligned} \left( S^*_i\left\{ x,-\mathcal{M}^{(\kappa )}(x)\right\} \right) _k&=x_k\nonumber \\&=\left( [I+M^{(\sigma )}]\hat{x}\right) _k\nonumber \\&=\left( [I+M^{(\sigma )}]\hat{S}_i(\hat{x})\right) _k\nonumber \\&=\left( [I+M^{(\sigma )}]\hat{S}_i\left\{ [I+M^{(\kappa )}]x\right\} \right) _k. \end{aligned}$$

(27)

Furthermore, there is $\hat{x}_i=x_i$ and $\hat{L}_i=L_i$. Since $i\ne \rho (\gamma _r)$ for all $r\in \{1,\ldots ,n\}$ with $\gamma _r\ge 1$, we obtain $\left( [I+M^{(\sigma )}]^\top \right) _{ir}=\delta _{ir}$ and

$$\begin{aligned} \hat{A}_{i,:}\hat{x}=\left( [I+M^{(\sigma )}]^\top A(I+M^{(\sigma )})(I+M^{(\kappa )})x\right) _i=\left( [I+M^{(\sigma )}]^\top A x\right) _i=A_{i,:}x. \end{aligned}$$

Consequently, it follows

$$\begin{aligned} \left( S^*_i\left\{ x,-\mathcal{M}^{(\kappa )}(x)\right\} \right) _i&=x_i+\epsilon _i(L_i-A_{i,:}x)\\&=\hat{x}_i+\epsilon _i(\hat{L}_i-\hat{A}_{i,:}\hat{x})=\left( \hat{S}_i(\hat{x})\right) _i\\&=\left( [I+M^{(\sigma )}]\hat{S}_i\left\{ [I+M^{(\kappa )}]x\right\} \right) _i. \end{aligned}$$

Now we consider $T^*_i$. Since $\mathcal{M}^{(\sigma )}(\hat{S}_i(\hat{x}))=\mathcal{M}^{(\sigma )}(\hat{x})$, we have

$$\begin{aligned} T^*_i(x,-\mathcal{M}^{(\kappa )}(x))=T^*_i(x,z)=z=\mathcal{M}^{(\sigma )}(\hat{x})=\mathcal{M}^{(\sigma )}\left( \hat{S}_i\left\{ [I+M^{(\kappa )}]x\right\} \right) . \end{aligned}$$

If $\gamma _i\ge 1$ and $i\ne \rho (\gamma _i)$, then, for $k\not \in \{i,\rho (\gamma _i)\}$, we also have (26) and, therefore, (27). Moreover, it holds

$$\begin{aligned} \hat{x}_i=x_i, \qquad \hat{L}_i=\left( [I+M^{(\sigma )}]^\top L\right) _i= L_i+\sigma _i L_{\rho (\gamma _i)} \end{aligned}$$

and

$$\begin{aligned} \hat{A}_{i,:} \hat{x}&= \left( [I+M^{(\sigma )}]^\top A [I+M^{(\sigma )}]\right) _{i,:}(I+M^{(\kappa )})x\\&=(A+(M^{(\sigma )})^\top A)_{i,:} x\\&= A_{i,:}x+\sigma _i A_{\rho (\gamma _i),:} x. \end{aligned}$$

With

$$\begin{aligned} a:=\epsilon _i[L_i-A_{i,:}x+\sigma _i (L_{\rho (\gamma _i)}-A_{\rho (\gamma _i),:}x)]=\epsilon _i(\hat{L}_i-\hat{A}_{i,:}\hat{x}), \end{aligned}$$

we get

$$\begin{aligned} \left( S^*_i\left\{ x,-\mathcal{M}^{(\kappa )}(x)\right\} \right) _i&=x_i+a\\&=\hat{x}_i+\epsilon _i(\hat{L}_i-\hat{A}_{i,:}\hat{x})\\&=\left( \hat{S}_i(\hat{x})\right) _i\\&=\left( [I+M^{(\sigma )}]\hat{S}_i\left\{ [I+M^{(\kappa )}]x\right\} \right) _i. \end{aligned}$$

From

$$\begin{aligned} \left( \mathcal{M}^{(\sigma )}(\hat{x})\right) _{\gamma _i}+\sigma _i a&=\sigma _i(\hat{x}_i+a)+\sum _{k \in \alpha _{\gamma _i},k\ne i} \sigma _k \hat{x}_k\nonumber \\&= \sigma _i\left( \hat{S}_i(\hat{x})\right) _i+\sum _{k \in \alpha _{\gamma _i},k\ne i} \sigma _k \hat{x}_k\nonumber \\&=\left( \mathcal{M}^{(\sigma )}\left\{ \hat{S}_i(\hat{x})\right\} \right) _{\gamma _i} \end{aligned}$$

(28)

we get

$$\begin{aligned} \left( S^*_i\left\{ x,-\mathcal{M}^{(\kappa )}(x)\right\} \right) _{\rho (\gamma _i)}&=x_{\rho (\gamma _i)}+\sigma _i a \\&=\hat{x}_{\rho (\gamma _i)}+\left( \mathcal{M}^{(\sigma )}(\hat{x})\right) _{\gamma _i}+\sigma _i a\\&=\hat{x}_{\rho (\gamma _i)}+\left( \mathcal{M}^{(\sigma )}\left\{ \hat{S}_i(\hat{x})\right\} \right) _{\gamma _i} \\&=\left( [I+M^{(\sigma )}]\hat{S}_i(\hat{x})\right) _{\rho (\gamma _i)}\\&=\left( [I+M^{(\sigma )}]\hat{S}_i\left\{ [I+M^{(\kappa )}]x\right\} \right) _{\rho (\gamma _i)}. \end{aligned}$$

Furthermore, we obtain

$$\begin{aligned} \left( T^*_i\left\{ x,-\mathcal{M}^{(\kappa )}(x)\right\} \right) _j&=\left( T^*_i(x,z)\right) _j\nonumber \\&=z_j\nonumber \\&=\left( \mathcal{M}^{(\sigma )}(\hat{x})\right) _j=\left( \mathcal{M}^{(\sigma )}\left\{ [I+M^{(\kappa )}]x\right\} \right) _j \end{aligned}$$

(29)

for $j\ne \gamma _i$ and, by using (28),

$$\begin{aligned} \left( T^*_i\left\{ x,-\mathcal{M}^{(\kappa )}(x)\right\} \right) _{\gamma _i}&=\left( T^*_i(x,z)\right) _{\gamma _i}\\&=z_{\gamma _i}+\sigma _i a\\&= \left( \mathcal{M}^{(\sigma )}(\hat{x})\right) _{\gamma _i}+\sigma _i a\\&=\left( \mathcal{M}^{(\sigma )}\left\{ \hat{S}_i(\hat{x})\right\} \right) _{\gamma _i}\\&=\left( \mathcal{M}^{(\sigma )}\left\{ \hat{S}_i\left( [I+M^{(\kappa )}]x\right) \right\} \right) _{\gamma _i}. \end{aligned}$$

If $\gamma _i\ge 1, i=\rho (\gamma _i)$, then, for $k\ne i$, we also obtain (26) and, therefore, (27) holds. Furthermore, there is

$$\begin{aligned} \hat{x}_i&=x_i+(M^{(\kappa )} x)_i =x_i+\left( \mathcal{M}^{(\kappa )}(x)\right) _{\gamma _i} =x_i-z_{\gamma _i},\\ \hat{L}_i&=\left( [I+M^{(\sigma )}]^\top L\right) _i= L_i+\sigma _i L_i=(1+\sigma _i)L_i \end{aligned}$$

and

$$\begin{aligned} \hat{A}_{i,:} \hat{x}&=\left( [I+M^{(\sigma )}]^\top A [I+M^{(\sigma )}]\right) _{i,:} (I+M^{(\kappa )})x\\&=\left( A+[M^{(\sigma )}]^\top A\right) _{i,:}x\\&= A_{i,:}x+\sigma _i A_{\rho (\gamma _i),:} x= (1+\sigma _i) A_{i,:}x. \end{aligned}$$

For $j\ne \gamma _i$, we have (29). In addition, it holds

$$\begin{aligned} \left( \mathcal{M}^{(\sigma )}(\hat{x})\right) _{\gamma _i}+\sigma _i\left[ \left( \hat{S}_i(\hat{x})\right) _i-\hat{x}_i\right]&=\sigma _i\hat{S}_i(\hat{x})_i+\sum _{k \in \alpha _{\gamma _i},k\ne i} \sigma _{k} \hat{x}_{k}\\&=\left( \mathcal{M}^{(\sigma )}\left\{ \hat{S}_i(\hat{x})\right\} \right) _{\gamma _i}. \end{aligned}$$

With $b:=x_i-z_{\gamma _i}=\hat{x}_i$ and $c:=\hat{P}_i\left( b+\epsilon _i(L_i-A_{i,:}x)\right) =\left( \hat{S}_i(\hat{x})\right) _i$ we get

$$\begin{aligned} \left( T^*_i\left\{ x,-\mathcal{M}^{(\kappa )}(x)\right\} \right) _{\gamma _i}&=\left( T^*_i(x,z)\right) _{\gamma _i}\\&=z_{\gamma _i}+\sigma _i(c-b)\\&=\left( \mathcal{M}^{(\sigma )}(\hat{x})\right) _{\gamma _i}+\sigma _i\left( \hat{S}_i(\hat{x})-\hat{x}\right) _i\\&=\left( (\mathcal{M}^{(\sigma )}\left\{ \hat{S}_i(\hat{x})\right\} \right) _{\gamma _i}\\&=\left( \mathcal{M}^{(\sigma )}\left\{ \hat{S}_i\left( [I+M^{(\kappa )}]x\right) \right\} \right) _{\gamma _i} \end{aligned}$$

and

$$\begin{aligned} \left( S_i^*(x),-\mathcal{M}^{(\kappa )}(x)\right) _i&=c+\left( T^*_i\left\{ x,-\mathcal{M}^{(\kappa )}(x)\right\} \right) _{\gamma _i}\\&=\left( \hat{S}_i(\hat{x})\right) _i+\left( \mathcal{M}^{(\sigma )}\left\{ \hat{S}_i(\hat{x})\right\} \right) _{\gamma _i} \\&=\left( [I+M^{(\sigma )}]\hat{S}_i(\hat{x})\right) _i\\&=\left( [I+M^{(\sigma )}]\hat{S}_i\left\{ [I+M^{(\kappa )}]x\right\} \right) _i. \end{aligned}$$

$\square $

Lemma 10

For any $x\in {\mathbb {R}}^n$ there holds

$$\begin{aligned} & (S^{*,n},T^{*,n})(x,-\mathcal{M}^{(\kappa )}(x))\\ & \qquad =\left( [I+M^{(\sigma )}]\hat{S}^n\left( [I+M^{(\kappa )}]x\right) ,\mathcal{M}^{(\sigma )}\left\{ \hat{S}^n\left( [I+M^{(\kappa )}]x\right) \right\} \right) . \end{aligned}$$

Proof

Lemma 9 yields

$$\begin{aligned} & (S^{*,1},T^{*,1})(x,-\mathcal{M}^{(\kappa )}(x))\\ & \qquad =\left( [I+M^{(\sigma )}]\hat{S}^1\left( [I+M^{(\kappa )}]x\right) ,\mathcal{M}^{(\sigma )}\left\{ \hat{S}^1\left( [I+M^{(\kappa )}]x\right) \right\} \right) . \end{aligned}$$

Let $i\in \{1,\ldots ,n\}$. By induction and from Lemma 8 it follows that

$$\begin{aligned}&(S^{*,i},T^{*,i})(x,-\mathcal{M}^{(\kappa )}(x))\\&\quad = \left( (S^{*}_i,T^{*}_i)\circ (S^{*,i-1},T^{*,i-1})\right) (x,-\mathcal{M}^{(\kappa )}(x))\\&\quad = (S^{*}_i,T^{*}_i)\left( [I+M^{(\sigma )}]\hat{S}^{i-1}\left( [I+M^{(\kappa )}]x\right) ,\mathcal{M}^{(\sigma )}\left\{ \hat{S}^{i-1}\left( [I+M^{(\kappa )}]x\right) \right\} \right) \\&\quad = (S^{*}_i,T^{*}_i)\left( y,-\mathcal{M}^{(\kappa )}(y)\right) \end{aligned}$$

with

$$\begin{aligned} y:=[I+M^{(\sigma )}]\hat{S}^{i-1}\left( [I+M^{(\kappa )}]x\right) . \end{aligned}$$

Applying Lemma 9 yields

$$\begin{aligned} S^{*}_i(y,-\mathcal{M}^{(\kappa )}(y))&= [I+M^{(\sigma )}]\hat{S}_i\left( [I+M^{(\kappa )}](I+M^{(\sigma )})\hat{S}^{i-1}\left( [I+M^{(\kappa )}]x\right) \right) \\&= [I+M^{(\sigma )}]\hat{S}_i\left( \hat{S}^{i-1}\left( [I+M^{(\kappa )}]x\right) \right) \\&= [I+M^{(\sigma )}]\hat{S}^i\left( [I+M^{(\kappa )}]x\right) \end{aligned}$$

and

$$\begin{aligned} T^{*}_i(y,-\mathcal{M}^{(\kappa )}(y))&=\mathcal{M}^{(\sigma )}\left( \hat{S}_i\left\{ [I+M^{(\kappa )}](I+M^{(\sigma )})\hat{S}^{i-1}\left( [I+M^{(\kappa )}]x\right) \right\} \right) \\&= \mathcal{M}^{(\sigma )}\left( \hat{S}_i\left\{ \hat{S}^{i-1}\left( [I+M^{(\kappa )}]x\right) \right\} \right) \\&= \mathcal{M}^{(\sigma )}\left( \hat{S}^i\left\{ [I+M^{(\kappa )}]x\right\} \right) . \end{aligned}$$

The definition of $S^{*,n}$, $T^{*,n}$ and $\hat{S}^n$ yields the assertion. $\square $

During the iteration process of Algorithm 1 the close relationship between $x^{l}$ and $z^{l}$ is given by the following statement.

Lemma 11

It holds $z^{l}=-\mathcal{M}^{(\kappa )}(x^{l})$.

Proof

If Algorithm 1 includes the acceleration step (APSOR or LSAPSOR), we get from (15)

$$\begin{aligned} z^{l+1}&=T^*_{D^l}(x^{l},z^{l})\\&=G_{D^l}(S^{*,n}(x^{l},z^{l}),T^{*,n}(x^{l},z^{l}))\\&=-{\mathcal {M}}^{(\kappa )}(H_{D^l}(S^{*,n}(x^{l},z^{l}),T^{*,n}(x^{l},z^{l})))\\&=-{\mathcal {M}}^{(\kappa )}(S^{*}_{D^l}(x^{l},z^{l})))\\&=-{\mathcal {M}}^{(\kappa )}(x^{l+1}). \end{aligned}$$

Now, we consider Algorithm 1 without the acceleration step (PSOR). The assertion is true for $l=0$ as $z^{0}$ is initialized by $z^{0}:=-\mathcal{M}^{(\kappa )}(x^{0})$. By induction, it follows from Lemma 1, Lemma 8 and Lemma 10

$$\begin{aligned} z^{l+1}&=T^*_{D^l}\left( x^{l},z^{l}\right) \\&=T^*_{D^l}\left( x^{l},-\mathcal{M}^{(\kappa )}(x^{l})\right) \\&=G_{D^l}\left[ S^{*,n}\left( x^{l},-\mathcal{M}^{(\kappa )}(x^{l})\right) ,T^{*,n}\left( x^{l},-\mathcal{M}^{(\kappa )}(x^{l})\right) \right] \\&= T^{*,n}\left( x^{l},-\mathcal{M}^{(\kappa )}(x^{l})\right) \\&=\mathcal{M}^{(\sigma )}\left( \hat{S}^n\left\{ [I+M^{(\kappa )}]x^{l}\right\} \right) \\&=\mathcal{M}^{(\sigma )}\left( [I+M^{(\kappa )}](I+M^{(\sigma )})\hat{S}^n\left\{ [I+M^{(\kappa )}]x^{l}\right\} \right) \\&=\mathcal{M}^{(\sigma )}\left( [I+M^{(\kappa )}]S^{*,n}\left\{ x^{l},-\mathcal{M}^{(\kappa )}(x^{l})\right\} \right) \\&=\mathcal{M}^{(\sigma )}\left( [I+M^{(\kappa )}]H_{D^l}\left( S^{*,n}\left\{ x^{l},-\mathcal{M}^{(\kappa )}(x^{l})\right\} ,T^{*,n}\left\{ x^{l},-\mathcal{M}^{(\kappa )}(x^{l})\right\} \right) \right) \\&=\mathcal{M}^{(\sigma )}\left( [I+M^{(\kappa )}]S^*_{D^l}(x^{l},z^{l})\right) \\&=\mathcal{M}^{(\sigma )}\left( [I+M^{(\kappa )}]x^{l+1})\right) \\&=-{\mathcal {M}}^{(\kappa )}(x^{l+1}). \end{aligned}$$

$\square $

Lemma 12

Let $\hat{x}:=(I+M^{(\kappa )})x$ and $\hat{D}:= (I+M^{(\kappa )})D$. Then, $\hat{\alpha }_{\hat{D}}(\hat{x})=\alpha _D(x)$, $\hat{\gamma }_{\hat{D}}(\hat{x})=\gamma _D(x)$ and $\hat{\xi }_{\hat{D}}(\hat{x},\hat{\gamma }_{\hat{D}}(\hat{x}))=\xi _D(x,\gamma _D(x))$.

Proof

From Lemma 1 and Corollary 4 we conclude for $\hat{\beta }\in {\mathbb {R}}^r$

$$\begin{aligned} \hat{E}(\hat{x}+\hat{D}\hat{\beta })&= \hat{E}((I+M^{(\kappa )})(x+D\hat{\beta }))\nonumber \\&= E((I+M^{(\sigma )})(I+M^{(\kappa )})(x+D\hat{\beta }))\nonumber \\&= E(x+D\hat{\beta }). \end{aligned}$$

(30)

We obtain from Theorem 3

$$\begin{aligned} \hat{\Theta }_{\hat{D}}(\hat{x})&=\{\hat{\beta }\in {\mathbb {R}}^r\mid (I+M^{(\kappa )})(x +D \hat{\beta }) \le \hat{g}\}\\&=\{\hat{\beta }\in {\mathbb {R}}^r\mid (x +D \hat{\beta })\in K\}\\&=\Theta _D(x). \end{aligned}$$

This and (30) give $\hat{\alpha }_{\hat{D}}(\hat{x})=\alpha _D(x)$. The identity $\hat{\gamma }_{\hat{D}}(\hat{x})=\gamma _D(x)$ directly follows from Corollary 4. From Theorem 3 we conclude

$$\begin{aligned}&\left\{ \mu ^\ell \mid \ell \in \{0,\ldots ,q\},\; B (x+\mu ^\ell D\gamma _{D}(x))\le g\right\} \\&\quad = \left\{ \mu ^\ell \mid \ell \in \{0,\ldots ,q\},\; x+\mu ^\ell D\gamma _{D}(x)\in K\right\} \\&\quad = \left\{ \mu ^\ell \mid \ell \in \{0,\ldots ,q\},\; (I+M^{(\kappa )})(x+\mu ^\ell D{\gamma _D(x)})\in \hat{K}\right\} \\&\quad =\left\{ \mu ^\ell \mid \ell \in \{0,\ldots ,q\},\; \hat{x}+\mu ^\ell \hat{D}\hat{\gamma }_{D}(\hat{x})\le \hat{g}\right\} . \end{aligned}$$

This shows $\xi _D(x,\gamma _D(x))=\hat{\xi }_{\hat{D}}(\hat{x},\hat{\gamma }_{\hat{D}}(\hat{x}))$. $\square $

Lemma 13

There holds

$$\begin{aligned} \hat{x}^l=(I+M^{(\kappa )})x^{l}. \end{aligned}$$

Proof

For $l=0$ the assertion is fulfilled. By induction we conclude from Lemma 10 and Lemma 11 that

$$\begin{aligned}&(I+M^{(\kappa )})x^{l+1}\nonumber \\&=(I+M^{(\kappa )})S^*_{D^l}(x^{l},z^{l})\nonumber \\&=(I+M^{(\kappa )})H_{D^l}\left( \left( S^{*,n},T^{*,n}\right) \left( x^{l},-\mathcal{M}^{(\kappa )}(x^{l})\right) \right) \nonumber \\&= (I+M^{(\kappa )})H_{D^l}\left( (I+M^{(\sigma )})\hat{S}^n\left( [I+M^{(\kappa )}]x^{l}\right) ,\mathcal{M}^{(\sigma )}\left\{ \hat{S}^n\left( [I+M^{(\kappa )}]x^{l}\right) \right\} \right) \nonumber \\&= (I+M^{(\kappa )})H_{D^l}\left( (I+M^{(\sigma )})\hat{S}^n\left( \hat{x}^{l}\right) ,\mathcal{M}^{(\sigma )}\left\{ \hat{S}^n\left( \hat{x}^{l}\right) \right\} \right) . \end{aligned}$$

(31)

If Algorithm 1 is applied without the acceleration step (PSOR), Lemma 1 and (31) give

$$\begin{aligned} (I+M^{(\kappa )})x^{l+1}&=(I+M^{(\kappa )})(I+M^{(\sigma )})\hat{S}^n\left( \hat{x}^{l}\right) \\&=\hat{H}_{D^l}\left( \hat{S}^n\left( \hat{x}^{l}\right) \right) \\&=\hat{S}_{D^l}\left( \hat{x}^{l}\right) \\&=\hat{x}^{l+1}. \end{aligned}$$

Now we consider Algorithm 1 with the APSOR and LSAPSOR acceleration step. For this purpose set $\hat{y}:=\hat{S}^n\left( \hat{x}^{l}\right) $ and $y:=(I+M^{(\sigma )})\hat{y}$. In the APSOR case Lemma 12 and (31) give

$$\begin{aligned} (I+M^{(\kappa )})x^{l+1}&= (I+M^{(\kappa )})\left( y+D^l \alpha _{D^l}(y)\right) \\&= \hat{y}+\hat{D}^l\hat{\alpha }_{\hat{D}^l}(\hat{y})\\&= \hat{H}_{\hat{D}^l}\left( \hat{S}^n\left( \hat{x}^{l}\right) \right) \\&= \hat{S}_{\hat{D}^l}\left( \hat{x}^{l}\right) \\&= \hat{x}^{l+1}. \end{aligned}$$

Analogously, we obtain in the LSAPSOR case

$$\begin{aligned} (I+M^{(\kappa )})x^{l+1}&= (I+M^{(\kappa )})\left( y+\xi _{D^l}(y,\gamma _{D^l}(y)) D^l \gamma _{D^l}(y)\right) \\&= \hat{y}+\hat{\xi }_{\hat{D}^l}(\hat{y},\gamma _{D^l}(\hat{y}))\hat{D}^l\hat{\gamma }_{\hat{D}^l}(\hat{y})\\&= \hat{H}_{\hat{D}^l}\left( \hat{S}^n\left( \hat{x}^{l}\right) \right) \\&= \hat{S}_{\hat{D}^l}\left( \hat{x}^{l}\right) \\&= \hat{x}^{l+1}. \end{aligned}$$

$\square $

Proof of Theorem 6

Let $l\ge 1$ and define $\hat{S}^0:{\mathbb {R}}^n\rightarrow {\mathbb {R}}^n$ as $\hat{S}^0(x):=x$. Furthermore, let $i\in \{1,\ldots ,n\}$ and define $\tilde{S}^i:{\mathbb {R}}^n\rightarrow {\mathbb {R}}^n$ as

$$\begin{aligned} \left( \tilde{S}^i(x)\right) _j:={\left\{ \begin{array}{ll} x_i+{\hat{\delta }}_i[\hat{L}_i-\hat{A}_{i,:} \hat{S}^{i-1}(x)],& i=j\\ x_j,& \text{ otherwise }, \end{array}\right. }\quad j=1,\ldots ,n \end{aligned}$$

(32)

and set

$$\begin{aligned} \eta ^l_i:={\left\{ \begin{array}{ll} \left( \tilde{S}^i(\hat{x}^l)-\hat{x}^l\right) _i^{-1}\left( \hat{S}^i(\hat{x}^l)-\hat{x}^l\right) _i, & \left( \tilde{S}^i(\hat{x}^l)-\hat{x}^l\right) _i\ne 0\\ 0, & \text {otherwise}. \end{array}\right. } \end{aligned}$$

Due to the projection $\hat{P}_i$ it is easy to see that $\eta ^{l}_i \in [0,1]$. From (32) and Lemma 7 we get

$$\begin{aligned} \hat{L}_i=\omega ^{-1} \hat{A}_{ii} \left( \tilde{S}^i(\hat{x}^l)-\hat{x}^l\right) _i+\hat{A}_{i,:} \hat{S}^{i-1}(\hat{x}^l). \end{aligned}$$

Note that for $y^{l,i}:=\hat{S}^{i}(\hat{x}^l) - \hat{S}^{i-1}(\hat{x}^l)$ it holds

$$\begin{aligned} y^{l,i}_j={\left\{ \begin{array}{ll} \left( \hat{S}^{i}(\hat{x}^l) - \hat{x}^l\right) _i,& i=j\\ 0,& \text {otherwise}. \end{array}\right. } \end{aligned}$$

Thus,

$$\begin{aligned}&\hat{E}\left( \hat{S}^{i}(\hat{x}^l)\right) - \hat{E}\left( \hat{S}^{i-1}(\hat{x}^l)\right) \nonumber \\&\quad = \frac{1}{2}\left( y^{l,i}\right) ^\top \hat{A}y^{l,i}+\left( y^{l,i}\right) ^\top \left( \hat{A} \hat{S}^{i-1}(\hat{x}^l) - \hat{L}\right) \nonumber \\&\quad = \frac{1}{2}\left( y^{l,i}_i\right) ^\top \hat{A}_{ii}y^{l,i}_i+\left( y^{l,i}_i\right) ^\top \left( \hat{A}_{i,:} \hat{S}^{i-1}(\hat{x}^l) - \hat{L}_i\right) \nonumber \\&\quad =\frac{\hat{A}_{ii}}{2} \left( \hat{S}^{i}(\hat{x}^l) - \hat{x}^l\right) _i^2 +\omega ^{-1} \hat{A}_{ii} \left( \hat{S}^{i}(\hat{x}^l) - \hat{x}^l\right) _i \left( \hat{x}^l-\tilde{S}^i(\hat{x}^l)\right) _i. \end{aligned}$$

(33)

If $\eta ^l_i=0$, then $\left( \hat{S}^{i}(\hat{x}^l)- \hat{x}^l \right) _i=0$ or $\left( \tilde{S}^{i}(\hat{x}^l)-\hat{x}^l\right) _i=0$. Note that $\left( \tilde{S}^{i}(\hat{x}^l)-\hat{x}^l\right) _i=0$ implies

$$\begin{aligned} \left( \hat{S}^{i}(\hat{x}^l)\right) _i=\hat{P}_i\left( \left( \tilde{S}^{i}(\hat{x}^l{)}\right) _i\right) =\hat{P}_i(\hat{x}^l_i)=\hat{x}^l_i \end{aligned}$$

as $\hat{x}^l\in \hat{K}$ for $l\ge 1$. Thus, in this case we conclude from (33) that

$$\begin{aligned} \hat{E}\left( \hat{S}^{i}(\hat{x}^l)\right) = \hat{E}\left( \hat{S}^{i-1}(\hat{x}^l)\right) . \end{aligned}$$

(34)

If $\eta ^l_i\ne 0$, then (33) yields

$$\begin{aligned} \hat{E}\left( \hat{S}^{i}(\hat{x}^l)\right) - \hat{E}\left( \hat{S}^{i-1}(\hat{x}^l)\right) = \frac{\hat{A}_{ii}}{2}\left( 1-\frac{2}{\omega \eta ^l_i}\right) \left( \hat{S}^{i}(\hat{x}^l) - \hat{x}^l\right) _i^2<0. \end{aligned}$$

(35)

Thus, we conclude from (23) and $\hat{S}^n(\hat{x}^{l})\in \hat{K}$ as well as from (34) and (35)

$$\begin{aligned} \hat{E}\left( \hat{x}^{l+1}\right)&= \hat{E}\left( \hat{S}_{\hat{D}^l}(\hat{x}^{l})\right) \nonumber \\&=\hat{E}\left( \hat{H}_{\hat{D}^l}\left( \hat{S}^n(\hat{x}^{l})\right) \right) \nonumber \\&\le \hat{E}\left( \hat{S}^{n}(\hat{x}^{l})\right) \nonumber \\&\le \hat{E}\left( \hat{S}^{n-1}(\hat{x}^{l})\right) \nonumber \\&\le \ldots \nonumber \\&\le \hat{E}\left( \hat{S}^{1}(\hat{x}^{l})\right) \nonumber \\&\le \hat{E}\left( \hat{S}^{0}(\hat{x}^{l})\right) \nonumber \\&= \hat{E}\left( \hat{x}^{l}\right) , \end{aligned}$$

(36)

i.e. the sequence $ \{\hat{E}(\hat{x}^l) \}_{l\in {\mathbb {N}}}$ is monotonic decreasing. Moreover, it is bounded from below as

$$\begin{aligned} \hat{E}(\hat{x})\ge \frac{1}{2}\lambda \Vert \hat{x}\Vert _2^2 -\Vert L\Vert _2\Vert x\Vert _2=\frac{1}{2} \lambda \left( \Vert \hat{x}\Vert _2-\frac{\Vert L\Vert _2}{\lambda }\right) ^2-\frac{\Vert L\Vert _2^2}{2\lambda } \end{aligned}$$

(37)

with the minimal eigenvalue $\lambda >0$ of $\hat{A}$ which, in particular, yields

$$\begin{aligned} \Vert \hat{x}\Vert _{\hat{A}}\ge \lambda ^{1/2} \Vert \hat{x}\Vert _2. \end{aligned}$$

(38)

Hence, $ \{\hat{E}(\hat{x}^{l}) \}_{l\in {\mathbb {N}}}$ has a limit. We obtain from (21), (35) and (36)

$$\begin{aligned} \Vert \hat{S}^n(\hat{x}^l) - \hat{x}^{l} \Vert _2^2&= \sum _{i=1,\; \eta ^{l,i}\ne 0}^n \left( \hat{S}^{i}(\hat{x}^l) - \hat{x}^l\right) _i^2\nonumber \\&=\sum _{i=1}^n \frac{2}{\hat{A}_{ii}} \left( \frac{2}{\omega \eta _{l,i}} -1 \right) ^{-1} \left( \hat{E}\left( \hat{S}^{i-1}(\hat{x}^l)\right) -\hat{E}\left( \hat{S}^i(\hat{x}^l)\right) \right) \nonumber \\&\le \varrho _1 \left( \hat{E}\left( \hat{x}^{l}\right) -\hat{E}\left( \hat{x}^{l+1})\right) \right) \end{aligned}$$

(39)

with

$$\begin{aligned} \varrho _1:=\frac{2n\omega }{(2-\omega )\min _{i\in \{1,\ldots ,n\}} \hat{A}_{ii}}. \end{aligned}$$

For $i\in \{1,\ldots ,n\}$ and $\hat{x}\in {\mathbb {R}}^n$ we obtain from (21) and (22)

$$\begin{aligned} (\hat{S}^n(\hat{x}))_i-\hat{x}_i&= (\hat{S}^i(\hat{x}))_i-\hat{x}_i\nonumber \\&= \hat{P}_i\left( \hat{x}_i+\hat{\delta }_i[\hat{L}_i-\hat{A}_{i,:}\hat{S}^{i-1}(\hat{x})]\right) -\hat{x}_i\nonumber \\&= \min \left\{ \hat{x}_i+\hat{\delta }_i[\hat{L}_i-\hat{A}_{i,:}\hat{S}^{i-1}(\hat{x})],\hat{g}_i\right\} -\hat{x}_i\nonumber \\&= \min \left\{ \hat{\delta }_i[\hat{L}_i-\hat{A}_{i,:}\hat{S}^{i-1}(\hat{x})],\hat{g}_i-\hat{x}_i\right\} , \end{aligned}$$

(40)

where we set $\infty -a:=\infty $ for some $a\in {\mathbb {R}}$. For an accumulation point $\hat{x}^*\in {\mathbb {R}}^n$ let $\{x^{l_k}\}_{k\in {\mathbb {N}}}$ be a subsequence with the limit $\hat{x}^*$, i.e. $\hat{x}^*=\lim _{k\rightarrow \infty } \hat{x}^{l_k}$. In the PSOR case we get from (39)

$$\begin{aligned} \lim _{l\rightarrow \infty } \Vert \hat{x}^{l+1}-\hat{x}^{l}\Vert _2^2&=\lim _{l\rightarrow \infty } \Vert \hat{H}_{\hat{D}}\left( \hat{S}^n(\hat{x}^{l})\right) -\hat{x}^{l}\Vert _2^2\\&=\lim _{l\rightarrow \infty } \Vert \hat{S}^n(\hat{x}^{l})-\hat{x}^{l}\Vert _2^2\\&\le \lim _{l\rightarrow \infty } \varrho _1 \left( \hat{E}\left( \hat{x}^{l}\right) -\hat{E}\left( \hat{x}^{l+1}\right) \right) \\&=0. \end{aligned}$$

Hence, $\hat{x}^*=\lim _{k\rightarrow \infty } \hat{x}^{l_k+1}$ and, thus, $\hat{S}^{i-1}(\hat{x}^*)=\hat{x}^*$. This and (40) yield

$$\begin{aligned} 0&=\lim _{k\rightarrow \infty } \hat{x}^{l_k+1}_i-\hat{x}^{l_k}_i\\&= \lim _{k\rightarrow \infty } \left( \hat{H}_{\hat{D}^{l_k}}\left( \hat{S}^n(\hat{x}^{l_k}{)}\right) \right) _i-\hat{x}^{l_k}_i\\&= \lim _{k\rightarrow \infty } (\hat{S}^n(\hat{x}^{l_k}))_i-\hat{x}^{l_k}_i\\&= \min \left\{ \hat{\delta }_i[\hat{L}_i-\hat{A}_{i,:} \hat{x}^*],\hat{g}_i-\hat{x}^*_i\right\} . \end{aligned}$$

In the APSOR and the LSAPSOR case we conclude from (25) and (38)

$$\begin{aligned} \Vert \hat{x}^{l+1} - \hat{S}^n(\hat{x}^l)) \Vert _2^2&\le \lambda ^{-1} \Vert x^{l+1} - \hat{S}^n(\hat{x}^l) \Vert _A^2\nonumber \\&= \lambda ^{-1} \Vert \hat{H}_{\hat{D}^l}(\hat{S}^n(\hat{x}^l)) - \hat{S}^n(\hat{x}^l) \Vert _A^2\nonumber \\&\le 2\lambda ^{-1} \left( \hat{E}\left( \hat{S}^n(\hat{x}^l)\right) -\hat{E}\left( \hat{H}_{\hat{D}^l}(\hat{S}^n(\hat{x}^l)\right) \right) \nonumber \\&\le 2\lambda ^{-1} \left( \hat{E}(\hat{x}^{l}) - \hat{E}(\hat{x}^{l+1}) \right) . \end{aligned}$$

(41)

Consequently, (39) and (41) imply

$$\begin{aligned} \Vert \hat{x}^{l+1}-\hat{x}^l\Vert _2^2&\le 2\Vert \hat{x}^{l+1}-\hat{S}^n(\hat{x}^l)\Vert _2^2+2\Vert \hat{S}^n(\hat{x}^l)-\hat{x}^l\Vert _2^2\\&\le \varrho _2\left( \hat{E}(\hat{x}^{l}) - \hat{E}(\hat{x}^{l+1}) \right) \end{aligned}$$

with

$$\begin{aligned} \varrho _2:={2}\left( {\varrho }_1+2\lambda ^{-1}\right) \end{aligned}$$

and, hence,

$$\begin{aligned} \lim _{l\rightarrow \infty } \Vert \hat{x}^{l+1}-\hat{x}^l\Vert _2=0. \end{aligned}$$

(42)

Defining

$$\begin{aligned} \hat{\alpha }^l:=\operatorname {argmin}_{\hat{\beta }\in \hat{\Theta }_{\hat{D}^l}(\hat{S}^n\left( \hat{x}^{l})\right) } E\left( \hat{S}^n\left( \hat{x}^{l}\right) +\hat{D}^l\hat{\beta }\right) \end{aligned}$$

we get from (41)

$$\begin{aligned} 0&\le \Vert \hat{D}^l\hat{\alpha }^l\Vert _2\\&=\Vert \hat{H}_{\hat{D}^l}\left( \hat{S}^n(\hat{x}^l)\right) -\hat{S}^n(\hat{x}^l)\Vert _2\\&=\Vert \hat{x}^{l+1} -\hat{S}^n(\hat{x}^l) \Vert _2^2\\&\le 2\lambda ^{-{1}} \left( \hat{E}(\hat{x}^{l}) - \hat{E}(\hat{x}^{l+1}) \right) \end{aligned}$$

and, thus,

$$\begin{aligned} \lim _{l\rightarrow \infty } \hat{D}^l\hat{\alpha }^l=0. \end{aligned}$$

This, (40) and (42) yield

$$\begin{aligned} 0&=\lim _{k\rightarrow \infty } \hat{x}^{l_k+1}_i-\hat{x}^{l_k}_i\\&= \lim _{k\rightarrow \infty } \left( \hat{H}_{\hat{D}^{l_k}}\left( \hat{S}^n(\hat{x}^{l_k}\right) \right) _i-\hat{x}^{l_k}_i\\&= \lim _{k\rightarrow \infty } \left( \hat{D}^{l_k}\hat{\alpha }^{l_k}\right) _i+\left( \hat{S}^n(\hat{x}^{l_k})\right) _i-\hat{x}^{l_k}_i\\&= \min \left\{ \hat{\delta }_i[\hat{L}_i-\hat{A}_{i,:} \hat{x}^*],\hat{g}_i-\hat{x}^*_i\right\} . \end{aligned}$$

Therefore, in the PSOR case as well as the APSOR and the LSAPSOR case we have $\hat{x}^*\in \hat{K}$ and $(\hat{A}\hat{x}^*-\hat{L})_i=0$ if $\hat{g}_i=\infty $. Furthermore, we conclude $\hat{L}_i-\hat{A}_{i,:} \hat{x}^*\ge 0$. Thus, for $\hat{y}\in \hat{K}$ we obtain

$$\begin{aligned} (\hat{y}-\hat{x}^*)^\top (\hat{A}\hat{x}^*-\hat{L})&= \sum _{i=1}^n (\hat{y}-\hat{x}^*)_i (\hat{A}\hat{x}^*-\hat{L})_i\\&= \sum _{i=1,\hat{g}_i\ne \infty }^n \left( (\hat{y}-\hat{g})_i+(\hat{g}-\hat{x}^*)_i\right) (\hat{A}\hat{x}^*-\hat{L})_i\\&\ge \sum _{i=1,\,\hat{g}_i\ne \infty }^n (\hat{g}-\hat{x}^*)_i (\hat{A}\hat{x}^*-\hat{L})_i\\&= 0. \end{aligned}$$

Since $\hat{x}\in \hat{K}$ is the unique solution of (10) we have

$$\begin{aligned} \hat{x}^*=\hat{x}. \end{aligned}$$

(43)

From (37) it follows that $\hat{E}$ is coercive. Hence, $\{\hat{x}^l\}_{l\in {\mathbb {N}}}$ is bounded as $\{E(\hat{x}^{l}) \}_{l\in {\mathbb {N}}}$ is bounded. This means that $\{\hat{x}^l\}_{l\in {\mathbb {N}}}$ has at least one accumulation point. From (43) we conclude

$$\begin{aligned} \hat{x}=\liminf _{l\rightarrow \infty } \hat{x}^l=\limsup _{l\rightarrow \infty } \hat{x}^l \end{aligned}$$

and, thus, $\lim _{l\rightarrow \infty } \hat{x}^l=\hat{x}$. Finally, from Theorem 3 and Lemma 13 we conclude

$$\begin{aligned} \lim _{l\rightarrow \infty } x^l=\lim _{l\rightarrow \infty } (I+M^{(\sigma )}) \hat{x}^l=(I+M^{(\sigma )})\hat{x}=x. \end{aligned}$$

Since $\{\hat{E}(\hat{x}^l) \}_{l\in {\mathbb {N}}}$ is monotonic decreasing, Lemma 13 shows that this is also true for the sequence $ \{E(x^l) \}_{l\in {\mathbb {N}}}$. $\square $

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Banz, L., Schröder, A. A variable transformation to box constraints for solving variational inequalities. Comp. Appl. Math. 44, 135 (2025). https://doi.org/10.1007/s40314-024-03066-x

Download citation

Received: 06 June 2023
Revised: 16 November 2024
Accepted: 18 December 2024
Published: 22 January 2025
DOI: https://doi.org/10.1007/s40314-024-03066-x

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A variable transformation to box constraints for solving variational inequalities

Abstract

Similar content being viewed by others

A Penalty Approach for a Box Constrained Variational Inequality Problem

A structure-preserving pivotal method for affine variational inequalities

Projection free methods on product domains

1 Introduction

2 Transformation to box constraints

Lemma 1

Proof

Lemma 2

Proof

Theorem 3

Proof

Corollary 4

Corollary 5

3 Projective successive overrelaxation with the bc-transformation

Theorem 6

Proof

Remark 1

Remark 2

4 Primal-dual active set solver with the bc-transformation

5 Numerical results

5.1 Effect of the acceleration step and relaxation parameter \(\omega \) for an obstacle problem

5.2 Effect of the bc-transformation on contact solvers

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (zip 1551 KB)

Appendix A: Proof of Theorem 6

Appendix A: Proof of Theorem 6

Lemma 7

Proof

Lemma 8

Proof

Lemma 9

Proof

Lemma 10

Proof

Lemma 11

Proof

Lemma 12

Proof

Lemma 13

Proof

Proof of Theorem 6

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification