A note on the Douglas–Rachford splitting method for optimization problems involving hypoconvex functions

Guo, Ke; Han, Deren

doi:10.1007/s10898-018-0660-z

A note on the Douglas–Rachford splitting method for optimization problems involving hypoconvex functions

Published: 11 May 2018

Volume 72, pages 431–441, (2018)
Cite this article

Download PDF

Journal of Global Optimization Aims and scope Submit manuscript

A note on the Douglas–Rachford splitting method for optimization problems involving hypoconvex functions

Download PDF

Ke Guo¹ &
Deren Han²

767 Accesses
11 Citations
Explore all metrics

Abstract

Recently, the convergence of the Douglas–Rachford splitting method (DRSM) was established for minimizing the sum of a nonsmooth strongly convex function and a nonsmooth hypoconvex function under the assumption that the strong convexity constant $\beta $ is larger than the hypoconvexity constant $\omega $. Such an assumption, implying the strong convexity of the objective function, precludes many interesting applications. In this paper, we prove the convergence of the DRSM for the case $\beta =\omega $, under relatively mild assumptions compared with some existing work in the literature.

Peaceman–Rachford splitting for a class of nonconvex optimization problems

Article 13 May 2017

Guoyin Li, Tianxiang Liu & Ting Kei Pong

The forward–backward splitting method for non-Lipschitz continuous minimization problems in Banach spaces

Article 08 March 2022

Wei-Bo Guan & Wen Song

Douglas–Rachford splitting for nonconvex optimization with application to nonconvex feasibility problems

Article 24 November 2015

Guoyin Li & Ting Kei Pong

1 Introduction

In this paper, we are interested in the following optimization problem

$$\begin{aligned} \min _{x\in \mathcal {R}^n} f(x)+g(x), \end{aligned}$$

(1.1)

where f is a strongly convex function with constant $\beta >0$ and g is a hypoconvex (also called weakly convex or semiconvex) function with constant $\omega >0$, see Sect. 3 for precise assumptions. Problem (1.1) is one of the most studied models in modern optimization with a huge body of literature (see, for instance, [6] and the references therein).

A well known algorithm for solving problem (1.1) is the so-called Douglas–Rachford splitting method (DRSM), which traces back to [10, 17] and has been well studied from various perspectives in the literature, see, e.g., [3, 4, 11, 12, 14]. The iterative process of the Douglas–Rachford splitting method only involves the evaluation of the proximal mappings of f and g, which is simple in many applications. We recall that for a proper lower semicontinuous function $h:\mathcal {R}^n\rightarrow \mathcal {R}\cup \{+\,\infty \}$ and a parameter $\nu >0$, the proximal mapping $\text{ prox }_{\nu h}$ is defined as

$$\begin{aligned} \text{ prox }_{\nu h}(x):=\arg \min _{y\in \mathcal {R}^n}\left\{ h(y)+\frac{1}{2\nu }\Vert y-x\Vert ^2\right\} . \end{aligned}$$

(1.2)

If $\inf \nolimits _{y\in \mathcal {R}^n}h(y)>-\,\infty $, then for every $\nu \in (0, +\,\infty )$, the set $\text{ prox }_{\nu h}(x)$ is nonempty and compact; and $\text{ prox }_{\nu h}(x)$ is single-valued if h is further assumed to be convex, see, e.g [20, Theorems 1.25 and 2.26]. Precisely, for an arbitrary starting point $z_{1}\in \mathcal {R}^{n}$, DRSM iteratively generates a sequence $\{z_{k}\}_{k\in N}$ via the following rule

$$\begin{aligned} z_{k+1}=\widetilde{T}_{DR}(z_k):= ((1-\alpha )I+\alpha R_{\lambda f}R_{\lambda g})(z_k), \end{aligned}$$

(1.3)

where $\alpha \in (0,1)$ is a parameter; I is the identity operator; $\lambda >0$ is the proximal parameter, and $R_{\lambda f}:=2\text{ prox }_{\lambda f}-I$ and $R_{\lambda g}:=2\text{ prox }_{\lambda g}-I$ are the reflection operators of f and g, respectively.

For the case where both f and g are convex, the convergence of DRSM has been extensively studied, see, e.g., [11, 17]. Indeed, its convergence is an immediate conclusion of the convergence result of the well known Krasnosel’skiĭ-Mann theorem [16] if we regard the scheme (1.3) as a convex combination of the nonexpansive operator $R_{\lambda f}R_{\lambda g}$ and the identity operator. However, the research on the convergence of the DRSM for the optimization problems involving nonconvex functions is still in infancy, and there are recently a few results for the “strongly + weakly” convex problem (1.1), i.e., the function f is strongly convex with constant $\beta >0$ and g is weakly convex with constant $\omega >0$.

The first effort seems to be the work [5], in which the convergence of (1.3) was established under the additional conditions that f is second-order differentiable, its gradient ${\nabla } f$ is Lipschitz continuous with constant $L>0$, $\beta = \omega $, $0<\lambda \le 1/\sqrt{L\beta }$ and $\alpha \in (0,1)$. Essentially, these additional assumptions are for ensuring the contraction property of the operator $R_{\lambda f}$ and the nonexpansiveness of the operator $R_{\lambda f}R_{\lambda g}$, because the analysis in [5] still follows the framework of the classical Krasnosel’skiĭ-Mann theorem. Very recently, the authors in [13] proved the convergence of the DRSM scheme (1.3) for the “strongly + weakly” convex problem (1.1) without any differentiability assumption on the strongly convex function f. Different from the technique in [5] based upon the nonexpansiveness of the operator $R_{\lambda f}R_{\lambda g}$, their technique is based on the Fejér monotonicity of the sequence $\{z_{k}\}_{k\in N}$ generated by (1.3) with respect to the fixed point set of $\widetilde{T}_{DR}$ and it does not require the nonexpansivenss of the operator $R_{\lambda f}R_{\lambda g}$. Here, for an operator $M:\mathcal {R}^{n}\rightarrow \mathcal {R}^{n}$, the fixed point set of M is defined as $\text{ Fix }(M):=\{z\in \mathcal {R}^{n}: M(z)=z \}$. Meanwhile, without any differentiability assumption on f, they alternatively require the condition $\beta >\omega $ which is slightly stronger than the condition $\beta = \omega $ in [5].

In many applications, the objective function $f+g$ is merely convex, but not strongly convex. It is thus desirable if we can establish the convergence of DRSM under this mild assumption. In this paper, our principal purpose is to complete this task. To make our analysis more realistic, we allow variable combination parameters and inexact evaluation of the reflection operators. That is, we consider the more general inexact DRSM (GIDRSM) scheme

$$\begin{aligned} z_{k+1}:=\alpha _{k}z_{k}+\beta _{k}(R_{\lambda f}R_{\lambda g}(z_{k})+e_{k}), \end{aligned}$$

(1.4)

where $\alpha _{k}, \beta _{k}\in [0,1]$ are suitable parameters satisfying $\alpha _{k}+\beta _{k}\le 1$, and $e_{k}$ represents an error in the evaluation of $R_{\lambda f}R_{\lambda g}(z_{k})$. Note that, the DRSM (1.3) corresponds to the case $e_{k}\equiv 0$, $\alpha _{k}\equiv 1-\alpha $, and $\beta _{k}\equiv \alpha $. Our aim is to prove the convergence of GIDRSM (1.4) for the case $\beta =\omega $ under the additional assumption that f is continuously differentiable such that its gradient ${\nabla }f$ is Lipschitz continuous, which is weaker than that in [5].

The paper is organized in the following way. In Sect. 2, we recall some definitions and known results for further analysis. In Sect. 3, we present our main results, i.e., the convergence of GIDRSM (1.4) under weaker conditions than the existing work for (1.3). Finally, some concluding remarks are given in Sect. 4.

2 Preliminaries

In this section, we recall some definitions and known results that will be used in our analysis later.

Definition 2.1

[2, Definition 4.4] Let D be a nonempty subset of $\mathcal {R}^n$, let $M: D\rightarrow \mathcal {R}^n$, and let $\kappa >0$. Then M is said to be $\kappa $-cocoercive if

$$\begin{aligned} \langle M(x)-M(y),x-y\rangle \ge \kappa \Vert M(x)-M(y)\Vert ^{2},~~\forall x,y\in D. \end{aligned}$$

Lemma 2.1

[15, Theorem 3.1] Let D be a nonempty closed convex subset of $\mathcal {R}^{n}$. Suppose that $M:\mathcal {R}^{n}\rightarrow D$ is a nonexpansive mapping such that its set of fixed points $\text{ Fix }(M)$ is nonempty. Let the sequence $\{z_{k}\}_{k\in N}$ in $\mathcal {R}^{n}$ be generated by choosing $z_{1}\in \mathcal {R}^{n}$ and using the recursion

$$\begin{aligned} z_{k+1}:=\alpha _{k}z_{k}+\beta _{k} M(z_{k})+r_{k},~~\forall k\ge 1, \end{aligned}$$

(2.1)

where $r_{k}$ denotes the residual vector. Here we assume that $\{\alpha _{k}\}_{k\in N}$ and $\{\beta _{k}\}_{k\in N}$ are real sequences in [0,1] such that $\alpha _{k}+\beta _{k}\le 1$ for all $k\ge 1$ and the following conditions hold: (a) $\sum _{k=1}^{\infty }\alpha _{k}\beta _{k}=\infty $; (b) $\sum _{k=1}^{\infty }\Vert r_{k}\Vert <\infty $; and (c) $\sum _{k=1}^{\infty }(1-\alpha _{k}-\beta _{k})<\infty $. Then the sequence $\{z_{k}\}_{k\in N}$ generated by (2.1) converges to a fixed point of M.

Definition 2.2

[20, Definition 12.58] A function $f:\mathcal {R}^n\rightarrow \mathcal {R}\cup \{+\,\infty \}$ is strongly convex with constant $\beta >0$ if for any $x,y\in \mathcal {R}^n$ and for any $\theta \in (0,1)$, we have

$$\begin{aligned} f((1-\theta )x+\theta y)\le (1-\theta )f(x) + \theta f(y)-\frac{\beta \theta (1-\theta )}{2}\Vert x-y\Vert ^2. \end{aligned}$$

Moreover, if the above inequality holds for $\beta =0$, then we call f is convex function.

Lemma 2.2

[2, Proposition 12.27] Let $h:\mathcal {R}^n\rightarrow \mathcal {R}\cup \{+\,\infty \}$ be a proper lower semicontinuous convex function and $\nu >0$, then the proximal operator $\text{ prox }_{\nu h}$ given in (1.2) is firmly nonexpansive.

Definition 2.3

[21, Definition 3.10] A proper lower semicontinuous function $g: \mathcal {R}^n\rightarrow \mathcal {R}\cup \{+\,\infty \}$ is called hypoconvex (weakly convex or semiconvex) if for some $\omega >0$, the function $g(\cdot )+\frac{\omega }{2}\Vert \cdot \Vert ^{2}$ is convex.

Remark 2.1

It is well-known that the set of hypoconvex functions contains several important classes of (nonsmooth) functions as special cases, for example, $\varphi $-convex functions [9] and primal-lower-nice functions [18]. Moreover, any twice continuously differentiable function with a bounded second-order derivative is hypoconvex, see, e.g., [7]. We refer to, e.g., [7, 8, 21], for more properties of hypoconvex functions.

Definition 2.4

[20, Definition 8.3] Consider a function $f:\mathcal {R}^n\rightarrow \mathcal {R}\cup \{+\,\infty \}$ and $\bar{x}\in \text{ dom }~f$.

(i)
The regular subdifferential of f at $\bar{x}$, written $\hat{\partial }f(\bar{x})$, is the set of vectors $x^*\in \mathcal {R}^n$ that satisfy
$$\begin{aligned} \liminf _{\begin{array}{c} y\rightarrow \bar{x} \\ y\ne \bar{x} \end{array}} \frac{f(y)-f(\bar{x})-\langle x^*, y-\bar{x}\rangle }{\Vert y-\bar{x}\Vert }\ge 0. \end{aligned}$$
(ii)
The subdifferential of f at $\bar{x}$, written $\partial f(\bar{x})$, is defined as follows:
$$\begin{aligned} \partial f(\bar{x}):=\left\{ x^*\in \mathcal {R}^n: \exists x_k\rightarrow \bar{x}, f(x_k)\rightarrow f(\bar{x}), x^*_k\in \hat{\partial }f(x_k), \text{ with }~x^*_k\rightarrow x^* \right\} . \end{aligned}$$

Remark 2.2

It follows from Definition 2.4 that the following conclusions hold (see, e.g., [20]).

(i)
If $h:\mathcal {R}^n\rightarrow \mathcal {R}\cup \{+\,\infty \}$ is a proper function and $f:\mathcal {R}^n\rightarrow \mathcal {R}$ is continuously differentiable, then $\partial (f+h)(x)={\nabla } f(x)+\partial h(x)$ for any $x\in \text{ dom }~h$.
(ii)
For any proper convex function $f:\mathcal {R}^n\rightarrow \mathcal {R}\cup \{+\,\infty \}$ and for any $\bar{x}\in \text{ dom }~f$, the subdifferential of f at $\bar{x}$ is defined as $\bar{\partial }f(\bar{x}):=\{v\in \mathcal {R}^{n}|~f(x)\ge f(\bar{x})+\langle v, x-\bar{x}\rangle ~\text {for all}~x\}$. For a convex function f, we have $\partial f(\bar{x})=\hat{\partial } f(\bar{x})=\bar{\partial } f(\bar{x})$ for any $\bar{x}\in \text{ dom }~f$.

The next Lemma is known as the Baillon-Haddad theorem in the literature.

Lemma 2.3

[1, Corollaire 10] Let $f:\mathcal {R}^{n}\rightarrow \mathcal {R}$ be differentiable convex on $\mathcal {R}^{n}$, and such that ${\nabla } f$ is Lipschitz continuous with constant $L>0$. Then ${\nabla } f$ is 1 / L-cocoercive.

The following interesting lemma is from [22].

Lemma 2.4

Let $h:=h_{1}-h_{2}$, where $h_{1}$ is a convex function with ${\nabla } h_{1}$ being Lipschitz continuous with constant $L>0$, $h_{2}$ is a convex function with ${\nabla } h_{2}$ being Lipschitz continuous with constant $l>0$. Assume that $L\ge l$, then ${\nabla } h$ is Lipschitz continuous with constant L.

3 Convergence Analysis

As we recalled in the introduction, the first effort seems to be the work [5], in which the convergence of the DRSM (1.3) for the case $\beta =\omega $ was established under the additional assumptions that f is second-order differentiable, its gradient ${\nabla } f$ is Lipschitz continuous with constant $L>0$ and $0<\lambda \le 1/\sqrt{L\beta }$. In absence of differentiability, [13] established the convergence under the stronger assumption that $\beta >\omega $. In this section, we prove the convergence of the more general iterative process GIDRSM (1.4) for the case $\beta =\omega $, under the following assumptions.

Assumption 3.1

Assume the following conditions are satisfied:

(A1)
$f:\mathcal {R}^n\rightarrow \mathcal {R}$ is strongly convex with constant $\beta >0$, continuously differentiable such that ${\nabla } f$ is Lipschitz continuous with constant $L>0$;
(A2)
$g: \mathcal {R}^n\rightarrow \mathcal {R}\cup \{+\,\infty \}$ is proper and lower semicontinuous hypoconvex with constant $\beta >0$;
(A3)
The set $X^{*}$ of all optimal solutions of problem (1.1) is nonempty, that is , $X^{*}\ne \emptyset $.

Remark 3.1

Compared with the assumptions made in [5], we only require the strongly convex function f is continuously differentiable, not necessarily second-order differentiable.

Recall that g is assumed to be hypoconvex with constant $\beta >0$, i.e.,

$$\begin{aligned} \tilde{g}(x):=g(x)+\frac{\beta }{2}\Vert x\Vert ^{2} \end{aligned}$$

(3.1)

is convex. According to [13, Section 3], for $0<\lambda <1/\beta $, it follows that $\text{ prox }_{\lambda g}$ is single-valued on $\mathcal {R}^{n}$ and

$$\begin{aligned} \text{ prox }_{\lambda g}(x)= (I+\lambda \partial g)^{-1}(x). \end{aligned}$$

Here, $\partial g$ is the subdifferential of g defined in (ii) of Definition 2.4. Moreover, based on (A3) in Assumption 3.1, it follows from the assertions (i) and (ii) in Remark 2.2 that $x^*\in X^{*}$ if and only if

$$\begin{aligned} 0\in \bar{\partial } (f+g)(x^*) = {\nabla } f(x^*) + \partial g(x^*). \end{aligned}$$

(3.2)

Proposition 3.1

Suppose that Assumption 3.1 holds and $0\!<\!\lambda \!<\!1/\beta $. Then, $\text{ Fix }(R_{\lambda f}R_{\lambda g})\!\ne \!\emptyset $. Moreover, $\text{ prox }_{\lambda g}(z^*)$ is a solution of (1.1) for $z^{*} \in \text{ Fix }(R_{\lambda f}R_{\lambda g})$.

Proof

The proof is similar to [13, Proposition 4.1], however, for the convenience of the reader, we sketch it here. Indeed, it follows from (3.2) that

$$\begin{aligned} 2x^*\in (I+\lambda {\nabla }f)(x^*)+(I+\lambda \partial g)(x^*). \end{aligned}$$

By setting $z^{*}:=2x^{*}-(I+\lambda {\nabla } f)(x^{*})$, we obtain $z^*\in (I+\lambda \partial g)(x^*)$, i.e., $x^*=(I+\lambda \partial g)^{-1}(z^*)=\text{ prox }_{\lambda g}(z^*)$. By means of this, we have

$$\begin{aligned} R_{\lambda g}(z^*)=2\text{ prox }_{\lambda g}(z^*)-z^*=2x^{*}-z^{*}=(I+\lambda {\nabla } f)(x^{*}) \end{aligned}$$

(3.3)

Since $\text{ prox }_{\lambda f}=(I+\lambda {\nabla } f)^{-1}$ is single-valued, it follows from (3.3) that

$$\begin{aligned} { x^*=\text{ prox }_{\lambda f}R_{\lambda g}(z^*).} \end{aligned}$$

Thus,

$$\begin{aligned} z^*=2x^*-R_{\lambda g}(z^*)=2\text{ prox }_{\lambda f}R_{\lambda g}(z^*)-R_{\lambda g}(z^*)=R_{\lambda f}R_{\lambda g}(z^*). \end{aligned}$$

Thence, $\text{ Fix }(R_{\lambda f}R_{\lambda g})\ne \emptyset $. Moreover, for any $z^{*}\in \text{ Fix }(R_{\lambda f}R_{\lambda g})$, we have

$$\begin{aligned} z^{*}=R_{\lambda f}R_{\lambda g}(z^*)=2\text{ prox }_{\lambda f}R_{\lambda g}(z^{*})-R_{\lambda g}(z^{*}). \end{aligned}$$

(3.4)

Setting $y^{*}:=\text{ prox }_{\lambda f}R_{\lambda g}(z^{*})$ in (3.4), we get

$$\begin{aligned} 2y^{*}-z^{*}=R_{\lambda g}(z^{*})=2\text{ prox }_{\lambda g}(z^{*})-z^{*}. \end{aligned}$$

(3.5)

which means $y^{*}=\text{ prox }_{\lambda g}(z^{*})$, i.e.,

$$\begin{aligned} z^{*}\in (I+\lambda \partial g)(y^{*}). \end{aligned}$$

(3.6)

On the other hand, it follows from the definition of $y^{*}$ and (3.5) that $y^{*}=\text{ prox }_{\lambda f}(2y^{*}-z^{*})$. This means

$$\begin{aligned} 2y^{*}-z^{*}= (I+\lambda {\nabla } f)(y^{*}). \end{aligned}$$

(3.7)

Adding (3.6) and (3.7), and invoking (3.2), we know that $y^{*}=\text{ prox }_{\lambda g}(z^{*})$ is a solution of problem (1.1). The proof is complete. $\square $

To prove the convergence of the GIDRSM (1.4), we need to extensively analyze the terms $\Vert R_{\lambda f}(x)-R_{\lambda f}(y)\Vert $ and $\Vert R_{\lambda g}(x)-R_{\lambda g}(y)\Vert $; and derive their bounds. The following lemma focuses on $\Vert R_{\lambda f}(x)-R_{\lambda f}(y)\Vert $.

Lemma 3.1

Let $f:\mathcal {R}^n\rightarrow \mathcal {R}$ be a strongly convex function with constant $\beta >0$, continuously differentiable such that ${\nabla } f$ is Lipschitz continuous with constant $L>0$. Then for any $x,y\in \mathcal {R}^n$ and $\lambda >0$, we have

$$\begin{aligned} \Vert R_{\lambda f}(x)-R_{\lambda f}(y)\Vert \le \frac{|1-\lambda \beta |}{1+\lambda \beta }\cdot \Vert x-y\Vert \end{aligned}$$

Proof

First, we claim that $\beta \le L$. To see this, since f is strongly convex with constant $\beta >0$, by [20, Exercise 12.59] we know that ${\nabla } f$ is strongly monotone with constant $\beta $, that is, for any $x,y\in \mathcal {R}^{n}$,

$$\begin{aligned} \beta \Vert x-y\Vert ^{2}\le & {} \langle {\nabla } f(x)-{\nabla } f(y), x-y\rangle \nonumber \\\le & {} \Vert {\nabla } f(x)-{\nabla } f(y)\Vert \cdot \Vert x-y\Vert \nonumber \\\le & {} L\Vert x-y\Vert ^{2}, \end{aligned}$$

(3.8)

where the second inequality follows from the Cauchy-Schwarz inequality and the third inequality follows from the Lipschitz continuity of ${\nabla } f$. Thus, (3.8) yields the desired result. By setting

$$\begin{aligned} \tilde{f}(x):=f(x)-\frac{\beta }{2}\Vert x\Vert ^{2}, \end{aligned}$$

(3.9)

it follows from [20, Exercise 12.59] that $\tilde{f}$ is convex because of the strong convexity of f.

Set $h_{1}(x):=f(x)$ and $h_{2}(x):=\frac{\beta }{2}\Vert x\Vert ^{2}$, then $h_{1}$ is convex with ${\nabla } h_{1}$ being Lipschitz continuous with constant $L>0$, $h_{2}$ is convex with ${\nabla } h_{2}$ being Lipschitz continuous with constant $\beta >0$. In view of $L\ge \beta $, it follows from Lemma 2.4 that ${\nabla } \tilde{f}$ is Lipschitz continuous with constant $L>0$. Recall the definition of the proximal mapping, we have

$$\begin{aligned} \text{ prox }_{\lambda f}(x)= & {} \arg \min _{y\in \mathcal {R}^{n}}\left\{ f(y)+\frac{1}{2\lambda }\Vert y-x\Vert ^{2} \right\} \nonumber \\= & {} \arg \min _{y\in \mathcal {R}^{n}}\left\{ \tilde{f}(y)+\frac{1}{2\lambda }\Vert y-x\Vert ^{2}+\frac{\beta }{2}\Vert y\Vert ^{2}\right\} \nonumber \\= & {} \text{ prox }_{\frac{\lambda }{1+\lambda \beta }\tilde{f}}\left( \frac{1}{1+\lambda \beta }\cdot x\right) . \end{aligned}$$

(3.10)

By the optimality condition of (3.10), we know

$$\begin{aligned} 0={\nabla } \tilde{f}(\text{ prox }_{\lambda f}(x))+\frac{1}{\lambda }(\text{ prox }_{\lambda f}(x)-x)+\beta \text{ prox }_{\lambda f}(x), \end{aligned}$$

which means that

$$\begin{aligned} {\nabla } \tilde{f}(\text{ prox }_{\lambda f}(x))=\frac{1}{\lambda }x-\left( \frac{1}{\lambda }+\beta \right) \text{ prox }_{\lambda f}(x). \end{aligned}$$

(3.11)

Since $\tilde{f}$ is convex and ${\nabla } \tilde{f}$ is Lipschitz continuous with constant $L>0$, by Lemma 2.3 we know ${\nabla } \tilde{f}$ is 1 / L-cocoercive, that is,

$$\begin{aligned} \left\langle {\nabla } \tilde{f}(\text{ prox }_{\lambda f}(x))-{\nabla } \tilde{f}(\text{ prox }_{\lambda f}(y)), x-y\right\rangle \ge \frac{1}{L}\left\| {\nabla } \tilde{f}(\text{ prox }_{\lambda f}(x))-{\nabla } \tilde{f}(\text{ prox }_{\lambda f}(y))\right\| ^{2}. \end{aligned}$$

(3.12)

Substituting (3.11) into (3.12), we have

$$\begin{aligned}&\left\langle \frac{1}{\lambda }x-\left( \frac{1}{\lambda }+\beta \right) \text{ prox }_{\lambda f}(x)-\frac{1}{\lambda }y+\left( \frac{1}{\lambda }+\beta \right) \text{ prox }_{\lambda f}(y), \text{ prox }_{\lambda f}(x)-\text{ prox }_{\lambda f}(y)\right\rangle \nonumber \\&\quad \, \ge \frac{1}{L}\left\| \frac{1}{\lambda }x-\left( \frac{1}{\lambda }+\beta \right) \text{ prox }_{\lambda f}(x)-\frac{1}{\lambda }y+\left( \frac{1}{\lambda }+\beta \right) \text{ prox }_{\lambda f}(y)\right\| ^{2}, \end{aligned}$$

(3.13)

Notice that, the left-hand side of (3.13) can be rewritten as

$$\begin{aligned}&\frac{1}{\lambda }\cdot \left\langle x-y, \text{ prox }_{\lambda f}(x)-\text{ prox }_{\lambda f}(y)\right\rangle \nonumber \\&\quad -\,\frac{1+\lambda \beta }{\lambda }\cdot \left\| \text{ prox }_{\lambda f}(x)-\text{ prox }_{\lambda f}(y)\right\| ^{2}, \end{aligned}$$

(3.14)

while the right-hand side of (3.13) can be rewritten as

$$\begin{aligned}&\frac{1}{L\lambda ^{2}}\cdot \Vert x-y\Vert ^{2}-\frac{2+2\lambda \beta }{L\lambda ^{2}}\cdot \langle x-y, \text{ prox }_{\lambda f}(x)\nonumber \\&\quad -\,\text{ prox }_{\lambda f}(y)\rangle +\frac{(1+\lambda \beta )^{2}}{L\lambda ^{2}}\cdot \Vert \text{ prox }_{\lambda f}(x)-\text{ prox }_{\lambda f}(y)\Vert ^{2}. \end{aligned}$$

(3.15)

Substituting (3.14) and (3.15) into (3.13) and rearrange terms, it follows that

$$\begin{aligned}&\left\langle x-y, \text{ prox }_{\lambda f}(x)-\text{ prox }_{\lambda f}(y)\right\rangle \nonumber \\&\quad \ge \, \frac{1}{(2\beta +L)\lambda +2}\cdot \Vert x-y\Vert ^{2}\nonumber \\&\quad +\,\frac{(1+\lambda \beta )^{2}+(1+\lambda \beta )\lambda L}{(2\beta +L)\lambda +2}\cdot \left\| \text{ prox }_{\lambda f}(x)-\text{ prox }_{\lambda f}(y)\right\| ^{2}. \end{aligned}$$

(3.16)

Since

$$\begin{aligned}&\Vert R_{\lambda f}(x)-R_{\lambda f}(y)\Vert ^{2} \nonumber \\&\quad =\,\Vert x-y\Vert ^{2}-4\left\langle x-y, \text{ prox }_{\lambda f}(x)-\text{ prox }_{\lambda f}(y)\right\rangle \nonumber \\&\qquad +\, 4\left\| \text{ prox }_{\lambda f}(x)-\text{ prox }_{\lambda f}(y)\right\| ^{2}. \end{aligned}$$

(3.17)

Substituting (3.16) into (3.17), we get that

$$\begin{aligned}&\Vert R_{\lambda f}(x)-R_{\lambda f}(y)\Vert ^{2} \nonumber \\&\quad \le \, \left( 1-\frac{4}{(2\beta +L)\lambda +2}\right) \cdot \Vert x-y\Vert ^{2}\nonumber \\&\quad +\,\left( 4-\frac{4(1+\lambda \beta )^{2}+4(1+\lambda \beta )\lambda L}{(2\beta +L)\lambda +2}\right) \cdot \left\| \text{ prox }_{\lambda f}(x) -\text{ prox }_{\lambda f}(y)\right\| ^{2} \nonumber \\&\quad =\,\frac{(2\beta +L)\lambda -2}{(2\beta +L)\lambda +2}\cdot \left\| x-y\right\| ^{2} +\frac{4-4\beta ^{2}\lambda ^{2}-4\beta L\lambda ^{2}}{(2\beta +L)\lambda +2}\cdot \left\| \text{ prox }_{\lambda f}(x)-\text{ prox }_{\lambda f}(y)\right\| ^{2}.\nonumber \\ \end{aligned}$$

(3.18)

Recall that, $\text{ prox }_{\lambda f}(x)=\text{ prox }_{\frac{\lambda }{1+\lambda \beta }\tilde{f}}\left( \frac{1}{1+\lambda \beta }\cdot x\right) $. According to Lemma 2.2, $\text{ prox }_{\frac{\lambda }{1+\lambda \beta }\tilde{f}}$ is firmly nonexpansive and hence nonexpansive, then we have

$$\begin{aligned} \Vert \text{ prox }_{\lambda f}(x)-\text{ prox }_{\lambda f}(y)\Vert= & {} \left\| \text{ prox }_{\frac{\lambda }{1+\lambda \beta }\tilde{f}}\left( \frac{1}{1+\lambda \beta }\cdot x\right) -\text{ prox }_{\frac{\lambda }{1+\lambda \beta }\tilde{f}}\left( \frac{1}{1+\lambda \beta }\cdot y\right) \right\| \nonumber \\\le & {} \frac{1}{1+\lambda \beta }\cdot \Vert x-y\Vert . \end{aligned}$$

(3.19)

Substituting (3.19) back into (3.18), we get

$$\begin{aligned} \Vert R_{\lambda f}(x)-R_{\lambda f}(y)\Vert ^{2}\le & {} \frac{(2\beta +L)\lambda -2}{(2\beta +L)\lambda +2}\cdot \left\| x-y\right\| ^{2}\\&+\,\frac{4-4\beta ^{2}\lambda ^{2}-4\beta L\lambda ^{2}}{(2\beta +L)\lambda +2}\cdot \frac{1}{(1+\lambda \beta )^{2}}\cdot \Vert x-y\Vert ^{2} \\= & {} \left( \frac{(2\beta +L)\lambda -2}{(2\beta +L)\lambda +2}+\frac{4-4\beta ^{2}\lambda ^{2}-4\beta L\lambda ^{2}}{((2\beta +L)\lambda +2)(1+\lambda \beta )^{2}}\right) \cdot \Vert x-y\Vert ^{2} \\= & {} \frac{((2\beta +L)\lambda -2)(1+\lambda \beta )^{2}+4-4\beta ^{2}\lambda ^{2}-4\beta L\lambda ^{2}}{(2(1+\lambda \beta )+\lambda L)(1+\lambda \beta )^{2}}\cdot \Vert x-y\Vert ^{2} \\= & {} \frac{(2\beta +L)\beta ^{2}\lambda ^{3}-(2\beta ^{2}+2\beta L)\lambda ^{2}+(L-2\beta )\lambda +2}{((2\beta +L)\lambda +2)(1+\lambda \beta )^{2}}\cdot \Vert x-y\Vert ^{2} \\= & {} \frac{(1-\lambda \beta )^{2}}{(1+\lambda \beta )^{2}}\cdot \Vert x-y\Vert ^{2}, \end{aligned}$$

where the last equality follows from the following observation

$$\begin{aligned}&(2\beta +L)\beta ^{2}\lambda ^{3}-(2\beta ^{2}+2\beta L)\lambda ^{2}+(L-2\beta )\lambda +2\\&\quad =((2\beta +L)\lambda +2)(1-\lambda \beta )^{2}. \end{aligned}$$

The proof is complete. $\square $

Next, we estimate $\Vert R_{\lambda g}(x)-R_{\lambda g}(y)\Vert $ for the hypoconvex function g. The following lemma was mentioned in [5]; for completeness, here we give a slightly simple proof.

Lemma 3.2

Let $g: \mathcal {R}^n\rightarrow \mathcal {R}\cup \{+\,\infty \}$ be a proper lower semicontinuous hypoconvex function with constant $\beta >0$. Then for any $x,y\in \mathcal {R}^n$ and $0<\lambda <1/\beta $, we have

$$\begin{aligned} \Vert R_{\lambda g}(x)-R_{\lambda g}(y)\Vert \le \frac{1+\lambda \beta }{1-\lambda \beta }\cdot \Vert x-y\Vert . \end{aligned}$$

(3.20)

Proof

Actually, for any $0<\lambda <1/\beta $, it follows from [13, Theorem 4.2] that

$$\begin{aligned} \Vert R_{\lambda g}(x)-R_{\lambda g}(y)\Vert ^{2} \le \Vert x-y\Vert ^{2}+4\beta \lambda \Vert \text{ prox }_{\lambda g}(x)-\text{ prox }_{\lambda g}(y)\Vert ^{2}. \end{aligned}$$

(3.21)

Recall the definition of $\tilde{g}$ in (3.1), similar to (3.10), for any $0<\lambda <1/\beta $, we can get

$$\begin{aligned} \text{ prox }_{\lambda g}(x)=\text{ prox }_{\frac{\lambda }{1-\lambda \beta }\tilde{g}}\left( \frac{1}{1-\lambda \beta }\cdot x\right) . \end{aligned}$$

With the help of the convexity of $\tilde{g}$, then it follows from Lemma 2.2 that $\text{ prox }_{\frac{\lambda }{1-\lambda \beta }\tilde{g}}$ is firmly nonexpansive and hence nonexpansive. Thus, $\text{ prox }_{\frac{\lambda }{1-\lambda \beta }\tilde{g}}\left( \frac{1}{1-\lambda \beta }I\right) $ is Lipschitz continuous with constant $\frac{1}{1-\lambda \beta }$, so is $\text{ prox }_{\lambda g}$, i.e.,

$$\begin{aligned} \Vert \text{ prox }_{\lambda g}(x)-\text{ prox }_{\lambda g}(y)\Vert \le \frac{1}{1-\beta \lambda }\cdot \Vert x-y\Vert . \end{aligned}$$

(3.22)

Substituting (3.22) into (3.21), it follows that (3.20) holds. The proof is complete. $\square $

By virtue of Lemmas 3.1 and 3.2, we can prove the convergence of the GIDRSM (1.4) immediately. To this end, we make some assumptions on the parameters $\{\alpha _{k}\}_{k\in N}$, $\{\beta _{k}\}_{k\in N}$ and $\{e_{k}\}_{k\in N}$.

Assumption 3.2

Assume that $\{\alpha _{k}\}_{k\in N}$ and $\{\beta _{k}\}_{k\in N}$ are real sequence in [0, 1] such that $\alpha _{k}+\beta _{k}\le 1$ for all $k\ge 1$ and the following conditions hold: (a) $\sum _{k=1}^{\infty }\alpha _{k}\beta _{k}=\,\infty $; (b) $\sum _{k=1}^{\infty }\beta _{k}\Vert e_{k}\Vert <\,\infty $; and (c) $\sum _{k=1}^{\infty }(1-\alpha _{k}-\beta _{k})<\,\infty $.

Now we are in the position to present the main result of this paper, i.e., the convergence of the GIDRSM (1.4) under suitable conditions.

Theorem 3.1

Let $\{z_{k}\}_{k\in N}$ be a sequence generated by the GIDRSM (1.4) and suppose that Assumptions 3.1 and 3.2 hold and $0<\lambda <1/\beta $. Then $\{z_{k}\}_{k\in N}$ converges to a fixed point of $R_{\lambda f}R_{\lambda g}$. Moreover, $\{\text{ prox }_{\lambda g}(z_{k})\}_{k\in N}$ converges to a solution of problem (1.1).

Proof

By Proposition 3.1 we know that $\text{ Fix }(R_{\lambda f}R_{\lambda g})\ne \emptyset $. For any $x,y\in \mathcal {R}^{n}$ and $0<\lambda <1/\beta $, it follows from Lemmas 3.1 and 3.2 that

$$\begin{aligned} \left\| R_{\lambda f}R_{\lambda g}(x)-R_{\lambda f}R_{\lambda g}(y)\right\|\le & {} \frac{1-\lambda \beta }{1+\lambda \beta }\cdot \Vert R_{\lambda g}(x)-R_{\lambda g}(y)\Vert \\\le & {} \frac{1-\lambda \beta }{1+\lambda \beta }\cdot \frac{1+\lambda \beta }{1-\lambda \beta }\cdot \Vert x-y\Vert \\= & {} \Vert x-y\Vert . \end{aligned}$$

This means $R_{\lambda f}R_{\lambda g}$ is nonexpansive. Thus, by letting $M:=R_{\lambda f}R_{\lambda g}$ and $r_{k}:=\beta _{k}e_{k}$ in Lemma 2.1, it follows from Assumption 3.2 that the sequence $\{z_{k}\}$ generated by the GIDRSM (1.4) converges to a point in $\text{ Fix }(R_{\lambda f}R_{\lambda g})$. Without loss of generality, we assume $\{z_{k}\}_{k\in N}$ converges to $z^{*}\in \text{ Fix }(R_{\lambda f}R_{\lambda g})$. Set $x:=z_{k}$ and $y:=z^{*}$ in (3.22), we have $\text{ prox }_{\lambda g}(z_{k})\rightarrow \text{ prox }_{\lambda g}(z^{*})$. Furthermore, $ \text{ prox }_{\lambda g}(z^{*})$ is a solution of problem (1.1) in view of Proposition 3.1. The proof is complete.

Remark 3.2

To ensure the convergence, the authors in [5] assume $0<\lambda \le 1/\sqrt{\beta L}$ while we only assume $0<\lambda <1/\beta $. Since $\beta \le L$ (see the proof in Lemma 3.1), it holds that $1/\sqrt{L\beta }\le 1/\beta $. Thus, we obtained the same global convergence result as in [5], not only with weaker assumption on the strong convex function f, but also with bigger range of the proximal parameter $\lambda $.

Remark 3.3

Since g is hypoconvex, there is a fundamental difference between $R_{\lambda f}$ and $R_{\lambda g}$: $R_{\lambda f}$ is contraction while $R_{\lambda g}$ is expansive. Thus, we can also consider the convergence of the following scheme that swaps the composition of $R_{\lambda f}$ and $R_{\lambda g}$ in (1.4):

$$\begin{aligned} z_{k+1}:=\alpha _{k}z_{k}+\beta _{k} (R_{\lambda g}R_{\lambda f}(z_{k})+e_{k}). \end{aligned}$$

(3.23)

Similar to $R_{\lambda f}R_{\lambda g}$, we can also show $R_{\lambda g}R_{\lambda f}$ is a nonexpansive mapping by means of Lemmas 3.1 and 3.2. Thus, the convergence of the GIDRSM (3.23) is analogous to the GIDRSM (1.4). For succinctness, we skip the details.

As an immediate consequence of Theorem 3.1, we obtain the following corollary for the DRSM scheme (1.3).

Corollary 3.1

Let $\{z_{k}\}_{k\in N}$ be a sequence generated by the DRSM (1.3) and suppose that Assumption 3.1 holds and $0<\lambda <1/\beta $. Then $\{z_{k}\}_{k\in N}$ converges to a fixed point of $R_{\lambda f}R_{\lambda g}$. Moreover, $\{\text{ prox }_{\lambda g}(z_{k})\}_{k\in N}$ converges to a solution of problem (1.1).

4 Conclusions

In this paper, we analyzed the general inexact Douglas–Rachford splitting method (GIDRSM) for the minimization of the sum of a strongly convex function and a hypoconvex function. We focused on the case that the sum of the two functions is convex ($\beta =\omega $). Under some relatively mild assumptions compared with some existing work in the literature, we proved its global convergence.

Note that for the case that the sum is strongly convex ($\beta >\omega $), the convergence of DRSM was recently established without any differentiable assumption on the functions [13]. Hence, it is natural to ask if the convergence holds in the absence of the differentiable assumption, or give a counter example, if it does not. We leave this as one of our future research topics.

References

Baillon, J.B., Haddad, G.: Quelques propriétés des opérateurs angle-bornés et $n$-cycliquement monotones. Isr. J. Math. 26, 137–150 (1977)
Article Google Scholar
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, Berlin (2011)
Book Google Scholar
Bauschke, H.H., Hare, W.L., Moursi, W.M.: On the range of the Douglas–Rachford operator. Math. Oper. Res. 41, 884–897 (2016)
Article MathSciNet Google Scholar
Bauschke, H.H., Koch, V.R., Phan, H.M.: Stadium norm and Douglas–Rachford splitting: a new approach to road design optimization. Oper. Res. 64, 201–218 (2016)
Article MathSciNet Google Scholar
Bayram, İ., Selesnick, I.W.: The Douglas–Rachford algorithm for weakly convex penalties. arXiv:1511.03920v1 (2015)
Beck, A., Teboulle, M.: Gradient-based algorithms with applications to signal recovery problems. In: Palomar, D., Eldar, Y. (eds.) Convex Optimization in Signal Processing and Communications, pp. 139–162. Cambridge University Press, Cambridge (2009)
Google Scholar
Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362, 3319–3363 (2010)
Article Google Scholar
Cannarsa, P., Sinestrari, C.: Semiconcave Functions, Hamilton–Jacobi Equations, and Optimal Control, vol. 58. Springer, Berlin (2004)
MATH Google Scholar
Degiovanni, M., Marino, A., Tosques, M.: Evolution equations with lack of convexity. Nonlinear Anal. 9, 1401–1443 (1985)
Article MathSciNet Google Scholar
Douglas, J., Rachford, H.H.: On the numerical solution of heat conduction problems in two or three space variables. Trans. Am. Math. Soc. 82, 421–439 (1956)
Article MathSciNet Google Scholar
Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992)
Article MathSciNet Google Scholar
Fukushima, M.: The primal Douglas–Rachford splitting algorithm for a class of monotone mappings with application to the traffic equilibrium problem. Math. Program. 72, 1–15 (1996)
MathSciNet MATH Google Scholar
Guo, K., Han, D.R., Yuan, X.M.: Convergence analysis of Douglas–Rachford splitting method for “strongly + weakly” convex programming. SIAM J. Numer. Anal. 55(4), 1549–1577 (2017)
Article MathSciNet Google Scholar
He, B.S., Yuan, X.M.: On the convergence rate of the Douglas–Rachford operator splitting method. Math. Program. 153, 715–722 (2015)
Article MathSciNet Google Scholar
Kanzow, C., Shehu, Y.: Generalized Krasnosel’skiĭ-Mann-type iterations for nonexpansive mappings in Hilbert spaces. Comput. Optim. Appl. 67(3), 595–620 (2017)
Article MathSciNet Google Scholar
Krasnosel’skiĭ, M.A.: Two remarks on the method of successive approximations. Uspehi Mat. Nauk. 10, 123–127 (1955)
MathSciNet Google Scholar
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16, 964–979 (1979)
Article MathSciNet Google Scholar
Marcellin, S., Thibault, L.: Evolution problems associated with primal lower nice functions. J. Convex Anal. 13, 385–421 (2006)
MathSciNet MATH Google Scholar
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (2015)
Google Scholar
Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis. Springer, Berlin (1998)
Book Google Scholar
Wang, X.F.: On Chebyshev functions and Klee functions. J. Math. Anal. Appl. 368, 293–310 (2010)
Article MathSciNet Google Scholar
Wen, B., Chen, X.J., Pong, T.K.: Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems. SIAM J. Optim. 27, 124–145 (2017)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematics and Information, China West Normal University, Nanchong, 637002, Sichuan, China
Ke Guo
School of Mathematical Sciences and Key Laboratory for NSLSCS of Jiangsu Province, Nanjing Normal University, Nanjing, 210023, Jiangsu, China
Deren Han

Authors

Ke Guo
View author publications
You can also search for this author in PubMed Google Scholar
Deren Han
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deren Han.

Additional information

K. Guo was supported by the Natural Science Foundation of China (Grant No. 11571178), Fundamental Research Funds of China West Normal University (Grant No. 412698). D. Han was supported by a project funded by PAPD of Jiangsu Higher Education Institutions and the Natural Science Foundation of China (Grant Nos. 11625105, 11371197 and 11431002).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, K., Han, D. A note on the Douglas–Rachford splitting method for optimization problems involving hypoconvex functions. J Glob Optim 72, 431–441 (2018). https://doi.org/10.1007/s10898-018-0660-z

Download citation

Received: 22 June 2017
Accepted: 04 May 2018
Published: 11 May 2018
Issue Date: November 2018
DOI: https://doi.org/10.1007/s10898-018-0660-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A note on the Douglas–Rachford splitting method for optimization problems involving hypoconvex functions

Abstract

Similar content being viewed by others

Peaceman–Rachford splitting for a class of nonconvex optimization problems

The forward–backward splitting method for non-Lipschitz continuous minimization problems in Banach spaces

Douglas–Rachford splitting for nonconvex optimization with application to nonconvex feasibility problems

1 Introduction

2 Preliminaries

Definition 2.1

Lemma 2.1

Definition 2.2

Lemma 2.2

Definition 2.3

Remark 2.1

Definition 2.4

Remark 2.2

Lemma 2.3

Lemma 2.4

3 Convergence Analysis

Assumption 3.1

Remark 3.1

Proposition 3.1

Proof

Lemma 3.1

Proof

Lemma 3.2

Proof

Assumption 3.2

Theorem 3.1

Proof

Remark 3.2

Remark 3.3

Corollary 3.1

4 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation