Elsevier

Applied Mathematics and Computation

Volume 333, 15 September 2018, Pages 76-94
Applied Mathematics and Computation

A modified primal-dual method with applications to some sparse recovery problems

https://doi.org/10.1016/j.amc.2018.03.089Get rights and content

Highlights

  • A convex composite optimization which includes sparse recovery problems is studied.

  • A modified Chambolle–Pock primal-dual method (MCPPDM) is proposed.

  • MCPPDM only needs little additional computation cost.

  • MCPPDM is proved to be globally convergent.

  • Numerical experiments show the efficiency of the proposed method.

Abstract

In this paper, we first present a modified Chambolle–Pock primal-dual method (MCPPDM) to solve a convex composite optimization problem which minimizes the sum of two convex functions with one composed by a linear operator. It is well known that the Chambolle–Pock primal-dual method (CPPDM) with the combination parameter being 1 is an application of the proximal point algorithm and thus is convergent, however, when the combination parameter is not 1, the method may be not convergent. To choose flexibly the combination parameter, we develop a slightly modified version with little additional computation cost. In CPPDM, one variable is updated twice but another variable is updated only once at each iteration. However, in the modified version, two variables are respectively updated twice at each iteration. Another main task of this paper is that we reformulate some well-known sparse recovery problems as special cases of the convex composite optimization problem and then apply MCPPDM to address these sparse recovery problems. A large number of numerical experiments have demonstrated that the efficiency of the proposed method is generally comparable or superior to that of existing well-known methods such as the linearized alternating direction method of multipliers and the graph projection splitting algorithm in terms of solution quality and run time.

Introduction

In this paper, we consider the convex composite optimization problemminimizef(x)+g(Ax),where f:XR{+} and g:YR{+} are proper, lower semicontinuous, convex functions defined respectively on finite-dimensional Hilbert spaces X and Y with the corresponding inner product ⟨·,·⟩ and the norm ·=·,·, A:XY is a linear operator with the adjoint A* and the induced norm ‖A‖. The input variable x can be a vector, matrix, or an element from a composite Hilbert space. By introducing an auxiliary variable yY, we can rewrite the problem (1) as the formminimizef(x)+g(y)subjecttoy=Ax.Parikh and Boyd in [1] first called the problem form (2) graph form since the variables x and y are constrained to lie in the graph {(x,y)X×Y:y=Ax} of the linear operator A. The problem (1) or graph form (2) includes many popular convex optimization problems such as linear and quadratic programming, general cone programming [3], and captures a wide variety of specific applications such as the lasso problem, logistic regression, support vector machine in statistics and machine learning [4], basis pursuit problem in signal processing [12], total-variation minimization problems in image processing [29], radiation treatment planning [5] and portfolio optimization [6], and so on. We invite the reader to consult [1], [2] for more applications. In this paper, we will focus on three classes of more attractive sparse recovery problems.

Compressive sensing. The fundamental problem in compressive sensing [8], [9], [10], [11] is to recover a high-dimensional sparse or approximately signal x¯Rn from very few nonadaptive linear and noisy measurements bRm, that is,b=Ψx¯+e,where ΨRm×n(mn) is a sensing matrix and eRm is a noise vector. The theory of compressive sensing has indicated that one can recover accurately and efficiently the original sparse signal via solving the quadratically constrained Basis Pursuit (BPε) problemminimizex1subjecttoΨxb2ɛ,where ε is an estimated upper bound of the noise level. In the noiseless case (e=0), we can set ɛ=0 and then obtain the well-known Basis Pursuit (BP) problem [12]minimizex1subjecttob=Ψx.

Another more theoretically effective estimator for recovering sparse signals from noisy measurements is the Dantzig selector (DS) introduced in [13], which is the solution to the convex problemminimizex1subjecttoD˜1ΨT(Ψxb)γ,where γ is a scalar related to the noise level, || · || is the ℓ norm and D˜ is the diagonal matrix whose diagonal entries are the ℓ2 norms of the columns of Ψ.

Low-rank matrix completion. Some applications such as the well-known Netflix problem [14], collaborative filtering [16], system identification [15], [18], global positioning [17], can be expressed as a low-rank matrix completion problem consisting of recovering an unknown low-rank matrix MRm×n from a give subset Ω of observed entries. The problem can be solved via the convex optimizationminimizeX*subjecttoXij=Mij,(i,j)Ω,where ‖ · ‖* is the nuclear norm of a matrix. By introducing the projection operator PΩ:Rm×nRm×n defined as[PΩX]ij={Xij,if(i,j)Ω,0,if(i,j)Ω,and the resulting matrix MΩ=PΩM, we can reformulate the problem (6) asminimizeX*subjecttoPΩX=MΩ.

Robust principal component analysis. Compared to classical PCA capturing the low-rank structure of the data matrix corrupted by random gaussian noises, robust principal component analysis (RPCA) introduced by Candès et al. in [19] aims to capture low-rank structure of the data matrix including sparse gross errors. In [19], the authors have claimed that RPCA can be cast as the following convex optimizationminimizeL*+νS1subjecttoD=L+S,where S1=i,j|(S)ij|, DRm×n is a data matrix and the parameter ν > 0 is used to balance the weights of rank and sparsity.

Due to wide applicability of the problem (1) in different scientific research fields, it is very key to develop fast and efficient algorithms for solving (1). Many different algorithms have been designed for (1) based on properties of f and g or the composite function gA. Specifically, we distinguish three cases for these algorithms. In the first case where the function f is differentiable with a Lipschitz continuous gradient and the composite function gA has inexpensive proximity operator, the proximal forward-backward algorithm [20] and its fast version based on Nesterov’s acceleration technique [21] are most widely used to solve (1). In the second case where f is differentiable with a Lipschitz continuous gradient and g rather than gA has inexpensive proximity operator, a class of primal-dual proximity-gradient algorithms such as the primal-dual fixed-point algorithm [22] used to handle image restoration are very proper for the general problem (1). At last, in recent years, primal-dual splitting algorithms we mainly consider in this paper have been well developed to solve (1) where f and g may be non-smooth but have inexpensive proximal operators. We discuss some state-of-the-art primal-dual splitting algorithms.

The alternating direction method of multipliers (ADMM) [23] is a fundamental and important method for solving a class of linearly constrained separable convex optimization problems including (2) as a special case. It is well known that ADMM is the dual application of the Douglas–Rachford operator splitting method [24], [25]. Let the augmented Lagrangian function of the graph form (2) beL(x,y,λ)=f(x)+g(y)+λ,Axy+β2Axy2,where λ is the dual variable corresponding to the constraint Ax=y, β > 0 is a penalty parameter. We can directly apply ADMM to the graph form (2), which yields the iterative scheme{xk+1=argminxXL(x,yk,λk)=argminxX{f(x)+β2Axyk+λk/β2},yk+1=argminyYL(xk+1,y,λk)=argminyY{g(y)+β2Axk+1y+λk/β2},λk+1=λk+β(Axk+1yk+1),Obviously, the y-related subproblem amounts toyk+1=proxg,βI(Axk+1+λk/β),where proxg, βI is the proximity operator of g. However, the x-related subproblem is a hard optimization problem and in general, has no closed form solution. To further improve the efficiency of ADMM, we often linearize the quadratic term β2Axyk+λk/β2, that is,β2Axyk+λk/β2β2Axkyk+λk/β2+gk,xxk+α2xxk2,where gk=βA*(Axkyk+λk/β) is the gradient of the quadratic term β2Axyk+λk/β2 at x=xk, α > 0 is a proximal parameter. Hence, we can get the approximate x-related subproblemargminxX{f(x)+gk,xxk+α2xxk2},Let xk+1 be the solution to the problem (10), that is,xk+1=proxf,αI(xkα1A*(λk+β(Axkyk))).Thus, the iterative scheme of the linearized alternating direction method of multipliers (LADMM) is{xk+1=proxf,αI(xkα1A*(λk+β(Axkyk))),yk+1=proxg,βI(Axk+1+λk/β),λk+1=λk+β(Axk+1yk+1).

In recent years, ADMM and LADMM have become popular and efficient methods for handling various applications including the constrained ℓ1 minimization in compressive sensing, and low-rank matrix recovery problems. We invite the reader to consult [26], [27], [28].

In addition, Parikh and Boyd in [1], [7] also considered the generic constrained convex optimization problemminimizeφ(z)subjecttozC,where φ is a proper convex function and C is closed nonempty convex. By introducing an auxiliary variable w, This problem (12) can be rewritten asminimizeφ(z)+ιC(w)subjecttozw=0,where ιC is the indicator function of the convex set C, i.e., ιC(w)=0, for wC, ιC(w)=+, for wC.

The augmented Lagrangian function of the problem (13) isL(z,w,λ)=φ(z)+ιC(w)+λ,zw+β2zw2.Parikh and Boyd then applied ADMM to (13) and obtained the iterative scheme{zk+1=argminzL(z,wk,λk)=proxφ,βI(wkλk/β),wk+1=argminwL(zk+1,w,λk)=ΠC(zk+1+λk/β),λk+1=λk+β(zk+1wk+1),where ΠC denotes the projection onto the convex set C. By introducing the scaled dual variable λ¯=λ/β, Boyd et al. in [7] express the above iterative scheme (14) as{zk+1=argminzL(z,wk,λk)=proxφ,βI(wkλ¯k),wk+1=argminwL(zk+1,w,λk)=ΠC(zk+1+λ¯k),λ¯k+1=λ¯k+(zk+1wk+1),which is also called the scaled form of the iterative scheme (14).

Applying the iterative scheme (15) to (13), with z=(x,y), w=(x˜,y˜) and φ(z)=f(x)+g(y), the set C being the graph of the linear operator A, yields the so-called graph projection splitting algorithm in which we denote by ΠA the projection onto the graph of the linear operator A. Parikh and Boyd in [1] have applied the graph projection splitting algorithm to solve some cone programming problems, parameter estimation problems in statistics and machine learning, and intensity modulated radiation treatment planning, and have shown its efficiency. However, this algorithm at each iteration needs to evaluate the graph projection which is equivalent to solving a system of linear equations. In fact, based on the work of [1], we can evaluate the graph projection ΠA viaΠA(x,y)=(IA*AI)1(IA*00)(xy)In general, solving the graph projection (16) is very time-consuming. If setting (x˜,y˜)=ΠA(x,y), we can further evaluate (16) as follows:{y˜=(I+AA*)1(Ax+AA*y),x˜=x+A*(yy˜),or{x˜=(I+A*A)1(x+A*y),y˜=Ax˜.

Recently, O’Connor and Vandenberghe in [29] have also applied the Douglas–Rachford operator splitting method to systems of primal-dual optimality conditions for the problem (1) and developed a class of primal-dual decomposition algorithms to handle image de-blurring. In these algorithms, one also has to solve a system of linear equations as in the graph projection splitting algorithm. In addition, Briceño-Arias and Combettes [30] have also developed a new primal-dual splitting method based on the forward-backward-forward algorithm proposed in [31], to solve the generic problem of finding a zero of the sum of a maximally monotone operator and a linear skew-adjoint operator which includes the problem (1) as a special case. Since each iteration of the new splitting method in [30] needs to perform two forward steps on the linear skew-adjoint operator, more operators such as matrix-vector products or matrix-matrix products are generated when the new splitting method is applied to solve (1).

We now focus our attention on the well-known Chambolle–Pock primal-dual method (CPPDM) [33]. For the problem (1), we can have its primal-dual formulation, that is, the following saddle-point problemminxXmaxyYΦ(x,y):=f(x)+Ax,yg*(y),where g*:YR{+} is the convex conjugate of g. To solve the saddle-point problem (19), Chambolle and Pock studied a new primal-dual method whose iterative scheme is{xk+1=argminxX{Φ(x,yk)+α2xxk2},x¯k=xk+1+θ(xk+1xk),yk+1=argmaxyY{Φ(x¯k,y)β2yyk2},where the parameter θ ∈ [0, 1], α > 0 and β > 0 are two proximal parameters. The main idea of the scheme (20) is to split the coupled term ⟨Ax, y⟩ in (19) and then make the resulting subproblems solvable. The greatest advantage of the scheme (20) is that it does not involve solving a system of linear equations if the proximity operators of f and g* can have closed-form solutions or can be solved efficiently with high precision, which occurs in many applications. In the literature, the scheme (20) with θ=0 is the classical Arrow–Hurwicz method [34] and is also called the primal-dual hybrid gradient method (PDHG) [35] in image processing. Note that the combination of xk+1 and xk is very important for numerical performance and theoretical analysis of the algorithm. In fact, He et al. [40] have shown by a very simple example that PDHG is not necessarily convergent. Chambolle and Pock have proved the convergence of the scheme (20) with θ=1 when the step size 1α in the primal step and the step size 1β in the dual step satisfy the condition1αβ<1A2.The condition (21) avoids using very small step sizes which leads to slow convergence rate. Further, He et al. [37] have found that the scheme (20) with θ=1 is a special application of the proximal point algorithm. In addition, very recently, the work of [36] has also pointed out that LADMM is equivalent to the scheme (20) with θ=1. However, the convergence of the scheme (20) with θ ∈ (0, 1) may be not guaranteed. To handle the case, some modified Chambolle–Pock primal-dual methods have been developed in recent years. With some restrictions on the parameters α and β, these modified algorithms can enlarge the rang of the parameter θ such as θ[1,1] in [37], [39] and θ being arbitrary in [38], however, these algorithms usually require much more extra computation cost (more matrix-vector or matrix-matrix products) and thus converges slowly in practice. In this paper, we propose a slightly modified version with little additional computation cost. Our idea is based on the observation that the scheme (20) at each iteration updates the variable x twice but updates the variable y only once. In the proposed modified version, the variable y is also updated twice at each iteration. Specifically, the iterative scheme of the modified Chambolle–Pock primal-dual method (MCPPDM) is{xk+1=argminxX{Φ(x,yk)+α2xxk2},x¯k=xk+1+θ(xk+1xk),y¯k=argmaxyY{Φ(x¯k,y)β2yyk2},yk+1=ρyk+(1ρ)y¯k.Obviously, when ρ=0, the scheme (22) reduces to (20).

We list some contributions of this paper. First, we show how to select proper parameters involved in the scheme (22) so that the scheme (22) could be convergent. Second, the proposed method, the well-known LADMM as well as the graph projection splitting algorithm are applied to sparse recovery problems mentioned above. We also note that the recent work [32] has studies an algorithmic framework which includes the proposed algorithms as a special case, however, we provide a different convergence analysis and focus on verifying the efficiency of our algorithm applied to sparse recovery problems.

The rest of the paper is organized as follows: In Section 2, some preliminaries are reviewed. In Section 3, the convergence of the proposed algorithm are analyzed. Numerical results on the performance of the proposed method for solving sparse recovery problems are shown in Section 4. Some conclusions are given in Section 5.

Section snippets

Preliminaries

In this section, we recall some key notions including the proximity operator, subdifferential and conjugate function of a convex function, and characterize solutions of the problem (1) or (19) with respect to coupled fixed-point equations via proximity operators.

We first recall the notion of the proximity operator of a convex function. Let H denote a finite-dimensional Hilbert space with the inner product ⟨·,·⟩ and the norm ·=·,·. For a self-adjoint positive-definite linear operator S in H,

Convergence analysis

In this section, the convergence of the proposed algorithmic scheme (22) is analysed and main convergence results are presented.

For simplicity, we can write the Eqs. (27) and (28) as a simple fixed-point formulation. To the end, We first introduce an operator H from X×YX×Y, defined at a given point u=(x,y)X×Y byH(u)=(proxf,αI(x),proxg*,βI(y)).For the operator H, we have the following Lemma 3.1.

Lemma 3.1

The operator H is defined by (29). Then, H=proxh,P, where the functionh(u)=f(x)+g*(y),and P is the

Numerical results

In this section, we present a series of experiments to verify the efficiency of the proposed method (MCPPDM) (22) for solving three classes of sparse recovery problems mentioned above and compare the numerical performance of it with that of LADMM (11) and that of the graph projection splitting algorithm (GPSA for short). Note that we do not compare the proposed method with these algorithms developed in [38], [39] because these algorithms modified Chambolle–Pock primal-dual methods with more

Conclusions

In this paper, we proposed a modified Chambolle–Pock primal-dual method (MCPPDM) for solving a convex composite optimization problem which covers many well-known applications. The modified method with little additional computation cost allows to choose flexibly the combination parameter in the original Chambolle–Pock primal-dual method. We then focused on some well-known sparse recovery problems and reformulated them as the convex composite optimization problem. A series of experiments on

Acknowledgments

The authors would like to thank the anonymous reviewers and the associate editor and editor-in chief for their valuable comments and suggestions, which improve the presentation of this paper. This research was supported by the National Natural Science Foundation of China under the Grant nos. 11771347, 91730306, 41390454, and 11271297, and supported by the Nanhu Scholars Program for Young Scholars of XYNU.

References (45)

  • N. Parikh et al.

    Block splitting for distributed optimization

    Math. Program. Comput.

    (2014)
  • C. Fougne et al.

    Parameter selection and pre-conditioning for a graph form solver

    (2015)
  • S. Boyd, L. Vandenberghe, Convex Optimization, Cambridge University Press,...
  • T. Hastie, R. Tibshirani, T. Friedman, The Elements of Statistical Learning, Springer,...
  • A. Olafsson et al.

    Efficient schemes for robust IMRT treatment planning

    Phys. Med. Biol.

    (2006)
  • S. Boyd et al.

    Performance bounds and suboptimal policies for multi-period investment

    Found. Trends Optim.

    (2013)
  • S. Boyd et al.

    Distributed optimization and statistical learning via the alternating direction method of multipliers

    Found. Trends Mach. Learn.

    (2011)
  • E. Candès et al.

    An introduction to compressive sampling

    IEEE Signal Process. Mag.

    (2008)
  • E. Candès et al.

    Stable signal recovery from incomplete and inaccurate measurements

    Commun. Pure Appl. Math.

    (2006)
  • E. Candès et al.

    Near optimal signal recovery from random projections: universal encoding strategies

    IEEE Trans. Inf. Theory

    (2006)
  • E. Candès et al.

    Decoding by linear programming

    IEEE Trans. Inf. Theory

    (2005)
  • S. Chen et al.

    Atomic decomposition by basis pursuit

    SIAM J. Sci. Comput.

    (1998)
  • E. Candès et al.

    The dantzig selector: statistical estimation when p is much larger than n

    Ann. Stat.

    (2007)
  • E. Candès et al.

    Exact matrix completion via convex optimization

    Found. Comput. Math.

    (2009)
  • M. Fazel, Matrix rank minimization with applications, 2002, Ph.D. thesis, Stanford...
  • D. Goldberg et al.

    Using collaborative filtering to weave an information tapestry

    Commun. ACM

    (1992)
  • A. Singer

    A remark on global positioning from local distances

    Proc. Nat. Acad. Sci. U.S.A.

    (2008)
  • Z. Liu et al.

    Interior-point method for nuclear norm approximation with application to system identification

    SIAM J. Matrix Anal. Appl.

    (2009)
  • E. Candès et al.

    Robust principal component analysis?

    J. ACM JACM

    (2011)
  • P. Combettes et al.

    Signal recovery by proximal forward-backward splitting

    SIAM J. Multiscale Model. Simul.

    (2005)
  • A. Beck et al.

    A fast iterative shrinkage-thresholding algorithm for linear inverse problems

    SIAM J. Imaging Sci.

    (2009)
  • P. Chen et al.

    A primal-dual fixed point algorithm for convex separable minimization with applications to image restoration

    Inverse Probl.

    (2013)
  • Cited by (0)

    View full text