A modified primal-dual method with applications to some sparse recovery problems
Introduction
In this paper, we consider the convex composite optimization problemwhere and are proper, lower semicontinuous, convex functions defined respectively on finite-dimensional Hilbert spaces and with the corresponding inner product ⟨·,·⟩ and the norm is a linear operator with the adjoint A* and the induced norm ‖A‖. The input variable x can be a vector, matrix, or an element from a composite Hilbert space. By introducing an auxiliary variable we can rewrite the problem (1) as the formParikh and Boyd in [1] first called the problem form (2) graph form since the variables x and y are constrained to lie in the graph of the linear operator A. The problem (1) or graph form (2) includes many popular convex optimization problems such as linear and quadratic programming, general cone programming [3], and captures a wide variety of specific applications such as the lasso problem, logistic regression, support vector machine in statistics and machine learning [4], basis pursuit problem in signal processing [12], total-variation minimization problems in image processing [29], radiation treatment planning [5] and portfolio optimization [6], and so on. We invite the reader to consult [1], [2] for more applications. In this paper, we will focus on three classes of more attractive sparse recovery problems.
Compressive sensing. The fundamental problem in compressive sensing [8], [9], [10], [11] is to recover a high-dimensional sparse or approximately signal from very few nonadaptive linear and noisy measurements that is,where is a sensing matrix and is a noise vector. The theory of compressive sensing has indicated that one can recover accurately and efficiently the original sparse signal via solving the quadratically constrained Basis Pursuit (BPε) problemwhere ε is an estimated upper bound of the noise level. In the noiseless case (), we can set and then obtain the well-known Basis Pursuit (BP) problem [12]
Another more theoretically effective estimator for recovering sparse signals from noisy measurements is the Dantzig selector (DS) introduced in [13], which is the solution to the convex problemwhere γ is a scalar related to the noise level, || · ||∞ is the ℓ∞ norm and is the diagonal matrix whose diagonal entries are the ℓ2 norms of the columns of Ψ.
Low-rank matrix completion. Some applications such as the well-known Netflix problem [14], collaborative filtering [16], system identification [15], [18], global positioning [17], can be expressed as a low-rank matrix completion problem consisting of recovering an unknown low-rank matrix from a give subset Ω of observed entries. The problem can be solved via the convex optimizationwhere ‖ · ‖* is the nuclear norm of a matrix. By introducing the projection operator defined asand the resulting matrix we can reformulate the problem (6) as
Robust principal component analysis. Compared to classical PCA capturing the low-rank structure of the data matrix corrupted by random gaussian noises, robust principal component analysis (RPCA) introduced by Candès et al. in [19] aims to capture low-rank structure of the data matrix including sparse gross errors. In [19], the authors have claimed that RPCA can be cast as the following convex optimizationwhere is a data matrix and the parameter ν > 0 is used to balance the weights of rank and sparsity.
Due to wide applicability of the problem (1) in different scientific research fields, it is very key to develop fast and efficient algorithms for solving (1). Many different algorithms have been designed for (1) based on properties of f and g or the composite function g○A. Specifically, we distinguish three cases for these algorithms. In the first case where the function f is differentiable with a Lipschitz continuous gradient and the composite function g○A has inexpensive proximity operator, the proximal forward-backward algorithm [20] and its fast version based on Nesterov’s acceleration technique [21] are most widely used to solve (1). In the second case where f is differentiable with a Lipschitz continuous gradient and g rather than g○A has inexpensive proximity operator, a class of primal-dual proximity-gradient algorithms such as the primal-dual fixed-point algorithm [22] used to handle image restoration are very proper for the general problem (1). At last, in recent years, primal-dual splitting algorithms we mainly consider in this paper have been well developed to solve (1) where f and g may be non-smooth but have inexpensive proximal operators. We discuss some state-of-the-art primal-dual splitting algorithms.
The alternating direction method of multipliers (ADMM) [23] is a fundamental and important method for solving a class of linearly constrained separable convex optimization problems including (2) as a special case. It is well known that ADMM is the dual application of the Douglas–Rachford operator splitting method [24], [25]. Let the augmented Lagrangian function of the graph form (2) bewhere λ is the dual variable corresponding to the constraint β > 0 is a penalty parameter. We can directly apply ADMM to the graph form (2), which yields the iterative schemeObviously, the y-related subproblem amounts towhere proxg, βI is the proximity operator of g. However, the x-related subproblem is a hard optimization problem and in general, has no closed form solution. To further improve the efficiency of ADMM, we often linearize the quadratic term that is,where is the gradient of the quadratic term at α > 0 is a proximal parameter. Hence, we can get the approximate x-related subproblemLet be the solution to the problem (10), that is,Thus, the iterative scheme of the linearized alternating direction method of multipliers (LADMM) is
In recent years, ADMM and LADMM have become popular and efficient methods for handling various applications including the constrained ℓ1 minimization in compressive sensing, and low-rank matrix recovery problems. We invite the reader to consult [26], [27], [28].
In addition, Parikh and Boyd in [1], [7] also considered the generic constrained convex optimization problemwhere φ is a proper convex function and is closed nonempty convex. By introducing an auxiliary variable w, This problem (12) can be rewritten aswhere is the indicator function of the convex set i.e., for for .
The augmented Lagrangian function of the problem (13) isParikh and Boyd then applied ADMM to (13) and obtained the iterative schemewhere denotes the projection onto the convex set . By introducing the scaled dual variable Boyd et al. in [7] express the above iterative scheme (14) aswhich is also called the scaled form of the iterative scheme (14).
Applying the iterative scheme (15) to (13), with and the set being the graph of the linear operator A, yields the so-called graph projection splitting algorithm in which we denote by ΠA the projection onto the graph of the linear operator A. Parikh and Boyd in [1] have applied the graph projection splitting algorithm to solve some cone programming problems, parameter estimation problems in statistics and machine learning, and intensity modulated radiation treatment planning, and have shown its efficiency. However, this algorithm at each iteration needs to evaluate the graph projection which is equivalent to solving a system of linear equations. In fact, based on the work of [1], we can evaluate the graph projection ΠA viaIn general, solving the graph projection (16) is very time-consuming. If setting we can further evaluate (16) as follows:or
Recently, O’Connor and Vandenberghe in [29] have also applied the Douglas–Rachford operator splitting method to systems of primal-dual optimality conditions for the problem (1) and developed a class of primal-dual decomposition algorithms to handle image de-blurring. In these algorithms, one also has to solve a system of linear equations as in the graph projection splitting algorithm. In addition, Briceño-Arias and Combettes [30] have also developed a new primal-dual splitting method based on the forward-backward-forward algorithm proposed in [31], to solve the generic problem of finding a zero of the sum of a maximally monotone operator and a linear skew-adjoint operator which includes the problem (1) as a special case. Since each iteration of the new splitting method in [30] needs to perform two forward steps on the linear skew-adjoint operator, more operators such as matrix-vector products or matrix-matrix products are generated when the new splitting method is applied to solve (1).
We now focus our attention on the well-known Chambolle–Pock primal-dual method (CPPDM) [33]. For the problem (1), we can have its primal-dual formulation, that is, the following saddle-point problemwhere is the convex conjugate of g. To solve the saddle-point problem (19), Chambolle and Pock studied a new primal-dual method whose iterative scheme iswhere the parameter θ ∈ [0, 1], α > 0 and β > 0 are two proximal parameters. The main idea of the scheme (20) is to split the coupled term ⟨Ax, y⟩ in (19) and then make the resulting subproblems solvable. The greatest advantage of the scheme (20) is that it does not involve solving a system of linear equations if the proximity operators of f and g* can have closed-form solutions or can be solved efficiently with high precision, which occurs in many applications. In the literature, the scheme (20) with is the classical Arrow–Hurwicz method [34] and is also called the primal-dual hybrid gradient method (PDHG) [35] in image processing. Note that the combination of and xk is very important for numerical performance and theoretical analysis of the algorithm. In fact, He et al. [40] have shown by a very simple example that PDHG is not necessarily convergent. Chambolle and Pock have proved the convergence of the scheme (20) with when the step size in the primal step and the step size in the dual step satisfy the conditionThe condition (21) avoids using very small step sizes which leads to slow convergence rate. Further, He et al. [37] have found that the scheme (20) with is a special application of the proximal point algorithm. In addition, very recently, the work of [36] has also pointed out that LADMM is equivalent to the scheme (20) with . However, the convergence of the scheme (20) with θ ∈ (0, 1) may be not guaranteed. To handle the case, some modified Chambolle–Pock primal-dual methods have been developed in recent years. With some restrictions on the parameters α and β, these modified algorithms can enlarge the rang of the parameter θ such as in [37], [39] and θ being arbitrary in [38], however, these algorithms usually require much more extra computation cost (more matrix-vector or matrix-matrix products) and thus converges slowly in practice. In this paper, we propose a slightly modified version with little additional computation cost. Our idea is based on the observation that the scheme (20) at each iteration updates the variable x twice but updates the variable y only once. In the proposed modified version, the variable y is also updated twice at each iteration. Specifically, the iterative scheme of the modified Chambolle–Pock primal-dual method (MCPPDM) isObviously, when the scheme (22) reduces to (20).
We list some contributions of this paper. First, we show how to select proper parameters involved in the scheme (22) so that the scheme (22) could be convergent. Second, the proposed method, the well-known LADMM as well as the graph projection splitting algorithm are applied to sparse recovery problems mentioned above. We also note that the recent work [32] has studies an algorithmic framework which includes the proposed algorithms as a special case, however, we provide a different convergence analysis and focus on verifying the efficiency of our algorithm applied to sparse recovery problems.
The rest of the paper is organized as follows: In Section 2, some preliminaries are reviewed. In Section 3, the convergence of the proposed algorithm are analyzed. Numerical results on the performance of the proposed method for solving sparse recovery problems are shown in Section 4. Some conclusions are given in Section 5.
Section snippets
Preliminaries
In this section, we recall some key notions including the proximity operator, subdifferential and conjugate function of a convex function, and characterize solutions of the problem (1) or (19) with respect to coupled fixed-point equations via proximity operators.
We first recall the notion of the proximity operator of a convex function. Let denote a finite-dimensional Hilbert space with the inner product ⟨·,·⟩ and the norm . For a self-adjoint positive-definite linear operator S in
Convergence analysis
In this section, the convergence of the proposed algorithmic scheme (22) is analysed and main convergence results are presented.
For simplicity, we can write the Eqs. (27) and (28) as a simple fixed-point formulation. To the end, We first introduce an operator H from defined at a given point byFor the operator H, we have the following Lemma 3.1. Lemma 3.1 The operator H is defined by (29). Then, where the functionand P is the
Numerical results
In this section, we present a series of experiments to verify the efficiency of the proposed method (MCPPDM) (22) for solving three classes of sparse recovery problems mentioned above and compare the numerical performance of it with that of LADMM (11) and that of the graph projection splitting algorithm (GPSA for short). Note that we do not compare the proposed method with these algorithms developed in [38], [39] because these algorithms modified Chambolle–Pock primal-dual methods with more
Conclusions
In this paper, we proposed a modified Chambolle–Pock primal-dual method (MCPPDM) for solving a convex composite optimization problem which covers many well-known applications. The modified method with little additional computation cost allows to choose flexibly the combination parameter in the original Chambolle–Pock primal-dual method. We then focused on some well-known sparse recovery problems and reformulated them as the convex composite optimization problem. A series of experiments on
Acknowledgments
The authors would like to thank the anonymous reviewers and the associate editor and editor-in chief for their valuable comments and suggestions, which improve the presentation of this paper. This research was supported by the National Natural Science Foundation of China under the Grant nos. 11771347, 91730306, 41390454, and 11271297, and supported by the Nanhu Scholars Program for Young Scholars of XYNU.
References (45)
- et al.
Block splitting for distributed optimization
Math. Program. Comput.
(2014) - et al.
Parameter selection and pre-conditioning for a graph form solver
(2015) - S. Boyd, L. Vandenberghe, Convex Optimization, Cambridge University Press,...
- T. Hastie, R. Tibshirani, T. Friedman, The Elements of Statistical Learning, Springer,...
- et al.
Efficient schemes for robust IMRT treatment planning
Phys. Med. Biol.
(2006) - et al.
Performance bounds and suboptimal policies for multi-period investment
Found. Trends Optim.
(2013) - et al.
Distributed optimization and statistical learning via the alternating direction method of multipliers
Found. Trends Mach. Learn.
(2011) - et al.
An introduction to compressive sampling
IEEE Signal Process. Mag.
(2008) - et al.
Stable signal recovery from incomplete and inaccurate measurements
Commun. Pure Appl. Math.
(2006) - et al.
Near optimal signal recovery from random projections: universal encoding strategies
IEEE Trans. Inf. Theory
(2006)