Abstract
We study the minimization problem of a non-convex sparsity promoting penalty function, the transformed \(l_1\) (TL1), and its application in compressed sensing (CS). The TL1 penalty interpolates \(l_0\) and \(l_1\) norms through a nonnegative parameter \(a \in (0,+\infty )\), similar to \(l_p\) with \(p \in (0,1]\), and is known to satisfy unbiasedness, sparsity and Lipschitz continuity properties. We first consider the constrained minimization problem, and discuss the exact recovery of \(l_0\) norm minimal solution based on the null space property (NSP). We then prove the stable recovery of \(l_0\) norm minimal solution if the sensing matrix A satisfies a restricted isometry property (RIP). We formulated a normalized problem to overcome the lack of scaling property of the TL1 penalty function. For a general sensing matrix A, we show that the support set of a local minimizer corresponds to linearly independent columns of A. Next, we present difference of convex algorithms for TL1 (DCATL1) in computing TL1-regularized constrained and unconstrained problems in CS. The DCATL1 algorithm involves outer and inner loops of iterations, one time matrix inversion, repeated shrinkage operations and matrix-vector multiplications. The inner loop concerns an \(l_1\) minimization problem on which we employ the Alternating Direction Method of Multipliers. For the unconstrained problem, we prove convergence of DCATL1 to a stationary point satisfying the first order optimality condition. In numerical experiments, we identify the optimal value \(a=1\), and compare DCATL1 with other CS algorithms on two classes of sensing matrices: Gaussian random matrices and over-sampled discrete cosine transform matrices (DCT). Among existing algorithms, the iterated reweighted least squares method based on \(l_{1/2}\) norm is the best in sparse recovery for Gaussian matrices, and the DCA algorithm based on \(l_1\) minus \(l_2\) penalty is the best for over-sampled DCT matrices. We find that for both classes of sensing matrices, the performance of DCATL1 algorithm (initiated with \(l_1\) minimization) always ranks near the top (if not the top), and is the most robust choice insensitive to the conditioning of the sensing matrix A. DCATL1 is also competitive in comparison with DCA on other non-convex penalty functions commonly used in statistics with two hyperparameters.
Similar content being viewed by others
References
Ahn, M., Pang, J.-S., Xin, J.: Difference-of-convex learning: directional stationarity, optimality, and sparsity. SIAM J. Optim. 27(3), 1637–1665 (2017)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Candès, E., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory 51(12), 4203–4215 (2005)
Candès, E., Rudelson, M., Tao, T., Vershynin, R.: Error correction via linear programming, In: 46th Annual IEEE Symposium on Foundations of Computer Science, pp. 668–681 (2005)
Candès, E., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete Fourier information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)
Candès, E., Romberg, J., Tao, T.: Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. 59(8), 1207–1223 (2006)
Candès, E., Fernandez-Granda, C.: Super-resolution from noisy data. J. Fourier Anal. Appl. 19(6), 1229–1254 (2013)
Cao, W., Sun, J., Xu, Z.: Fast image deconvolution using closed-form thresholding formulas of regularization. J. Vis. Commun. Image Represent. 24(1), 31–41 (2013)
Chartrand, R.: Nonconvex compressed sensing and error correction. ICASSP 3, III 889 (2007)
Chartrand, R., Yin, W.: Iteratively reweighted algorithms for compressive sensing. In: ICASSP pp. 3869–3872 (2008)
Cohen, A., Dahmen, W., DeVore, R.: Compressed sensing and the best k-term approximation. J. Am. Math. Soc. 22, 211–231 (2009)
Donoho, D.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)
Donoho, D., Elad, M.: Optimally sparse representation in general (nonorthogonal) dictionaries via \(\ell _1\) minimization. Proc. Natl. Acad. Sci. USA 100, 2197–2202 (2003)
Esser, E., Lou, Y., Xin, J.: A method for finding structured sparse solutions to non-negative least squares problems with applications. SIAM J. Imaging Sci. 6, 2010–2046 (2013)
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)
Fannjiang, A., Liao, W.: Coherence pattern–guided compressive sensing with unresolved grids. SIAM J. Imaging Sci. 5(1), 179–202 (2012)
Goldstein, T., Osher, S.: The split Bregman method for \(\ell _1\)-regularized problems. SIAM J. Imaging Sci. 2(1), 323–343 (2009)
Lai, M., Xu, Y., Yin, W.: Improved iteratively reweighted least squares for unconstrained smoothed \(\ell _q\) minimization. SIAM J. Numer. Anal. 51(2), 927–957 (2013)
Le Thi, H.A., Thi, B.T.N., Le, H.M.: Sparse signal recovery by difference of convex functions algorithms. In: Selamat, A., Nguyen, N.T., Haron, H. (eds.) Intelligent Information and Database Systems, pp. 387–397. Springer, Berlin (2013)
Le Thi, H.A., Huynh, V.N., Dinh, T.: DC programming and DCA for general DC programs. In: Do, T.V., Le Thi, H.A. Nguyen, N.T. (eds.) Advanced Computational Methods for Knowledge Engineering, pp. 15–35. Springer, Berlin (2014)
Le Thi, H.A., Pham Dinh, T., Le, H.M., Vo, X.T.: DC approximation approaches for sparse optimization. Eur. J. Oper. Res. 244(1), 26–46 (2015)
Lou, Y., Yin, P., Xin, J.: Point source super-resolution via non-convex L1 based methods. J. Sci. Comput. 68(3), 1082–1100 (2016)
Lou, Y., Yin, P., He, Q., Xin, J.: Computing sparse representation in a highly coherent dictionary based on difference of L1 and L2. J. Sci. Comput. 64, 178–196 (2015)
Lu, Z., Zhang, Y.: Sparse approximation via penalty decomposition methods. SIAM J. Optim. 23(4), 2448–2478 (2013)
Lv, J., Fan, Y.: A unified approach to model selection and sparse recovery using regularized least squares. Ann. Stat. 37(6A), 3498–3528 (2009)
Mallat, S., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 41(12), 3397–3415 (1993)
Mazumder, R., Friedman, J., Hastie, T.: SparseNet: coordinate descent with nonconvex penalties. J. Am. Stat. Assoc. 106(495), 1125–1138 (2011)
Natarajan, B.: Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2), 227–234 (1995)
Needell, D., Vershynin, R.: Signal recovery from incomplete and inaccurate measurements via regularized orthogonal matching pursuit. IEEE J. Sel. Top. Signal Process. 4(2), 310–316 (2010)
Nguyen, T.B.T., Le Thi, H.A., Le, H.M., Vo, X.T.: DC approximation approach for \(\ell _0\)-minimization in compressed sensing. In: Do, T.V., Le Thi, H.A. , Nguyen N.T. ( eds.) Advanced Computational Methods for Knowledge Engineering, pp. 37–48. Springer, Berlin (2015)
Nikolova, M.: Local strong homogeneity of a regularized estimator. SIAM J. Appl. Math. 61(2), 633–658 (2000)
Ong, C.S., Le Thi, H.A.: Learning sparse classifiers with difference of convex functions algorithms. Optim. Methods Softw. 28(4), 830–854 (2013)
Pham Dinh, T., Le Thi, A.: Convex analysis approach to d.c. programming: theory, algorithms and applications. Acta Math. Vietnam. 22(1), 289–355 (1997)
Pham Dinh, T., Le Thi, H.A.: A DC optimization algorithm for solving the trust-region subproblem. SIAM J. Optim. 8(2), 476–505 (1998)
Soubies, E., Blanc-Féraud, L., Aubert, G.: A continuous exact \(\ell _0\) penalty (CEL0) for least squares regularized problem. SIAM J. Imaging Sci. 8(3), 1607–1639 (2015)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. 58(1), 267–288 (1996)
Tran, H., Webster, C.: Unified sufficient conditions for uniform recovery of sparse signals via nonconvex minimizations. arXiv:1701.07348. 19 Oct 2017
Tropp, J., Gilbert, A.: Signal recovery from partial information via orthogonal matching pursuit. IEEE Trans. Inf. Theory 53(12), 4655–4666 (2007)
Xu, F., Wang, S.: A hybrid simulated annealing thresholding algorithm for compressed sensing. Signal Process. 93, 1577–1585 (2013)
Xu, Z., Chang, X., Xu, F., Zhang, H.: \(L_{1/2}\) regularization: a thresholding representation theory and a fast solver. IEEE Trans. Neural Netw. Learn. Syst. 23(7), 1013–1027 (2012)
Yang, J., Zhang, Y.: Alternating direction algorithms for \(l_1\) problems in compressive sensing. SIAM J. Sci. Comput. 33(1), 250–278 (2011)
Yin, P., Lou, Y., He, Q., Xin, J.: Minimization of \(\ell _{1-2}\) for compressed sensing. SIAM J. Sci. Comput. 37(1), A536–A563 (2015)
Yin, P., Xin, J.: Iterative \(\ell _1\) minimization for non-convex compressed sensing. J. Comput. Math. 35(4), 437–449 (2017)
Yin, W., Osher, S., Goldfarb, D., Darbon, J.: Bregman iterative algorithms for \(l_1\)-minimization with applications to compressed sensing. SIAM J. Imaging Sci. 1(1), 143–168 (2008)
Zeng, J., Lin, S., Wang, Y., Xu, Z.: \(L_{1/2}\) regularization: convergence of iterative half thresholding algorithm. IEEE Trans. Signal Process. 62(9), 2317–2329 (2014)
Zhang, C.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)
Zhang, S., Xin, J.: Minimization of transformed \(L_1\) penalty: closed form representation and iterative thresholding algorithms. Commun. Math. Sci. 15(2), 511–537 (2017)
Zhang, S., Yin, P., Xin, J.: Transformed Schatten-1 Iterative thresholding algorithms for low rank matrix completion. Commun. Math. Sci. 15(3), 839–862 (2017)
Acknowledgements
The authors would like to thank Professor Wenjiang Fu for referring us to [25], Professor Jong-Shi Pang for his helpful suggestions, and the anonymous referees for their constructive comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Shuai Zhang and Jack Xin were partially supported by NSF Grants DMS-0928427, DMS-1222507, and DMS-1522383.
Appendices
Appendix A: Proof of exact TL1 sparse recovery (Theorem 2.1)
Proof
The proof is along the lines of arguments in [4, 9], while using special properties of the penalty function \(\rho _a\). For simplicity, we denote \(\beta _{C}\) by \(\beta \) and \(\beta ^0_{C}\) by \(\beta ^0\).
Let \(e = \beta - \beta ^0\), and we want to prove that the vector \(e =0\). It is clear that, \(e_{T^c} = \beta _{T^c}\), since T is the support set of \(\beta ^0\). By the triangular inequality of \(\rho _a\), we have:
Then
It follows that:
Now let us arrange the components at \(T^c\) in the order of decreasing magnitude of |e| and partition into L parts: \(T^c = T_1 \cup T_2 \cup \cdots \cup T_L\), where each \(T_j\) has R elements (except possibly \(T_L\) with less). Also denote \(T = T_0\) and \(T_{01} = T \cup T_1\). Since \(Ae = A(\beta - \beta ^0) = 0\), it follows that
At the next step, we derive two inequalities between the \(l_2\) norm and function \(P_a\), in order to use the inequality (1.1). Since
we have:
Now we estimate the \(l_2\) norm of \(e_{T_j}\) from above in terms of \(P_a\). It follows from \(\beta \) being the minimizer of the problem (2.12) and the definition of \(x_C\) (2.9) that
For each \(i \in T^c,\ \ \rho _a(\beta _i) \le P_a(\beta _{T^c}) \le 1\). Also since
we have
It is known that function \(\rho _a(t)\) is increasing for non-negative variable \(t \ge 0\), and
where \(j = 2,3,\ldots ,L\). Thus we have
Finally, plug (1.3) and (1.5) into inequality (1.2) to get:
By the RIP condition (2.13), the factor \(\sqrt{1-\delta _{R+|T|}} \dfrac{a}{a+1} \sqrt{\dfrac{R}{|T|}} - \sqrt{1+\delta _{R}}\) is strictly positive, hence \(P_a(e_T) = 0\), and \(e_T = 0\). Also by inequality (1.1), \(e_{T^c} = 0\). We have proved that \(\beta _{C} = \beta ^0_{C}\). The equivalence of (2.12) and (2.11) holds. If another vector \(\beta \) is the optimal solution of (2.12), we can prove that it is also equal to \(\beta ^0_{C}\), using the same procedure. Hence \(\beta _{C}\) is unique. \(\square \)
Appendix B: Proof of stable TL1 sparse recovery (Theorem 2.2)
Proof
Set \( n = A \beta - y_C \). We use three notations below:
-
(i)
\(\beta _{C}^n \Rightarrow \) optimal solution for the constrained problem (2.14);
-
(ii)
\(\beta _{C} \Rightarrow \) optimal solution for the constrained problem (2.12);
-
(iii)
\(\beta _{C}^0 \Rightarrow \) optimal solution for the \(l_0\) problem (2.11).
Let T be the support set of \(\beta _{C}^0\), i.e., \(T = supp(\beta _{C}^0)\), and vector \(e = \beta _{C}^n - \beta _{C}^0\). Following the proof of Theorem 2.2, we obtain:
and
Further, due to the inequality \(P_a(\beta _{T^c}^n) = P_a(e_{T^c}) \le P_a(e_T)\) from (1.1) and inequalities in (1.2), we get
where \(C_{\delta } = \sqrt{1-\delta _{R+|T|}} \dfrac{a}{a+1} \sqrt{\dfrac{R}{|T|}} - \sqrt{1+\delta _{R}}\).
By the initial assumption on the size of observation noise, we have
so we have: \(P_a(e_T) \le \dfrac{\tau R_{1/2}}{C_{\delta }}\).
On the other hand, we know that \(P_a(\beta _{C}) \le 1 \) and \(\beta _{C}\) is in the feasible set of the problem (2.14). Thus we have the inequality: \(P_a(\beta _{C}^n) \le P_a(\beta _{C}) \le 1\). By (1.4), \(\beta _{C,i}^n \le 1\) for each i. So, we have
It follows that
for a positive constant D depending only on \(\delta _R\) and \(\delta _{R+|T|}\). The second inequality uses the definition of RIP, while the first inequality in the last row comes from (2.1) and (1.1). \(\square \)
Rights and permissions
About this article
Cite this article
Zhang, S., Xin, J. Minimization of transformed \(L_1\) penalty: theory, difference of convex function algorithm, and robust application in compressed sensing. Math. Program. 169, 307–336 (2018). https://doi.org/10.1007/s10107-018-1236-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-018-1236-x
Keywords
- Transformed \(l_1\) penalty
- Sparse signal recovery theory
- Difference of convex function algorithm
- Convergence analysis
- Coherent random matrices
- Compressed sensing
- Robust recovery