Skip to main content
Log in

A family of second-order methods for convex \(\ell _1\)-regularized optimization

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

This paper is concerned with the minimization of an objective that is the sum of a convex function f and an \(\ell _1\) regularization term. Our interest is in active-set methods that incorporate second-order information about the function f to accelerate convergence. We describe a semismooth Newton framework that can be used to generate a variety of second-order methods, including block active set methods, orthant-based methods and a second-order iterative soft-thresholding method. The paper proposes a new active set method that performs multiple changes in the active manifold estimate at every iteration, and employs a mechanism for correcting these estimates, when needed. This corrective mechanism is also evaluated in an orthant-based method. Numerical tests comparing the performance of three active set methods are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Afonso, M.V., Bioucas-Dias, J.M., Figueiredo, M.A.T.: Fast image recovery using variable splitting and constrained optimization. IEEE Trans. Image Process. 19(9), 2345–2356 (2010)

    Article  MathSciNet  Google Scholar 

  2. Aganagić, M.: Newton’s method for linear complementarity problems. Math. Program. 28(3), 349–362 (1984)

    Article  MATH  Google Scholar 

  3. Andrew, G., Gao, J.: Scalable training of \({L}_1\)-regularized log-linear models. In: Proceedings of the 24th International Conference on Machine Learning, pp. 33–40. ACM (2007)

  4. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  5. Gharbia, Ben: I., Gilbert, J.C.: Nonconvergence of the plain Newton-min algorithm for linear complementarity problems with a \(P\)-matrix. Math. Program. 134(2), 349–364 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  6. Bergounioux, M., Ito, K., Kunisch, K.: Primal–dual strategy for constrained optimal control problems. SIAM J. Control Optim. 37(4), 1176–1194 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  7. Bertsekas, D.P.: Projected Newton methods for optimization problems with simple constraints. SIAM J. Control Optim. 20(2), 221–246 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  8. Bioucas-Dias, J.M., Figueiredo, M.A.T.: A new TwIST: two-step iterative shrinkage/thresholding algorithms for image restoration. IEEE Trans. Image Process. 16(12), 2992–3004 (2007)

    Article  MathSciNet  Google Scholar 

  9. Bongartz, I., Conn, A.R., Gould, N.I.M., Toint, P.L.: \(\sf CUTE\): constrained and unconstrained testing environment. ACM Trans. Math. Softw. 21(1), 123–160 (1995)

    Article  MATH  Google Scholar 

  10. Byrd, R.H., Chin, G.M., Nocedal, J., Wu, Y.: Sample size selection in optimization methods for machine learning. Math. Program. 134(1), 127–155 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  11. Byrd, R.H., Nocedal, J., Solntsev, S.: An algorithm for quadratic \(\ell _1\) 1-regularized optimization with a flexible active-set strategy. Optim. Methods Softw. 1–25 (2014). arXiv:1412.1844

  12. Byrd, R.H., Nocedal, J., Oztoprak, F.: An inexact successive quadratic approximation method for convex l1 regularized optimization. Math. Program. B (to appear). arXiv preprint arXiv:1309.3529 (2013)

  13. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011). http://www.csie.ntu.edu.tw/~cjlin/libsvm

  14. Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57(11), 1413–1457 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  15. Deng, W., Yin, W., Zhang, Y.: Group sparse optimization by alternating direction method. In: SPIE Optical Engineering+ Applications. pp. 88580R–88580R. International Society for Optics and Photonics (2013)

  16. Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. Ser. A 91, 201–213 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  17. Donoho, D.L.: De-noising by soft-thresholding. IEEE Trans. Inf. Theory 41(3), 613–627 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  18. Duchi, J., Singer, Y.: Efficient online and batch learning using forward backward splitting. J. Mach. Learn. Res. 10, 2899–2934 (2009)

    MathSciNet  MATH  Google Scholar 

  19. Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems, vol. 2. Springer, Berlin (2003)

    MATH  Google Scholar 

  20. Frank, A., Asuncion, A.: UCI machine learning repository (2010)

  21. Gould, N.I.M., Orban, D., Toint, P.L.: \(\sf CUTEr\) and \(\sf sifdec\): a constrained and unconstrained testing environment, revisited. ACM Trans. Math. Softw. 29(4), 373–394 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  22. Griesse, R., Lorenz, D.A.: A semismooth Newton method for Tikhonov functionals with sparsity constraints. Inverse Prob. 24(3), 035007 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  23. Hale, E.T., Yin, W., Zhang, Y.: Fixed-point continuation for \(\ell _1\)-minimization: methodology and convergence. SIAM J. Optim. 19(3), 1107–1130 (2008)

  24. Hans, E., Raasch, T.: Global convergence of damped semismooth Newton methods for \(\ell _1\) Tikhonov regularization. Inverse Prob. 31(2), 025005 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  25. Hintermuller, M., Hinze, M.: A SQP-semismooth Newton-type algorithm applied to control of the instationary Navier–Stokes system subject to control constraints. SIAM J. Optim. 16(4), 1177–1200 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  26. Hintermüller, M., Ito, K., Kunisch, K.: The primal–dual active set strategy as a semismooth Newton method. SIAM J. Optim. 13(3), 865–888 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  27. Hintermüller, M., Stadler, G.: An infeasible primal–dual algorithm for total bounded variation-based inf-convolution-type image restoration. SIAM J. Sci. Comput. 28(1), 1–23 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  28. Hungerländer, P., Rendl, F.: A feasible active set method for convex problems with simple bounds. Preprint (2013)

  29. Júdice, J.J., Pires, F.M.: A block principal pivoting algorithm for large-scale strictly monotone linear complementarity problems. Comput. Oper. Res. 21(5), 587–596 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  30. Kim, J., Park, H.: Fast active-set-type algorithms for \(l_1\)-regularized linear regression. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (2010)

  31. Seung-Jean Kim, K., Koh, M., Lustig, S.B., Gorinevsky, D.: An interior-point method for large-scale l1-regularized least squares. IEEE J. Sel. Topics Signal Process. 1(4), 606–617 (2007)

    Article  Google Scholar 

  32. Luo, Z.Q., Tseng, P.: Error bounds and convergence analysis of feasible direction methods: a general approach. Ann. Oper. Res. 46, 157–178 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  33. Mifflin, R.: Semismooth and semiconvex functions in constrained optimization. SIAM J. Control Optim. 15(6), 959–972 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  34. Milzarek, A., Ulbrich, M.: A semismooth Newton method with multi-dimensional filter globalization for L1-optimization. SIAM J. Optim. 24(1), 298–333 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  35. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, Boston (2004)

    Book  MATH  Google Scholar 

  36. Nocedal, J., Wright, S.: Numerical Optimization, 2nd edn. Springer, New York (1999)

    Book  MATH  Google Scholar 

  37. Qi, L.: Convergence analysis of some algorithms for solving nonsmooth equations. Math. Oper. Res. 18(1), 227–244 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  38. Qi, L., Sun, J.: A nonsmooth version of Newton’s method. Math. Program. 58, 353–367 (1993). doi:10.1007/BF01581275

    Article  MathSciNet  MATH  Google Scholar 

  39. Rohn, J.: A short proof of finiteness of Murty’s principal pivoting algorithm. Math. Program. 46(1–3), 255–256 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  40. Ruszczynski, A.: Nonlinear Optimization. Princeton University Press, Princeton (2011)

    MATH  Google Scholar 

  41. Sra, S., Nowozin, S., Wright, S.J.: Optimization for Machine Learning. MIT Press, Cambridge (2011)

    Google Scholar 

  42. Stadler, G.: Elliptic optimal control problems with L1-control cost and applications for the placement of control devices. Comput. Optim. Appl. 44(2), 159–181 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  43. Ulbrich, M.: Semismooth Newton methods for variational inequalities and constrained optimization problems in function spaces. MOS-SIAM Series on Optimization, vol. 11 (2011)

  44. Wen, Z., Yin, W., Goldfarb, D., Zhang, Y.: A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation. SIAM J. Sci. Comput. 32(4), 1832–1857 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  45. Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57(7), 2479–2493 (2009)

    Article  MathSciNet  Google Scholar 

  46. Xiao, L.: Dual averaging methods for regularized stochastic learning and online optimization. J. Mach. Learn. Res. 11, 2543–2596 (2010)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The authors thank Michael Hintermüller and Daniel Robinson for their very useful comments and suggestions. They are also grateful to an anonymous referee who pointed out an error in a previous proof of the results of Sect. 4.1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gillian M. Chin.

Additional information

Richard H. Byrd was supported by National Science Foundation Grant DMS-1216554 and Department of Energy Grant DE-SC0001774. Gillian M. Chin was supported by an NSERC fellowship and a grant from Google Inc. Jorge Nocedal was supported by National Science Foundation Grant DMS-1216567, and by Department of Energy Grant DE-FG02-87ER25047. Figen Oztoprak was supported by Department of Energy Grant DE-SC0001774 and by Scientific and Technological Research Council of Turkey Grant 113M500.

Appendix: Derivation of the block active-set (BAS) algorithm for problem (3.8)

Appendix: Derivation of the block active-set (BAS) algorithm for problem (3.8)

By introducing auxiliary variables uv, such that

$$\begin{aligned} x = u-v \quad \text{ and } \quad u,v \ge 0, \end{aligned}$$
(8.1)

we can reformulate problem (3.8) as a bound constrained problem of the form (3.1):

$$\begin{aligned} \min _{u,v \in \mathbb {R}^n}&\textstyle \frac{1}{2}(u-v)^TQ(u-v) + q^T(u-v) + \mu (u+v)^Te \\&\quad \text{ s.t. } \quad u,v \ge 0, \nonumber \end{aligned}$$
(8.2)

where \(e \in \mathbb {R}^n\) represents a vector with entries of all ones. In this smooth reformulation, the variable u is used to represent the positive components of x, and v the negative components. The KKT conditions for (8.2) are

$$\begin{aligned}&Q(u-v) + q + \mu e - y = 0, \end{aligned}$$
(8.3)
$$\begin{aligned}&- (Q(u{-}v) + q) + \mu e {-} z = 0, \end{aligned}$$
(8.4)
$$\begin{aligned}&{y^T u = 0, \quad z^Tv = 0}, \end{aligned}$$
(8.5)
$$\begin{aligned}&u,v \ge 0, \quad y, z \ge 0, \end{aligned}$$
(8.6)

where y and z are the Lagrange multipliers for the nonnegative constraints of u and v respectively.

Given a primal–dual iterate \((u^k,v^k, y^k,z^k)\), the BAS algorithm (3.5)–(3.7) begins with the specification of the index sets, which for the case of problem (8.2) take the form:

$$\begin{aligned} A^u_k:&\ \text{ is } \text{ the } \text{ set } \text{ of } \text{ indices } \text{ of } u^k \text{ that } \text{ will } \text{ be } \text{ set } \text{ to } \text{ zero },&\\ {\mathcal {T}}^u_k:&\ \text{ is } \text{ the } \text{ set } \text{ of } \text{ indices } \text{ of } u^k \text{ that } \text{ will } \text{ be } \text{ allowed } \text{ to } \text{ move; } \quad {\mathcal {T}}^u_k = \{1,\ldots ,n\} {\setminus } A^u_k, \\ A^v_k:&\ \text{ is } \text{ the } \text{ set } \text{ of } \text{ indices } \text{ of } v^k \text{ that } \text{ will } \text{ be } \text{ set } \text{ to } \text{ zero },&\\ {\mathcal {T}}^v_k:&\ \text{ is } \text{ the } \text{ set } \text{ of } \text{ indices } \text{ of } v^k \text{ that } \text{ will } \text{ be } \text{ allowed } \text{ to } \text{ move; } \quad {\mathcal {T}}^v_k = \{1,\ldots ,n\} {\setminus } A^v_k . \end{aligned}$$

The method is specified in Algorithm 8.1, where we have assumed the initial variables \(u^0, v^0\) satisfy the complementarity conditions

$$\begin{aligned} 0 \le u_i^0 \perp v_i^0 \ge 0, \quad \forall i. \end{aligned}$$
(8.7)
figure d

Algorithm 8.1 preserves complementarity in u and v. To see this, note that equations (8.10)–(8.13), together with the initialization, yield primal–dual complementarity at every iteration k, i.e.,

$$\begin{aligned} u^k_i y^k_i = 0 \quad \text{ and } \quad v^k_i z^k_i = 0, \quad \forall i . \end{aligned}$$
(8.16)

By adding the equations (8.14) and (8.15), we see that \(y^{k+1} + z^{k+1} = 2 \mu e\). This implies that for each index i, we will have either \(y^{k+1}_i >0\) or \(z^{k+1}_i >0\). Thus, by (8.8), (8.9) and (8.16), either \(i \in A^u_k\) or \(i \in A^v_k\). As a result, for each i, either \(i \notin {\mathcal {T}}^u_k\) or \(i \notin {\mathcal {T}}^v_k\), since \({\mathcal {T}}^u_k = [A^u_k]^c\), and hence

$$\begin{aligned} {\mathcal {T}}^u_k \cap {\mathcal {T}}^v_k = \emptyset \quad \forall k \ge 0. \end{aligned}$$
(8.17)

Therefore, at every iteration k either \(u_i^k\) or \(v_i^k\) are zero, for all i.

As a result, it suffices to keep track of the positive and negative components of a single vector x, thus reducing the number of variables by half. Given (8.1), we can similarly define a single multiplier vector,

$$\begin{aligned} w^k := Q(u^k-v^k) + q = Qx^k + q . \end{aligned}$$
(8.18)

so that by (8.14) and (8.15) we have

$$\begin{aligned} y^{k+1}_i = w^{k+1}_i + \mu , \quad z^{k+1}_i = -w^{k+1}_i + \mu . \end{aligned}$$
(8.19)

Next, we define index sets corresponding to the single variable x. To do so, we first make some observations about the transitions of the indices of the vectors uv in Algorithm 8.1. Let us define

$$\begin{aligned}&P^{k-1} = {\mathcal {T}}^u_{k-1} : \text{ the } \text{ set } \text{ of } \text{ indices } \text{ of } x^k \text{ we } \text{ predict } \text{ are } \text{ positive, } \nonumber \\&N^{k-1} = {\mathcal {T}}^v_{k-1} : \text{ the } \text{ set } \text{ of } \text{ indices } \text{ of } x^k \text{ we } \text{ predict } \text{ are } \text{ negative, } \nonumber \\&A^{k-1} = A^u_{k-1} {\cap } A^v_{k-1} : \text{ the } \text{ set } \text{ of } \text{ indices } \text{ of } x^k \text{ that } \text{ are } \text{ set } \text{ to } \text{ zero. } \end{aligned}$$
(8.20)

We analyze the following cases.

  1. 1.

    Let us consider an index \(i \in A^{k-1}\). By (8.10) and (8.11), it follows that \(u^k_i = v^k_i = 0\). Therefore, one of the following conditions holds:

    $$\begin{aligned} \text{ If }&y^k_i >0 \wedge z^k_i > 0: \ \text{ then, } \text{ by } (8.8), (8.9), i \in A^{k}. \\ \text{ If }&y^k_i = w^k_i + \mu \le 0: \ \text{ then, } \text{ by } (8.19), z^k_i > 0. \text{ From } (8.8) \text{ and } (8.9), \\&\text{ we } \text{ have } i \in P^{k}. \\ \text{ If }&z^k_i = - w^k_i + \mu \le 0: \text{ then, } \text{ by } \text{ similar } \text{ arguments, } i \in N^{k}. \end{aligned}$$
  2. 2.

    Let us consider an index \(i \in P^{k-1}\). From (8.17), we have \(i \in A^v_{k-1}\). By (8.11), (8.12) and (8.19), it follows that \(v^{k}_i=0\), \(y^{k}_i=0\) and \(z^{k}_i > 0\) respectively. Therefore, from (8.9), \(i \in A^v_k\), and by (8.8), it follows that:

    $$\begin{aligned} i \in {\left\{ \begin{array}{ll} A^{k} &{} \text{ if } u^k_i < 0 \\ P^{k} &{} \text{ if } u^k_i \ge 0 .\end{array}\right. } \end{aligned}$$
  3. 3.

    Let us consider \(i \in N^{k-1}\). From (8.17), we have \(i \in A^u_{k-1}\). By (8.10), (8.13) and (8.19), it follows that \(u^{k}_i=0\), \(z^{k}_i=0\) and \(y^{k}_i > 0\) respectively. Therefore, \(i \in A^u_k\), and we have that

    $$\begin{aligned} i \in {\left\{ \begin{array}{ll} A^{k} &{} \text{ if } v^k_i < 0 \\ N^{k} &{} \text{ if } v^k_i \ge 0 . \end{array}\right. } \end{aligned}$$

This analysis demonstrates that, for the case \(i \in P^{k-1} \cup N^{k-1}\), the multipliers \(y^{k}_i, z^{k}_i \ge 0\) are always feasible, and that indices cannot transition from being inactive in v to being inactive in u, or vice versa. As a result, transitions between the index sets P and N over consecutive iterations are prohibited in the BAS method.

Based on this fact, and the observations made in the three cases above, we can redefine the index sets in (8.20) in terms of the x variable,

$$\begin{aligned} P^k&=\{i\in P^{k-1}: x^k_i \ge 0\} \cup \{i\in A^{k-1}:w^k_i\le -\mu \}, \\ N^k&=\{i\in N^{k-1}: x^k_i \le 0\} \cup \{i\in A^{k-1}:w^k_i\ge \mu \}, \nonumber \\ A^k&=\{i\in A^{k-1}: w^k_i \in (-\mu , \mu ) \} \cup \{i\in P^{k-1}: x^k_i < 0\} \cup \{i\in N^{k-1}: x^k_i > 0\} . \nonumber \end{aligned}$$
(8.21)

To express (8.10)–(8.15) in terms of the x variable alone, we first note that (8.10) and (8.11) can be replaced by

$$\begin{aligned} x^{k+1}_i = 0, \quad \forall i \in A^k. \end{aligned}$$

To compute the remaining components of \(x^{k+1}\), that is \(i \in P^k \cup N^k\), we substitute (8.12) and (8.13) into the equations (8.14) and (8.15) to obtain

$$\begin{aligned}&[Qx_{k+1} + q + \mu e]_i = 0, \quad \forall i \in P^k ,\\&[Qx_{k+1} + q - \mu e]_i = 0, \quad \forall i \in N^k . \end{aligned}$$

In summary, the description of the BAS algorithm applied to problem (3.8) is given in Algorithm 3.1.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Byrd, R.H., Chin, G.M., Nocedal, J. et al. A family of second-order methods for convex \(\ell _1\)-regularized optimization. Math. Program. 159, 435–467 (2016). https://doi.org/10.1007/s10107-015-0965-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-015-0965-3

Mathematics Subject Classification

Navigation