Skip to main content
Log in

An inexact successive quadratic approximation method for a class of difference-of-convex optimization problems

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

In this paper, we propose a new method for a class of difference-of-convex (DC) optimization problems, whose objective is the sum of a smooth function and a possibly non-prox-friendly DC function. The method sequentially solves subproblems constructed from a quadratic approximation of the smooth function and a linear majorization of the concave part of the DC function. We allow the subproblem to be solved inexactly, and propose a new inexact rule to characterize the inexactness of the approximate solution. For several classical algorithms applied to the subproblem, we derive practical termination criteria so as to obtain solutions satisfying the inexact rule. We also present some convergence results for our method, including the global subsequential convergence and a non-asymptotic complexity analysis. Finally, numerical experiments are conducted to illustrate the efficiency of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. As shown later in (A.4), we have \(\triangle _{k,1}\le 0\) under some condition given in Lemma 3.2.

  2. As shown in (A.2), we have \(\triangle _{k}(\alpha ) \le \alpha \triangle _{k,1} \le 0\) under some condition given in Lemma 3.2.

  3. This is applicable since our choice of \({\varvec{B}_{k}}\) in (5.2) is the sum of a diagonal matrix and a rank-one matrix.

  4. We do not compare our methods with nonmonotone proximal gradient method with majorization (NPG\(_{\text{major}}\)) in [25, Algorithm 2], because the performance of NPG\(_{\text{major}}\) is very similar to that of NPG; see [23, Sect. 5].

  5. The code can be downloaded in https://github.com/stephenbeckr/zeroSR1/tree/master/paperExperiments/Lasso.

  6. In [4], the restart frequency is 200. We replace the frequency by 2000 thus to improve the performance of pDCA\(_e\) in numerical experiments.

  7. Indeed, for any \({\varvec{x}}\in \mathrm{I\!R}^n\), a subgradient in \(\partial g({\varvec{x}})\) is \(\varvec{\xi }= \tau {\varvec{x}} + \lambda \mu \,\varvec{u} + \varvec{A}^\top (\varvec{v} + \varvec{b})\), where \(\varvec{u}\) and \(\varvec{v}\) are given by

    $$ \varvec{u}_i = {\left\{ \begin{array}{ll} {\text{sign}}({\varvec{x}}_i), &{} \ \ {\text{if}}\ i\in C_u \\ 0. &{} \ \ {\text{else.}} \end{array}\right. } \ \ \ \ \varvec{v}_i = {\left\{ \begin{array}{ll} (\varvec{A}{\varvec{x}} - \varvec{b})_i, &{} \ \ {\text{if}}\ i\in C_v\\ 0. &{} \ \ {\text{else.}} \end{array}\right. } $$

    Here, \(C_u\) is an arbitrary index set corresponding to the largest k elements of \({\varvec{x}}\) in magnitude, and \(C_v\) is an arbitrary index set corresponding to the largest r elements of \(\varvec{A}{\varvec{x}} - \varvec{b}\) in magnitude.

  8. It has been shown in [24] that sequence generated by pDCA\(_e\) converges locally linearly to a stationary point of (6.5). Moreover, it has been shown in [24] that pDCA\(_e\) outperforms NPG\(_{\text{major}}\) proposed in [25, Algorithm 2]. Consequently, we do not compare our method with NPG\(_{\text{major}}\) here.

References

  1. Ahn, M., Pang, J.S., Xin, J.: Difference-of-convex learning: directional stationarity, optimality, and sparsity. SIAM J. Optim. 27, 1637–1665 (2017)

    Article  MathSciNet  Google Scholar 

  2. Beck, A.: First-Order Methods in Optimization. SIAM (2017)

  3. Becker, S., Fadili, J., Ochs, P.: On quasi-Newton forward-backward splitting: proximal calculus and convergence. SIAM J. Optim. 29, 2445–2482 (2019)

    Article  MathSciNet  Google Scholar 

  4. Becker, S., Candès, E.J., Grant, M.C.: Templates for convex cone problems with applications to sparse signal recovery. Math. Program. Comput. 3, 165–218 (2011)

    Article  MathSciNet  Google Scholar 

  5. Bonettini, S., Loris, I., Porta, F., Prato, M.: Variable metric inexact line-search-based methods for nonsmooth optimization. SIAM J. Optim. 26, 891–921 (2016)

    Article  MathSciNet  Google Scholar 

  6. Byrd, R.H., Nocedal, J., Oztoprak, F.: An inexact successive quadratic approximation method for \(L\)-1 regularized optimization. Math. Program. 157, 375–396 (2016)

    Article  MathSciNet  Google Scholar 

  7. Bonettini, S., Porta, F., Ruggiero, V.: A variable metric forward-backward method with extrapolation. SIAM J. Sci. Comput. 38, A2558–A2584 (2016)

    Article  MathSciNet  Google Scholar 

  8. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)

    Article  MathSciNet  Google Scholar 

  9. Chouzenoux, E., Pesquet, J.C., Repetti, A.: Variable metric forward-backward algorithm for minimizing the sum of a differentiable function and a convex function. J. Optim. Theory Appl. 162, 107–132 (2014)

    Article  MathSciNet  Google Scholar 

  10. Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 456, 1348–1360 (2001)

    Article  MathSciNet  Google Scholar 

  11. Ghanbari, H., Scheinberg, K.: Proximal quasi-Newton methods for regularized convex optimization with linear and accelerated sublinear convergence rates. Comput. Optim. Appl. 69, 597–627 (2018)

    Article  MathSciNet  Google Scholar 

  12. Gotoh, J.Y., Takeda, A., Tono, K.: DC formulations and algorithms for sparse optimization problems. Math. Program. 169, 141–176 (2018)

    Article  MathSciNet  Google Scholar 

  13. Gong, P., Zhang, C., Lu, Z., Huang, J., Ye, J.: A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. In: International Conference on Machine Learning, pp. 37–45 (2013)

  14. Kanzow, C., Lechner, T.: Globalized inexact proximal Newton-type methods for nonconvex composite functions. https://www.mathematik.uni-wuerzburg.de/fileadmin/10040700/paper/ProxNewton.pdf (2020)

  15. Karimi, S., Vavasis, S.: IMRO: a proximal quasi-Newton method for solving \(\ell _1\)-regularized least squares problems. SIAM J. Optim. 27, 583–615 (2017)

    Article  MathSciNet  Google Scholar 

  16. Lee, C.P., Wright, S.J.: Inexact successive quadratic approximation for regularized optimization. Comput. Optim. Appl. 72, 641–674 (2019)

    Article  MathSciNet  Google Scholar 

  17. Li, G., Liu, T., Pong, T.P.: Peaceman-Rachford splitting for a class of nonconvex optimization problems. Comput. Optim. Appl. 68, 407–436 (2017)

    Article  MathSciNet  Google Scholar 

  18. Li, J., Andersen, M.S., Vandenberghe, L.: Inexact proximal Newton methods for self-concordant functions. Math. Methods Oper. Res. 85, 19–41 (2017)

    Article  MathSciNet  Google Scholar 

  19. Li, H., Lin, Z.: Accelerated proximal gradient methods for nonconvex programming. In: Advances in Neural Information Processing Systems, pp. 379–387 (2015)

  20. Lin, H., Mairal, J., Harchaoui, Z.: An inexact variable metric proximal point algorithm for generic quasi-Newton acceleration. SIAM J. Optim. 29, 1408–1443 (2019)

    Article  MathSciNet  Google Scholar 

  21. Lee, J.D., Sun, Y., Saunders, M.A.: Proximal Newton-type methods for minimizing composite functions. SIAM J. Optim. 24, 1420–1443 (2014)

    Article  MathSciNet  Google Scholar 

  22. Li, X., Sun, D., Toh, K.C.: A highly efficient semismooth Newton augmented Lagrangian method for solving Lasso problems. SIAM J. Optim. 28, 433–458 (2018)

    Article  MathSciNet  Google Scholar 

  23. Liu, T., Pong, T.K.: Further properties of the forward-backward envelope with applications to difference-of-convex programming. Comput. Optim. Appl. 67, 489–520 (2017)

    Article  MathSciNet  Google Scholar 

  24. Liu, T., Pong, T.K., Takeda, A.: A refined convergence analysis of pDCA\(_e\) with applications to simultaneous sparse recovery and outlier detection. Comput. Optim. Appl. 73, 69–100 (2019)

    Article  MathSciNet  Google Scholar 

  25. Liu, T., Pong, T.K., Takeda, A.: A successive difference-of-convex approximation method for a class of nonconvex nonsmooth optimization problems. Math. Program. 176, 339–367 (2019)

    Article  MathSciNet  Google Scholar 

  26. Luo, Z.Q., Tseng, P.: Error bound and convergence analysis of matrix splitting algorithms for the affine variational inequality problem. SIAM J. Optim. 2, 43–54 (1992)

    Article  MathSciNet  Google Scholar 

  27. Lou, Y., Yan, M.: Fast L\(_1\)-L\(_2\) minimization via a proximal operator. J. Sci. Comput. 74, 767–785 (2018)

    Article  MathSciNet  Google Scholar 

  28. Ma, T.H., Lou, Y., Huang, T.Z.: Truncated \(\ell _{1-2}\) models for sparse recovery and rank minimization. SIAM J. Imaging Sci. 10, 1346–1380 (2017)

    Article  MathSciNet  Google Scholar 

  29. Nakayama, S., Narushima, Y., Yabe, H.: Inexact proximal memoryless quasi-Newton methods based on the Broyden family for minimizing composite functions. Comput. Optim. Appl. 79, 127–154 (2021)

    Article  MathSciNet  Google Scholar 

  30. O’donoghue, B., Candès, E.J.: Adaptive restart for accelerated gradient schemes. J. Found. Comput. Math. 15, 715–732 (2015)

    Article  MathSciNet  Google Scholar 

  31. Peng, W., Zhang, H., Zhang, X., Cheng, L.: Global complexity analysis of inexact successive quadratic approximation methods for regularized optimization under mild assumptions. J. Glob. Optim. 78, 69–89 (2020)

    Article  MathSciNet  Google Scholar 

  32. Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis. Springer, Berlin (1998)

    Book  Google Scholar 

  33. Salzo, S.: The variable metric forward-backward splitting algorithm under mild differentiability assumptions. SIAM J. Optim. 27, 2153–2181 (2017)

    Article  MathSciNet  Google Scholar 

  34. Schmidt, M., Roux, N. L., Bach, F.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Advances in Neural Information Processing Systems, pp. 1458–1466 (2011)

  35. Scheinberg, K., Tang, X.: Practical inexact proximal quasi-Newton method with global complexity analysis. Math. Program. 160, 495–529 (2016)

    Article  MathSciNet  Google Scholar 

  36. Stella, L., Themelis, A., Patrinos, P.: Forward-backward quasi-Newton methods for nonsmooth optimization problems. Comput. Optim. Appl. 67, 443–487 (2017)

    Article  MathSciNet  Google Scholar 

  37. Tao, P.D., An, L.T.H.: Convex analysis approach to DC programming: theory, algorithms and applications. Acta Mathematica Vietnamica 22, 289–355 (1997)

    MathSciNet  MATH  Google Scholar 

  38. Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117, 387–423 (2009)

    Article  MathSciNet  Google Scholar 

  39. Wen, B., Chen, X., Pong, T.K.: A proximal difference-of-convex algorithm with extrapolation. Comput. Optim. Appl. 69, 297–324 (2018)

    Article  MathSciNet  Google Scholar 

  40. Wang, Y., Luo, Z., Zhang, X.: New improved penalty methods for sparse reconstruction based on difference of two norms. Available at researchgate. https://doi.org/10.13140/RG.2.1.3256.3369.

  41. Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57, 2479–2493 (2009)

    Article  MathSciNet  Google Scholar 

  42. Yang, L.: Proximal gradient method with extrapolation and line search for a class of nonconvex and nonsmooth problems. https://arxiv.org/abs/1711.06831

  43. Yin, P., Lou, Y., He, Q., Xin, J.: Minimization of \(\ell _{1-2}\) for compressed sensing. SIAM J. Sci. Comput. 37, A536–A563 (2015)

    Article  MathSciNet  Google Scholar 

  44. Yue, M.C., Zhou, Z., So, A.M.C.: A family of inexact SQA methods for non-smooth convex minimization with provable convergence guarantees based on the Luo-Tseng error bound property. Math. Program. 174, 327–358 (2019)

    Article  MathSciNet  Google Scholar 

  45. Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Lei Yang for his constructive suggestions during the writing of this paper. The research of the first author is supported in part by JSPS KAKENHI Grants No.19H04069. The research of the second author is supported in part by JSPS KAKENHI Grants No. 17H01699 and 19H04069.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tianxiang Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A Proof of Lemma 3.2

Proof

First, for any \(\alpha \in (0,\,1]\), we see from the L-smoothness of f and the convexity of g with \(\varvec{\xi }^{k+1}\in \partial g({\varvec{x}}^k)\) that

$$\begin{aligned} F({\varvec{x}}^k + \alpha \,\varvec{d}_k) - F({\varvec{x}}^k)&= f({\varvec{x}}^k + \alpha \,\varvec{d}_k) - f({\varvec{x}}^k) + h({\varvec{x}}^k + \alpha \,\varvec{d}_k) - h({\varvec{x}}^k) - \left( g({\varvec{x}}^k + \alpha \,\varvec{d}_k) - g({\varvec{x}}^k)\right) \nonumber \\&\le \alpha \, \nabla f({\varvec{x}}^k)^{\top }\varvec{d}_k + \frac{\alpha ^2L}{2}\left\| \varvec{d}_k\right\| ^2 + h({\varvec{x}}^k + \alpha \,\varvec{d}_k) - h({\varvec{x}}^k) - \alpha \, {\varvec{\xi }^{k+1}}^{\top }\varvec{d}_k \nonumber \\&= \frac{\alpha ^2L}{2}\left\| \varvec{d}_k\right\| ^2 + \alpha \left\langle \nabla f({\varvec{x}}^k) - \varvec{\xi }^{k +1},\, \varvec{d}_k\right\rangle + h({\varvec{x}}^k + \alpha \,\varvec{d}_k) - h({\varvec{x}}^k)\nonumber \\&= \frac{\alpha ^2L}{2}\left\| \varvec{d}_k\right\| ^2 + \triangle _k(\alpha ). \end{aligned}$$
(A.1)

On the other hand, the convexity of h and \(\alpha \in (0,\, 1]\) yield

$$\begin{aligned} \begin{aligned} \triangle _k(\alpha )&= \alpha \left\langle \nabla f({\varvec{x}}^k) - \varvec{\xi }^{k +1},\, \varvec{d}_k\right\rangle + h({\varvec{x}}^k + \alpha \,\varvec{d}_k) - h({\varvec{x}}^k)\\&= \alpha \left\langle \nabla f({\varvec{x}}^k) - \varvec{\xi }^{k +1},\, \varvec{d}_k\right\rangle + h\left( \alpha ({\varvec{x}}^k + \varvec{d}_k) + (1 - \alpha ){\varvec{x}}^k\right) - h({\varvec{x}}^k) \\&\le \alpha \left\langle \nabla f({\varvec{x}}^k) - \varvec{\xi }^{k +1},\, \varvec{d}_k\right\rangle + \alpha \, h({\varvec{x}}^k + \varvec{d}_k) + (1 - \alpha )h({\varvec{x}}^k) - h({\varvec{x}}^k)\\&= \alpha \left\langle \nabla f({\varvec{x}}^k) - \varvec{\xi }^{k +1},\, \varvec{d}_k\right\rangle + \alpha \, h(\varvec{u}^k) - \alpha \,h({\varvec{x}}^k) = \alpha \,\triangle _{k,1}. \end{aligned} \end{aligned}$$
(A.2)

To show the well-definedness of three termination criteria \((\text {LS}_1)\), \((\text {LS}_2)\) and \((\text {LS}_3)\), we see from (A.1) and (A.2) that it suffices to show that \(\triangle _{k,1}\) could be bounded by a proper multiple of \(\Vert \varvec{d}_k\Vert ^2\). To proceed, we first see from the convexity of h and \({\varvec{B}_{k}}\succ {\varvec{0}}\) that \(G_k\) is strongly convex with modulus \(\lambda _{\min }({\varvec{B}_{k}})\). On the other hand, we know from (3.3) and \(\varvec{d}_k = \varvec{u}^k - {\varvec{x}}^k\) that there exists some \(\varvec{w}_k \in \partial G_k(\varvec{u}^k)\) such that \(\Vert \varvec{w}_k\Vert \le \epsilon _k\,\Vert \varvec{d}_k\Vert \). This together with the strong convexity of \(G_k\) with modulus \(\lambda _{\min }({\varvec{B}_{k}})\) implies that

$$\begin{aligned} G_k(\varvec{u}^k) - G_k({\varvec{x}}^k) \le -\left\langle \varvec{w}_k,\, {\varvec{x}}^k - \varvec{u}^k\right\rangle - \frac{{\lambda _{\min }({\varvec{B}_{k}})}}{2}\Vert {\varvec{x}}^k - \varvec{u}^k\Vert ^2 \le \left( \epsilon _k - \frac{{\lambda _{\min }({\varvec{B}_{k}})}}{2}\right) \Vert \varvec{d}_k\Vert ^2. \end{aligned}$$
(A.3)

Moreover, we know from (A.3) and \(\varvec{d}_k = \varvec{u}^k - {\varvec{x}}^k\) that

$$\begin{aligned} \begin{aligned}&\left( \epsilon _k -\frac{{\lambda _{\min }({\varvec{B}_{k}})}}{2}\right) \Vert \varvec{d}_k\Vert ^2 \ge G_k(\varvec{u}^k) - G_k({\varvec{x}}^k)\\&= \left\langle \nabla f({\varvec{x}}^k) - \varvec{\xi }^{k +1},\, \varvec{d}_k\right\rangle + \frac{1}{2}(\varvec{u}^k - {\varvec{x}}^k)^\top {\varvec{B}_{k}}(\varvec{u}^k - {\varvec{x}}^k) + h(\varvec{u}^k) - h({\varvec{x}}^k)\\&= \triangle _{k,1} + \frac{1}{2}\varvec{d}_k^\top {\varvec{B}_{k}}\varvec{d}_k \ge \triangle _{k,1} + \frac{\lambda _{\min }({\varvec{B}_{k}})}{2}\Vert \varvec{d}_k\Vert ^2, \end{aligned} \end{aligned}$$

which implies that

$$\begin{aligned} \triangle _{k,1} \le \left( {\epsilon _k - \lambda _{\min }({\varvec{B}_{k}})}\right) \Vert \varvec{d}_k\Vert ^2. \end{aligned}$$
(A.4)

Now we are ready to prove the well-definedness of the three termination criteria. We first consider the termination criteria \((\text {LS}_1)\) and \((\text {LS}_2)\) when \({\lambda _{\min }({\varvec{B}_{k}}) - \epsilon _k > 0}\). For any \(\alpha \in (0,\, 1]\), we have

$$\begin{aligned} \begin{aligned}&F({\varvec{x}}^k + \alpha \,\varvec{d}_k) - \max _{[k - M]_+\le j \le k}F({\varvec{x}}^j) - \sigma \alpha \,\triangle _{k,1} \le F({\varvec{x}}^k + \alpha \,\varvec{d}_k) - F({\varvec{x}}^k) - \sigma \alpha \,\triangle _{k,1}\\&\le \frac{\alpha ^2L}{2}\left\| \varvec{d}_k\right\| ^2 + \alpha \,\triangle _{k,1} - \sigma \alpha \,\triangle _{k,1} \le \frac{L}{2}\alpha \left( \alpha - 2(1 - \sigma )\left( {\lambda _{\min }({\varvec{B}_{k}}) - \epsilon _k}\right) /L\right) \Vert \varvec{d}_k\Vert ^2, \end{aligned} \end{aligned}$$
(A.5)

where the second inequality follows from (A.1) and (A.2), and the last inequality follows from (A.4). Notice that \(\sigma \in (0,\, 1)\) and \({\lambda _{\min }({\varvec{B}_{k}}) - \epsilon _k > 0}\). We see from (A.5) that its left-hand side is nonpositive when \(\alpha \in (0,\, 2(1 - \sigma )\left( {\lambda _{\min }({\varvec{B}_{k}}) - \epsilon _k}\right) /L]\). This proves the well-definedness of \((\text {LS}_1)\). Similarly, we consider termination criterion \((\text {LS}_2)\). By (A.1), (A.2) and (A.4), we have

$$\begin{aligned} \begin{aligned}&F({\varvec{x}}^k + \alpha \,\varvec{d}_k) - \max _{[k - M]_+\le j \le k}F({\varvec{x}}^j) - \sigma \triangle _{k}(\alpha ) \le F({\varvec{x}}^k + \alpha \,\varvec{d}_k) - F({\varvec{x}}^k) - \sigma \triangle _{k}(\alpha )\\&\le \frac{\alpha ^2L}{2}\left\| \varvec{d}_k\right\| ^2 + \triangle _{k}(\alpha ) - \sigma \triangle _{k}(\alpha ) \le \frac{\alpha ^2L}{2}\left\| \varvec{d}_k\right\| ^2 + \alpha (1 - \sigma )\triangle _{k,1} \\&\le \frac{L}{2}\alpha \left( \alpha - 2(1 - \sigma )\left( {\lambda _{\min }({\varvec{B}_{k}}) - \epsilon _k}\right) /L\right) \Vert \varvec{d}_k\Vert ^2. \end{aligned} \end{aligned}$$
(A.6)

This proves the well-definedness of \((\text {LS}_2)\). Finally, we see from (A.1), (A.2) and (A.4) that

$$\begin{aligned} \begin{aligned}&F({\varvec{x}}^k + \alpha \,\varvec{d}_k) - \max _{[k - M]_+\le j \le k}F({\varvec{x}}^j) + \sigma \alpha \Vert \varvec{d}_k\Vert ^2 \le F({\varvec{x}}^k + \alpha \,\varvec{d}_k) - F({\varvec{x}}^k) + \sigma \alpha \Vert \varvec{d}_k\Vert ^2\\&\le \frac{\alpha ^2L}{2}\left\| \varvec{d}_k\right\| ^2 + \triangle _{k}(\alpha ) + \sigma \alpha \Vert \varvec{d}_k\Vert ^2 \le \frac{\alpha ^2L}{2}\left\| \varvec{d}_k\right\| ^2 + \alpha \left( {\epsilon _k - \lambda _{\min }({\varvec{B}_{k}})}\right) \Vert \varvec{d}_k\Vert ^2 + \sigma \alpha \Vert \varvec{d}_k\Vert ^2 \\&= \frac{L}{2}\alpha \left( \alpha - 2({\lambda _{\min }({\varvec{B}_{k}}) - \epsilon _k} - \sigma )/L\right) \Vert \varvec{d}_k\Vert ^2. \end{aligned} \end{aligned}$$
(A.7)

This proves the well-definedness of \((\text {LS}_3)\) when \({\lambda _{\min }({\varvec{B}_{k}}) - \epsilon _k} > \sigma \). Furthermore, we have (3.4) by noticing (A.5), (A.6), (A.7) and the line-search rule \(\alpha _k\in \left\{ \beta ^i: i = 0, 1, \ldots \right\} \). This completes the proof. \(\square \)

B Proof of Theorem 3.3

Proof

For simplicity of notation, we define

$$\begin{aligned} t(k): = \mathop {\text{arg}}\,{{\max}}_{[k - M]_+\le j \le k} F({\varvec{x}}^j). \end{aligned}$$
(B.1)

We know from (3.4) that in case of \((\text {LS}_1)\) or \((\text {LS}_2)\) with \(\delta > 0\) we have \(\alpha _k \ge c_1 > 0\) with \(c_1: = \min \left\{ 1,\, 2\beta (1 - \sigma )\delta /L\right\} \), and in case of \((\text {LS}_3)\) with \(\delta > \sigma \), we have \(\alpha _k \ge c_2 > 0\) with \(c_2: = \min \left\{ 1,\, 2\beta (\delta - \sigma )/L\right\} \). Furthermore, for the three line-search criteria we have

$$\begin{aligned} F({\varvec{x}}^{k+1}) = F({\varvec{x}}^k + \alpha _k\varvec{d}_k) \le F({\varvec{x}}^{t(k)}) + {\left\{ \begin{array}{ll} \sigma \alpha _k\triangle _{k,1} \overset{\text{(a)}}{\le }- \sigma c_1 \delta \Vert \varvec{d}_k\Vert ^2 &{} for\ (\text {LS}_1) ,\\ \sigma \triangle _{k}(\alpha _k) \overset{\text{(b)}}{\le }\sigma \alpha _k\triangle _{k,1} \le - \sigma c_1 \delta \Vert \varvec{d}_k\Vert ^2 &{} for\ (\text {LS}_2),\\ - \sigma \alpha _k\Vert \varvec{d}_k\Vert ^2 \overset{\text{(c)}}{\le }- \sigma c_2\Vert \varvec{d}_k\Vert ^2 &{} for\ (\text {LS}_3). \end{array}\right. } \end{aligned}$$

where (a) follows from (A.4), \({\delta =\inf _k\left( \lambda _{\min }({\varvec{B}_{k}}) - \epsilon _k\right) }\) and \(\alpha _k \ge c_1\) in case of \((\text {LS}_1)\) with \(\delta > 0\), (b) follows from (A.2) and (c) follows from \(\alpha _k\ge c_2\) in case of \((\text {LS}_3)\) with \(\delta > \sigma \). Consequently, for each line search there exists some \(c > 0\) such that

$$\begin{aligned} F({\varvec{x}}^{k+1}) \le F({\varvec{x}}^{t(k)}) - c\Vert \varvec{d}_k\Vert ^2. \end{aligned}$$
(B.2)

Next, we prove the three statements based on (B.2). First, we know from (B.2) that

$$\begin{aligned} F({\varvec{x}}^k) \le F({\varvec{x}}^0) < \infty , \end{aligned}$$

which together with the level-boundedness of F gives the boundedness of \(\{{\varvec{x}}^k\}\). This proves (i).

Next, we prove (ii). We see from (B.2) that sequence \(\{F({\varvec{x}}^{t(k)})\}\) is non-increasing:

$$\begin{aligned} \begin{aligned}&F\big ({\varvec{x}}^{t(k+1)}\big ) = \max _{[k+1-M]_+\le j \le k+1}F({\varvec{x}}^j) = \max \Big \{\max _{[k+1-M]_+\le j\le k}F({\varvec{x}}^j),\, F({\varvec{x}}^{k+1}) \Big \}\\ \le&\max \Big \{ F\big ({\varvec{x}}^{t(k)}\big ),\, F\big ({\varvec{x}}^{t(k)}\big ) - c\left\| \varvec{d}_k\right\| ^2\Big \} \le F\big ({\varvec{x}}^{t(k)}\big ). \end{aligned} \end{aligned}$$

This together with the level-boundedness of F implies that there exists some \(\bar{F}\) such that

$$\begin{aligned} \lim _{k\rightarrow \infty }F\big ({\varvec{x}}^{t(k)}\big ) = \bar{F}. \end{aligned}$$
(B.3)

Next, we prove that the following relationships hold for all \(j\ge 1\) by induction:

$$\begin{aligned}&\lim _{k\rightarrow \infty } \varvec{d}_{t(k)-j} = {\varvec{0}}, \end{aligned}$$
(B.4a)
$$\begin{aligned}&\lim _{k\rightarrow \infty } F\big ({\varvec{x}}^{t(k) - j}\big ) = \bar{F}. \end{aligned}$$
(B.4b)

We first prove that (B.4a) and (B.4b) hold for \(j = 1\). Replacing k in (B.2) by \(t(k) - 1\), we obtain

$$\begin{aligned} F\big ({\varvec{x}}^{t(k)}\big ) \le F\big ({\varvec{x}}^{t\left( t(k) - 1\right) }\big ) - c\left\| \varvec{d}_{t(k)-1}\right\| ^2. \end{aligned}$$
(B.5)

Rearranging (B.5) and letting \(k\rightarrow \infty \), using (B.3) and \(t(k)\rightarrow \infty \) while \(k\rightarrow \infty \) (when \(k\ge M\) we have \(t(k)\in [k - M,\, k]\)), we see that \(\lim _{k\rightarrow \infty }\varvec{d}_{t(k)-1} = {\varvec{0}}\). This proves that (B.4a) holds for \(j=1\). Furthermore, we see from (B.3) that

$$\begin{aligned} \bar{F} = \lim _{k\rightarrow \infty }F\big ({\varvec{x}}^{t(k)}\big ) = \lim _{k\rightarrow \infty }F\big ({\varvec{x}}^{t(k)-1} + \alpha _{t(k)-1}\varvec{d}_{t(k)-1}\big ) = \lim _{k\rightarrow \infty }F\big ({\varvec{x}}^{t(k)-1} \big ), \end{aligned}$$

where the last equality follows from \(\alpha _k\le 1\), \(\lim _{k\rightarrow \infty }\varvec{d}_{t(k)-1} = {\varvec{0}}\) and the uniform continuity of F on the closure of the sequence \(\{{\varvec{x}}^k\}\) (This is because h is continuous on its domain, \({\varvec{x}}^k\in {\text{dom}}\,h\) and sequence \(\{{\varvec{x}}^k\}\) is bounded). This proves that (B.4b) holds for \(j = 1\).

Now we assume that (B.4a) and (B.4b) hold for some \(J\ge 1\), i.e., \(\lim _{k\rightarrow \infty }\varvec{d}_{t(k) - J} = {\varvec{0}}\) and \(\lim _{k\rightarrow \infty }F\big ({\varvec{x}}^{t(k)- J}\big ) = \bar{F}\). Replacing k in (B.2) by \(t(k) - J -1\), we obtain

$$\begin{aligned} F\big ({\varvec{x}}^{t(k)-J}\big ) \le F\big ({\varvec{x}}^{t\left( t(k)- J - 1\right) }\big ) - c\left\| \varvec{d}_{t(k)-J - 1}\right\| ^2. \end{aligned}$$
(B.6)

Rearranging (B.6) and letting \(k\rightarrow \infty \), using assumption \(\lim _{k\rightarrow \infty }F\big ({\varvec{x}}^{t(k)- J}\big ) = \bar{F}\) and (B.3) with \(t(k) - J -1\rightarrow \infty \) while \(k\rightarrow \infty \) (when \(k\ge M\) we have \(t(k)\in [k - M,\, k]\)), we see that \(\lim _{k\rightarrow \infty }\varvec{d}_{t(k) - J-1} = {\varvec{0}}\). This proves that (B.4a) holds for \(j=J+1\). Similarly, we have

$$\begin{aligned} \bar{F} = \lim _{k\rightarrow \infty }F\big ({\varvec{x}}^{t(k)-J}\big ) = \lim _{k\rightarrow \infty }F\big ({\varvec{x}}^{t(k)-J -1} + \alpha _{t(k)-J -1}\varvec{d}_{t(k)-J -1}\big ) = \lim _{k\rightarrow \infty }F\big ({\varvec{x}}^{t(k)-J -1} \big ), \end{aligned}$$

which proves that (B.4b) holds for \(J+1\). This completes the induction.

Now we are ready to prove (ii). Note from (B.1) that when \(k\ge M\), we have \(k-M\le t(k) \le k\). Thus, for any k, we have \(k - M - 1 = t(k) - j_k\) for some \(j_k\in [1,\, M+1]\). Therefore, it follows from (B.4a) that

$$\begin{aligned} {\varvec{0}} = \lim _{k\rightarrow \infty }\varvec{d}_{t(k)-j_k} = \lim _{k\rightarrow \infty }\varvec{d}_{k-M-1} = \lim _{k\rightarrow \infty }\varvec{d}_k, \end{aligned}$$

which together with \({\varvec{x}}^{k+1} - {\varvec{x}}^k = \alpha _k\varvec{d}_k\) and \(\alpha _k\le 1\) proves (ii).

Finally, we prove (iii). Since \(\{{\varvec{x}}^k\}\) is bounded, there exists some convergence subsequence, say \(\{{\varvec{x}}^{k_j}\}\), which satisfies \(\lim _{j\rightarrow \infty }{\varvec{x}}^{k_j} = {\varvec{x}}^*\). On the other hand, since the set \(\partial G_k(\varvec{u}^k)\) is closed, we see from (3.3) that there exists some \(\varvec{w}_k\in {\text{dom}}\, h\) satisfying \(\Vert \varvec{w}_k\Vert \le \epsilon _k\Vert \varvec{u}^k - {\varvec{x}}^k\Vert \) and

$$\begin{aligned} \varvec{w}_k \in \partial G_k(\varvec{u}^k) = \nabla f({\varvec{x}}^k) - \varvec{\xi }^{k+1} + {\varvec{B}_{k}}(\varvec{u}^k - {\varvec{x}}^k) + \partial h(\varvec{u}^k). \end{aligned}$$

This combined with \(\varvec{d}_k = \varvec{u}^k - {\varvec{x}}^k\) further implies that

$$\begin{aligned} \varvec{w}_{k_j} - \varvec{B}_{k_j}\varvec{d}_{k_j} \in \nabla f({\varvec{x}}^{k_j}) - \varvec{\xi }^{k_j+1} + \partial h({\varvec{x}}^{k_j} + \varvec{d}_{k_j}). \end{aligned}$$
(B.7)

Due to \(\varvec{\xi }^{k+1}\in \partial g({\varvec{x}}^k)\), the boundedness of \(\{{\varvec{x}}^k\}\) and the convexity and continuity of g, we see that \(\{\varvec{\xi }^k\}\) is bounded. Thus, by passing to a further subsequence if necessary, without loss of generality, we assume that \(\varvec{\xi }^*:= \lim _{j\rightarrow \infty }\varvec{\xi }^{k_j+1}\) exists and thus \(\varvec{\xi }^*\in \partial g({\varvec{x}}^*)\) due to \(\varvec{\xi }^{k_j+1}\in \partial g({\varvec{x}}^{k_j})\) and the closedness of \(\partial g\). On the other hand, we see from the boundedness of \(\{{\varvec{B}_{k}}\}\) and the assumption \(\delta > 0\) that \(\{\epsilon _k\}\) is bounded, which further gives \(\Vert \varvec{w}_k\Vert \le \epsilon _k\Vert \varvec{u}^k - {\varvec{x}}^k\Vert = \epsilon _k\Vert \varvec{d}_k\Vert \rightarrow 0\). Now passing to the limit in (B.7) and using \(\Vert \varvec{w}_k\Vert \rightarrow 0\), \(\left\| \varvec{d}_k\right\| \rightarrow 0\), the boundedness of \(\{{\varvec{B}_{k}}\}\), the L-smoothness of f and the closedness of \(\partial h\), we see that

$$\begin{aligned} {\varvec{0}} \in \nabla f({\varvec{x}}^*) + \partial h({\varvec{x}}^*) - \partial g({\varvec{x}}^*). \end{aligned}$$

This proves (iii) and completes the proof. \(\square \)

C Proof of Lemma 3.6

Proof

Since \({\varvec{x}}_{\varvec{I}}^k\) is a global minimizer of the optimization problem in (3.6), we have

$$\begin{aligned} {\varvec{0}}\in \nabla f({\varvec{x}}^k) - \varvec{\xi }^{k +1} + {\varvec{x}}_{\varvec{I}}^k - {\varvec{x}}^k + \partial h({\varvec{x}}_{\varvec{I}}^k). \end{aligned}$$
(C.1)

If \({\varvec{x}}_{\varvec{I}}^k = {\varvec{x}}^k\), we see from (C.1) that \({\varvec{0}} \in \nabla f({\varvec{x}}^k) - \varvec{\xi }^{k +1} + \partial h({\varvec{x}}^k)\), which together with \(\varvec{\xi }^{k+1}\in \partial g({\varvec{x}}^k)\) proves that \({\varvec{x}}^k\) is a stationary point of (1.1). On the other hand, if \({\varvec{x}}^k\) is a stationary point of (1.1) and \(\partial g({\varvec{x}}^k)\) is a singleton, these together with \(\varvec{\xi }^{k+1}\in \partial g({\varvec{x}}^k)\) give

$$\begin{aligned} {\varvec{0}} \in \nabla f({\varvec{x}}^k) - \partial g({\varvec{x}}^k) + \partial h({\varvec{x}}^k) = \nabla f({\varvec{x}}^k) - \varvec{\xi }^{k +1} + \partial h({\varvec{x}}^k). \end{aligned}$$
(C.2)

Now, using the monotonicity of operator \(\partial h\) with (C.1) and (C.2), we further have

$$\begin{aligned} \langle {\varvec{x}}^k - {\varvec{x}}_{\varvec{I}}^k ,\, {\varvec{x}}_{\varvec{I}}^k - {\varvec{x}}^k\rangle \ge 0, \end{aligned}$$

which implies that \({\varvec{x}}_{\varvec{I}}^k = {\varvec{x}}^k\). This completes the proof. \(\square \)

D Proof of Theorem 4.2

Proof

First, we consider (FISTA). Since \({\varvec{B}_{k}}\succ {\varvec{0}}\), we know from the convexity of h that \(G_k(\cdot )\) is strongly convex with modulus \(\lambda _{\min }({\varvec{B}_{k}})\). We then further have

$$\begin{aligned} \frac{\lambda _{\min }(\varvec{B}_{k})}{2}\Vert {\varvec{z}}^{\ell } - {\varvec{z}}^*\Vert ^2 \le G_k({\varvec{z}}^{\ell }) - G_k({\varvec{z}}^*) \le \frac{2L_{\phi }}{(\ell + 1)^2}\Vert {\varvec{z}}^{0} - {\varvec{z}}^*\Vert ^2, \end{aligned}$$
(D.1)

where the last inequality follows from [8, Theorem 4.4]. Furthermore, inequality (D.1) together with the definition of \(c_1\) in (4.5) implies that

$$\begin{aligned} \Vert {\varvec{z}}^{\ell } - {\varvec{z}}^*\Vert \le \frac{2\sqrt{L_{\phi }}}{(\ell + 1)\sqrt{\lambda _{\min }({\varvec{B}_{k}})}}\Vert {\varvec{z}}^{0} - {\varvec{z}}^*\Vert = \frac{c_1}{\ell + 1} \Vert {\varvec{z}}^{0} - {\varvec{z}}^*\Vert . \end{aligned}$$
(D.2)

Furthermore, we have that for \(\ell \ge 2\),

$$\begin{aligned} \begin{aligned}&\Vert {\varvec{z}}^{\ell } - {\varvec{y}}^{\ell }\Vert = \big \Vert {\varvec{z}}^{\ell } - {\varvec{z}}^{\ell - 1} - \frac{\theta _{\ell - 1} - 1}{\theta _{\ell }}({\varvec{z}}^{\ell - 1} - {\varvec{z}}^{\ell -2})\big \Vert \le \Vert {\varvec{z}}^{\ell } - {\varvec{z}}^{\ell - 1}\Vert + \Vert {\varvec{z}}^{\ell - 1} - {\varvec{z}}^{\ell - 2}\Vert \\&\le \left\| {\varvec{z}}^{\ell } - {\varvec{z}}^*\right\| + 2\left\| {\varvec{z}}^{\ell - 1} - {\varvec{z}}^*\right\| + \left\| {\varvec{z}}^{\ell - 2} - {\varvec{z}}^*\right\| \le \frac{4c_1}{\ell - 1} \Vert {\varvec{z}}^{0} - {\varvec{z}}^*\Vert , \end{aligned} \end{aligned}$$
(D.3)

where the first equality follows from the \({\varvec{y}}\)-update in (FISTA) and the last inequality follows from (D.2). Notice that \({\varvec{z}}^0 = {\varvec{x}}^k\ne {\varvec{z}}^*\). Using (D.2) and (D.3), we further have for \(\ell \ge \max \{2,\, c_1\}\) that

$$\begin{aligned} \frac{\Vert {\varvec{z}}^{\ell } - {\varvec{y}}^{\ell }\Vert }{\Vert {\varvec{z}}^{\ell } - {\varvec{z}}^0\Vert } \le \frac{4c_1}{\ell - 1}\frac{\Vert {\varvec{z}}^0 - {\varvec{z}}^*\Vert }{\Vert {\varvec{z}}^{0} - {\varvec{z}}^*\Vert - \Vert {\varvec{z}}^{\ell } - {\varvec{z}}^*\Vert } \le \frac{4c_1(\ell + 1)}{(\ell - 1)(\ell + 1 - c_1)}. \end{aligned}$$
(D.4)

Then the termination criterion (4.2) is satisfied whenever the right-hand side of (D.4) is upper bounded by \(\frac{\epsilon _k}{2L_{\phi }}\), which by calculus further gives (4.6).

Now we consider (V-FISTA). Similarly, the strong convexity of \(G_k(\cdot )\) implies that

$$\begin{aligned} \begin{aligned}&\lambda _{\min }({\varvec{B}_{k}})\left\| {\varvec{z}}^{\ell } - {\varvec{z}}^*\right\| ^2/2 \le G_k({\varvec{z}}^{\ell }) - G_k({\varvec{z}}^*) \\&\le \bigg (1 - \frac{1}{\sqrt{\kappa }} \bigg )^{\ell }\bigg ( G_k({\varvec{z}}^0) - G_k({\varvec{z}}^*) + \frac{\lambda _{\min }({\varvec{B}_{k}})}{2}\left\| {\varvec{z}}^0 - {\varvec{z}}^*\right\| ^2\bigg ) = c_2^2\Big (\frac{1}{\tau }\Big )^{2\ell }\lambda _{\min }({\varvec{B}_{k}})/2, \end{aligned} \end{aligned}$$
(D.5)

where the second inequality follows from [2, Theorem 10.42] and the last equality follows from the definition of \(\tau \) and \(c_2\) in (4.5). We then see from (D.5) that

$$\begin{aligned} \left\| {\varvec{z}}^{\ell } - {\varvec{z}}^*\right\| \le \frac{c_2}{\tau ^{\ell }}. \end{aligned}$$
(D.6)

This together with the \({\varvec{y}}\)-update in (V-FISTA) that for \(\ell \ge 2\),

$$\begin{aligned} \begin{aligned}&\left\| {\varvec{z}}^{\ell } - {\varvec{y}}^{\ell }\right\| = \Big \Vert {\varvec{z}}^{\ell } - {\varvec{z}}^{\ell - 1} - \frac{\sqrt{\kappa } - 1}{\sqrt{\kappa } + 1}({\varvec{z}}^{\ell - 1} - {\varvec{z}}^{\ell - 2})\Big \Vert \le \left\| {\varvec{z}}^{\ell } - {\varvec{z}}^{\ell - 1}\right\| + \left\| {\varvec{z}}^{\ell - 1} - {\varvec{z}}^{\ell -2}\right\| \\&\le \left\| {\varvec{z}}^{\ell } - {\varvec{z}}^*\right\| + 2\left\| {\varvec{z}}^{\ell - 1} - {\varvec{z}}^*\right\| + \left\| {\varvec{z}}^{\ell - 2} - {\varvec{z}}^*\right\| \le c_2\,\left( \frac{1}{\tau ^{\ell }} + \frac{2}{\tau ^{\ell - 1}} + \frac{1}{\tau ^{\ell -2}}\right) \le \frac{4c_2}{\tau ^{\ell -2}}. \end{aligned} \end{aligned}$$
(D.7)

Since \({\varvec{z}}^0 \ne {\varvec{z}}^*\), we use (D.6) and have that for \(\ell \ge 1+ {\log}_{\tau }\frac{c_2}{\Vert {\varvec{z}}^0 - {\varvec{z}}^*\Vert }\),

$$\begin{aligned} \begin{aligned} \left\| {\varvec{z}}^{\ell } - {\varvec{z}}^0\right\|&\ge \left\| {\varvec{z}}^0 - {\varvec{z}}^*\right\| - \left\| {\varvec{z}}^{\ell } - {\varvec{z}}^*\right\| \ge \left\| {\varvec{z}}^0 - {\varvec{z}}^*\right\| - \frac{c_2}{\tau ^{\ell }} > 0. \end{aligned} \end{aligned}$$

Using this and (D.7), we have for any \(\ell \ge \max \{2,\, 1+ {\log}_{\tau }\frac{c_2}{\Vert {\varvec{z}}^0 - {\varvec{z}}^*\Vert }\}\) that

$$\begin{aligned} \frac{\left\| {\varvec{z}}^{\ell } - {\varvec{y}}^{\ell }\right\| }{\left\| {\varvec{z}}^{\ell } - {\varvec{z}}^0\right\| } \le \frac{4c_2/\tau ^{\ell - 2}}{\left\| {\varvec{z}}^0 - {\varvec{z}}^*\right\| - c_2/\tau ^{\ell }} = \frac{4c_2\tau ^2}{\tau ^{\ell }\Vert {\varvec{z}}^0 - {\varvec{z}}^*\Vert - c_2}. \end{aligned}$$
(D.8)

Then the termination criterion (4.2) is satisfied whenever the right-hand side of (D.8) is upper bounded by \(\frac{\epsilon _k}{2L_{\phi }}\), which by calculus further gives (4.7). This completes the proof. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, T., Takeda, A. An inexact successive quadratic approximation method for a class of difference-of-convex optimization problems. Comput Optim Appl 82, 141–173 (2022). https://doi.org/10.1007/s10589-022-00357-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-022-00357-z

Keywords

Navigation