Skip to main content
Log in

A Greedy Newton-Type Method for Multiple Sparse Constraint Problem

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

With the development of science and technology, we can get many groups of data for the same object. There is a certain relationship with each other or structure between these data or within the data. To characterize the structure of the data in different datasets, in this paper, we propose a multiple sparse constraint problem (MSCP) to process the problem with multiblock sparse structure. We give three types of stationary points and present the relationships among the three types of stationary points and the global/local minimizers. Then we design a gradient projection Newton algorithm, which is proven to enjoy the global and quadratic convergence property. Finally, some numerical experiments of different examples illustrate the efficiency of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Agarwal, A., Negahban, S., Wainwright, M.: Fast global convergence of gradient methods for high-dimensional statistical recovery. Ann. Stat. 40, 2452–2482 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bahmani, S., Raj, B., Boufounos, P.: Greedy sparsity-constrained optimization. J. Mach. Learn. Res. 14, 807–841 (2013)

    MathSciNet  MATH  Google Scholar 

  3. Beck, A., Eldar, Y.: Sparsity constrained nonlinear optimization: optimality conditions and algorithms. SIAM. J. Optim. 23, 1480–1509 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  4. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM. J. Imaging. Sci. 2, 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  5. Berg, E., Friedlander, M.P.: Probing the Pareto frontier for basis pursuit solutions. SIAM J. Sci. Comput. 31, 890–912 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  6. Jalali, A., Johnson, C., Ravikumar, P.: On learning discrete graphical models using Greedy methods. Adv. Neural Inf. Process. Syst. 24, 1935–1943 (2011). (Granada, Spain)

    Google Scholar 

  7. Jiao, Y., Jin, B., Lu, X.: Group sparse recovery via the \(\ell ^0(\ell ^2)\) penalty: theory and algorithm. IEEE Trans. Signal Process. 65, 998–1012 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  8. Moré, J., Sorensen, D.: Computing a trust region step. SIAM J. Sci. Stat. Comput. 4, 553–572 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  9. Pan, L., Chen, X.: Group sparse optimization for images recovery using capped folded concave functions. SIAM J. Imaging. Sci. 14, 1–25 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  10. Pan, L., Xiu, N., Zhou, S.: On solutions of sparsity constrained optimization. J. Oper. Res. Soc. China 3, 421–439 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  11. Pan, L., Zhou, S., Xiu, N., Qi, H.: A convergent iterative hard thresholding for sparsity and nonnegativity constrained optimization. Pac. J. Optim. 33, 325–353 (2017)

    MATH  Google Scholar 

  12. Rockafellar, R., Wets, R.: Variational Analysis. Springer, Berlin (2009)

    MATH  Google Scholar 

  13. Shalev-Shwartz, S., Srebro, N., Zhang, T.: Trading accuracy for sparsity in optimization problems with sparsity constraints. SIAM. J. Optim. 20, 2807–2832 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  14. She, Y., Wang, Z., Shen, J.: Gaining outlier resistance with progressive quantiles: fast algorithms and theoretical studies. J. Am. Stat. Assoc. 117, 1282–1295 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  15. Sun, J., Kong, L., Zhou, S.: Gradient projection Newton algorithm for sparse collaborative learning using synthetic and real datasets of applications. J. Comput. Appl. Math. 422, 114872 (2023)

    Article  MathSciNet  MATH  Google Scholar 

  16. Thompson, P., Martin, N., Wright, M.: Imaging genomics. Curr. Opin. Neurol. 23, 368–373 (2010)

    Article  Google Scholar 

  17. Visscher, P., Brown, M., Mccarthy, M., Yang, J.: Five years of GWAS discovery. Am. J. Hum. Genet. 90, 7–24 (2012)

    Article  Google Scholar 

  18. Wang, R., Xiu, N., Zhang, C.: Greedy projected gradient-Newton method for sparse logistic regression. IEEE Trans Neural Netw. Learn. Syst. 31, 527–538 (2020)

    Article  MathSciNet  Google Scholar 

  19. Wang, S., Yehya, N., Schadt, E., Wang, H., Drake, T., Lusis, A.: Genetic and genomic analysis of a fat mass trait with complex inheritance reveals marked sex specificity. PLoS Genet. 2, 148–159 (2006)

    Article  Google Scholar 

  20. Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. B. 68, 49–67 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  21. Zhang, H., Wang, F., Xu, H., Liu, Y., Liu, J., Zhao, H., Gelernter, J.: Differentially co-expressed genes in postmortem prefrontal cortex of individuals with alcohol use disorders: influence on alcohol metabolism-related pathways. Hum. Genet. 133, 1383–1394 (2014)

    Article  Google Scholar 

  22. Zhou, S.: Gradient projection Newton pursuit for sparsity constrained optimization. Appl. Comput. Harmon. Anal. 61, 75–100 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  23. Zhou, S., Luo, Z., Xiu, N.: Computing one-bit compressive sensing via double-sparsity constrained optimization. IEEE Trans. Signal. Proces. 70, 1593–1608 (2022)

    Article  MathSciNet  Google Scholar 

  24. Zhou, S., Xiu, N., Qi, H.: Global and quadratic convergence of Newton hard-thresholding pursuit. J. Mach. Learn. Res. 22, 1–45 (2021)

    MathSciNet  MATH  Google Scholar 

  25. Zille, P., Calhoun, V., Wang, Y.: Enforcing co-expression within a brain-imaging genomics regression framework. IEEE Trans. Med. Imaging 37, 2561–2571 (2018)

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the Associate Editor and anonymous referees for their helpful suggestions. This work was funded by the National Natural Science Foundation of China (12071022), Beijing Natural Science Foundation (Z190002) and Natural Science Foundation of Shandong Province (ZR2018MA019).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lingchen Kong.

Additional information

Communicated by Sebastian U. Stich.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

1.1 A. Proof of Theorem 3.1

Necessity. Based on [12, Theorem 6.12], a local minimizer \(\mathbf{x}^{*}\) of the problem (1) must satisfy that \(- \nabla f(\mathbf{x}^{*}) \in {\mathcal {N}}_{\Sigma }(\mathbf{x}^{*}) = {\mathcal {N}}_{\Sigma _1}(\mathbf{x}^{*}_1) \times {\mathcal {N}}_{\Sigma _2}(\mathbf{x}^{*}_2)\times \cdots \times {\mathcal {N}}_{\Sigma _2}(\mathbf{x}^{*}_m),\) where \({\mathcal {N}}_{\Sigma }(\mathbf{x}^{*})\) is the normal cone of \(\Sigma \) at \(\mathbf{x}^{*}\) and the equality is by [12, Theorem 6.41]. Then the explicit expression (see [10]) of the normal cone \({\mathcal {N}}_{\Sigma _i}(\mathbf{x}^{*}_i)\) enable us to derive (12) immediately.

Sufficiency. Let \(\mathbf{x}^{*}\) satisfy (12). The convexity of f leads to

$$\begin{aligned} \begin{array}{cl} f(\mathbf{x})\ge &{}f(\mathbf{x}^{*})+\langle \nabla _{1} f(\mathbf{x}^{*}), \mathbf{x}_{1}-\mathbf{x}_{1}^{*}\rangle +\langle \nabla _{2} f(\mathbf{x}^{*}), \mathbf{x}_{2}-\mathbf{x}_{2}^{*}\rangle \\ &{}+\cdots +\langle \nabla _{m} f(\mathbf{x}^{*}), \mathbf{x}_{m}-\mathbf{x}_{m}^{*}\rangle . \end{array} \end{aligned}$$
(25)

If there is a \(\delta >0\) such that for any \(\mathbf{x}\in \Sigma \cap N(\mathbf{x}^*,\delta )\), we have

$$\begin{aligned} \langle \nabla _{1}f(\mathbf{x}^{*}),\mathbf{x}_{1}-\mathbf{x}_{1}^{*}\rangle =\langle \nabla _{2}f(\mathbf{x}^{*}),\mathbf{x}_{2}-\mathbf{x}_{2}^{*}\rangle =\cdots =\langle \nabla _{m}f(\mathbf{x}^{*}),\mathbf{x}_{m}-\mathbf{x}_{m}^{*}\rangle =0, \end{aligned}$$
(26)

then the conclusion can be made immediately. Therefore, we next to show (26). In fact, by (12), we note that \(\nabla _{i}f(\mathbf{x}^{*})=0\) if \(\Vert \mathbf{x}_{i}^{*}\Vert _{0}<s_{i}\), which indicates it suffices to consider the worst case of \(\Vert \mathbf{x}_{i}^{*}\Vert _{0}=s_{i},i=1,2,\ldots ,m\). Under such a case, we define

$$\begin{aligned} \delta := \min _{i=1,2,\ldots ,m} \min \limits _{j \in \Gamma (\mathbf{x}_{i}^{*})} \mid (\mathbf{x}_{i}^{*})_j\mid . \end{aligned}$$

Then for any \(\mathbf{x}\in N(\mathbf{x}^{*},\delta )\cap \Sigma \), we have

$$\begin{aligned} \mid (\mathbf{x}_{i})_j\mid= & {} \mid (\mathbf{x}_{i})_j^*- (\mathbf{x}_{i})_j^*+(\mathbf{x}_{i})_j\mid \\\ge & {} \mid (\mathbf{x}_{i})_j^*\mid -\mid (\mathbf{x}_{i})_j^*-(\mathbf{x}_{i})_j\mid \\\ge & {} \mid (\mathbf{x}_{i})_j^*\mid -\Vert \mathbf{x}_{i}^*-\mathbf{x}_{i}\Vert \\> & {} \mid (\mathbf{x}_{i})_j^*\mid -\delta \\\ge & {} 0. \end{aligned}$$

This indicates that \(\Gamma (\mathbf{x}_{i}^{*})\subseteq \Gamma (\mathbf{x}_{i})\), which by \(\Vert \mathbf{x}_{i}\Vert _{0}\le s_{i}=\Vert \mathbf{x}_{i}^{*}\Vert _0=\mid \Gamma (\mathbf{x}_{i}^{*})\mid \) yields

$$\begin{aligned} \Gamma (\mathbf{x}_{i}^{*})= \Gamma (\mathbf{x}_{i}), i=1,2,\ldots ,m,~\forall ~\mathbf{x}\in N(\mathbf{x}^{*},\delta )\cap \Sigma . \end{aligned}$$

Using the above fact and (12) derive that

$$\begin{aligned} \langle \nabla _{i}f(\mathbf{x}^{*}),\mathbf{x}_{i}-\mathbf{x}_{i}^{*}\rangle =\langle (\nabla _{i}f(\mathbf{x}^{*}))_{\Gamma (\mathbf{x}_{i}^{*})},(\mathbf{x}_{i}-\mathbf{x}_{i}^{*})_{\Gamma (\mathbf{x}_{i}^{*})}\rangle =0. \end{aligned}$$
(27)

The prove is completed.

1.2 B. Proof of Lemma 3.1 and Theorem 3.2

Suppose that \(\mathbf{x}^{*}\) is \(\alpha \)-stationary point. If \(j\in \Gamma (\mathbf{x}_{i}^{*})\), according to \(\mathbf{x}_{i}^{*} \in \Pi _{\Sigma _i}(\mathbf{x}_{i}^{*}-\alpha \nabla _{i} f(\mathbf{x}^{*}))\), we have \((\mathbf{x}_{i}^{*})_{j}=(\mathbf{x}_{i}^{*})_{j}-(\alpha \nabla _{i} f(\mathbf{x}^{*}))_{j}\), so that \((\nabla _{i} f(\mathbf{x}^{*}))_{j}=0\). If \(j\in {\overline{\Gamma }}(\mathbf{x}_{i}^{*})\), then \(\mid (\mathbf{x}_{i}^{*})_{j}-(\alpha \nabla _{i} f(\mathbf{x}^{*}))_{j}\mid \le (\mathbf{x}_{i}^{*})^\downarrow _{s_{i}}\), which combined with the fact that \((\mathbf{x}_{i}^{*})_{j}=0\) implies that \(\mid \alpha \nabla _{i} f(\mathbf{x}^{*})\mid _{j}\le (\mathbf{x}_{i}^{*})^\downarrow _{s_{i}}\).

Suppose that \(\mathbf{x}^{*}\) satisfies (13). If \(\Vert \mathbf{x}_{i}^{*}\Vert _{0}< s_{i}\), then \((\mathbf{x}_{i}^{*})^\downarrow _{s_{i}}=0\). It follows that \((\nabla _{i} f(\mathbf{x}^{*}))_{j}=0\), in this case \(\Pi _{\Sigma _i}\left( \mathbf{x}_{i}^{*}-\alpha \nabla _{i} f\left( \mathbf{x}^{*}\right) \right) =\Pi _{\Sigma _i}\left( \mathbf{x}_{i}^{*}\right) \) is the set \(\{\mathbf{x}_{i}^{*}\}\). If \(\Vert \mathbf{x}_{i}^{*}\Vert _{0}= s_{i}\), then \((\mathbf{x}_{i}^{*})^\downarrow _{s_{i}}\ne 0\). We have

$$\begin{aligned} (\mathbf{x}_{i}^{*}-\alpha \nabla _{i} f\left( \mathbf{x}^{*}\right) )_{j} {\left\{ \begin{array}{ll} =(\mathbf{x}_{i}^{*})_{j},~~~~~~~~~~j\in \Gamma (\mathbf{x}_{i}^{*}),\\ \le (\mathbf{x}_{i}^{*})^\downarrow _{s_{i}}, ~~~~~~~~~j\in {\overline{\Gamma }}(\mathbf{x}_{i}^{*}).\\ \end{array}\right. } \end{aligned}$$

Therefore, the vector \(\mathbf{x}_{i}^{*}-\alpha \nabla _{i} f\left( \mathbf{x}^{*}\right) \) contains the \(s_{i}\) components of \(\mathbf{x}_{i}^{*}\) with the largest absolute value and all other components are smaller or equal to them, so that \(\mathbf{x}_{i}^{*} \in \Pi _{\Sigma _i}(\mathbf{x}_{i}^{*}-\alpha \nabla _{i} f\left( \mathbf{x}^{*}\right) )\).

1.3 C. Proof of Lemma 3.2 and Theorem 3.3

Suppose \(\mathbf{x}^{*}\) is B-stationary point, that is \(\nabla ^{B}_{ \Sigma _i}f(\mathbf{x}^{*})=\arg \min \{\Vert d_{i}+\nabla _{i}f(\mathbf{x}^{*})\Vert : d_{i}\in T_{ \Sigma _i}^{B}(\mathbf{x}_{i}^{*})\}\).

If \(\Vert \mathbf{x}_{i}^{*}\Vert _{0}= s_{i}\), we have

$$\begin{aligned} \begin{aligned} \nabla ^{B}_{ \Sigma _i}f(\mathbf{x}^{*})&=\arg \min \{\Vert d_{i}+\nabla _{i}f(\mathbf{x}^{*})\Vert : d_{i}\in T_{ \Sigma _i}^{B}(\mathbf{x}_{i}^{*})\}\\&=\arg \min \{\Vert d_{i}+\nabla _{i}f(\mathbf{x}^{*})\Vert : \Vert d_{i}\Vert _{0}\le s_{i},~\Vert \mathbf{x}_{i}^{*}+\nu d_{i}\Vert _{0}\le s_{i},~\forall ~ \nu \in \mathbb {R}\}\\&=\arg \min \{\Vert d_{i}+\nabla _{i}f(\mathbf{x}^{*})\Vert : \Gamma (d_{i})\subseteq \Gamma (\mathbf{x}_{i}^{*})\}. \end{aligned} \end{aligned}$$

The above formula is equivalent to

$$\begin{aligned} \nabla ^{B}_{ \Sigma _i}f(\mathbf{x}^{*})_{j}= {\left\{ \begin{array}{ll} -(\nabla _{i}f(\mathbf{x}^{*}))_{j},~~&{}j\in \Gamma (\mathbf{x}_{i}^{*}),\\ 0,~~&{}j\in {\overline{\Gamma }}(\mathbf{x}_{i}^{*}).\\ \end{array}\right. } \end{aligned}$$

Then we have

$$\begin{aligned} \nabla ^{B}_{ \Sigma _i}f(\mathbf{x}^{*})=0\Longleftrightarrow ((\nabla _{i}f(\mathbf{x}^{*}))_{j} {\left\{ \begin{array}{ll} =0,~~j\in \Gamma (\mathbf{x}_{i}^{*}),\\ \in \mathbb {R}, ~~j\in {\overline{\Gamma }}(\mathbf{x}_{i}^{*}).\\ \end{array}\right. } \end{aligned}$$

Next we prove \(\nabla ^{B}_{ \Sigma _i}f(\mathbf{x}^{*})=0\Longleftrightarrow \nabla _{i}f(\mathbf{x}^{*})=0\) when \(\Vert \mathbf{x}_{i}^{*}\Vert _{0}< s_{i}\). On the one hand, if \(\nabla _{i}f(\mathbf{x}^{*})=0\), then

$$\begin{aligned} \begin{aligned} \nabla ^{B}_{ \Sigma _i}f(\mathbf{x}^{*})&=\arg \min \{\Vert d_{i}+\nabla _{i}f(\mathbf{x}^{*})\Vert : d_{i}\in T_{ \Sigma _i}^{B}(\mathbf{x}_{i}^{*})\}\\&=\arg \min \{\Vert d_{i}+\nabla _{i}f(\mathbf{x}^{*})\Vert : \Vert d_{i}\Vert _{0}\le s_{i},~\Vert \mathbf{x}_{i}^{*}+\nu d_{i}\Vert _{0}\le s_{i},~\forall ~ \nu \in \mathbb {R}\}\\&=\arg \min \{\Vert d_{i}\Vert : \Vert d_{i}\Vert _{0}\le s_{i},~\Vert \mathbf{x}_{i}^{*}+\nu d_{i}\Vert _{0}\le s_{i},~\forall ~ \nu \in \mathbb {R}\}=0. \end{aligned} \end{aligned}$$

On the other hand, if \(\nabla ^{B}_{ \Sigma _i}f(\mathbf{x}^{*})=0\), then

$$\begin{aligned} 0=\nabla ^{B}_{ \Sigma _i}f(\mathbf{x}^{*})=\arg \min \{\Vert d_{i}+\nabla _{i}f(\mathbf{x}^{*})\Vert : \Vert d_{i}\Vert _{0}\le s_{i},~\Vert \mathbf{x}_{i}^{*}+\nu d_{i}\Vert _{0}\le s_{i},~\forall ~ \nu \in \mathbb {R}\}, \end{aligned}$$

which leads to \(\Vert \nabla _{i}f(\mathbf{x}^{*})\Vert \le \Vert d_{i}+\nabla _{i}f(\mathbf{x}^{*})\Vert \) for any \(\Vert d_{i}\Vert _{0}\le s_{i},~\Vert \mathbf{x}_{i}^{*}+\nu d_{i}\Vert _{0}\le s_{i},~\forall ~\nu \in \mathbb {R}\). Particularly, for any \(j_{0}\in \{1,2,\ldots ,p_{i}\}\), we take \(d_{i}\) with \(\Gamma (d_{i})=\{j_{0}\}\). Apparently, \(\Vert \mathbf{x}_{i}^{*}+\nu d_{i}\Vert _{0}\le s_{i},~\forall ~\nu \in \mathbb {R}\). Then by valuing \(d_{j_{0}}=-(\nabla _{i}f(\mathbf{x}^{*}))_{j_{0}}\) and \(d_{j}=0,j\ne j_{0}\), we can get \((\nabla _{i}f(\mathbf{x}^{*}))_{j_{0}}=0\) because \(\Vert \nabla _{i}f(\mathbf{x}^{*})\Vert \le \Vert \nabla _{i}f(\mathbf{x}^{*})-(\nabla _{i}f(\mathbf{x}^{*}))_{j_{0}}\Vert \). By the arbitrariness of \(j_{0}\), we get \(\nabla _{i}f(\mathbf{x}^{*})=0\).

Since the conditions (14) is equivalent to (12), the B-stationary point is equivalent to the local minimizer. The proof of Theorem 3.3 is completed.

1.4 D. Proof of Lemma 3.3 and Theorem 3.4

If \((\mathbf{x}^{*})\) is a C-stationary point, then \(\nabla ^{C}_{ \Sigma _i}f(\mathbf{x}^{*})=\arg \min \{\Vert d_{i}+\nabla _{i}f(\mathbf{x}^{*})\Vert : d_{i}\in T_{ \Sigma _i}^{C}(\mathbf{x}_{i}^{*})\}\). So

$$\begin{aligned} \begin{aligned} \nabla ^{C}_{ \Sigma _i}f(\mathbf{x}^{*})&=\arg \min \{\Vert d_{i}+\nabla _{i}f(\mathbf{x}^{*})\Vert : d_{i}\in T_{ \Sigma _i}^{C}(\mathbf{x}_{i}^{*})\}\\&=\arg \min \{\Vert d_{i}+\nabla _{i}f(\mathbf{x}^{*})\Vert : \Gamma (d_{i})\subseteq \Gamma (\mathbf{x}_{i}^{*})\}, \end{aligned} \end{aligned}$$

which is equivalent to

$$\begin{aligned} (\nabla ^{C}_{ \Sigma _i}f(\mathbf{x}^{*}))_{j}= {\left\{ \begin{array}{ll} -(\nabla _{i}f(\mathbf{x}^{*}))_{j},~~&{}j\in \Gamma (\mathbf{x}_{i}^{*}),\\ 0,~~&{}j\in {\overline{\Gamma }}(\mathbf{x}_{i}^{*}).\\ \end{array}\right. } \end{aligned}$$

Therefore, if \(\mathbf{x}^{*}\) is a C-stationary point, then

$$\begin{aligned} \nabla ^{C}_{ \Sigma _i}f(\mathbf{x}^{*})=0\Longleftrightarrow (\nabla _{i}f(\mathbf{x}^{*}))_{j} {\left\{ \begin{array}{ll} =0,~~j\in \Gamma (\mathbf{x}_{i}^{*}),\\ \in \mathbb {R}, ~~j\in {\overline{\Gamma }}(\mathbf{x}_{i}^{*}).\\ \end{array}\right. } \end{aligned}$$

To sum up, we get the desired results.

If \(\Vert \mathbf{x}_{i}^{*}\Vert _{0}=s_{i},i=1,2,\ldots ,m\), the conditions (15) imply (12) and a point satisfying (12) is a local minimizer by Theorem 3.1, so a C-stationary point of (1) is a local minimizer. Suppose \(\mathbf{x}^{*}\) is a local minimizer, then it satisfies (12) which is a special case of (15), that is to say, it is a C-stationary point. The proof of Theorem 3.4 is completed.

1.5 E. Proof of Theorem 3.6

The original problem (1) can be written as

$$\begin{aligned} \begin{aligned} \min _{|T_i| =s_{i},i=1,2,\ldots ,m}\left\{ \min _{\mathbf{x}_{1},\ldots ,\mathbf{x}_{m}}~ f(\mathbf{x}_{1},\ldots ,\mathbf{x}_{m}),~~ \mathrm{s.t.}~ \mathbf{x}_{1}\in \mathbb {R}^{p_{1}}_{T_1},\ldots ,\mathbf{x}_{m}\in \mathbb {R}^{p_{m}}_{T_m}\right\} . \end{aligned} \end{aligned}$$
(28)

Because f is a strongly convex on \(\mathbb {R}^{p_{1}}_{T_1}\times \cdots \mathbb {R}^{p_{m}}_{T_m}\). Moreover, for any given \(|T_i|=s_{i},~i=1,2,\ldots ,m\), the inner problem of (28) is also a strongly convex program. Therefore the inner program admits a unique global minimizer denoted by \(\mathbf{x}_{i}^*(T_i),~i=1,2,\ldots ,m\). Note that \(T_i\subseteq [p_i],~i=1,2,\ldots ,m\). Thus there are finitely many \(T_i\) such that \(|T_i|=s_{i}\) and so are the inner programs. This indicates that \(\mathbf{x}_{i}^*(T_i)\) is finitely many. To derive the global minimizer of (28), we only pick one \(\mathbf{x}_{i}^*(T_i)\) that makes the objective function value of (28) minimal. Global minimizers exist.

We next show that any local minimizer \(\mathbf{x}^*\) is unique. To proceed with that, denote \(\delta :=\min \limits _{i=1,2,\ldots ,m}\{\delta _i\}\) where

$$\begin{aligned} \begin{array}{cll} \delta _i&{}:=&{} {\left\{ \begin{array}{ll} +\infty , &{} \mathbf{x}_{i}^* =0,\\ \min _{j\in \Gamma (\mathbf{x}_{i}^*)} \mid (\mathbf{x}_{i}^*)_j\mid , &{} \mathbf{x}_{i}^* \ne 0,\ \end{array}\right. }~~~~i=1,2,\ldots ,m. \end{array} \end{aligned}$$
(29)

Clearly, \(\delta _i\) and hence \(\delta >0\). Then, similar reasoning allows us to derive (26) for any \(\mathbf{x}\in \Sigma \cap N(\mathbf{x}^*,\delta )\). This and f being a strongly convex lead to

$$\begin{aligned} f(\mathbf{x})\ge f(\mathbf{x}^{*}) + (l_f/2) \Vert \mathbf{x}-\mathbf{x}^{*}\Vert ^2. \end{aligned}$$
(30)

The above condition indicates \(\mathbf{x}^{*}\) is the unique global minimizer of the problem \(\min \{f(\mathbf{x}):\mathbf{x}\in \Sigma \cap N(\mathbf{x}^*,\delta )\}\), namely, \(\mathbf{x}^{*}\) is the unique local minimizer of (1).

Appendix B

1.1 A. Proof of Lemma 4.1

(1) It follows from (16) that \(\mathbf{x}^k(\alpha ) \in \Pi _{\Sigma }(\mathbf{x}^k -\alpha \nabla f(\mathbf{x}^k))\) and thus

$$\begin{aligned} \Vert \mathbf{x}^k(\alpha )-( \mathbf{x}^k - \alpha \nabla f(\mathbf{x}^k))\Vert ^2 \le \Vert \mathbf{x}^k -( \mathbf{x}^k - \alpha \nabla f(\mathbf{x}^k))\Vert ^2, \end{aligned}$$

which results in

$$\begin{aligned} 2\alpha \langle \nabla f(\mathbf{x}^k), \mathbf{x}^k(\alpha )- \mathbf{x}^k\rangle \le - \Vert \mathbf{x}^k(\alpha )- \mathbf{x}^k\Vert ^2. \end{aligned}$$
(31)

This and the strong smoothness of f with the constant \(L_f\) derive that

$$\begin{aligned} \begin{array}{lll} f(\mathbf{x}^k(\alpha )) &{}\le &{} f(\mathbf{x}^k)+ \langle \nabla f(\mathbf{x}^k),\mathbf{x}^k(\alpha )-\mathbf{x}^k\rangle +(L_{f}/2)\Vert \mathbf{x}^k(\alpha )-\mathbf{x}^k\Vert ^2\\ &{} {\le }&{} f(\mathbf{x}^k)- ( {1}/{(2\alpha )} -(L_{f}/2) )\Vert \mathbf{x}^k(\alpha )-\mathbf{x}^k\Vert ^2\\ &{}\le &{} f(\mathbf{x}^k) - (\sigma /2)\Vert \mathbf{x}^k(\alpha )-\mathbf{x}^k\Vert ^2, \end{array} \end{aligned}$$

where the last inequality is from \(0<\alpha \le 1/(\sigma +L_{f})\). Invoking the Armijo-type step size rule, one has \(\alpha _k\ge \gamma /(\sigma +L_{f})\), which by \(\alpha _k\le 1\) proves the desired assertion.

(2) By (23) and \(\mathbf{u}^k=\mathbf{x}^k(\alpha _k)\), we have

$$\begin{aligned} f(\mathbf{u}^k)\le & {} f(\mathbf{x}^k) - (\sigma /2)\Vert \mathbf{u}^k-\mathbf{x}^k\Vert ^2. \end{aligned}$$
(32)

By the framework of Algorithm 1, if \(\mathbf{x}^{k+1}=\mathbf{u}^k\), then the above condition implies,

$$\begin{aligned} f(\mathbf{x}^{k+1})\le & {} f(\mathbf{x}^k) - (\sigma /2)\Vert \mathbf{x}^{k+1}-\mathbf{x}^k\Vert ^2. \end{aligned}$$
(33)

If \(\mathbf{x}^{k+1}=\mathbf{v}^k\), then we obtain

$$\begin{aligned} \begin{array}{lll} f(\mathbf{x}^{k+1})= f(\mathbf{v}^k)&{}\le &{}f(\mathbf{u}^k) - (\sigma /2)\Vert \mathbf{x}^{k+1} -\mathbf{u}^k\Vert ^2 \\ &{}\le &{} f(\mathbf{x}^k) - (\sigma /2)\Vert \mathbf{u}^k-\mathbf{x}^k\Vert ^2- (\sigma /2)\Vert \mathbf{x}^{k+1}-\mathbf{u}^k\Vert ^2\\ &{}\le &{} f(\mathbf{x}^k) -(\sigma /4)\Vert \mathbf{x}^{k+1}-\mathbf{x}^k\Vert ^2, \end{array} \end{aligned}$$
(34)

where the second and last inequalities used (32) and a fact \(\Vert \mathbf{a}+\mathbf{b}\Vert ^2\le 2\Vert \mathbf{a}\Vert ^2+2\Vert \mathbf{b}\Vert ^2\) for all vectors \(\mathbf{a}\) and \(\mathbf{b}\). Both cases lead to

$$\begin{aligned} \begin{array}{lll} f(\mathbf{x}^{k+1})&{}\le &{} f(\mathbf{x}^k) - (\sigma /4)\Vert \mathbf{x}^{k+1}-\mathbf{x}^k\Vert ^2,\\ f(\mathbf{x}^{k+1})&{}\le &{} f(\mathbf{x}^k) - (\sigma /2)\Vert \mathbf{u}^k-\mathbf{x}^k\Vert ^2. \end{array} \end{aligned}$$
(35)

Therefore, \(\{f(\mathbf{x}^k)\}\) is a non-increasing sequence, which with (35) and \(f \ge 0\) yields

$$\begin{aligned}&\sum _{k\ge 0} \max \{(\sigma /4)\Vert \mathbf{x}^{k+1}-\mathbf{x}^k\Vert ^2, (\sigma /2)\Vert \mathbf{u}^k-\mathbf{x}^k\Vert ^2\}\\&\quad \le \sum _{k\ge 0} \left[ f(\mathbf{x}^k) - f(\mathbf{x}^{k+1})\right] = f(\mathbf{x}^0) - \lim _{k\rightarrow \infty } f(\mathbf{x}^{k+1})\le f(\mathbf{x}^0). \end{aligned}$$

The above condition suffices to \(\lim _{k\rightarrow \infty }\Vert \mathbf{x}^{k+1}-\mathbf{x}^k\Vert =\lim _{k\rightarrow \infty }\Vert \mathbf{u}^k-\mathbf{x}^k\Vert =0.\)

(3) Let \(\mathbf{x}^*\) be any accumulating point of \(\{\mathbf{x}^k\}\). Then there exists a subset M of \(\{0,1,2,\ldots \}\) such that \(\lim _{ k (\in M)\rightarrow \infty } \mathbf{x}^k = \mathbf{x}^*.\) This further implies \(\lim _{ k (\in M)\rightarrow \infty } \mathbf{u}^k = \mathbf{x}^*\) by applying 2). In addition, as stated in 1), we have \(\{\alpha _k\}\subseteq [ {\underline{\alpha }}, 1]\), which indicates that one can find a subsequence K of M and a scalar \(\alpha _*\in [ {\underline{\alpha }}, 1]\) such that \(\{\alpha _k: k \in K\}\rightarrow \alpha _*\). Overall, we have

$$\begin{aligned} \lim _{ k (\in K)\rightarrow \infty }\mathbf{x}^k =\lim _{ k (\in K)\rightarrow \infty }\mathbf{u}^k = \mathbf{x}^*, ~~~~\lim _{ k (\in K)\rightarrow \infty } \alpha _k =\alpha _* \in [\underline{\alpha },1].\end{aligned}$$
(36)

Let \({\varvec{\eta }}^k :=\mathbf{x}^k -\alpha _k \nabla f(\mathbf{x}^k)\). The framework of Algorithm 1 implies

$$\begin{aligned} \mathbf{u}^k \in \Pi _{\Sigma }( {\varvec{\eta }}^k ),~~~~ \lim _{ k (\in K)\rightarrow \infty } {\varvec{\eta }}^k =\mathbf{x}^* -\alpha _* \nabla f(\mathbf{x}^*)=:{\varvec{\eta }}^*. \end{aligned}$$
(37)

The first condition means \(\mathbf{u}^k \in \Sigma \) for any \( k \ge 1\). Note that \(\Sigma \) is closed and \(\mathbf{x}^*\) is the accumulating point of \(\{\mathbf{u}^k\}\) by (36). Therefore, \(\mathbf{x}^*\in \Sigma \), which results in

$$\begin{aligned} \min _{\mathbf{x}\in \Sigma }\Vert \mathbf{x}-{\varvec{\eta }}^*\Vert \le \Vert \mathbf{x}^*-{{\varvec{\eta }}}^*\Vert . \end{aligned}$$
(38)

If the strict inequality holds in the above condition, then there is an \( \varepsilon _0>0\) such that

$$\begin{aligned} \Vert \mathbf{x}^*-{\varvec{\eta }}^*\Vert -\varepsilon _0= & {} \min _{\mathbf{x}\in \Sigma }\Vert \mathbf{x}-{\varvec{\eta }}^*\Vert \\\ge & {} \min _{\mathbf{x}\in \Sigma } (\Vert \mathbf{x}- {\varvec{\eta }}^k \Vert -\Vert {\varvec{\eta }}^k- {\varvec{\eta }}^*\Vert )\\= & {} \Vert \mathbf{u}^k - {\varvec{\eta }}^k \Vert -\Vert {\varvec{\eta }}^k -{\varvec{\eta }}^*\Vert , \end{aligned}$$

where the last equality is from (37). Taking the limit of both sides of the above condition along \( k (\in K)\rightarrow \infty \) yields \(\Vert \mathbf{x}^*- {\varvec{\eta }}^*\Vert -\varepsilon _0 \ge \Vert \mathbf{x}^* - {\varvec{\eta }}^*\Vert \) by (36) and (37), a contradiction with \( \varepsilon _0>0\). Therefore, we must have the equality holds in (38), showing that

$$\begin{aligned} \mathbf{x}^* \in \Pi _ \Sigma ({\varvec{\eta }}^*)= \Pi _\Sigma \left( \mathbf{x}^* - \alpha _* \nabla f(\mathbf{x}^*)\right) . \end{aligned}$$

The above relation means the conditions in (13) hold for \(\alpha =\alpha _*\), then these conditions must hold for any \(0<\alpha \le {\underline{\alpha }}\) due to \({\underline{\alpha }}\le \alpha _*\) from (36), namely,

$$\begin{aligned} \mathbf{x}^* \in \Pi _\Sigma \left( \mathbf{x}^* - \alpha \nabla f(\mathbf{x}^*)\right) , \end{aligned}$$

displaying that \(\mathbf{x}^*\) is an \(\alpha \)-stationary point of (1), as desired.

1.2 B. Proof of Theorem 4.1

As shown in Lemma 4.1, one can find a subsequence of \(\{\mathbf{x}^ k \}\) that converges to the \(\alpha \)-stationary point \(\mathbf{x}^*\) with \(0<\alpha \le {\underline{\alpha }}\) of (1). Recall that an \(\alpha \)-stationary point \(\mathbf{x}^*\) is also a local minimizer by Theorem 3.2, which indicates that \(\mathbf{x}^*\) is unique because f(x) is restricted strong convex. In other words, \(\mathbf{x}^*\) is an isolated local minimizer of (1). Finally, it follows from \(\mathbf{x}^*\) being isolated, [8, Lemma 4.10] and \(\lim _{ k \rightarrow \infty }\Vert \mathbf{x}^{ k +1}-\mathbf{x}^ k \Vert =0\) by Lemma 4.1 that the whole sequence converges to the unique local minimizer \(\mathbf{x}^*\).

1.3 C. Proof of Lemma 4.2

1) If \(\Vert \mathbf{x}_i^*\Vert _0=s_i\), then by \(\mathbf{x}_i^k\rightarrow \mathbf{x}_i^*, \mathbf{u}_i^k\rightarrow \mathbf{x}_i^*\) and \(\Vert \mathbf{x}_i^k\Vert _0\le s_i,\Vert \mathbf{u}_i^k\Vert _0\le s_i\), we must have \(\Gamma (\mathbf{x}_i^*) \equiv \Gamma (\mathbf{x}_i^k)\equiv \Gamma (\mathbf{u}_i^k)\) for sufficiently large k. If \(\Vert \mathbf{x}_i^*\Vert _0<s_i\), similar reasoning allows for deriving \(\Gamma (\mathbf{x}_i^*) \subseteq \Gamma (\mathbf{x}_i^k) \) and \( \Gamma (\mathbf{x}_i^*) \subseteq \Gamma (\mathbf{u}_i^k)\).

2) According to Theorem 4.1, the limit point \(\mathbf{x}^*\) is a local minimizer of the problem (1). Therefore, it satisfies (12) from Theorem 3.1. We first conclude that one of the conditions in (17) must be satisfied for a large enough k. In fact, there are three cases for \(\mathbf{x}^*\), each of which can imply one condition in (17) as follows.

$$\begin{aligned} \begin{array}{lll} \text {Case 1)}&{}~\Vert \mathbf{x}_i^*\Vert _0=s_i,~i=1,2,\ldots ,m&{}\quad \Longrightarrow ~~~~\text {Cond 1)},\\ \text {Case 2)}&{}~\Vert \mathbf{x}_i^*\Vert _0<s_i,~i\in I_{1},~\Vert \mathbf{x}_i^*\Vert _0=s_i,~i\in I_{2}&{}\quad \Longrightarrow ~~~~\text {Cond 2)},\\ \text {Case 3)}&{}~\Vert \mathbf{x}_i^*\Vert _0<s_i,~i=1,2,\ldots ,m&{}\quad \Longrightarrow ~~~~\text {Cond 3)}.\\ \end{array} \end{aligned}$$
(39)

We now prove them one by one. From the Lipschitz continuity of \(\nabla f\), we have

$$\begin{aligned} \begin{array}{llll} &{}&{}\max \{\Vert \nabla _i f(\mathbf{u}^k)- \nabla _i f(\mathbf{x}^*)\Vert ,\Vert (\nabla f(\mathbf{u}^{k}))_{\Gamma _k}-(\nabla f(\mathbf{x}^*))_{\Gamma _k}\}\\ &{}&{}\quad \le \Vert \nabla f(\mathbf{u}^k)- \nabla f(\mathbf{x}^*)\Vert \le L_f\Vert \mathbf{u}^k - \mathbf{x}^* \Vert . \end{array} \end{aligned}$$
(40)

The relation of Case 1) \(\Rightarrow \) Cond 1) can be launched by (24) immediately. For Case 2), we have \(\Gamma (\mathbf{x}_i^k)\equiv \Gamma (\mathbf{u}_i^k),~i\in I_{2}\) by (24) and

$$\begin{aligned} \begin{array}{llll} \Vert \nabla _i f(\mathbf{u}^k)\Vert &{}=&{} \Vert \nabla _i f(\mathbf{u}^k)- \nabla _i f(\mathbf{x}^*)\Vert ,~i\in I_1,&{}~~(\text {by }(12))\\ &{}\le &{}L_f\Vert \mathbf{u}^k - \mathbf{x}^* \Vert &{} ~~(\text {by }(40))\\ &{}\le &{} \epsilon .&{} ~~(\text {by } \mathbf{u}^k \rightarrow \mathbf{x}^*) \end{array} \end{aligned}$$
(41)

Therefore, Case 2) \(\Rightarrow \) Cond 2). Similarly, we can show the last relation.

Next, since f(x) is strongly convex, \(H^k_{\Gamma _{k}\Gamma _{k}}\) is non-singular, which means that the equations (19) are solvable. Finally, we show the inequality (20) is true when \(\sigma \in (0,l_f/2)\). In fact, the conditions (24) and (12) enable to derive

$$\begin{aligned} (\nabla f(\mathbf{x}^*))_{\Gamma _k}=0, \end{aligned}$$
(42)

for sufficiently large k. Then it follows from (19) that

$$\begin{aligned} \begin{array}{llll} \Vert \mathbf{v}^k-\mathbf{u}^k\Vert &{}=&{} \Vert \mathbf{v}^{k}_{\Gamma _k}-\mathbf{u}^{k}_{\Gamma _k}\Vert &{}~~(\text {by }(21))\\ &{}=&{} \Vert (H^k_{\Gamma _{k}\Gamma _{k}})^{-1}(\nabla f(\mathbf{u}^{k}))_{\Gamma _k}\Vert &{}~~(\text {by } (19))\\ &{}\le &{}(1/l_f)\Vert (\nabla f(\mathbf{u}^{k}))_{\Gamma _k}\Vert &{}~~(\text {by }(7))\\ &{}=&{}(1/l_f)\Vert (\nabla f(\mathbf{u}^{k}))_{\Gamma _k}-(\nabla f(\mathbf{x}^*))_{\Gamma _k}\Vert &{}~~(\text {by }(42))\\ &{}\le &{}(L_f/l_f) \Vert \mathbf{u}^{k}-\mathbf{x}^*\Vert \rightarrow 0.&{} ~~(\text {by } (40)) \end{array} \end{aligned}$$

The above condition indicates that \(\Vert \mathbf{v}^k-\mathbf{u}^k\Vert \rightarrow 0\), resulting in

$$\begin{aligned} o(\Vert \mathbf{v}^k-\mathbf{u}^k\Vert ^2) \le (l_f/{4}) \Vert \mathbf{v}^k-\mathbf{u}^k \Vert ^2, \end{aligned}$$
(43)

for sufficiently large k. We have the following chain of inequalities,

$$\begin{aligned} \begin{array}{llll} 2f(\mathbf{v}^k)-2f(\mathbf{u}^k)&{}=&{} 2\langle \nabla f(\mathbf{u}^k),\mathbf{v}^k-\mathbf{u}^k \rangle +2o(\Vert \mathbf{v}^k-\mathbf{u}^k\Vert ^2)\\ &{}&{} \quad + \langle \nabla ^2f(\mathbf{u}^k)(\mathbf{v}^k-\mathbf{u}^k),\mathbf{v}^k-\mathbf{u}^k \rangle ~~~~(\text {by Taylor expansion})\\ &{}=&{} 2\langle (\nabla f(\mathbf{u}^k))_{\Gamma _k},(\mathbf{v}^k-\mathbf{u}^k)_{\Gamma _k} \rangle +2o(\Vert \mathbf{v}^k-\mathbf{u}^k\Vert ^2)\\ &{}&{} \quad \langle H^k_{\Gamma _{k}\Gamma _{k}}(\mathbf{v}^k-\mathbf{u}^k)_{\Gamma _k},(\mathbf{v}^k-\mathbf{u}^k)_{\Gamma _k} \rangle ~~~~ (\text {by }(21))\\ &{}=&{} - \langle H^k_{\Gamma _{k}\Gamma _{k}}(\mathbf{v}^k-\mathbf{u}^k)_{\Gamma _k},(\mathbf{v}^k-\mathbf{u}^k)_{\Gamma _k} \rangle + 2o(\Vert \mathbf{v}^k-\mathbf{u}^k\Vert ^2) ~~~~(\text {by } (19))\\ &{}\le &{} - l_f \Vert (\mathbf{v}^k-\mathbf{u}^k)_{\Gamma _k} \Vert ^2 +o(\Vert \mathbf{v}^k-\mathbf{u}^k\Vert ^2) ~~~~(\text {by }(7))\\ &{}=&{} - l_f \Vert \mathbf{v}^k-\mathbf{u}^k \Vert ^2 + 2 o(\Vert \mathbf{v}^k-\mathbf{u}^k\Vert ^2)~~~~(\text {by } (21))\\ &{}\le &{} -(l_f/{2}) \Vert \mathbf{v}^k-\mathbf{u}^k \Vert ^2~~~~(\text {by }(43)) \\ &{}\le &{} -\sigma \Vert \mathbf{v}^k-\mathbf{u}^k \Vert ^2.~~~~(\text {by } \sigma \in (0,l_f/2)) \end{array} \end{aligned}$$

In general, for sufficiently large k, Newton steps can always be obtained.

1.4 D. Proof of Theorem 4.2

We first estimate \(\Vert \mathbf{u}^{k}-\mathbf{x}^*\Vert \). Recalling (16) that

$$\begin{aligned} {\mathbf{u}^k=\mathbf{x}^k(\alpha _k) \in \Pi _{\Sigma }(\mathbf{x}^{k}- \alpha _k \nabla f(\mathbf{x}^k)), } \end{aligned}$$

and \(\Gamma _k=\Gamma (\mathbf{u}^k)\), we have

$$\begin{aligned} \mathbf{u}^k_{\Gamma _k}=\mathbf{x}^{k}_{\Gamma _{k}}-\alpha _k (\nabla f(\mathbf{x}^{k}))_{\Gamma _k},~~~~ \mathbf{u}^k_{{{\overline{\Gamma }}}_k}=0. \end{aligned}$$

This enables us to deliver that

$$\begin{aligned} \begin{array}{llll} \Vert \mathbf{u}^{k}-\mathbf{x}^*\Vert &{}=&{} \Vert \mathbf{x}^{k}_{\Gamma _{k}}-\alpha _k (\nabla f(\mathbf{x}^{k}))_{\Gamma _k} - \mathbf{x}^*_{\Gamma _{k}}\Vert ~~~~(\text {by } \mathbf{u}^k_{{{\overline{\Gamma }}}_k}=\mathbf{x}^*_{{{\overline{\Gamma }}}_k}=0 \text { from }(24))\\ &{}=&{} \Vert \mathbf{x}^{k}_{\Gamma _{k}}-\alpha _k (\nabla f(\mathbf{x}^{k}))_{\Gamma _k} - \mathbf{x}^*_{\Gamma _{k}}-\alpha _k(\nabla f(\mathbf{x}^{*}))_{\Gamma _k})\Vert ~~~~(\text {by }(42))\\ &{}\le &{} \Vert \mathbf{x}^{k}_{\Gamma _{k}}- \mathbf{x}^*_{\Gamma _{k}}\Vert +\alpha _k \Vert (\nabla f(\mathbf{x}^{k}))_{\Gamma _k} - (\nabla f(\mathbf{x}^{*}))_{\Gamma _k})\Vert \\ &{}\le &{} (1+L_f)\Vert \mathbf{x}^{k} - \mathbf{x}^* \Vert . ~~~~~ (\text {by }0< \alpha _k\le 1\hbox { and }(40)) \end{array} \end{aligned}$$
(44)

By Lemma 4.2 2), the Newton step is always admitted for sufficiently large k. Then direct calculations lead the following chain of inequalities,

$$\begin{aligned} \begin{array}{llll} &{}&{} \Vert \mathbf{x}^{k+1}-\mathbf{x}^*\Vert = \Vert \mathbf{v}^{k}-\mathbf{x}^*\Vert \\ &{}&{}\quad = \Vert \mathbf{v}^{k}_{\Gamma _{k}}-\mathbf{x}^*_{\Gamma _{k}}\Vert ~~~~~~~~(\text {by } \mathbf{v}^k_{{{\overline{\Gamma }}}_k}=\mathbf{x}^*_{{{\overline{\Gamma }}}_k}=0 \text { from }(24))\\ &{}&{}\quad = \Vert \mathbf{u}^{k}_{\Gamma _{k}}-\mathbf{x}^*_{\Gamma _{k}} - (H^k_{\Gamma _{k}\Gamma _{k}})^{-1} (\nabla f(\mathbf{u}^{k}))_{\Gamma _k}\Vert &{}~~(\text {by }(19))\\ &{}&{}\quad = \Vert \mathbf{u}^{k}_{\Gamma _{k}}-\mathbf{x}^*_{\Gamma _{k}} - (H^k_{\Gamma _{k}\Gamma _{k}})^{-1} ((\nabla f(\mathbf{u}^{k}))_{\Gamma _k}-(\nabla f(\mathbf{x}^{*}))_{\Gamma _k})\Vert &{}~~(\text {by }(42))\\ &{}&{}\quad \le (1/ l_f) \Vert H^k_{\Gamma _{k}\Gamma _{k}}(\mathbf{u}^{k}_{\Gamma _{k}}-\mathbf{x}^*_{\Gamma _{k}} )- ((\nabla f(\mathbf{u}^{k}))_{\Gamma _k}-(\nabla f(\mathbf{x}^{*}))_{\Gamma _k})\Vert &{}~~(\text {by }(7) )\\ &{}&{}\quad \le (1/ l_f) \Vert \nabla ^2 f(\mathbf{u}^k)(\mathbf{u}^{k}-\mathbf{x}^* )- (\nabla f(\mathbf{u}^{k})-\nabla f(\mathbf{x}^{*}))\Vert &{} \\ &{}&{}\quad =(1/ l_f) \Vert \int _0^1 (\nabla ^2 f(\mathbf{u}^*+t(\mathbf{u}^{k}-\mathbf{x}^*))-\nabla ^2 f(\mathbf{u}^k)) (\mathbf{u}^{k}-\mathbf{x}^* )dt\Vert &{} \\ &{}&{}\quad \le (1/ l_f) \int _0^1 \Vert \nabla ^2 f(\mathbf{u}^*+t(\mathbf{u}^{k}-\mathbf{x}^*))-\nabla ^2 f(\mathbf{u}^k)\Vert \Vert \mathbf{u}^{k}-\mathbf{x}^*\Vert dt\\ &{}&{}\quad \le (1/ l_f) \int _0^1 C_f \Vert \mathbf{u}^*+t(\mathbf{u}^{k}-\mathbf{x}^*) - \mathbf{u}^{k}\Vert \Vert \mathbf{u}^{k}-\mathbf{x}^*\Vert dt &{}~~(\text {by }(9)) \\ &{}&{}\quad \le (C_f/ l_f) \Vert \mathbf{u}^{k}-\mathbf{x}^*\Vert ^2 \int _0^1 (1-t)dt\\ &{}&{}\quad =(C_f /(2l_f))\Vert \mathbf{u}^{k}-\mathbf{x}^*\Vert ^2, \end{array} \end{aligned}$$

which combining (44) can make the conclusion immediately.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, J., Kong, L. & Qu, B. A Greedy Newton-Type Method for Multiple Sparse Constraint Problem. J Optim Theory Appl 196, 829–854 (2023). https://doi.org/10.1007/s10957-022-02156-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-022-02156-2

Keywords

Mathematics Subject Classification

Navigation