A Greedy Newton-Type Method for Multiple Sparse Constraint Problem

Sun, Jun; Kong, Lingchen; Qu, Biao

doi:10.1007/s10957-022-02156-2

A Greedy Newton-Type Method for Multiple Sparse Constraint Problem

Published: 13 January 2023

Volume 196, pages 829–854, (2023)
Cite this article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

465 Accesses
1 Altmetric
Explore all metrics

Abstract

With the development of science and technology, we can get many groups of data for the same object. There is a certain relationship with each other or structure between these data or within the data. To characterize the structure of the data in different datasets, in this paper, we propose a multiple sparse constraint problem (MSCP) to process the problem with multiblock sparse structure. We give three types of stationary points and present the relationships among the three types of stationary points and the global/local minimizers. Then we design a gradient projection Newton algorithm, which is proven to enjoy the global and quadratic convergence property. Finally, some numerical experiments of different examples illustrate the efficiency of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Newton method for the composite row sparsity regularized optimization

Article 02 December 2024

A comparative study of multi-objective optimization algorithms for sparse signal reconstruction

Article 06 October 2021

A New Sufficient Condition for Sparse Recovery with Multiple Orthogonal Least Squares

Article 21 April 2022

References

Agarwal, A., Negahban, S., Wainwright, M.: Fast global convergence of gradient methods for high-dimensional statistical recovery. Ann. Stat. 40, 2452–2482 (2012)
Article MathSciNet MATH Google Scholar
Bahmani, S., Raj, B., Boufounos, P.: Greedy sparsity-constrained optimization. J. Mach. Learn. Res. 14, 807–841 (2013)
MathSciNet MATH Google Scholar
Beck, A., Eldar, Y.: Sparsity constrained nonlinear optimization: optimality conditions and algorithms. SIAM. J. Optim. 23, 1480–1509 (2013)
Article MathSciNet MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM. J. Imaging. Sci. 2, 183–202 (2009)
Article MathSciNet MATH Google Scholar
Berg, E., Friedlander, M.P.: Probing the Pareto frontier for basis pursuit solutions. SIAM J. Sci. Comput. 31, 890–912 (2008)
Article MathSciNet MATH Google Scholar
Jalali, A., Johnson, C., Ravikumar, P.: On learning discrete graphical models using Greedy methods. Adv. Neural Inf. Process. Syst. 24, 1935–1943 (2011). (Granada, Spain)
Google Scholar
Jiao, Y., Jin, B., Lu, X.: Group sparse recovery via the $\ell ^0(\ell ^2)$ penalty: theory and algorithm. IEEE Trans. Signal Process. 65, 998–1012 (2017)
Article MathSciNet MATH Google Scholar
Moré, J., Sorensen, D.: Computing a trust region step. SIAM J. Sci. Stat. Comput. 4, 553–572 (1983)
Article MathSciNet MATH Google Scholar
Pan, L., Chen, X.: Group sparse optimization for images recovery using capped folded concave functions. SIAM J. Imaging. Sci. 14, 1–25 (2021)
Article MathSciNet MATH Google Scholar
Pan, L., Xiu, N., Zhou, S.: On solutions of sparsity constrained optimization. J. Oper. Res. Soc. China 3, 421–439 (2017)
Article MathSciNet MATH Google Scholar
Pan, L., Zhou, S., Xiu, N., Qi, H.: A convergent iterative hard thresholding for sparsity and nonnegativity constrained optimization. Pac. J. Optim. 33, 325–353 (2017)
MATH Google Scholar
Rockafellar, R., Wets, R.: Variational Analysis. Springer, Berlin (2009)
MATH Google Scholar
Shalev-Shwartz, S., Srebro, N., Zhang, T.: Trading accuracy for sparsity in optimization problems with sparsity constraints. SIAM. J. Optim. 20, 2807–2832 (2010)
Article MathSciNet MATH Google Scholar
She, Y., Wang, Z., Shen, J.: Gaining outlier resistance with progressive quantiles: fast algorithms and theoretical studies. J. Am. Stat. Assoc. 117, 1282–1295 (2021)
Article MathSciNet MATH Google Scholar
Sun, J., Kong, L., Zhou, S.: Gradient projection Newton algorithm for sparse collaborative learning using synthetic and real datasets of applications. J. Comput. Appl. Math. 422, 114872 (2023)
Article MathSciNet MATH Google Scholar
Thompson, P., Martin, N., Wright, M.: Imaging genomics. Curr. Opin. Neurol. 23, 368–373 (2010)
Article Google Scholar
Visscher, P., Brown, M., Mccarthy, M., Yang, J.: Five years of GWAS discovery. Am. J. Hum. Genet. 90, 7–24 (2012)
Article Google Scholar
Wang, R., Xiu, N., Zhang, C.: Greedy projected gradient-Newton method for sparse logistic regression. IEEE Trans Neural Netw. Learn. Syst. 31, 527–538 (2020)
Article MathSciNet Google Scholar
Wang, S., Yehya, N., Schadt, E., Wang, H., Drake, T., Lusis, A.: Genetic and genomic analysis of a fat mass trait with complex inheritance reveals marked sex specificity. PLoS Genet. 2, 148–159 (2006)
Article Google Scholar
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. B. 68, 49–67 (2006)
Article MathSciNet MATH Google Scholar
Zhang, H., Wang, F., Xu, H., Liu, Y., Liu, J., Zhao, H., Gelernter, J.: Differentially co-expressed genes in postmortem prefrontal cortex of individuals with alcohol use disorders: influence on alcohol metabolism-related pathways. Hum. Genet. 133, 1383–1394 (2014)
Article Google Scholar
Zhou, S.: Gradient projection Newton pursuit for sparsity constrained optimization. Appl. Comput. Harmon. Anal. 61, 75–100 (2022)
Article MathSciNet MATH Google Scholar
Zhou, S., Luo, Z., Xiu, N.: Computing one-bit compressive sensing via double-sparsity constrained optimization. IEEE Trans. Signal. Proces. 70, 1593–1608 (2022)
Article MathSciNet Google Scholar
Zhou, S., Xiu, N., Qi, H.: Global and quadratic convergence of Newton hard-thresholding pursuit. J. Mach. Learn. Res. 22, 1–45 (2021)
MathSciNet MATH Google Scholar
Zille, P., Calhoun, V., Wang, Y.: Enforcing co-expression within a brain-imaging genomics regression framework. IEEE Trans. Med. Imaging 37, 2561–2571 (2018)
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank the Associate Editor and anonymous referees for their helpful suggestions. This work was funded by the National Natural Science Foundation of China (12071022), Beijing Natural Science Foundation (Z190002) and Natural Science Foundation of Shandong Province (ZR2018MA019).

Author information

Authors and Affiliations

School of Mathematics and Statistics, Linyi University, Linyi, 276000, People’s Republic of China
Jun Sun
Department of Applied Mathematics, Beijing Jiaotong University, Beijing, 100044, People’s Republic of China
Lingchen Kong
Institute of Operations Research, Qufu Normal University, Rizhao, 276826, Shandong, People’s Republic of China
Biao Qu

Authors

Jun Sun
View author publications
You can also search for this author inPubMed Google Scholar
Lingchen Kong
View author publications
You can also search for this author inPubMed Google Scholar
Biao Qu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Lingchen Kong.

Additional information

Communicated by Sebastian U. Stich.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

1.1 A. Proof of Theorem 3.1

Necessity. Based on [12, Theorem 6.12], a local minimizer $\mathbf{x}^{*}$ of the problem (1) must satisfy that $- \nabla f(\mathbf{x}^{*}) \in {\mathcal {N}}_{\Sigma }(\mathbf{x}^{*}) = {\mathcal {N}}_{\Sigma _1}(\mathbf{x}^{*}_1) \times {\mathcal {N}}_{\Sigma _2}(\mathbf{x}^{*}_2)\times \cdots \times {\mathcal {N}}_{\Sigma _2}(\mathbf{x}^{*}_m),$ where ${\mathcal {N}}_{\Sigma }(\mathbf{x}^{*})$ is the normal cone of $\Sigma $ at $\mathbf{x}^{*}$ and the equality is by [12, Theorem 6.41]. Then the explicit expression (see [10]) of the normal cone ${\mathcal {N}}_{\Sigma _i}(\mathbf{x}^{*}_i)$ enable us to derive (12) immediately.

Sufficiency. Let $\mathbf{x}^{*}$ satisfy (12). The convexity of f leads to

$$\begin{aligned} \begin{array}{cl} f(\mathbf{x})\ge &{}f(\mathbf{x}^{*})+\langle \nabla _{1} f(\mathbf{x}^{*}), \mathbf{x}_{1}-\mathbf{x}_{1}^{*}\rangle +\langle \nabla _{2} f(\mathbf{x}^{*}), \mathbf{x}_{2}-\mathbf{x}_{2}^{*}\rangle \\ &{}+\cdots +\langle \nabla _{m} f(\mathbf{x}^{*}), \mathbf{x}_{m}-\mathbf{x}_{m}^{*}\rangle . \end{array} \end{aligned}$$

(25)

If there is a $\delta >0$ such that for any $\mathbf{x}\in \Sigma \cap N(\mathbf{x}^*,\delta )$, we have

$$\begin{aligned} \langle \nabla _{1}f(\mathbf{x}^{*}),\mathbf{x}_{1}-\mathbf{x}_{1}^{*}\rangle =\langle \nabla _{2}f(\mathbf{x}^{*}),\mathbf{x}_{2}-\mathbf{x}_{2}^{*}\rangle =\cdots =\langle \nabla _{m}f(\mathbf{x}^{*}),\mathbf{x}_{m}-\mathbf{x}_{m}^{*}\rangle =0, \end{aligned}$$

(26)

then the conclusion can be made immediately. Therefore, we next to show (26). In fact, by (12), we note that $\nabla _{i}f(\mathbf{x}^{*})=0$ if $\Vert \mathbf{x}_{i}^{*}\Vert _{0}<s_{i}$, which indicates it suffices to consider the worst case of $\Vert \mathbf{x}_{i}^{*}\Vert _{0}=s_{i},i=1,2,\ldots ,m$. Under such a case, we define

$$\begin{aligned} \delta := \min _{i=1,2,\ldots ,m} \min \limits _{j \in \Gamma (\mathbf{x}_{i}^{*})} \mid (\mathbf{x}_{i}^{*})_j\mid . \end{aligned}$$

Then for any $\mathbf{x}\in N(\mathbf{x}^{*},\delta )\cap \Sigma $, we have

$$\begin{aligned} \mid (\mathbf{x}_{i})_j\mid= & {} \mid (\mathbf{x}_{i})_j^*- (\mathbf{x}_{i})_j^*+(\mathbf{x}_{i})_j\mid \\\ge & {} \mid (\mathbf{x}_{i})_j^*\mid -\mid (\mathbf{x}_{i})_j^*-(\mathbf{x}_{i})_j\mid \\\ge & {} \mid (\mathbf{x}_{i})_j^*\mid -\Vert \mathbf{x}_{i}^*-\mathbf{x}_{i}\Vert \\> & {} \mid (\mathbf{x}_{i})_j^*\mid -\delta \\\ge & {} 0. \end{aligned}$$

This indicates that $\Gamma (\mathbf{x}_{i}^{*})\subseteq \Gamma (\mathbf{x}_{i})$, which by $\Vert \mathbf{x}_{i}\Vert _{0}\le s_{i}=\Vert \mathbf{x}_{i}^{*}\Vert _0=\mid \Gamma (\mathbf{x}_{i}^{*})\mid $ yields

$$\begin{aligned} \Gamma (\mathbf{x}_{i}^{*})= \Gamma (\mathbf{x}_{i}), i=1,2,\ldots ,m,~\forall ~\mathbf{x}\in N(\mathbf{x}^{*},\delta )\cap \Sigma . \end{aligned}$$

Using the above fact and (12) derive that

$$\begin{aligned} \langle \nabla _{i}f(\mathbf{x}^{*}),\mathbf{x}_{i}-\mathbf{x}_{i}^{*}\rangle =\langle (\nabla _{i}f(\mathbf{x}^{*}))_{\Gamma (\mathbf{x}_{i}^{*})},(\mathbf{x}_{i}-\mathbf{x}_{i}^{*})_{\Gamma (\mathbf{x}_{i}^{*})}\rangle =0. \end{aligned}$$

(27)

The prove is completed.

1.2 B. Proof of Lemma 3.1 and Theorem 3.2

Suppose that $\mathbf{x}^{*}$ is $\alpha $-stationary point. If $j\in \Gamma (\mathbf{x}_{i}^{*})$, according to $\mathbf{x}_{i}^{*} \in \Pi _{\Sigma _i}(\mathbf{x}_{i}^{*}-\alpha \nabla _{i} f(\mathbf{x}^{*}))$, we have $(\mathbf{x}_{i}^{*})_{j}=(\mathbf{x}_{i}^{*})_{j}-(\alpha \nabla _{i} f(\mathbf{x}^{*}))_{j}$, so that $(\nabla _{i} f(\mathbf{x}^{*}))_{j}=0$. If $j\in {\overline{\Gamma }}(\mathbf{x}_{i}^{*})$, then $\mid (\mathbf{x}_{i}^{*})_{j}-(\alpha \nabla _{i} f(\mathbf{x}^{*}))_{j}\mid \le (\mathbf{x}_{i}^{*})^\downarrow _{s_{i}}$, which combined with the fact that $(\mathbf{x}_{i}^{*})_{j}=0$ implies that $\mid \alpha \nabla _{i} f(\mathbf{x}^{*})\mid _{j}\le (\mathbf{x}_{i}^{*})^\downarrow _{s_{i}}$.

Suppose that $\mathbf{x}^{*}$ satisfies (13). If $\Vert \mathbf{x}_{i}^{*}\Vert _{0}< s_{i}$, then $(\mathbf{x}_{i}^{*})^\downarrow _{s_{i}}=0$. It follows that $(\nabla _{i} f(\mathbf{x}^{*}))_{j}=0$, in this case $\Pi _{\Sigma _i}\left( \mathbf{x}_{i}^{*}-\alpha \nabla _{i} f\left( \mathbf{x}^{*}\right) \right) =\Pi _{\Sigma _i}\left( \mathbf{x}_{i}^{*}\right) $ is the set $\{\mathbf{x}_{i}^{*}\}$. If $\Vert \mathbf{x}_{i}^{*}\Vert _{0}= s_{i}$, then $(\mathbf{x}_{i}^{*})^\downarrow _{s_{i}}\ne 0$. We have

$$\begin{aligned} (\mathbf{x}_{i}^{*}-\alpha \nabla _{i} f\left( \mathbf{x}^{*}\right) )_{j} {\left\{ \begin{array}{ll} =(\mathbf{x}_{i}^{*})_{j},~~~~~~~~~~j\in \Gamma (\mathbf{x}_{i}^{*}),\\ \le (\mathbf{x}_{i}^{*})^\downarrow _{s_{i}}, ~~~~~~~~~j\in {\overline{\Gamma }}(\mathbf{x}_{i}^{*}).\\ \end{array}\right. } \end{aligned}$$

Therefore, the vector $\mathbf{x}_{i}^{*}-\alpha \nabla _{i} f\left( \mathbf{x}^{*}\right) $ contains the $s_{i}$ components of $\mathbf{x}_{i}^{*}$ with the largest absolute value and all other components are smaller or equal to them, so that $\mathbf{x}_{i}^{*} \in \Pi _{\Sigma _i}(\mathbf{x}_{i}^{*}-\alpha \nabla _{i} f\left( \mathbf{x}^{*}\right) )$.

1.3 C. Proof of Lemma 3.2 and Theorem 3.3

Suppose $\mathbf{x}^{*}$ is B-stationary point, that is $\nabla ^{B}_{ \Sigma _i}f(\mathbf{x}^{*})=\arg \min \{\Vert d_{i}+\nabla _{i}f(\mathbf{x}^{*})\Vert : d_{i}\in T_{ \Sigma _i}^{B}(\mathbf{x}_{i}^{*})\}$.

If $\Vert \mathbf{x}_{i}^{*}\Vert _{0}= s_{i}$, we have

$$\begin{aligned} \begin{aligned} \nabla ^{B}_{ \Sigma _i}f(\mathbf{x}^{*})&=\arg \min \{\Vert d_{i}+\nabla _{i}f(\mathbf{x}^{*})\Vert : d_{i}\in T_{ \Sigma _i}^{B}(\mathbf{x}_{i}^{*})\}\\&=\arg \min \{\Vert d_{i}+\nabla _{i}f(\mathbf{x}^{*})\Vert : \Vert d_{i}\Vert _{0}\le s_{i},~\Vert \mathbf{x}_{i}^{*}+\nu d_{i}\Vert _{0}\le s_{i},~\forall ~ \nu \in \mathbb {R}\}\\&=\arg \min \{\Vert d_{i}+\nabla _{i}f(\mathbf{x}^{*})\Vert : \Gamma (d_{i})\subseteq \Gamma (\mathbf{x}_{i}^{*})\}. \end{aligned} \end{aligned}$$

The above formula is equivalent to

$$\begin{aligned} \nabla ^{B}_{ \Sigma _i}f(\mathbf{x}^{*})_{j}= {\left\{ \begin{array}{ll} -(\nabla _{i}f(\mathbf{x}^{*}))_{j},~~&{}j\in \Gamma (\mathbf{x}_{i}^{*}),\\ 0,~~&{}j\in {\overline{\Gamma }}(\mathbf{x}_{i}^{*}).\\ \end{array}\right. } \end{aligned}$$

Then we have

$$\begin{aligned} \nabla ^{B}_{ \Sigma _i}f(\mathbf{x}^{*})=0\Longleftrightarrow ((\nabla _{i}f(\mathbf{x}^{*}))_{j} {\left\{ \begin{array}{ll} =0,~~j\in \Gamma (\mathbf{x}_{i}^{*}),\\ \in \mathbb {R}, ~~j\in {\overline{\Gamma }}(\mathbf{x}_{i}^{*}).\\ \end{array}\right. } \end{aligned}$$

Next we prove $\nabla ^{B}_{ \Sigma _i}f(\mathbf{x}^{*})=0\Longleftrightarrow \nabla _{i}f(\mathbf{x}^{*})=0$ when $\Vert \mathbf{x}_{i}^{*}\Vert _{0}< s_{i}$. On the one hand, if $\nabla _{i}f(\mathbf{x}^{*})=0$, then

$$\begin{aligned} \begin{aligned} \nabla ^{B}_{ \Sigma _i}f(\mathbf{x}^{*})&=\arg \min \{\Vert d_{i}+\nabla _{i}f(\mathbf{x}^{*})\Vert : d_{i}\in T_{ \Sigma _i}^{B}(\mathbf{x}_{i}^{*})\}\\&=\arg \min \{\Vert d_{i}+\nabla _{i}f(\mathbf{x}^{*})\Vert : \Vert d_{i}\Vert _{0}\le s_{i},~\Vert \mathbf{x}_{i}^{*}+\nu d_{i}\Vert _{0}\le s_{i},~\forall ~ \nu \in \mathbb {R}\}\\&=\arg \min \{\Vert d_{i}\Vert : \Vert d_{i}\Vert _{0}\le s_{i},~\Vert \mathbf{x}_{i}^{*}+\nu d_{i}\Vert _{0}\le s_{i},~\forall ~ \nu \in \mathbb {R}\}=0. \end{aligned} \end{aligned}$$

On the other hand, if $\nabla ^{B}_{ \Sigma _i}f(\mathbf{x}^{*})=0$, then

$$\begin{aligned} 0=\nabla ^{B}_{ \Sigma _i}f(\mathbf{x}^{*})=\arg \min \{\Vert d_{i}+\nabla _{i}f(\mathbf{x}^{*})\Vert : \Vert d_{i}\Vert _{0}\le s_{i},~\Vert \mathbf{x}_{i}^{*}+\nu d_{i}\Vert _{0}\le s_{i},~\forall ~ \nu \in \mathbb {R}\}, \end{aligned}$$

which leads to $\Vert \nabla _{i}f(\mathbf{x}^{*})\Vert \le \Vert d_{i}+\nabla _{i}f(\mathbf{x}^{*})\Vert $ for any $\Vert d_{i}\Vert _{0}\le s_{i},~\Vert \mathbf{x}_{i}^{*}+\nu d_{i}\Vert _{0}\le s_{i},~\forall ~\nu \in \mathbb {R}$. Particularly, for any $j_{0}\in \{1,2,\ldots ,p_{i}\}$, we take $d_{i}$ with $\Gamma (d_{i})=\{j_{0}\}$. Apparently, $\Vert \mathbf{x}_{i}^{*}+\nu d_{i}\Vert _{0}\le s_{i},~\forall ~\nu \in \mathbb {R}$. Then by valuing $d_{j_{0}}=-(\nabla _{i}f(\mathbf{x}^{*}))_{j_{0}}$ and $d_{j}=0,j\ne j_{0}$, we can get $(\nabla _{i}f(\mathbf{x}^{*}))_{j_{0}}=0$ because $\Vert \nabla _{i}f(\mathbf{x}^{*})\Vert \le \Vert \nabla _{i}f(\mathbf{x}^{*})-(\nabla _{i}f(\mathbf{x}^{*}))_{j_{0}}\Vert $. By the arbitrariness of $j_{0}$, we get $\nabla _{i}f(\mathbf{x}^{*})=0$.

Since the conditions (14) is equivalent to (12), the B-stationary point is equivalent to the local minimizer. The proof of Theorem 3.3 is completed.

1.4 D. Proof of Lemma 3.3 and Theorem 3.4

If $(\mathbf{x}^{*})$ is a C-stationary point, then $\nabla ^{C}_{ \Sigma _i}f(\mathbf{x}^{*})=\arg \min \{\Vert d_{i}+\nabla _{i}f(\mathbf{x}^{*})\Vert : d_{i}\in T_{ \Sigma _i}^{C}(\mathbf{x}_{i}^{*})\}$. So

$$\begin{aligned} \begin{aligned} \nabla ^{C}_{ \Sigma _i}f(\mathbf{x}^{*})&=\arg \min \{\Vert d_{i}+\nabla _{i}f(\mathbf{x}^{*})\Vert : d_{i}\in T_{ \Sigma _i}^{C}(\mathbf{x}_{i}^{*})\}\\&=\arg \min \{\Vert d_{i}+\nabla _{i}f(\mathbf{x}^{*})\Vert : \Gamma (d_{i})\subseteq \Gamma (\mathbf{x}_{i}^{*})\}, \end{aligned} \end{aligned}$$

which is equivalent to

$$\begin{aligned} (\nabla ^{C}_{ \Sigma _i}f(\mathbf{x}^{*}))_{j}= {\left\{ \begin{array}{ll} -(\nabla _{i}f(\mathbf{x}^{*}))_{j},~~&{}j\in \Gamma (\mathbf{x}_{i}^{*}),\\ 0,~~&{}j\in {\overline{\Gamma }}(\mathbf{x}_{i}^{*}).\\ \end{array}\right. } \end{aligned}$$

Therefore, if $\mathbf{x}^{*}$ is a C-stationary point, then

$$\begin{aligned} \nabla ^{C}_{ \Sigma _i}f(\mathbf{x}^{*})=0\Longleftrightarrow (\nabla _{i}f(\mathbf{x}^{*}))_{j} {\left\{ \begin{array}{ll} =0,~~j\in \Gamma (\mathbf{x}_{i}^{*}),\\ \in \mathbb {R}, ~~j\in {\overline{\Gamma }}(\mathbf{x}_{i}^{*}).\\ \end{array}\right. } \end{aligned}$$

To sum up, we get the desired results.

If $\Vert \mathbf{x}_{i}^{*}\Vert _{0}=s_{i},i=1,2,\ldots ,m$, the conditions (15) imply (12) and a point satisfying (12) is a local minimizer by Theorem 3.1, so a C-stationary point of (1) is a local minimizer. Suppose $\mathbf{x}^{*}$ is a local minimizer, then it satisfies (12) which is a special case of (15), that is to say, it is a C-stationary point. The proof of Theorem 3.4 is completed.

1.5 E. Proof of Theorem 3.6

The original problem (1) can be written as

$$\begin{aligned} \begin{aligned} \min _{|T_i| =s_{i},i=1,2,\ldots ,m}\left\{ \min _{\mathbf{x}_{1},\ldots ,\mathbf{x}_{m}}~ f(\mathbf{x}_{1},\ldots ,\mathbf{x}_{m}),~~ \mathrm{s.t.}~ \mathbf{x}_{1}\in \mathbb {R}^{p_{1}}_{T_1},\ldots ,\mathbf{x}_{m}\in \mathbb {R}^{p_{m}}_{T_m}\right\} . \end{aligned} \end{aligned}$$

(28)

Because f is a strongly convex on $\mathbb {R}^{p_{1}}_{T_1}\times \cdots \mathbb {R}^{p_{m}}_{T_m}$. Moreover, for any given $|T_i|=s_{i},~i=1,2,\ldots ,m$, the inner problem of (28) is also a strongly convex program. Therefore the inner program admits a unique global minimizer denoted by $\mathbf{x}_{i}^*(T_i),~i=1,2,\ldots ,m$. Note that $T_i\subseteq [p_i],~i=1,2,\ldots ,m$. Thus there are finitely many $T_i$ such that $|T_i|=s_{i}$ and so are the inner programs. This indicates that $\mathbf{x}_{i}^*(T_i)$ is finitely many. To derive the global minimizer of (28), we only pick one $\mathbf{x}_{i}^*(T_i)$ that makes the objective function value of (28) minimal. Global minimizers exist.

We next show that any local minimizer $\mathbf{x}^*$ is unique. To proceed with that, denote $\delta :=\min \limits _{i=1,2,\ldots ,m}\{\delta _i\}$ where

$$\begin{aligned} \begin{array}{cll} \delta _i&{}:=&{} {\left\{ \begin{array}{ll} +\infty , &{} \mathbf{x}_{i}^* =0,\\ \min _{j\in \Gamma (\mathbf{x}_{i}^*)} \mid (\mathbf{x}_{i}^*)_j\mid , &{} \mathbf{x}_{i}^* \ne 0,\ \end{array}\right. }~~~~i=1,2,\ldots ,m. \end{array} \end{aligned}$$

(29)

Clearly, $\delta _i$ and hence $\delta >0$. Then, similar reasoning allows us to derive (26) for any $\mathbf{x}\in \Sigma \cap N(\mathbf{x}^*,\delta )$. This and f being a strongly convex lead to

$$\begin{aligned} f(\mathbf{x})\ge f(\mathbf{x}^{*}) + (l_f/2) \Vert \mathbf{x}-\mathbf{x}^{*}\Vert ^2. \end{aligned}$$

(30)

The above condition indicates $\mathbf{x}^{*}$ is the unique global minimizer of the problem $\min \{f(\mathbf{x}):\mathbf{x}\in \Sigma \cap N(\mathbf{x}^*,\delta )\}$, namely, $\mathbf{x}^{*}$ is the unique local minimizer of (1).

Appendix B

1.1 A. Proof of Lemma 4.1

(1) It follows from (16) that $\mathbf{x}^k(\alpha ) \in \Pi _{\Sigma }(\mathbf{x}^k -\alpha \nabla f(\mathbf{x}^k))$ and thus

$$\begin{aligned} \Vert \mathbf{x}^k(\alpha )-( \mathbf{x}^k - \alpha \nabla f(\mathbf{x}^k))\Vert ^2 \le \Vert \mathbf{x}^k -( \mathbf{x}^k - \alpha \nabla f(\mathbf{x}^k))\Vert ^2, \end{aligned}$$

which results in

$$\begin{aligned} 2\alpha \langle \nabla f(\mathbf{x}^k), \mathbf{x}^k(\alpha )- \mathbf{x}^k\rangle \le - \Vert \mathbf{x}^k(\alpha )- \mathbf{x}^k\Vert ^2. \end{aligned}$$

(31)

This and the strong smoothness of f with the constant $L_f$ derive that

$$\begin{aligned} \begin{array}{lll} f(\mathbf{x}^k(\alpha )) &{}\le &{} f(\mathbf{x}^k)+ \langle \nabla f(\mathbf{x}^k),\mathbf{x}^k(\alpha )-\mathbf{x}^k\rangle +(L_{f}/2)\Vert \mathbf{x}^k(\alpha )-\mathbf{x}^k\Vert ^2\\ &{} {\le }&{} f(\mathbf{x}^k)- ( {1}/{(2\alpha )} -(L_{f}/2) )\Vert \mathbf{x}^k(\alpha )-\mathbf{x}^k\Vert ^2\\ &{}\le &{} f(\mathbf{x}^k) - (\sigma /2)\Vert \mathbf{x}^k(\alpha )-\mathbf{x}^k\Vert ^2, \end{array} \end{aligned}$$

where the last inequality is from $0<\alpha \le 1/(\sigma +L_{f})$. Invoking the Armijo-type step size rule, one has $\alpha _k\ge \gamma /(\sigma +L_{f})$, which by $\alpha _k\le 1$ proves the desired assertion.

(2) By (23) and $\mathbf{u}^k=\mathbf{x}^k(\alpha _k)$, we have

$$\begin{aligned} f(\mathbf{u}^k)\le & {} f(\mathbf{x}^k) - (\sigma /2)\Vert \mathbf{u}^k-\mathbf{x}^k\Vert ^2. \end{aligned}$$

(32)

By the framework of Algorithm 1, if $\mathbf{x}^{k+1}=\mathbf{u}^k$, then the above condition implies,

$$\begin{aligned} f(\mathbf{x}^{k+1})\le & {} f(\mathbf{x}^k) - (\sigma /2)\Vert \mathbf{x}^{k+1}-\mathbf{x}^k\Vert ^2. \end{aligned}$$

(33)

If $\mathbf{x}^{k+1}=\mathbf{v}^k$, then we obtain

$$\begin{aligned} \begin{array}{lll} f(\mathbf{x}^{k+1})= f(\mathbf{v}^k)&{}\le &{}f(\mathbf{u}^k) - (\sigma /2)\Vert \mathbf{x}^{k+1} -\mathbf{u}^k\Vert ^2 \\ &{}\le &{} f(\mathbf{x}^k) - (\sigma /2)\Vert \mathbf{u}^k-\mathbf{x}^k\Vert ^2- (\sigma /2)\Vert \mathbf{x}^{k+1}-\mathbf{u}^k\Vert ^2\\ &{}\le &{} f(\mathbf{x}^k) -(\sigma /4)\Vert \mathbf{x}^{k+1}-\mathbf{x}^k\Vert ^2, \end{array} \end{aligned}$$

(34)

where the second and last inequalities used (32) and a fact $\Vert \mathbf{a}+\mathbf{b}\Vert ^2\le 2\Vert \mathbf{a}\Vert ^2+2\Vert \mathbf{b}\Vert ^2$ for all vectors $\mathbf{a}$ and $\mathbf{b}$. Both cases lead to

$$\begin{aligned} \begin{array}{lll} f(\mathbf{x}^{k+1})&{}\le &{} f(\mathbf{x}^k) - (\sigma /4)\Vert \mathbf{x}^{k+1}-\mathbf{x}^k\Vert ^2,\\ f(\mathbf{x}^{k+1})&{}\le &{} f(\mathbf{x}^k) - (\sigma /2)\Vert \mathbf{u}^k-\mathbf{x}^k\Vert ^2. \end{array} \end{aligned}$$

(35)

Therefore, $\{f(\mathbf{x}^k)\}$ is a non-increasing sequence, which with (35) and $f \ge 0$ yields

$$\begin{aligned}&\sum _{k\ge 0} \max \{(\sigma /4)\Vert \mathbf{x}^{k+1}-\mathbf{x}^k\Vert ^2, (\sigma /2)\Vert \mathbf{u}^k-\mathbf{x}^k\Vert ^2\}\\&\quad \le \sum _{k\ge 0} \left[ f(\mathbf{x}^k) - f(\mathbf{x}^{k+1})\right] = f(\mathbf{x}^0) - \lim _{k\rightarrow \infty } f(\mathbf{x}^{k+1})\le f(\mathbf{x}^0). \end{aligned}$$

The above condition suffices to $\lim _{k\rightarrow \infty }\Vert \mathbf{x}^{k+1}-\mathbf{x}^k\Vert =\lim _{k\rightarrow \infty }\Vert \mathbf{u}^k-\mathbf{x}^k\Vert =0.$

(3) Let $\mathbf{x}^*$ be any accumulating point of $\{\mathbf{x}^k\}$. Then there exists a subset M of $\{0,1,2,\ldots \}$ such that $\lim _{ k (\in M)\rightarrow \infty } \mathbf{x}^k = \mathbf{x}^*.$ This further implies $\lim _{ k (\in M)\rightarrow \infty } \mathbf{u}^k = \mathbf{x}^*$ by applying 2). In addition, as stated in 1), we have $\{\alpha _k\}\subseteq [ {\underline{\alpha }}, 1]$, which indicates that one can find a subsequence K of M and a scalar $\alpha _*\in [ {\underline{\alpha }}, 1]$ such that $\{\alpha _k: k \in K\}\rightarrow \alpha _*$. Overall, we have

$$\begin{aligned} \lim _{ k (\in K)\rightarrow \infty }\mathbf{x}^k =\lim _{ k (\in K)\rightarrow \infty }\mathbf{u}^k = \mathbf{x}^*, ~~~~\lim _{ k (\in K)\rightarrow \infty } \alpha _k =\alpha _* \in [\underline{\alpha },1].\end{aligned}$$

(36)

Let ${\varvec{\eta }}^k :=\mathbf{x}^k -\alpha _k \nabla f(\mathbf{x}^k)$. The framework of Algorithm 1 implies

$$\begin{aligned} \mathbf{u}^k \in \Pi _{\Sigma }( {\varvec{\eta }}^k ),~~~~ \lim _{ k (\in K)\rightarrow \infty } {\varvec{\eta }}^k =\mathbf{x}^* -\alpha _* \nabla f(\mathbf{x}^*)=:{\varvec{\eta }}^*. \end{aligned}$$

(37)

The first condition means $\mathbf{u}^k \in \Sigma $ for any $ k \ge 1$. Note that $\Sigma $ is closed and $\mathbf{x}^*$ is the accumulating point of $\{\mathbf{u}^k\}$ by (36). Therefore, $\mathbf{x}^*\in \Sigma $, which results in

$$\begin{aligned} \min _{\mathbf{x}\in \Sigma }\Vert \mathbf{x}-{\varvec{\eta }}^*\Vert \le \Vert \mathbf{x}^*-{{\varvec{\eta }}}^*\Vert . \end{aligned}$$

(38)

If the strict inequality holds in the above condition, then there is an $ \varepsilon _0>0$ such that

$$\begin{aligned} \Vert \mathbf{x}^*-{\varvec{\eta }}^*\Vert -\varepsilon _0= & {} \min _{\mathbf{x}\in \Sigma }\Vert \mathbf{x}-{\varvec{\eta }}^*\Vert \\\ge & {} \min _{\mathbf{x}\in \Sigma } (\Vert \mathbf{x}- {\varvec{\eta }}^k \Vert -\Vert {\varvec{\eta }}^k- {\varvec{\eta }}^*\Vert )\\= & {} \Vert \mathbf{u}^k - {\varvec{\eta }}^k \Vert -\Vert {\varvec{\eta }}^k -{\varvec{\eta }}^*\Vert , \end{aligned}$$

where the last equality is from (37). Taking the limit of both sides of the above condition along $ k (\in K)\rightarrow \infty $ yields $\Vert \mathbf{x}^*- {\varvec{\eta }}^*\Vert -\varepsilon _0 \ge \Vert \mathbf{x}^* - {\varvec{\eta }}^*\Vert $ by (36) and (37), a contradiction with $ \varepsilon _0>0$. Therefore, we must have the equality holds in (38), showing that

$$\begin{aligned} \mathbf{x}^* \in \Pi _ \Sigma ({\varvec{\eta }}^*)= \Pi _\Sigma \left( \mathbf{x}^* - \alpha _* \nabla f(\mathbf{x}^*)\right) . \end{aligned}$$

The above relation means the conditions in (13) hold for $\alpha =\alpha _*$, then these conditions must hold for any $0<\alpha \le {\underline{\alpha }}$ due to ${\underline{\alpha }}\le \alpha _*$ from (36), namely,

$$\begin{aligned} \mathbf{x}^* \in \Pi _\Sigma \left( \mathbf{x}^* - \alpha \nabla f(\mathbf{x}^*)\right) , \end{aligned}$$

displaying that $\mathbf{x}^*$ is an $\alpha $-stationary point of (1), as desired.

1.2 B. Proof of Theorem 4.1

As shown in Lemma 4.1, one can find a subsequence of $\{\mathbf{x}^ k \}$ that converges to the $\alpha $-stationary point $\mathbf{x}^*$ with $0<\alpha \le {\underline{\alpha }}$ of (1). Recall that an $\alpha $-stationary point $\mathbf{x}^*$ is also a local minimizer by Theorem 3.2, which indicates that $\mathbf{x}^*$ is unique because f(x) is restricted strong convex. In other words, $\mathbf{x}^*$ is an isolated local minimizer of (1). Finally, it follows from $\mathbf{x}^*$ being isolated, [8, Lemma 4.10] and $\lim _{ k \rightarrow \infty }\Vert \mathbf{x}^{ k +1}-\mathbf{x}^ k \Vert =0$ by Lemma 4.1 that the whole sequence converges to the unique local minimizer $\mathbf{x}^*$.

1.3 C. Proof of Lemma 4.2

1) If $\Vert \mathbf{x}_i^*\Vert _0=s_i$, then by $\mathbf{x}_i^k\rightarrow \mathbf{x}_i^*, \mathbf{u}_i^k\rightarrow \mathbf{x}_i^*$ and $\Vert \mathbf{x}_i^k\Vert _0\le s_i,\Vert \mathbf{u}_i^k\Vert _0\le s_i$, we must have $\Gamma (\mathbf{x}_i^*) \equiv \Gamma (\mathbf{x}_i^k)\equiv \Gamma (\mathbf{u}_i^k)$ for sufficiently large k. If $\Vert \mathbf{x}_i^*\Vert _0<s_i$, similar reasoning allows for deriving $\Gamma (\mathbf{x}_i^*) \subseteq \Gamma (\mathbf{x}_i^k) $ and $ \Gamma (\mathbf{x}_i^*) \subseteq \Gamma (\mathbf{u}_i^k)$.

2) According to Theorem 4.1, the limit point $\mathbf{x}^*$ is a local minimizer of the problem (1). Therefore, it satisfies (12) from Theorem 3.1. We first conclude that one of the conditions in (17) must be satisfied for a large enough k. In fact, there are three cases for $\mathbf{x}^*$, each of which can imply one condition in (17) as follows.

$$\begin{aligned} \begin{array}{lll} \text {Case 1)}&{}~\Vert \mathbf{x}_i^*\Vert _0=s_i,~i=1,2,\ldots ,m&{}\quad \Longrightarrow ~~~~\text {Cond 1)},\\ \text {Case 2)}&{}~\Vert \mathbf{x}_i^*\Vert _0<s_i,~i\in I_{1},~\Vert \mathbf{x}_i^*\Vert _0=s_i,~i\in I_{2}&{}\quad \Longrightarrow ~~~~\text {Cond 2)},\\ \text {Case 3)}&{}~\Vert \mathbf{x}_i^*\Vert _0<s_i,~i=1,2,\ldots ,m&{}\quad \Longrightarrow ~~~~\text {Cond 3)}.\\ \end{array} \end{aligned}$$

(39)

We now prove them one by one. From the Lipschitz continuity of $\nabla f$, we have

$$\begin{aligned} \begin{array}{llll} &{}&{}\max \{\Vert \nabla _i f(\mathbf{u}^k)- \nabla _i f(\mathbf{x}^*)\Vert ,\Vert (\nabla f(\mathbf{u}^{k}))_{\Gamma _k}-(\nabla f(\mathbf{x}^*))_{\Gamma _k}\}\\ &{}&{}\quad \le \Vert \nabla f(\mathbf{u}^k)- \nabla f(\mathbf{x}^*)\Vert \le L_f\Vert \mathbf{u}^k - \mathbf{x}^* \Vert . \end{array} \end{aligned}$$

(40)

The relation of Case 1) $\Rightarrow $ Cond 1) can be launched by (24) immediately. For Case 2), we have $\Gamma (\mathbf{x}_i^k)\equiv \Gamma (\mathbf{u}_i^k),~i\in I_{2}$ by (24) and

$$\begin{aligned} \begin{array}{llll} \Vert \nabla _i f(\mathbf{u}^k)\Vert &{}=&{} \Vert \nabla _i f(\mathbf{u}^k)- \nabla _i f(\mathbf{x}^*)\Vert ,~i\in I_1,&{}~~(\text {by }(12))\\ &{}\le &{}L_f\Vert \mathbf{u}^k - \mathbf{x}^* \Vert &{} ~~(\text {by }(40))\\ &{}\le &{} \epsilon .&{} ~~(\text {by } \mathbf{u}^k \rightarrow \mathbf{x}^*) \end{array} \end{aligned}$$

(41)

Therefore, Case 2) $\Rightarrow $ Cond 2). Similarly, we can show the last relation.

Next, since f(x) is strongly convex, $H^k_{\Gamma _{k}\Gamma _{k}}$ is non-singular, which means that the equations (19) are solvable. Finally, we show the inequality (20) is true when $\sigma \in (0,l_f/2)$. In fact, the conditions (24) and (12) enable to derive

$$\begin{aligned} (\nabla f(\mathbf{x}^*))_{\Gamma _k}=0, \end{aligned}$$

(42)

for sufficiently large k. Then it follows from (19) that

$$\begin{aligned} \begin{array}{llll} \Vert \mathbf{v}^k-\mathbf{u}^k\Vert &{}=&{} \Vert \mathbf{v}^{k}_{\Gamma _k}-\mathbf{u}^{k}_{\Gamma _k}\Vert &{}~~(\text {by }(21))\\ &{}=&{} \Vert (H^k_{\Gamma _{k}\Gamma _{k}})^{-1}(\nabla f(\mathbf{u}^{k}))_{\Gamma _k}\Vert &{}~~(\text {by } (19))\\ &{}\le &{}(1/l_f)\Vert (\nabla f(\mathbf{u}^{k}))_{\Gamma _k}\Vert &{}~~(\text {by }(7))\\ &{}=&{}(1/l_f)\Vert (\nabla f(\mathbf{u}^{k}))_{\Gamma _k}-(\nabla f(\mathbf{x}^*))_{\Gamma _k}\Vert &{}~~(\text {by }(42))\\ &{}\le &{}(L_f/l_f) \Vert \mathbf{u}^{k}-\mathbf{x}^*\Vert \rightarrow 0.&{} ~~(\text {by } (40)) \end{array} \end{aligned}$$

The above condition indicates that $\Vert \mathbf{v}^k-\mathbf{u}^k\Vert \rightarrow 0$, resulting in

$$\begin{aligned} o(\Vert \mathbf{v}^k-\mathbf{u}^k\Vert ^2) \le (l_f/{4}) \Vert \mathbf{v}^k-\mathbf{u}^k \Vert ^2, \end{aligned}$$

(43)

for sufficiently large k. We have the following chain of inequalities,

$$\begin{aligned} \begin{array}{llll} 2f(\mathbf{v}^k)-2f(\mathbf{u}^k)&{}=&{} 2\langle \nabla f(\mathbf{u}^k),\mathbf{v}^k-\mathbf{u}^k \rangle +2o(\Vert \mathbf{v}^k-\mathbf{u}^k\Vert ^2)\\ &{}&{} \quad + \langle \nabla ^2f(\mathbf{u}^k)(\mathbf{v}^k-\mathbf{u}^k),\mathbf{v}^k-\mathbf{u}^k \rangle ~~~~(\text {by Taylor expansion})\\ &{}=&{} 2\langle (\nabla f(\mathbf{u}^k))_{\Gamma _k},(\mathbf{v}^k-\mathbf{u}^k)_{\Gamma _k} \rangle +2o(\Vert \mathbf{v}^k-\mathbf{u}^k\Vert ^2)\\ &{}&{} \quad \langle H^k_{\Gamma _{k}\Gamma _{k}}(\mathbf{v}^k-\mathbf{u}^k)_{\Gamma _k},(\mathbf{v}^k-\mathbf{u}^k)_{\Gamma _k} \rangle ~~~~ (\text {by }(21))\\ &{}=&{} - \langle H^k_{\Gamma _{k}\Gamma _{k}}(\mathbf{v}^k-\mathbf{u}^k)_{\Gamma _k},(\mathbf{v}^k-\mathbf{u}^k)_{\Gamma _k} \rangle + 2o(\Vert \mathbf{v}^k-\mathbf{u}^k\Vert ^2) ~~~~(\text {by } (19))\\ &{}\le &{} - l_f \Vert (\mathbf{v}^k-\mathbf{u}^k)_{\Gamma _k} \Vert ^2 +o(\Vert \mathbf{v}^k-\mathbf{u}^k\Vert ^2) ~~~~(\text {by }(7))\\ &{}=&{} - l_f \Vert \mathbf{v}^k-\mathbf{u}^k \Vert ^2 + 2 o(\Vert \mathbf{v}^k-\mathbf{u}^k\Vert ^2)~~~~(\text {by } (21))\\ &{}\le &{} -(l_f/{2}) \Vert \mathbf{v}^k-\mathbf{u}^k \Vert ^2~~~~(\text {by }(43)) \\ &{}\le &{} -\sigma \Vert \mathbf{v}^k-\mathbf{u}^k \Vert ^2.~~~~(\text {by } \sigma \in (0,l_f/2)) \end{array} \end{aligned}$$

In general, for sufficiently large k, Newton steps can always be obtained.

1.4 D. Proof of Theorem 4.2

We first estimate $\Vert \mathbf{u}^{k}-\mathbf{x}^*\Vert $. Recalling (16) that

$$\begin{aligned} {\mathbf{u}^k=\mathbf{x}^k(\alpha _k) \in \Pi _{\Sigma }(\mathbf{x}^{k}- \alpha _k \nabla f(\mathbf{x}^k)), } \end{aligned}$$

and $\Gamma _k=\Gamma (\mathbf{u}^k)$, we have

$$\begin{aligned} \mathbf{u}^k_{\Gamma _k}=\mathbf{x}^{k}_{\Gamma _{k}}-\alpha _k (\nabla f(\mathbf{x}^{k}))_{\Gamma _k},~~~~ \mathbf{u}^k_{{{\overline{\Gamma }}}_k}=0. \end{aligned}$$

This enables us to deliver that

$$\begin{aligned} \begin{array}{llll} \Vert \mathbf{u}^{k}-\mathbf{x}^*\Vert &{}=&{} \Vert \mathbf{x}^{k}_{\Gamma _{k}}-\alpha _k (\nabla f(\mathbf{x}^{k}))_{\Gamma _k} - \mathbf{x}^*_{\Gamma _{k}}\Vert ~~~~(\text {by } \mathbf{u}^k_{{{\overline{\Gamma }}}_k}=\mathbf{x}^*_{{{\overline{\Gamma }}}_k}=0 \text { from }(24))\\ &{}=&{} \Vert \mathbf{x}^{k}_{\Gamma _{k}}-\alpha _k (\nabla f(\mathbf{x}^{k}))_{\Gamma _k} - \mathbf{x}^*_{\Gamma _{k}}-\alpha _k(\nabla f(\mathbf{x}^{*}))_{\Gamma _k})\Vert ~~~~(\text {by }(42))\\ &{}\le &{} \Vert \mathbf{x}^{k}_{\Gamma _{k}}- \mathbf{x}^*_{\Gamma _{k}}\Vert +\alpha _k \Vert (\nabla f(\mathbf{x}^{k}))_{\Gamma _k} - (\nabla f(\mathbf{x}^{*}))_{\Gamma _k})\Vert \\ &{}\le &{} (1+L_f)\Vert \mathbf{x}^{k} - \mathbf{x}^* \Vert . ~~~~~ (\text {by }0< \alpha _k\le 1\hbox { and }(40)) \end{array} \end{aligned}$$

(44)

By Lemma 4.2 2), the Newton step is always admitted for sufficiently large k. Then direct calculations lead the following chain of inequalities,

$$\begin{aligned} \begin{array}{llll} &{}&{} \Vert \mathbf{x}^{k+1}-\mathbf{x}^*\Vert = \Vert \mathbf{v}^{k}-\mathbf{x}^*\Vert \\ &{}&{}\quad = \Vert \mathbf{v}^{k}_{\Gamma _{k}}-\mathbf{x}^*_{\Gamma _{k}}\Vert ~~~~~~~~(\text {by } \mathbf{v}^k_{{{\overline{\Gamma }}}_k}=\mathbf{x}^*_{{{\overline{\Gamma }}}_k}=0 \text { from }(24))\\ &{}&{}\quad = \Vert \mathbf{u}^{k}_{\Gamma _{k}}-\mathbf{x}^*_{\Gamma _{k}} - (H^k_{\Gamma _{k}\Gamma _{k}})^{-1} (\nabla f(\mathbf{u}^{k}))_{\Gamma _k}\Vert &{}~~(\text {by }(19))\\ &{}&{}\quad = \Vert \mathbf{u}^{k}_{\Gamma _{k}}-\mathbf{x}^*_{\Gamma _{k}} - (H^k_{\Gamma _{k}\Gamma _{k}})^{-1} ((\nabla f(\mathbf{u}^{k}))_{\Gamma _k}-(\nabla f(\mathbf{x}^{*}))_{\Gamma _k})\Vert &{}~~(\text {by }(42))\\ &{}&{}\quad \le (1/ l_f) \Vert H^k_{\Gamma _{k}\Gamma _{k}}(\mathbf{u}^{k}_{\Gamma _{k}}-\mathbf{x}^*_{\Gamma _{k}} )- ((\nabla f(\mathbf{u}^{k}))_{\Gamma _k}-(\nabla f(\mathbf{x}^{*}))_{\Gamma _k})\Vert &{}~~(\text {by }(7) )\\ &{}&{}\quad \le (1/ l_f) \Vert \nabla ^2 f(\mathbf{u}^k)(\mathbf{u}^{k}-\mathbf{x}^* )- (\nabla f(\mathbf{u}^{k})-\nabla f(\mathbf{x}^{*}))\Vert &{} \\ &{}&{}\quad =(1/ l_f) \Vert \int _0^1 (\nabla ^2 f(\mathbf{u}^*+t(\mathbf{u}^{k}-\mathbf{x}^*))-\nabla ^2 f(\mathbf{u}^k)) (\mathbf{u}^{k}-\mathbf{x}^* )dt\Vert &{} \\ &{}&{}\quad \le (1/ l_f) \int _0^1 \Vert \nabla ^2 f(\mathbf{u}^*+t(\mathbf{u}^{k}-\mathbf{x}^*))-\nabla ^2 f(\mathbf{u}^k)\Vert \Vert \mathbf{u}^{k}-\mathbf{x}^*\Vert dt\\ &{}&{}\quad \le (1/ l_f) \int _0^1 C_f \Vert \mathbf{u}^*+t(\mathbf{u}^{k}-\mathbf{x}^*) - \mathbf{u}^{k}\Vert \Vert \mathbf{u}^{k}-\mathbf{x}^*\Vert dt &{}~~(\text {by }(9)) \\ &{}&{}\quad \le (C_f/ l_f) \Vert \mathbf{u}^{k}-\mathbf{x}^*\Vert ^2 \int _0^1 (1-t)dt\\ &{}&{}\quad =(C_f /(2l_f))\Vert \mathbf{u}^{k}-\mathbf{x}^*\Vert ^2, \end{array} \end{aligned}$$

which combining (44) can make the conclusion immediately.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sun, J., Kong, L. & Qu, B. A Greedy Newton-Type Method for Multiple Sparse Constraint Problem. J Optim Theory Appl 196, 829–854 (2023). https://doi.org/10.1007/s10957-022-02156-2

Download citation

Received: 28 November 2021
Accepted: 26 December 2022
Published: 13 January 2023
Issue Date: March 2023
DOI: https://doi.org/10.1007/s10957-022-02156-2

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Greedy Newton-Type Method for Multiple Sparse Constraint Problem

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Newton method for the composite row sparsity regularized optimization

A comparative study of multi-objective optimization algorithms for sparse signal reconstruction

A New Sufficient Condition for Sparse Recovery with Multiple Orthogonal Least Squares

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A

1.1 A. Proof of Theorem 3.1

1.2 B. Proof of Lemma 3.1 and Theorem 3.2

1.3 C. Proof of Lemma 3.2 and Theorem 3.3

1.4 D. Proof of Lemma 3.3 and Theorem 3.4

1.5 E. Proof of Theorem 3.6

Appendix B

1.1 A. Proof of Lemma 4.1

1.2 B. Proof of Theorem 4.1

1.3 C. Proof of Lemma 4.2

1.4 D. Proof of Theorem 4.2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now