Abstract
With the development of science and technology, we can get many groups of data for the same object. There is a certain relationship with each other or structure between these data or within the data. To characterize the structure of the data in different datasets, in this paper, we propose a multiple sparse constraint problem (MSCP) to process the problem with multiblock sparse structure. We give three types of stationary points and present the relationships among the three types of stationary points and the global/local minimizers. Then we design a gradient projection Newton algorithm, which is proven to enjoy the global and quadratic convergence property. Finally, some numerical experiments of different examples illustrate the efficiency of the proposed method.





Similar content being viewed by others
References
Agarwal, A., Negahban, S., Wainwright, M.: Fast global convergence of gradient methods for high-dimensional statistical recovery. Ann. Stat. 40, 2452–2482 (2012)
Bahmani, S., Raj, B., Boufounos, P.: Greedy sparsity-constrained optimization. J. Mach. Learn. Res. 14, 807–841 (2013)
Beck, A., Eldar, Y.: Sparsity constrained nonlinear optimization: optimality conditions and algorithms. SIAM. J. Optim. 23, 1480–1509 (2013)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM. J. Imaging. Sci. 2, 183–202 (2009)
Berg, E., Friedlander, M.P.: Probing the Pareto frontier for basis pursuit solutions. SIAM J. Sci. Comput. 31, 890–912 (2008)
Jalali, A., Johnson, C., Ravikumar, P.: On learning discrete graphical models using Greedy methods. Adv. Neural Inf. Process. Syst. 24, 1935–1943 (2011). (Granada, Spain)
Jiao, Y., Jin, B., Lu, X.: Group sparse recovery via the \(\ell ^0(\ell ^2)\) penalty: theory and algorithm. IEEE Trans. Signal Process. 65, 998–1012 (2017)
Moré, J., Sorensen, D.: Computing a trust region step. SIAM J. Sci. Stat. Comput. 4, 553–572 (1983)
Pan, L., Chen, X.: Group sparse optimization for images recovery using capped folded concave functions. SIAM J. Imaging. Sci. 14, 1–25 (2021)
Pan, L., Xiu, N., Zhou, S.: On solutions of sparsity constrained optimization. J. Oper. Res. Soc. China 3, 421–439 (2017)
Pan, L., Zhou, S., Xiu, N., Qi, H.: A convergent iterative hard thresholding for sparsity and nonnegativity constrained optimization. Pac. J. Optim. 33, 325–353 (2017)
Rockafellar, R., Wets, R.: Variational Analysis. Springer, Berlin (2009)
Shalev-Shwartz, S., Srebro, N., Zhang, T.: Trading accuracy for sparsity in optimization problems with sparsity constraints. SIAM. J. Optim. 20, 2807–2832 (2010)
She, Y., Wang, Z., Shen, J.: Gaining outlier resistance with progressive quantiles: fast algorithms and theoretical studies. J. Am. Stat. Assoc. 117, 1282–1295 (2021)
Sun, J., Kong, L., Zhou, S.: Gradient projection Newton algorithm for sparse collaborative learning using synthetic and real datasets of applications. J. Comput. Appl. Math. 422, 114872 (2023)
Thompson, P., Martin, N., Wright, M.: Imaging genomics. Curr. Opin. Neurol. 23, 368–373 (2010)
Visscher, P., Brown, M., Mccarthy, M., Yang, J.: Five years of GWAS discovery. Am. J. Hum. Genet. 90, 7–24 (2012)
Wang, R., Xiu, N., Zhang, C.: Greedy projected gradient-Newton method for sparse logistic regression. IEEE Trans Neural Netw. Learn. Syst. 31, 527–538 (2020)
Wang, S., Yehya, N., Schadt, E., Wang, H., Drake, T., Lusis, A.: Genetic and genomic analysis of a fat mass trait with complex inheritance reveals marked sex specificity. PLoS Genet. 2, 148–159 (2006)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. B. 68, 49–67 (2006)
Zhang, H., Wang, F., Xu, H., Liu, Y., Liu, J., Zhao, H., Gelernter, J.: Differentially co-expressed genes in postmortem prefrontal cortex of individuals with alcohol use disorders: influence on alcohol metabolism-related pathways. Hum. Genet. 133, 1383–1394 (2014)
Zhou, S.: Gradient projection Newton pursuit for sparsity constrained optimization. Appl. Comput. Harmon. Anal. 61, 75–100 (2022)
Zhou, S., Luo, Z., Xiu, N.: Computing one-bit compressive sensing via double-sparsity constrained optimization. IEEE Trans. Signal. Proces. 70, 1593–1608 (2022)
Zhou, S., Xiu, N., Qi, H.: Global and quadratic convergence of Newton hard-thresholding pursuit. J. Mach. Learn. Res. 22, 1–45 (2021)
Zille, P., Calhoun, V., Wang, Y.: Enforcing co-expression within a brain-imaging genomics regression framework. IEEE Trans. Med. Imaging 37, 2561–2571 (2018)
Acknowledgements
The authors would like to thank the Associate Editor and anonymous referees for their helpful suggestions. This work was funded by the National Natural Science Foundation of China (12071022), Beijing Natural Science Foundation (Z190002) and Natural Science Foundation of Shandong Province (ZR2018MA019).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Sebastian U. Stich.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A
1.1 A. Proof of Theorem 3.1
Necessity. Based on [12, Theorem 6.12], a local minimizer \(\mathbf{x}^{*}\) of the problem (1) must satisfy that \(- \nabla f(\mathbf{x}^{*}) \in {\mathcal {N}}_{\Sigma }(\mathbf{x}^{*}) = {\mathcal {N}}_{\Sigma _1}(\mathbf{x}^{*}_1) \times {\mathcal {N}}_{\Sigma _2}(\mathbf{x}^{*}_2)\times \cdots \times {\mathcal {N}}_{\Sigma _2}(\mathbf{x}^{*}_m),\) where \({\mathcal {N}}_{\Sigma }(\mathbf{x}^{*})\) is the normal cone of \(\Sigma \) at \(\mathbf{x}^{*}\) and the equality is by [12, Theorem 6.41]. Then the explicit expression (see [10]) of the normal cone \({\mathcal {N}}_{\Sigma _i}(\mathbf{x}^{*}_i)\) enable us to derive (12) immediately.
Sufficiency. Let \(\mathbf{x}^{*}\) satisfy (12). The convexity of f leads to
If there is a \(\delta >0\) such that for any \(\mathbf{x}\in \Sigma \cap N(\mathbf{x}^*,\delta )\), we have
then the conclusion can be made immediately. Therefore, we next to show (26). In fact, by (12), we note that \(\nabla _{i}f(\mathbf{x}^{*})=0\) if \(\Vert \mathbf{x}_{i}^{*}\Vert _{0}<s_{i}\), which indicates it suffices to consider the worst case of \(\Vert \mathbf{x}_{i}^{*}\Vert _{0}=s_{i},i=1,2,\ldots ,m\). Under such a case, we define
Then for any \(\mathbf{x}\in N(\mathbf{x}^{*},\delta )\cap \Sigma \), we have
This indicates that \(\Gamma (\mathbf{x}_{i}^{*})\subseteq \Gamma (\mathbf{x}_{i})\), which by \(\Vert \mathbf{x}_{i}\Vert _{0}\le s_{i}=\Vert \mathbf{x}_{i}^{*}\Vert _0=\mid \Gamma (\mathbf{x}_{i}^{*})\mid \) yields
Using the above fact and (12) derive that
The prove is completed.
1.2 B. Proof of Lemma 3.1 and Theorem 3.2
Suppose that \(\mathbf{x}^{*}\) is \(\alpha \)-stationary point. If \(j\in \Gamma (\mathbf{x}_{i}^{*})\), according to \(\mathbf{x}_{i}^{*} \in \Pi _{\Sigma _i}(\mathbf{x}_{i}^{*}-\alpha \nabla _{i} f(\mathbf{x}^{*}))\), we have \((\mathbf{x}_{i}^{*})_{j}=(\mathbf{x}_{i}^{*})_{j}-(\alpha \nabla _{i} f(\mathbf{x}^{*}))_{j}\), so that \((\nabla _{i} f(\mathbf{x}^{*}))_{j}=0\). If \(j\in {\overline{\Gamma }}(\mathbf{x}_{i}^{*})\), then \(\mid (\mathbf{x}_{i}^{*})_{j}-(\alpha \nabla _{i} f(\mathbf{x}^{*}))_{j}\mid \le (\mathbf{x}_{i}^{*})^\downarrow _{s_{i}}\), which combined with the fact that \((\mathbf{x}_{i}^{*})_{j}=0\) implies that \(\mid \alpha \nabla _{i} f(\mathbf{x}^{*})\mid _{j}\le (\mathbf{x}_{i}^{*})^\downarrow _{s_{i}}\).
Suppose that \(\mathbf{x}^{*}\) satisfies (13). If \(\Vert \mathbf{x}_{i}^{*}\Vert _{0}< s_{i}\), then \((\mathbf{x}_{i}^{*})^\downarrow _{s_{i}}=0\). It follows that \((\nabla _{i} f(\mathbf{x}^{*}))_{j}=0\), in this case \(\Pi _{\Sigma _i}\left( \mathbf{x}_{i}^{*}-\alpha \nabla _{i} f\left( \mathbf{x}^{*}\right) \right) =\Pi _{\Sigma _i}\left( \mathbf{x}_{i}^{*}\right) \) is the set \(\{\mathbf{x}_{i}^{*}\}\). If \(\Vert \mathbf{x}_{i}^{*}\Vert _{0}= s_{i}\), then \((\mathbf{x}_{i}^{*})^\downarrow _{s_{i}}\ne 0\). We have
Therefore, the vector \(\mathbf{x}_{i}^{*}-\alpha \nabla _{i} f\left( \mathbf{x}^{*}\right) \) contains the \(s_{i}\) components of \(\mathbf{x}_{i}^{*}\) with the largest absolute value and all other components are smaller or equal to them, so that \(\mathbf{x}_{i}^{*} \in \Pi _{\Sigma _i}(\mathbf{x}_{i}^{*}-\alpha \nabla _{i} f\left( \mathbf{x}^{*}\right) )\).
1.3 C. Proof of Lemma 3.2 and Theorem 3.3
Suppose \(\mathbf{x}^{*}\) is B-stationary point, that is \(\nabla ^{B}_{ \Sigma _i}f(\mathbf{x}^{*})=\arg \min \{\Vert d_{i}+\nabla _{i}f(\mathbf{x}^{*})\Vert : d_{i}\in T_{ \Sigma _i}^{B}(\mathbf{x}_{i}^{*})\}\).
If \(\Vert \mathbf{x}_{i}^{*}\Vert _{0}= s_{i}\), we have
The above formula is equivalent to
Then we have
Next we prove \(\nabla ^{B}_{ \Sigma _i}f(\mathbf{x}^{*})=0\Longleftrightarrow \nabla _{i}f(\mathbf{x}^{*})=0\) when \(\Vert \mathbf{x}_{i}^{*}\Vert _{0}< s_{i}\). On the one hand, if \(\nabla _{i}f(\mathbf{x}^{*})=0\), then
On the other hand, if \(\nabla ^{B}_{ \Sigma _i}f(\mathbf{x}^{*})=0\), then
which leads to \(\Vert \nabla _{i}f(\mathbf{x}^{*})\Vert \le \Vert d_{i}+\nabla _{i}f(\mathbf{x}^{*})\Vert \) for any \(\Vert d_{i}\Vert _{0}\le s_{i},~\Vert \mathbf{x}_{i}^{*}+\nu d_{i}\Vert _{0}\le s_{i},~\forall ~\nu \in \mathbb {R}\). Particularly, for any \(j_{0}\in \{1,2,\ldots ,p_{i}\}\), we take \(d_{i}\) with \(\Gamma (d_{i})=\{j_{0}\}\). Apparently, \(\Vert \mathbf{x}_{i}^{*}+\nu d_{i}\Vert _{0}\le s_{i},~\forall ~\nu \in \mathbb {R}\). Then by valuing \(d_{j_{0}}=-(\nabla _{i}f(\mathbf{x}^{*}))_{j_{0}}\) and \(d_{j}=0,j\ne j_{0}\), we can get \((\nabla _{i}f(\mathbf{x}^{*}))_{j_{0}}=0\) because \(\Vert \nabla _{i}f(\mathbf{x}^{*})\Vert \le \Vert \nabla _{i}f(\mathbf{x}^{*})-(\nabla _{i}f(\mathbf{x}^{*}))_{j_{0}}\Vert \). By the arbitrariness of \(j_{0}\), we get \(\nabla _{i}f(\mathbf{x}^{*})=0\).
Since the conditions (14) is equivalent to (12), the B-stationary point is equivalent to the local minimizer. The proof of Theorem 3.3 is completed.
1.4 D. Proof of Lemma 3.3 and Theorem 3.4
If \((\mathbf{x}^{*})\) is a C-stationary point, then \(\nabla ^{C}_{ \Sigma _i}f(\mathbf{x}^{*})=\arg \min \{\Vert d_{i}+\nabla _{i}f(\mathbf{x}^{*})\Vert : d_{i}\in T_{ \Sigma _i}^{C}(\mathbf{x}_{i}^{*})\}\). So
which is equivalent to
Therefore, if \(\mathbf{x}^{*}\) is a C-stationary point, then
To sum up, we get the desired results.
If \(\Vert \mathbf{x}_{i}^{*}\Vert _{0}=s_{i},i=1,2,\ldots ,m\), the conditions (15) imply (12) and a point satisfying (12) is a local minimizer by Theorem 3.1, so a C-stationary point of (1) is a local minimizer. Suppose \(\mathbf{x}^{*}\) is a local minimizer, then it satisfies (12) which is a special case of (15), that is to say, it is a C-stationary point. The proof of Theorem 3.4 is completed.
1.5 E. Proof of Theorem 3.6
The original problem (1) can be written as
Because f is a strongly convex on \(\mathbb {R}^{p_{1}}_{T_1}\times \cdots \mathbb {R}^{p_{m}}_{T_m}\). Moreover, for any given \(|T_i|=s_{i},~i=1,2,\ldots ,m\), the inner problem of (28) is also a strongly convex program. Therefore the inner program admits a unique global minimizer denoted by \(\mathbf{x}_{i}^*(T_i),~i=1,2,\ldots ,m\). Note that \(T_i\subseteq [p_i],~i=1,2,\ldots ,m\). Thus there are finitely many \(T_i\) such that \(|T_i|=s_{i}\) and so are the inner programs. This indicates that \(\mathbf{x}_{i}^*(T_i)\) is finitely many. To derive the global minimizer of (28), we only pick one \(\mathbf{x}_{i}^*(T_i)\) that makes the objective function value of (28) minimal. Global minimizers exist.
We next show that any local minimizer \(\mathbf{x}^*\) is unique. To proceed with that, denote \(\delta :=\min \limits _{i=1,2,\ldots ,m}\{\delta _i\}\) where
Clearly, \(\delta _i\) and hence \(\delta >0\). Then, similar reasoning allows us to derive (26) for any \(\mathbf{x}\in \Sigma \cap N(\mathbf{x}^*,\delta )\). This and f being a strongly convex lead to
The above condition indicates \(\mathbf{x}^{*}\) is the unique global minimizer of the problem \(\min \{f(\mathbf{x}):\mathbf{x}\in \Sigma \cap N(\mathbf{x}^*,\delta )\}\), namely, \(\mathbf{x}^{*}\) is the unique local minimizer of (1).
Appendix B
1.1 A. Proof of Lemma 4.1
(1) It follows from (16) that \(\mathbf{x}^k(\alpha ) \in \Pi _{\Sigma }(\mathbf{x}^k -\alpha \nabla f(\mathbf{x}^k))\) and thus
which results in
This and the strong smoothness of f with the constant \(L_f\) derive that
where the last inequality is from \(0<\alpha \le 1/(\sigma +L_{f})\). Invoking the Armijo-type step size rule, one has \(\alpha _k\ge \gamma /(\sigma +L_{f})\), which by \(\alpha _k\le 1\) proves the desired assertion.
(2) By (23) and \(\mathbf{u}^k=\mathbf{x}^k(\alpha _k)\), we have
By the framework of Algorithm 1, if \(\mathbf{x}^{k+1}=\mathbf{u}^k\), then the above condition implies,
If \(\mathbf{x}^{k+1}=\mathbf{v}^k\), then we obtain
where the second and last inequalities used (32) and a fact \(\Vert \mathbf{a}+\mathbf{b}\Vert ^2\le 2\Vert \mathbf{a}\Vert ^2+2\Vert \mathbf{b}\Vert ^2\) for all vectors \(\mathbf{a}\) and \(\mathbf{b}\). Both cases lead to
Therefore, \(\{f(\mathbf{x}^k)\}\) is a non-increasing sequence, which with (35) and \(f \ge 0\) yields
The above condition suffices to \(\lim _{k\rightarrow \infty }\Vert \mathbf{x}^{k+1}-\mathbf{x}^k\Vert =\lim _{k\rightarrow \infty }\Vert \mathbf{u}^k-\mathbf{x}^k\Vert =0.\)
(3) Let \(\mathbf{x}^*\) be any accumulating point of \(\{\mathbf{x}^k\}\). Then there exists a subset M of \(\{0,1,2,\ldots \}\) such that \(\lim _{ k (\in M)\rightarrow \infty } \mathbf{x}^k = \mathbf{x}^*.\) This further implies \(\lim _{ k (\in M)\rightarrow \infty } \mathbf{u}^k = \mathbf{x}^*\) by applying 2). In addition, as stated in 1), we have \(\{\alpha _k\}\subseteq [ {\underline{\alpha }}, 1]\), which indicates that one can find a subsequence K of M and a scalar \(\alpha _*\in [ {\underline{\alpha }}, 1]\) such that \(\{\alpha _k: k \in K\}\rightarrow \alpha _*\). Overall, we have
Let \({\varvec{\eta }}^k :=\mathbf{x}^k -\alpha _k \nabla f(\mathbf{x}^k)\). The framework of Algorithm 1 implies
The first condition means \(\mathbf{u}^k \in \Sigma \) for any \( k \ge 1\). Note that \(\Sigma \) is closed and \(\mathbf{x}^*\) is the accumulating point of \(\{\mathbf{u}^k\}\) by (36). Therefore, \(\mathbf{x}^*\in \Sigma \), which results in
If the strict inequality holds in the above condition, then there is an \( \varepsilon _0>0\) such that
where the last equality is from (37). Taking the limit of both sides of the above condition along \( k (\in K)\rightarrow \infty \) yields \(\Vert \mathbf{x}^*- {\varvec{\eta }}^*\Vert -\varepsilon _0 \ge \Vert \mathbf{x}^* - {\varvec{\eta }}^*\Vert \) by (36) and (37), a contradiction with \( \varepsilon _0>0\). Therefore, we must have the equality holds in (38), showing that
The above relation means the conditions in (13) hold for \(\alpha =\alpha _*\), then these conditions must hold for any \(0<\alpha \le {\underline{\alpha }}\) due to \({\underline{\alpha }}\le \alpha _*\) from (36), namely,
displaying that \(\mathbf{x}^*\) is an \(\alpha \)-stationary point of (1), as desired.
1.2 B. Proof of Theorem 4.1
As shown in Lemma 4.1, one can find a subsequence of \(\{\mathbf{x}^ k \}\) that converges to the \(\alpha \)-stationary point \(\mathbf{x}^*\) with \(0<\alpha \le {\underline{\alpha }}\) of (1). Recall that an \(\alpha \)-stationary point \(\mathbf{x}^*\) is also a local minimizer by Theorem 3.2, which indicates that \(\mathbf{x}^*\) is unique because f(x) is restricted strong convex. In other words, \(\mathbf{x}^*\) is an isolated local minimizer of (1). Finally, it follows from \(\mathbf{x}^*\) being isolated, [8, Lemma 4.10] and \(\lim _{ k \rightarrow \infty }\Vert \mathbf{x}^{ k +1}-\mathbf{x}^ k \Vert =0\) by Lemma 4.1 that the whole sequence converges to the unique local minimizer \(\mathbf{x}^*\).
1.3 C. Proof of Lemma 4.2
1) If \(\Vert \mathbf{x}_i^*\Vert _0=s_i\), then by \(\mathbf{x}_i^k\rightarrow \mathbf{x}_i^*, \mathbf{u}_i^k\rightarrow \mathbf{x}_i^*\) and \(\Vert \mathbf{x}_i^k\Vert _0\le s_i,\Vert \mathbf{u}_i^k\Vert _0\le s_i\), we must have \(\Gamma (\mathbf{x}_i^*) \equiv \Gamma (\mathbf{x}_i^k)\equiv \Gamma (\mathbf{u}_i^k)\) for sufficiently large k. If \(\Vert \mathbf{x}_i^*\Vert _0<s_i\), similar reasoning allows for deriving \(\Gamma (\mathbf{x}_i^*) \subseteq \Gamma (\mathbf{x}_i^k) \) and \( \Gamma (\mathbf{x}_i^*) \subseteq \Gamma (\mathbf{u}_i^k)\).
2) According to Theorem 4.1, the limit point \(\mathbf{x}^*\) is a local minimizer of the problem (1). Therefore, it satisfies (12) from Theorem 3.1. We first conclude that one of the conditions in (17) must be satisfied for a large enough k. In fact, there are three cases for \(\mathbf{x}^*\), each of which can imply one condition in (17) as follows.
We now prove them one by one. From the Lipschitz continuity of \(\nabla f\), we have
The relation of Case 1) \(\Rightarrow \) Cond 1) can be launched by (24) immediately. For Case 2), we have \(\Gamma (\mathbf{x}_i^k)\equiv \Gamma (\mathbf{u}_i^k),~i\in I_{2}\) by (24) and
Therefore, Case 2) \(\Rightarrow \) Cond 2). Similarly, we can show the last relation.
Next, since f(x) is strongly convex, \(H^k_{\Gamma _{k}\Gamma _{k}}\) is non-singular, which means that the equations (19) are solvable. Finally, we show the inequality (20) is true when \(\sigma \in (0,l_f/2)\). In fact, the conditions (24) and (12) enable to derive
for sufficiently large k. Then it follows from (19) that
The above condition indicates that \(\Vert \mathbf{v}^k-\mathbf{u}^k\Vert \rightarrow 0\), resulting in
for sufficiently large k. We have the following chain of inequalities,
In general, for sufficiently large k, Newton steps can always be obtained.
1.4 D. Proof of Theorem 4.2
We first estimate \(\Vert \mathbf{u}^{k}-\mathbf{x}^*\Vert \). Recalling (16) that
and \(\Gamma _k=\Gamma (\mathbf{u}^k)\), we have
This enables us to deliver that
By Lemma 4.2 2), the Newton step is always admitted for sufficiently large k. Then direct calculations lead the following chain of inequalities,
which combining (44) can make the conclusion immediately.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sun, J., Kong, L. & Qu, B. A Greedy Newton-Type Method for Multiple Sparse Constraint Problem. J Optim Theory Appl 196, 829–854 (2023). https://doi.org/10.1007/s10957-022-02156-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-022-02156-2
Keywords
- Multiple sparse
- Stationary point
- Gradient projection Newton algorithm
- Convergence analysis
- Numerical experiment