A smoothing SQP framework for a class of composite $$L_q$$ minimization over polyhedron

Liu, Ya-Feng; Ma, Shiqian; Dai, Yu-Hong; Zhang, Shuzhong

doi:10.1007/s10107-015-0939-5

A smoothing SQP framework for a class of composite $L_q$ minimization over polyhedron

Full Length Paper
Series A
Published: 12 August 2015

Volume 158, pages 467–500, (2016)
Cite this article

Mathematical Programming Submit manuscript

Ya-Feng Liu¹,
Shiqian Ma²,
Yu-Hong Dai¹ &
…
Shuzhong Zhang³

1049 Accesses
27 Citations
Explore all metrics

Abstract

The composite $L_q~(0<q<1)$ minimization problem over a general polyhedron has received various applications in machine learning, wireless communications, image restoration, signal reconstruction, etc. This paper aims to provide a theoretical study on this problem. First, we derive the Karush–Kuhn–Tucker (KKT) optimality conditions for local minimizers of the problem. Second, we propose a smoothing sequential quadratic programming framework for solving this problem. The framework requires a (approximate) solution of a convex quadratic program at each iteration. Finally, we analyze the worst-case iteration complexity of the framework for returning an $\epsilon $-KKT point; i.e., a feasible point that satisfies a perturbed version of the derived KKT optimality conditions. To the best of our knowledge, the proposed framework is the first one with a worst-case iteration complexity guarantee for solving composite $L_q$ minimization over a general polyhedron.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

$\mathbf{C^{2}}$ -Lusin approximation of strongly convex functions

Article 03 April 2024

Efficiency of higher-order algorithms for minimizing composite functions

Article 10 October 2023

A new optimization approach to solving split equality problems in Hilbert spaces

Article 13 April 2024

Notes

Here, the differences between two norms ($\Vert \cdot \Vert _{\infty }$ in (5.40) and $\Vert \cdot \Vert $ in (5.42)) are neglected.

References

Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8(1), 141–148 (1988)
Article MathSciNet MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage–thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MathSciNet MATH Google Scholar
Ben-Tal, A., Nemirovski, A.: Lectures on Modern Convex Optimization. MPS–SIAM Series on Optimization. SIAM, Philadelphia (2001)
Book MATH Google Scholar
Berg, E.V.D., Friedlander, M.P.: Probing the pareto frontier for basis pursuit solutions. SIAM J. Sci. Comput. 31(2), 890–912 (2008)
Article MathSciNet MATH Google Scholar
Bertsekas, D.P.: Convex Analyis and Optimization. Athena Scientific, Massachusetts (2003)
Google Scholar
Bian, W., Chen, X.: Worst-case complexity of smoothing quadratic regularization methods for non-Lipschitzian optimization. SIAM J. Optim. 23(3), 1718–1741 (2013)
Article MathSciNet MATH Google Scholar
Bian, W., Chen, X.: Smoothing quadratic regularization methods for box constrained non-Lipschitz optimization in image restoration. Technical report, Hong Kong Polytechnic University (2014)
Bian, W., Chen, X., Ye, Y.: Complexity analysis of interior point algorithms for non-Lipschitz and nonconvex minimization. Math. Program. 149(1–2), 301–327 (2015)
Article MathSciNet MATH Google Scholar
Birbil, S.I., Fang, S.C., Frenk, J.B.G., Zhang, S.: Recursive approximation of the high dimensional max function. Oper. Res. Lett. 33(5), 450–458 (2005)
Article MathSciNet MATH Google Scholar
Birgin, E.G., Martínez, J.M., Raydan, M.: Nonmonotone spectral projected gradient methods on convex sets. SIAM J. Optim. 10(4), 1196–1211 (2000)
Article MathSciNet MATH Google Scholar
Boser, B.E., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: The Fifth Annual Workshop of Computational Learning Theory, pp. 144–152 (1992)
Bruckstein, A.M., Donoho, D.L., Elad, M.: From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Rev. 51(1), 34–81 (2009)
Article MathSciNet MATH Google Scholar
Candès, E.J., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory 51(12), 4203–4215 (2005)
Article MathSciNet MATH Google Scholar
Candès, E.J., Wakin, M.B., Boyd, S.P.: Enhancing sparsity by reweighted $\ell _1$ minimization. J. Fourier Anal. Appl. 14(5), 877–905 (2008)
Article MathSciNet MATH Google Scholar
Cartis, C., Gould, N.I.M., Toint, P.L.: Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results. Math. Program. 127(2), 245–295 (2011)
Article MathSciNet MATH Google Scholar
Cartis, C., Gould, N.I.M., Toint, P.L.: Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function- and derivative-evaluation complexity. Math. Program. 130(2), 295–319 (2011)
Article MathSciNet MATH Google Scholar
Cartis, C., Gould, N.I.M., Toint, P.L.: On the evaluation complexity of composite function minimization with applications to nonconvex nonlinear programming. SIAM J. Optim. 21(4), 1721–1739 (2011)
Article MathSciNet MATH Google Scholar
Chartrand, R.: Exact reconstruction of sparse signals via nonconvex minimization. IEEE Signal Process. Lett. 14(10), 707–710 (2007)
Article Google Scholar
Chartrand, R., Staneva, V.: Restricted isometry properties and nonconvex compressive sensing. Inverse Probl. 24(3), 1–14 (2008)
Article MathSciNet MATH Google Scholar
Chartrand, R., Yin, W.: Iteratively reweighted algorithms for compressive sensing. In: Internal Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3869–3872 (2008)
Chen, X.: Smoothing methods for nonsmooth, nonconvex minimization. Math. Program. 134(1), 71–99 (2012)
Article MathSciNet MATH Google Scholar
Chen, X., Ge, D., Wang, Z., Ye, Y.: Complexity of unconstrained $l_2$-$l_p$ minimization. Math. Program. 143(1–2), 371–383 (2014)
Article MathSciNet MATH Google Scholar
Chen, X., Ng, M.K., Zhang, C.: Non-Lipschitz $l_{{p}}$-regularization and box constrained model for image restoration. IEEE Trans. Image Process. 21(12), 4709–4721 (2012)
Article MathSciNet Google Scholar
Chen, X., Niu, L., Yuan, Y.: Optimality conditions and a smoothing trust region newton method for nonLipschitz optimization. SIAM J. Optim. 23(3), 1528–1552 (2013)
Article MathSciNet MATH Google Scholar
Chen, X., Xu, F., Ye, Y.: Lower bound theory of nonzero entries in solutions of $\ell _2$-$\ell _p$ minimization. SIAM J. Sci. Comput. 32(5), 2832–2852 (2010)
Article MathSciNet Google Scholar
Chen, X., Zhou, W.: Smoothing nonlinear conjugate gradient method for image restoration using nonsmooth nonconvex minimization. SIAM J. Imaging Sci. 3(4), 765–790 (2010)
Article MathSciNet MATH Google Scholar
Clarke, F.H.: Optimization and Nonsmooth Analysis. John Wiley, New York (1983)
MATH Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Dai, Y.H., Liao, L.Z.: R-linear convergence of the Barzilai and Borwein gradient method. IMA J. Numer. Anal. 22(1), 1–10 (2002)
Article MathSciNet MATH Google Scholar
Daubechies, I., DeVore, R., Fornasier, M., Güntürk, C.S.: Iteratively reweighted least squares minimization for sparse recovery. Commun. Pure Appl. Math. 63(1), 1–38 (2010)
Article MathSciNet MATH Google Scholar
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1359 (2001)
Article MathSciNet MATH Google Scholar
Foucart, S., Lai, M.J.: Sparsest solutions of underdetermined linear systems via $\ell _q$-minimization for $0 < q \le 1$. Appl. Comput. Harmon. Anal. 26(3), 395–407 (2009)
Article MathSciNet MATH Google Scholar
Foucart, S., Rauhut, H.: A Mathematical Introduction to Compressive Sensing. Springer, New York (2013)
Book MATH Google Scholar
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, New York (1979)
MATH Google Scholar
Garmanjani, R., Vicente, L.N.: Smoothing and worst case complexity for direct-search methods in non-smooth optimization. IMA J. Numer. Anal. 33(3), 1008–1028 (2013)
Article MathSciNet MATH Google Scholar
Ge, D., Jiang, X., Ye, Y.: A note on the complexity of $l_{p}$ minimization. Math. Program. 129(2), 285–299 (2011)
Article MathSciNet MATH Google Scholar
Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Progam. (2015). doi:10.1007/s10107-015-0871-8
Gould, N.I.M., Toint, P.L.: Preprocessing for quadratic programming. Math. Program. 100(1), 95–132 (2004)
MathSciNet MATH Google Scholar
Hale, E.T., Yin, W., Zhang, Y.: Fixed-point continuation applied to compressed sensing: implemetation and numerical experiments. J. Comput. Math. 28(2), 170–194 (2010)
MathSciNet MATH Google Scholar
Huang, J., Ma, S., Xie, H., Zhang, C.H.: A group bridge approach for variable selection. Biometrika 96(2), 339–355 (2009)
Article MathSciNet MATH Google Scholar
Ji, S., Sze, K.F., Zhou, Z., So, A.M.C., Ye, Y.: Beyond convex relaxation: A polynomial-time non-convex optimization approach to network localization. In: IEEE Conference on Computer Communications (INFOCOM), pp. 2499–2507 (2013)
Jiang, B., Dai, Y.H.: A framework of constraint preserving update schemes for optimization on stiefel manifold. Math. Program. (2015). doi:10.1007/s10107-014-0816-7
Jiang, B., Zhang, S.: Iteration bounds for finding $\epsilon $-stationary points of structured nonconvex optimization. Technical report, University of Minnesota (2014)
Lai, M.J., Wang, J.: An unconstrained $\ell _q$ minimization with $0<q\le 1$ for sparse solution of underdetermined linear systems. SIAM J. Optim. 21(1), 82–101 (2011)
Article MathSciNet Google Scholar
Lai, M.J., Xu, Y., Yin, W.: Improved iteratively reweighted least squares for unconstrained smoothed $l_{q}$ minimization. SIAM J. Numer. Anal. 51(2), 927–957 (2013)
Article MathSciNet MATH Google Scholar
Liu, Y.F., Dai, Y.H., Luo, Z.Q.: Joint power and admission control via linear programming deflation. IEEE Trans. Signal Process. 61(6), 1327–1338 (2013)
Article MathSciNet Google Scholar
Liu, Y.F., Dai, Y.H., Ma, S.: Joint power and admission control: non-convex $l_q$ approximation and an effective polynomial time deflation approach. IEEE Trans. Signal Process. 63(14), 3641–3656 (2015)
Article MathSciNet Google Scholar
Lu, Z.: Iterative reweighted minimization methods for $l_p$ regularized unconstrained nonlinear programming. Math. Program. 147(1–2), 277–307 (2014)
Mitliagkas, I., Sidiropoulos, N.D., Swami, A.: Joint power and admission control for ad-hoc and cognitive underlay networks: convex approximation and distributed implementation. IEEE Trans. Wireless Commun. 10(12), 4110–4121 (2011)
Article Google Scholar
Mourad, N., Reilly, J.P.: Minimizing nonconvex functions for sparse vector reconstruction. IEEE Trans. Signal Process. 58(7), 3485–3496 (2010)
Article MathSciNet Google Scholar
Nesterov, Y.: A method of solving a convex programming problem with convergence rate $\text{ O }(1/k^2)$. Sov. Math. Dokl. 27(2), 372–376 (1983)
MATH Google Scholar
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Article MathSciNet MATH Google Scholar
Nikolova, M., Ng, M.K., Zhang, S., Ching, W.K.: Efficient reconstruction of piecewise constant images using nonsmooth nonconvex minimization. SIAM J. Imaging Sci. 1(1), 2–25 (2008)
Article MathSciNet MATH Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, New York (2006)
MATH Google Scholar
Papadimitriou, C.H.: Computational Complexity. Addison-Wesley, Massachusetts (1994)
MATH Google Scholar
Rao, B.D., Kreutz-delgado, K.: An affine scaling methodology for best basis selection. IEEE Trans. Signal Process. 47(1), 187–200 (1999)
Article MathSciNet MATH Google Scholar
Sun, Q.: Recovery of sparsest signals via $\ell _q$-minimization. Appl. Comput. Harmon. Anal. 32(3), 329–341 (2012)
Article MathSciNet MATH Google Scholar
Sun, W., Yuan, Y.: Optimization Theory and Methods: Nonlinear Programming. Springer, New York (2006)
MATH Google Scholar
Vazirani, V.V.: Approximation Algorithms. Springer, New York (2001)
MATH Google Scholar
Wagner, M., Meller, J., Elber, R.: Large-scale linear programming techniques for the design of protein folding potentials. Math. Program. 101(2), 301–318 (2004)
Article MathSciNet MATH Google Scholar
Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57(7), 2479–2493 (2009)
Article MathSciNet Google Scholar
Ye, Y.: Interior Point Algorithms-Theory and Analysis. Wiley, New York (1997)
Book MATH Google Scholar
Yun, S., Toh, K.C.: A coordinate gradient descent method for $l_1$-regularized convex minimization. Comput. Optim. Appl. 48(2), 273–307 (2011)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

We would like to thank Prof. Xiaojun Chen and Dr. Wei Bian for many insightful comments, which helped us in improving the results in this paper. We thank Dr. Qingna Li and Dr. Xin Liu for many useful discussions on an early version of this paper. We also thank Prof. Alexander Shapiro, the anonymous Associate Editor and referees for their constructive comments, which significantly improved the presentation of the paper.

Author information

Authors and Affiliations

LSEC, ICMSEC, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China
Ya-Feng Liu & Yu-Hong Dai
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong, China
Shiqian Ma
Department of Industrial and Systems Engineering, University of Minnesota, Minneapolis, MN, 55455, USA
Shuzhong Zhang

Authors

Ya-Feng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shiqian Ma
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Hong Dai
View author publications
You can also search for this author in PubMed Google Scholar
Shuzhong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ya-Feng Liu.

Additional information

Y.-F. Liu was partially supported by NSFC Grants 11331012 and 11301516. S. Ma was partially supported by the Hong Kong Research Grants Council General Research Fund Early Career Scheme (Project ID: CUHK 439513). Y.-H. Dai was partially supported by the China National Funds for Distinguished Young Scientists Grant 11125107, the Key Project of Chinese National Programs for Fundamental Research and Development Grant 2015CB856000, NSFC Grant 71331001, and the CAS Program for Cross & Cooperative Team of the Science & Technology Innovation. S. Zhang was partially supported by NSF Grant CMMI-1462408.

Appendices

Appendix 1: Three motivating applications

Support Vector Machine [11, 28]. The support vector machine (SVM) is a state-of-the-art classification method introduced by Boser, Guyon, and Vapnik in 1992 in [11]. Given a database $\left\{ s_m\in \mathbb {R}^{N-1},\,y_m\in \mathbb {R}\right\} _{m=1}^M,$ where $s_m$ is called pattern or example and $y_m$ is the label associated with $s_m.$ For convenience, we assume the labels are $+1$ for positive examples and $-1$ for negative examples. If the data are linearly separable, the task of SVM is to find a linear discriminant function of the form $\ell (s)=\hat{s}^Tx$ with $\hat{s}=[s^T,1]^T\in \mathbb {R}^{N}$ such that all data are correctly classified and at the same time the margin of the hyperplane $\ell $ that separates the two classes of examples is maximized. Mathematically, the above problem can be formulated as

$$\begin{aligned} \begin{array}{l@{\quad }l} \displaystyle \min _{x} &{} \displaystyle \frac{1}{2}\sum _{n=1}^{N-1}x_n^2 \\ \text{ s.t. } &{} \displaystyle y_m\hat{s}_m^Tx\ge 1,\quad m=1,2,\ldots ,M. \end{array} \end{aligned}$$

(6.1)

In practice, data are often not linearly separable. In this case, problem (6.1) is not feasible, and the following problem can be solved instead:

$$\begin{aligned} \displaystyle \min _{x} \displaystyle \sum _{m=1}^M\max \left\{ 1-y_m\hat{s}_m^Tx,0\right\} ^q+\frac{\rho }{2}\sum _{n=1}^{N-1}x_n^2, \end{aligned}$$

(6.2)

where the constant $\rho \ge 0$ balances the relative importance of minimizing the classification errors and maximizing the margin. Problem (6.2) with $q=1$ is called the soft-margin SVM in [28]. It is clear that problem (6.2) is a special instance of (1.1) with

$$\begin{aligned} A=\left[ \begin{array}{c} y_1\hat{s}_1^T \\ \vdots \\ y_M\hat{s}_M^T \\ \end{array} \right] ,~b=e,~h(x)=\frac{\rho }{2}\sum _{n=1}^{N-1}x_n^2,\quad \text {and}~\mathcal{X}=\mathbb {R}^N. \end{aligned}$$

Here, e is the all-one vector of dimension M.

Joint Power and Admission Control [46, 49]. Consider a wireless network consisting of K interfering links (a link corresponds to a transmitter/receiver pair) with channel gains $g_{kj}\ge 0$ (from the transmitter of link j to the receiver of link k), noise power $\eta _k>0,$ signal-to-interference-plus-noise-ratio (SINR) target $\gamma _k>0,$ and power budget $\bar{p}_k>0$ for $k, j=1,2,\ldots ,K.$ Denoting the transmission power of transmitter k by $x_k$, the SINR at the kth receiver can be expressed as

$$\begin{aligned} \displaystyle \text {SINR}_k=\frac{g_{kk}x_k}{\eta _k+\displaystyle \sum _{j\ne k}g_{kj}x_j},\quad k=1,2,\ldots ,K. \end{aligned}$$

(6.3)

Due to the existence of mutual interferences among different links [which correspond to the term $\sum _{j\ne k}g_{kj}x_j$ in (6.3)], the linear system

$$\begin{aligned} \displaystyle \text {SINR}_k\ge \gamma _k,~\bar{p}_k\ge x_k\ge 0,\quad k=1,2,\ldots ,K \end{aligned}$$

may not be feasible. The joint power and admission control problem aims at supporting a maximum number of links at their specified SINR targets while using a minimum total transmission power. Assuming without loss of generality that $g_{kk}=\gamma _k=\bar{p}_k=1$ for all $k=1,2,\dots ,K,$ the joint power and admission control problem can be formulated as follows (see [46])

$$\begin{aligned} \begin{array}{ll} \displaystyle \min _{x} &{}\quad \left\| \max \left\{ b-Ax,{0}\right\} \right\| _q^q+\rho e^Tx \\ \text{ s.t. } &{}\quad \displaystyle {0}\le x\le e, \end{array} \end{aligned}$$

(6.4)

where $\rho >0$ is a parameter, $b=[\eta _1,\eta _2,\ldots ,\eta _K]^T,$ and $A=[a_{kj}]\in \mathbb {R}^{K\times K}$ with

$$\begin{aligned} a_{kj}=\left\{ \begin{array}{ll} 1,&{}\quad \text {if }k=j;\\ - g_{kj},\quad &{}\quad \text {if }k\ne j. \end{array} \right. \end{aligned}$$

By utilizing the special structure of A, i.e., all of its diagonal entries are positive and off-diagonal entries are nonpositive, it is shown in [47, Theorem 1] that the solution of problem (6.4) can maximize the number of supported links using a minimum total transmission power as long as q is chosen to be sufficiently small (but not necessarily to be zero). Clearly, (6.4) is a special case of (1.1) with

$$\begin{aligned} M=K,~N=K,~h(x)=\rho e^Tx,\quad \text {and}~\mathcal{X}=\left\{ x\,|\,{0}\le x\le e\right\} \subseteq \mathbb {R}^{N}. \end{aligned}$$

Linear Decoding Problem [13]. Given the coding matrix $C\in \mathbb {R}^{K_1\times K_2}$ and corrupted measurement $c=Cx+e_u\in \mathbb {R}^{K_1},$ where $e_u$ is an unknown vector of errors, the linear decoding problem is to recover x from c. It is shown in [13] that, if C satisfies the restricted isometry property, x can be exactly recovered by solving the convex minimization problem

$$\begin{aligned} \min _{x}\Vert c-Cx\Vert _1 \end{aligned}$$

provided that $e_u$ is sparse. By [33, Theorem 4.10], $L_q$ ($q\in (0,1)$) minimization

$$\begin{aligned} \min _{x}\Vert c-Cx\Vert _q^q \end{aligned}$$

(6.5)

has a better capability of recovering x than $L_1$ minimization. By using the equation $|a|=\max \left\{ a,0\right\} +\max \left\{ -a,0\right\} ,$ it is simple to see problem (6.5) is a special case of (1.1) with

$$\begin{aligned} M=2K_1,\quad N=K_2,\quad A=\left[ \begin{array}{c} C \\ \\ -C \\ \end{array} \right] ,\quad b=\left[ \begin{array}{c} c \\ \\ -c \\ \end{array} \right] ,\quad h(x)=0,\quad \text {and}~\mathcal{X}=\mathbb {R}^N. \end{aligned}$$

Appendix 2: Proof of Lemma 3.2

Let $\bar{x}$ be any local minimizer of problem (3.2) with $\mathcal{I}_{\bar{x}},\,\mathcal{J}_{\bar{x}},$ and $\mathcal{K}_{\bar{x}}$ given in (3.1). For convenience, we denote $\mathcal{I}_{\bar{x}},\,\mathcal{J}_{\bar{x}},\,\mathcal{K}_{\bar{x}}$ as $\mathcal{I},\,\mathcal{J},\,\mathcal{K}$ in this proof. We prove that $\bar{x}$ is a local minimizer of problem (1.1) by dividing the proof into two parts. The first one is the easy case where $\mathcal{K}=\emptyset $ and the second one deals with the complicated case where $\mathcal{K}\ne \emptyset .$

Part 1: $\mathcal{K}=\emptyset .$ In this case, $\bar{x}$ is a local minimizer of problem

$$\begin{aligned} \begin{array}{l@{\quad }l} \displaystyle \min _{x} &{} \displaystyle \Vert (b-Ax)_{{\mathcal{J}}}\Vert _q^q + h(x) \\ \text{ s.t. } &{} x\in \mathcal{X}. \end{array} \end{aligned}$$

By the definition, $\bar{x}$ is a local minimizer of problem (1.1).

Part 2: $\mathcal{K}\ne \emptyset .$ Consider the feasible direction cone $\mathcal {D}_{\bar{x}}$ of problem (1.1) at point $\bar{x},$ i.e.,

$$\begin{aligned} \mathcal{D}_{\bar{x}}=\left\{ \,d\,|\,\bar{x}+\alpha d\in \mathcal{X}~\text {for some}~\alpha >0\right\} . \end{aligned}$$

For simplicity, we use $\mathcal {D}$ to denote $\mathcal {D}_{\bar{x}}$ in the subsequent proof. For any subset $\mathcal{K}_{p}$ of $\mathcal{K}$ indexed by $p=1,2,\ldots ,P:=2^{|\mathcal{K}|},$ let $\mathcal{K}_{p}^{c}=\mathcal{K}\setminus \mathcal{K}_{p},$ and define

$$\begin{aligned} \mathcal{D}_p=\left\{ d\,|\,(Ad)_{\mathcal{K}_{p}}\le 0,(Ad)_{\mathcal{K}_{p}^{c}}\ge 0\right\} \bigcap \mathcal{D}. \end{aligned}$$

By the Minkowski–Weyl Theorem [5, Proposition 3.2.1], there exist $d_p^1, d_p^2, \ldots , d_p^{g_p} \in \mathcal{D}_p$, such that

$$\begin{aligned} \mathcal{D}_p={\text{ Cone }}\left\{ d_p^{1},d_p^{2},\ldots ,d_p^{g_p}\right\} , \end{aligned}$$

and thus

$$\begin{aligned} \mathcal{D}=\displaystyle \bigcup _{p=1}^P{\text{ Cone }}\left\{ d_p^{1},d_p^{2},\ldots ,d_p^{g_p}\right\} . \end{aligned}$$

Without loss of generality, assume $\Vert d_p^{j}\Vert =1$ for all $j=1,2,\ldots ,g_p,~p=1,2,\ldots ,P.$ For any $d\in \bigcup _{p=1}^P\left\{ d_p^{1},d_p^{2},\ldots ,d_p^{g_p}\right\} \subseteq \mathcal{D}$, define

$$\begin{aligned} \overleftarrow{\mathcal{K}}^{d}=\left\{ m\in \mathcal{K}\,|\,(Ad)_m<0\right\} . \end{aligned}$$

(6.6)

Next, we consider the two cases where $\overleftarrow{\mathcal{K}}^{d}$ is nonempty and empty, respectively. The former happens when d is not a feasible direction of problem (3.2) at point $\bar{x};$ while the latter happens when d is a feasible direction of problem (3.2) at point $\bar{x}.$

Case 1: $\overleftarrow{\mathcal{K}}^{d}\ne \emptyset .$ Since $d\in \mathcal{D},$ there must exist $\epsilon _0^d$ so that $\bar{x}+\epsilon d\in \mathcal{X}$ holds for all $0\le \epsilon \le \epsilon _0^d.$ Define

$$\begin{aligned} \overrightarrow{\mathcal{J}}^{d}=\left\{ m\in \mathcal{J}\,|\,(Ad)_m>0\right\} . \end{aligned}$$

(6.7)

Choose $\epsilon _1^d$ small enough such that

$$\begin{aligned} \left( b-A\bar{x}-\epsilon _1^d Ad\right) _m\le 0,\quad \forall ~m\in {\mathcal{I}} \end{aligned}$$

(6.8)

and

$$\begin{aligned} \left( b-A\bar{x}-\epsilon _1^d Ad\right) _m\ge \frac{\left( b-A\bar{x}\right) _m}{2}>0,\quad \forall ~m\in \overrightarrow{\mathcal{J}}^{d}. \end{aligned}$$

(6.9)

Therefore, for $0\le \epsilon \le \min \left\{ \epsilon _0^d,\epsilon _1^d\right\} $, we obtain

$$\begin{aligned}&f(\bar{x}+\epsilon d)- f(\bar{x})\nonumber \\&\quad =\Vert \max \left\{ b-A\left( \bar{x}+\epsilon d\right) ,0\right\} \Vert _q^q-\Vert \max \left\{ b-A\bar{x},0\right\} \Vert _q^q\nonumber \\&\quad =\sum _{m\in \overleftarrow{\mathcal{K}}^{d}\cup \mathcal{J}} \left( b-A\bar{x}-\epsilon Ad\right) _m^q-\sum _{m\in \mathcal{J}}\left( b-A\bar{x}\right) _m^q \end{aligned}$$

(6.10)

$$\begin{aligned}&\quad \ge \sum _{m\in \overleftarrow{\mathcal{K}}^{d}}(-Ad)_m^q\epsilon ^q + \sum _{m\in {\overrightarrow{\mathcal{J}}^{d}}} \left( \left( b-A\bar{x}-\epsilon Ad\right) _m^q-\left( b-A\bar{x}\right) _m^q\right) \end{aligned}$$

(6.11)

$$\begin{aligned}&\quad \ge \sum _{m\in \overleftarrow{\mathcal{K}}^{d}}(-Ad)_m^q\epsilon ^q + \sum _{m\in {\overrightarrow{\mathcal{J}}^{d}}} q \left( b-A\bar{x}-\epsilon Ad\right) _m^{q-1}\left( -\epsilon (Ad)_m\right) \end{aligned}$$

(6.12)

$$\begin{aligned}&\quad \ge \sum _{m\in \overleftarrow{\mathcal{K}}^{d}}(-Ad)_m^q\epsilon ^q + \sum _{m\in {\overrightarrow{\mathcal{J}}^{d}}} q \left( \frac{(b-A\bar{x})_m}{2}\right) ^{q-1}\left( -\epsilon (Ad)_m\right) , \end{aligned}$$

(6.13)

where (6.10) is due to (3.1), (6.6), and (6.8); (6.11) is due to (6.7); (6.12) is due to the concavity of the function $z^q$ with respect to $z>0;$ (6.13) is due to (6.9) and the definition of $\overrightarrow{\mathcal{J}}^{d}$ in (6.7). Moreover, by (1.3) and the Taylor’s expansion, for any $0\le \epsilon \le 1$, there exists $\xi \in (0,1)$ such that

$$\begin{aligned} h(\bar{x}+\epsilon d)-h(\bar{x})= & {} \epsilon \nabla h(\bar{x}+\xi \epsilon d)^Td\nonumber \\\ge & {} -\epsilon \left\| \nabla h(\bar{x}+\xi \epsilon d)\right\| \nonumber \\\ge & {} -\epsilon \left( \left\| \nabla h(\bar{x})\right\| +\epsilon L_h\right) \\\ge & {} -\epsilon \left( \left\| \nabla h(\bar{x})\right\| +L_h\right) .\nonumber \end{aligned}$$

(6.14)

Combining (6.13) with (6.14), for any $0\le \epsilon \le \min \left\{ \epsilon _0^d,\epsilon _1^d,1\right\} ,$ we obtain

$$\begin{aligned} F(\bar{x}+\epsilon d)- F(\bar{x})\ge \lambda _1^d\epsilon ^q-\lambda _2^d\epsilon , \end{aligned}$$

where

$$\begin{aligned}&\lambda _1^d:=\sum _{m\in \overleftarrow{\mathcal{K}}^{d}}(-Ad)_m^q\epsilon ^q>0,\\&\lambda _2^d:=\sum _{m\in \overrightarrow{\mathcal{J}}^{d}} q \left( \frac{(b-A\bar{x})_m}{2}\right) ^{q-1}(Ad)_m+\left\| \nabla h(\bar{x})\right\| +L_h>0. \end{aligned}$$

Define

$$\begin{aligned} \epsilon _2^d:=\left( \frac{\lambda _1^d}{\lambda _2^d}\right) ^{\frac{1}{1-q}} \end{aligned}$$

and

$$\begin{aligned} \bar{\epsilon }^d:= \min \left\{ \epsilon _0^d,\epsilon _1^d,\epsilon _2^d,1\right\} >0. \end{aligned}$$

From the above analysis, we can conclude that, for any $d\in \bigcup _{p=1}^P\left\{ d_p^{1},d_p^{2},\ldots ,d_p^{g_p}\right\} $ with $\overleftarrow{\mathcal{K}}^{d}\ne \emptyset ,$ $F(\bar{x}+\epsilon d)\ge F(\bar{x})$ holds for all $\epsilon \in [0,\bar{\epsilon }^d].$

Case 2: $\overleftarrow{\mathcal{K}}^{d}=\emptyset .$ Recall the definition of $\overleftarrow{\mathcal{K}}^{d}$ (cf. (6.6)). $\overleftarrow{\mathcal{K}}^{d}=\emptyset $ implies that d is a feasible direction of problem (3.2) at point $\bar{x}.$ From the assumption that $\bar{x}$ is a local minimizer of problem (3.2), we know that there exists an $\tilde{\epsilon }>0$ such that for all $d\in \bigcup _{p=1}^P\left\{ d_p^{1},d_p^{2},\ldots ,d_p^{g_p}\right\} $ with $\overleftarrow{\mathcal{K}}^{d}=\emptyset ,$ there holds $F(\bar{x}+\epsilon d)\ge F(\bar{x})$ for all $\epsilon \in [0,\tilde{\epsilon }].$

We now combine the above two cases: Case 1 and Case 2. Since there are finitely many directions $\bigcup _{p=1}^P\left\{ d_p^{1},d_p^{2},\ldots ,d_p^{g_p}\right\} ,$ it follows that

$$\begin{aligned} \bar{\epsilon }:=\min \left\{ \min _{\overleftarrow{\mathcal{K}}^{d}\ne \emptyset ,\,j=1,\ldots ,g_p,\,p=1,\ldots ,P}\left\{ \bar{\epsilon }^{d_p^j}\right\} ,\tilde{\epsilon }\right\} >0 \end{aligned}$$

and

$$\begin{aligned} \bar{x}+\epsilon d_p^j\in \mathcal{X},~F(\bar{x}+\epsilon d_p^j)\ge F(\bar{x}),~\forall ~j=1,2,\ldots ,g_p,\,p=1,2,\ldots ,P\nonumber \\ \end{aligned}$$

(6.15)

hold true for all $\epsilon \in [0, \bar{\epsilon }].$

Let ${\text{ Conv }}_p(\bar{x},\bar{\epsilon })$ denote the convex hull spanned by points $\bar{x}$ and $\bar{x}+\bar{\epsilon }d_p^{j},~j=1,2,\ldots ,g_p.$ Then, for any $x\in \bigcup _{p=1}^P{\text{ Conv }}_p(\bar{x},\bar{\epsilon }),$ we have $F(x)\ge F(\bar{x})$ by (6.15) and the fact that f(x) is concave in ${\text{ Conv }}_p(\bar{x},\bar{\epsilon })$. Furthermore, one can always choose a sufficiently small but fixed $\epsilon >0$ such that $B(\bar{x},\epsilon )\bigcap \mathcal{X}\subseteq \bigcup _{p=1}^P{\text{ Conv }}_p(\bar{x},\bar{\epsilon }).$ Therefore, $\bar{x}$ is a local minimizer of problem (1.1). $\square $

Appendix 3: Proof of Lemma 4.1

We show the three items of Lemma 4.1 separately.

(i)
of Lemma 4.1: it follows directly from the inequality
$$\begin{aligned} \theta ^q(t)\le \theta ^q(t,\mu )\le \left( \theta (t)+\frac{\mu }{2}\right) ^q\le \theta ^q(t)+\left( \frac{\mu }{2}\right) ^q,\quad \forall ~t\le \mu . \end{aligned}$$
(ii)
of Lemma 4.1: When $t\ne 0$ and $t\ne \mu ,$ $\theta ^q(t,\mu )$ is twice continuously differentiable with respect to t. Recall $\theta ^q(t,\mu )\ge \left( \mu /2\right) ^q$ for all t (cf. (4.2)). Then it follows from (4.4) that
$$\begin{aligned} \left| \left[ \theta ^q(t,\mu )\right] ''\right| \le 4q\mu ^{q-2},\quad \forall ~t\notin \left\{ 0,\mu \right\} . \end{aligned}$$
This further implies (ii) of Lemma 4.1.
(iii)
of Lemma 4.1: By the mean-value theorem [27, Theorem 2.3.7], we have
$$\begin{aligned} \theta ^q(t,\mu )= \theta ^q(\hat{t},\mu )+\left[ \theta ^q(\hat{t},\mu )\right] '\left( t-\hat{t}\right) +\frac{\upsilon }{2}\left( t-\hat{t}\right) ^2,\end{aligned}$$
(6.16)
where $\upsilon \in \partial _{t}\left( \left[ \theta ^q(\xi \hat{t}+(1-\xi )t,\mu )\right] '\right) $ and $\xi \in [0,1].$ We consider the following three cases.
- Case $\hat{t}>2\mu :$ Since $t-\hat{t}\ge {-\hat{t}}/{2},$ it follows for any $\xi \in [0,1]$ that
  $$\begin{aligned} \xi t+(1-\xi )\hat{t}=\hat{t}+\xi (t-\hat{t})\ge \hat{t}/2>\mu . \end{aligned}$$
  This, together with (4.4) and (4.5), implies that the $\upsilon $ in (6.16) satisfies
  $$\begin{aligned} \upsilon \le 0= \kappa (\hat{t},\mu ). \end{aligned}$$
  From this and (6.16), we obtain (4.6).
- Case $\hat{t}\in [-\mu , 2\mu ]:$ From (ii) of Lemma 4.1, $|\upsilon |$ is uniformly bounded by $\kappa (\hat{t},\mu )={{4q}\mu ^{q-2}}.$ Combining this with (6.16) yields (4.6).
- Case $\hat{t}<-\mu :$ Since $t-\hat{t}\le \mu ,$ for any $\xi \in [0,1],$ it follows
  $$\begin{aligned} \xi t+(1-\xi )\hat{t}=\hat{t}+\xi (t-\hat{t})< 0. \end{aligned}$$
  From this, (4.4), (4.5), and (6.16), we can obtain (4.6).

This completes the proof of Lemma 4.1. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, YF., Ma, S., Dai, YH. et al. A smoothing SQP framework for a class of composite $L_q$ minimization over polyhedron. Math. Program. 158, 467–500 (2016). https://doi.org/10.1007/s10107-015-0939-5

Download citation

Received: 27 July 2014
Accepted: 27 July 2015
Published: 12 August 2015
Issue Date: July 2016
DOI: https://doi.org/10.1007/s10107-015-0939-5

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A smoothing SQP framework for a class of composite \(L_q\) minimization over polyhedron

Abstract

Access this article

Similar content being viewed by others

$\mathbf{C^{2}}$ -Lusin approximation of strongly convex functions

Efficiency of higher-order algorithms for minimizing composite functions

A new optimization approach to solving split equality problems in Hilbert spaces

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix 1: Three motivating applications

Appendix 2: Proof of Lemma 3.2

Appendix 3: Proof of Lemma 4.1

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

A smoothing SQP framework for a class of composite \(L_q\) minimization over polyhedron

Abstract

Access this article

Similar content being viewed by others

$\mathbf{C^{2}}$ -Lusin approximation of strongly convex functions

Efficiency of higher-order algorithms for minimizing composite functions

A new optimization approach to solving split equality problems in Hilbert spaces

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix 1: Three motivating applications

Appendix 2: Proof of Lemma 3.2

Appendix 3: Proof of Lemma 4.1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation