Skip to main content
Log in

Sparse and robust SVM classifier for large scale classification

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Support vector machine (SVM) has drawn wide attention in various fields, such as image classification, pattern recognition and disease diagnosis and so on. Nevertheless, it requires much memory and runs very slow in large-scale datasets setting. To reduce computational cost and the required storage, we first design a new sparse and robust SVM model according to our construct truncated smoothly clipped absolute deviation (SCAD) loss and then establish its first-order necessary and sufficient optimality conditions though newly defined proximal stationary point. Following the idea of the proximal stationary point, we define the Lts support vectors of Lts-SVM and then show that the Lts support vectors are a small portion of the whole training set, which is convenient for us to introduce a novel working set in each step. We next present a new alternating direction method of multipliers with working set (Lts-ADMM) to deal with Lts-SVM, which proves that the proposed algorithm not only converges to a local minimizer of Lts-SVM but also possesses a relatively low computational complexity. Finally, the extensive numerical experiments demonstrate that the proposed algorithm enjoys better performance than nine leading state-of-the-art methods with regard to best classification accuracy, smallest number of support vectors and super fast computational speed in large-scale datasets setting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/

  2. https://www.kaggle.com/datasets

  3. http://archive.ics.uci.edu/ml/datasets.php

References

  1. Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20(3):273–297

    Article  MATH  Google Scholar 

  2. Wang HJ, Shao YH, Zhou SL, Zhang C, Xiu NH (2022) Support vector machine classifier via l0/1 soft-margin loss. IEEE Trans Pattern Anal Mach Intell 40(10):7253–7265

    Article  Google Scholar 

  3. Wen Y, Ma J, Yuan C, Yang L (2020) Projection multi-birth support vector machine for multi-classification. Appl Intell 50(13):1–17

    Google Scholar 

  4. Wang HJ, Shao YH (2022) Fast truncated Huber loss SVM for large scale classification. Knowl-Based Syst 26:1–17

    Google Scholar 

  5. Zhou SL (2022) Sparse SVM for sufficient data reduction. IEEE Trans Pattern Anal Mach Intell 44(9):5560–5571

    Google Scholar 

  6. Akram-Ali-Hammouri Z, Fernandez-Delgado M, Cernadas E, Barro S (2022) Fast support vector classification for large-scale problems. IEEE Trans Pattern Anal Mach Intell 44(10):6184–6195

    Article  Google Scholar 

  7. Niu DB, Wang CJ, Tang PP, Wang QS, Song E (2022) An efficient algorithm for a class of large-scale support vector machines exploiting hidden sparsity. IEEE Trans Signal Process 99:1–16

    MathSciNet  Google Scholar 

  8. Huang XL, Shi L, Suykens JAK (2014) Support vector machine classifier with pinball loss. IEEE Trans Pattern Anal Mach Intell 36(5):984–997

    Article  Google Scholar 

  9. Tanveer M, Sharma S, Rastogi R, Anand P (2022) Sparse support vector machine with pinball loss. IEEE Trans Emer Tele Tech 32(2):1–13

    Google Scholar 

  10. Shen X, Niu LF, Qi ZQ, Tian YJ (2017) Support vector machine classifier with truncated pinball loss. Pattern Recognit 68:199–210

    Article  Google Scholar 

  11. Wang HR, Xu YT, Zhou ZJ (2021) Twin-parametric margin support vector machine with truncated pinball loss. Neural Comput Appl 33(8):3781–3798

    Article  Google Scholar 

  12. Yin J, Li QN (2019) A semismooth Newton method for support vector classification and regression. Comput Optim Appl 73(2):477–508

    Article  MathSciNet  MATH  Google Scholar 

  13. Xiao XS, Xu YT, Zhang Y, Zhong PW (2022) A novel self-weighted Lasso and its safe screening rule. Appl Intell 52(12):14465–14477

    Article  Google Scholar 

  14. Yan YQ, Li QN (2021) An efficient augmented Lagrangian method for support vector machine. Optim Method Softw 35(4):855–883

    Article  MathSciNet  MATH  Google Scholar 

  15. Feng RX, Xu YT (2022) Support matrix machine with pinball loss for classification. Neural Comput Appl 34(21):18643–18661

    Article  Google Scholar 

  16. Wang HM, Yitian Xu YT (2022) A safe double screening strategy for elastic net support vector machine. Inf Sci 582:382–397

    Article  MathSciNet  Google Scholar 

  17. Zhu BZ, Ye SX, Wang P, Chevallier JL, Wei LM (2022) Forecasting carbon price using a multi-objective least squares support vector machine with mixture kernels. J Forecast, Online, https://doi.org/10.1002/for.2784

  18. Zhao J, Xu YT, Fujita H (2019) An improved non-parallel universum support vector machine and its safe sample screening rule. Knowl -Based Syst 170:79–88

    Article  Google Scholar 

  19. Allen-Zhu Z (2018) Katyusha: the first direct acceleration of stochastic gradient methods. J Mach Learn Res 18:1–51

    MathSciNet  MATH  Google Scholar 

  20. Zhu WX, Song YY, Xiao YY (2022) Support vector machine classifier with huberized pinball loss. Eng Appl Artif Intell 91:1–16

    Google Scholar 

  21. Wang HJ, Shao YH, Xiu NH (2022) Proximal operator and optimality conditions for ramp loss SVM. Optim Lett 16(3):999–1014

    Article  MathSciNet  MATH  Google Scholar 

  22. Wang HR, Xu YT, Zhou ZJ (2022) Ramp loss KNN-weighted multi-class twin support vector machine. Soft Comput 26(14):6591–6618

    Article  Google Scholar 

  23. Pang XY, Zhao J, Xu YT (2022) A novel ramp loss-based multi-task twin support vector machine with multi-parameter safe acceleration. Neural Netw 150:194–212

    Article  Google Scholar 

  24. Park SY, Liu YF (2021) Robust penalized logistic regression with truncated loss functions. Can J Stat 39(2):300–323

    Article  MathSciNet  MATH  Google Scholar 

  25. Feng YL, Yang Y, Huang XL, Mehrkanoon S Suykens JAK (2018) Robust support vector machines for classification with nonconvex and smooth losses. Neural Comput 28(6):1217–1247

    Article  MathSciNet  MATH  Google Scholar 

  26. Yang LM, Dong HW (2018) Support vector machine with truncated pinball loss and its application in pattern recognition. Chemometrics Intell Lab Syst 177:89–99

    Article  Google Scholar 

  27. Chang X, Liu S, Zhao P, Song D (2019) A generalization of linearized alternating direction method of multipliers for solving two-block separable convex programming. J Comput Appl Math 357(2):251–272

    Article  MathSciNet  MATH  Google Scholar 

  28. Zhou SL, Xiu NH, Qi HD (2021) Global and quadratic convergence of newton hard-thresholding pursuit. J Mach Learn Res 22:1–45

    MathSciNet  MATH  Google Scholar 

  29. Wang R, Xiu NH, Zhou SL (2022) An extended newton-type algorithm for L2-regularized sparse logistic regression and its efficiency for classifying large-scale datasets. J Comput Appl Math, Online. https://doi.org/10.1016/j.cam.2022.113656

    MATH  Google Scholar 

  30. Guan L, Qiao LB, Li DS, Sun T, Ge KS, Lu XC (2018) An efficient ADMM -based algorithm to nonconvex penalized support vector machines. In: Proc Int Conf Data Mining Workshops, pp 1209–1216

  31. Dong W, Wozniak M, Wu JS, Li WG, Bai ZW (2022) De-noising aggregation of graph neural networks by using principal component analysis. IEEE Trans Industr Inform, Online, https://doi.org/10.1109/TII.2022.3156658

  32. Dong W, Wu JS, Zhang XW, Bai ZW, Peng Wang P, Wozniak M (2022) Improving performance and efficiency of graph neural networks by injective aggregation. Knowl -Based Syst, Online, https://doi.org/10.1016/j.knosys.2022.109616

Download references

Acknowledgements

The authors sincerely thank the associate editor and the five referees for their constructive comments, which have significantly improved the quality of the paper. This work is supported by the National Natural Science Foundation of China (11971052, 11871183), the Changsha Municipal Natural Science Foundation (kq2208214), and the Scientific Research Fund of Hunan Provincial Education Department (22C0152).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huajun Wang.

Ethics declarations

Competing interests

he authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Proofs of all theorems

Appendix A: Proofs of all theorems

1.1 A.1 Proof of Lemma 3.1

Proof

From (7), it is evidently that the Lts(p) is separable, which makes us to get (9). Next, we split the proof of explicit expression (10) into the following two cases:

  1. (i)

    For pi≠ 0, the ts(pi) is differentiable. Therefore, we can yield ts(pi) = 0 for pi ≥ 1 or pi < 0, ts(pi) = (8 − 8t)/3 for pi ∈ [1/2,1) and ts(pi) = 4/3 for pi ∈ (0,1/2).

  2. (ii)

    For pi = 0, the ts(pi) is non-differentiable. Based on Definition 3.1, we can get that ts(pi) = 0 with pi < 0 and ts(pi) = 4/3 with pi ∈ (0,1/2). Hence, we obtain ts(0) ∈ [0,4/3].

Therefore, we yield (10). This completes the proof. □

1.2 A.2 Proof of Lemma 3.2

Proof

According to (5) and (11), we yield that \(\text {prox}_{\lambda \tau \ell _{ts}}(r)\) is the minimizer of following function

$$ \begin{array}{@{}rcl@{}} \phi(z):= \begin{cases} \phi_1(z):=\lambda\tau+ \frac{(z-r)^2}{2}, & z\geq1,\\ \phi_2(z):=\frac{-4z^2+8z-1}{3}\lambda\tau+ \frac{(z-r)^2}{2}, & z \in(1/2, 1),\\ \phi_3(z):=\frac{4z}{3}\lambda\tau+ \frac{(z-r)^2}{2}, & z \in(0, 1/2],\\ \phi_4(z):=r^2/2, & z=0,\\ \phi_5(z):=(z-r)^2/2, & z<0. \end{cases} \end{array} $$

Evidently, the minimizers of ϕ1(z), ϕ2(z), ϕ3(z), ϕ4(z) and ϕ5(z) are obtained at \(z^{*}_{1}=r\), \(z^{*}_{2}=\frac {3r-8\lambda \tau }{3-8\lambda \tau }\), \(z^{*}_{3}=\) r − 4λτ/3, \(z^{*}_{4}=0\) and \(z^{*}_{5}=r\) respectively.

  1. (i)

    For λτ ∈ (0,3/8), we compare the five values of \(\phi (z^{*}_{1}),\phi (z^{*}_{2}),\phi (z^{*}_{3})\), \(\phi (z^{*}_{4})\) and \(\phi (z^{*}_{5})\) to obtain conclusion:

  1. (a1)

    As r ≥ 1, we get \(\min \limits \{\phi (z^{*}_{2}),\phi (z^{*}_{3}),\phi (z^{*}_{4}),\phi (z^{*}_{5})\}>\phi (z^{*}_{1})\), which means \(z^{*}=z^{*}_{1}=r\).

  2. (a2)

    As \(r\in [\frac {1}{2}+\frac {4\lambda \tau }{3},1)\), we obtain \(\min \limits \{\phi (z^{*}_{1}),\phi (z^{*}_{3}), \phi (z^{*}_{4}),\phi (z^{*}_5)\}> \phi (z^{*}_2)\), which implies \(z^{*}=z^{*}_2=\frac {3r-8\lambda \tau }{3-8\lambda \tau }\).

  3. (a3)

    As \(r\in (\frac {4\lambda \tau }{3},\frac {1}{2}+\frac {4\lambda \tau }{3})\), we have \(\min \limits \{\phi (z^{*}_1), \phi (z^{*}_2), \phi (z^{*}_4), \phi (z^{*}_{5}) \}>\phi (z^{*}_{3})\), which represents \(z^{*}=z^{*}_{3}=r-4\lambda \tau /3\).

  4. (a4)

    As \(r\in [0,\frac {4\lambda \tau }{3}]\), we receive \(\min \limits \{\phi (z^{*}_{1}),\phi (z^{*}_{2}),\phi (z^{*}_{3}), \phi (z^{*}_5)\}>\phi (z^{*}_4)\), which gets \(z^{*}=z^{*}_4=0\).

  5. (a5)

    As r < 0, we derive \(\min \limits \{\phi (z^{*}_1),\phi (z^{*}_2),\phi (z^{*}_3), \phi (z^{*}_4)\}>\phi (z^{*}_5)\), which obtains \(z^{*}=z^{*}_5=r\). By (a1)–(a5), we get (12).

  1. (ii)

    For λτ ∈ [3/8,9/8), by contrasting the five values of \(\phi (z^{*}_1),\phi (z^{*}_{2}),\phi (z^{*}_3), \phi (z^{*}_4)\) and \(\phi (z^{*}_5)\), we yield conclusion:

  1. (b1)

    As \(r>\frac {3}{4}+\frac {2\lambda \tau }{3}\), we get \(\min \limits \{\phi (z^{*}_2),\phi (z^{*}_3),\phi (z^{*}_4),\) \(\phi (z^{*}_5)\} >\phi (z^{*}_{1})\), which obtains \(z^{*}=z^{*}_{1}=r\).

  2. (b2)

    As \(r=\frac {3}{4}+\frac {2\lambda \tau }{3}\), we derive \(\min \limits \{\phi (z^{*}_{2}),\phi (z^{*}_{4}),\) \(\phi (z^{*}_{5})\}> \phi (z^{*}_{1})=\phi (z^{*}_{3})\), which can obtain \(z^{*}=z^{*}_{1}=r\) or \(z^{*}= z^{*}_{3}=r-4\lambda \tau /3\).

  3. (b3)

    As \(r\in (\frac {4\lambda \tau }{3},\frac {3}{4}+\frac {2\lambda \tau }{3})\), we obtain \(\min \limits \{\phi (z^{*}_{1}),\phi (z^{*}_{2}), \phi (z^{*}_4),\phi (z^{*}_5)\}>\phi (z^{*}_3)\), which means \(z^{*}=z^{*}_3=r-4\lambda \tau /3\).

  4. (b4)

    As \(r\in [0,\frac {4\lambda \tau }{3}]\), we receive \(\min \limits \{\phi (z^{*}_1),\phi (z^{*}_2),\phi (z^{*}_3), \phi (z^{*}_{5})\}>\phi (z^{*}_{4})\), which means \(z^{*}=z^{*}_{4}=0\).

  5. (b5)

    As r < 0, we get \(\min \limits \{\phi (z^{*}_{1}),\phi (z^{*}_{2}),\phi (z^{*}_{3}),\phi (z^{*}_{4})> \phi (z^{*}_{5})\), which gets \(z^{*}=z^{*}_{4}=r\). By (b1)-(b5), we get (13).

  1. (iii)

    For λτ ≥ 9/8, by contrasting the five values of \(\phi (z^{*}_{1}),\phi (z^{*}_{2}),\phi (z^{*}_{3}), \phi (z^{*}_{4})\) and \(\phi (z^{*}_{5})\), we yield conclusion:

  1. (c1)

    As \(r>\sqrt {2\lambda \tau }\), we get \(\min \limits \{\phi (z^{*}_{2}),\phi (z^{*}_{3}),\phi (z^{*}_{4}), \) \(\phi (z^{*}_{5})\}> \phi (z^{*}_{1})\), which implies \(z^{*}=z^{*}_{1}=r\).

  2. (c2)

    As \(r=\sqrt {2\lambda \tau }\), we obtain \(\min \limits \{\phi (z^{*}_{2}),\phi (z^{*}_{3}),\phi (z^{*}_{5})\}> \phi (z^{*}_{1})=\phi (z^{*}_{4})\), which means \(z^{*}=z^{*}_{1}=r\) or \(z^{*}=z^{*}_{4}=0\).

  3. (c3)

    As \(r\in [0,\sqrt {2\lambda \tau })\), we derive \(\min \limits \{\phi (z^{*}_{1}),\phi (z^{*}_{2}),\) \(\phi (z^{*}_{4}), \phi (z^{*}_5)\}>\phi (z^{*}_4)\), which means \(z^{*}=z^{*}_4=0\).

  4. (c4)

    As r < 0, we obtain \(\min \limits \{\phi (z^{*}_1),\phi (z^{*}_2),\phi (z^{*}_3),\) \(\phi (z^{*}_4)\}> \phi (z^{*}_5)\), which obtains \(z^{*}=z^{*}_5=r\). By (c1)-(c4), we get (14), which completes the whole proof.

1.3 A.3 Proof of Lemma 3.3

Proof

From the separate property of Lts(u), it is evidently that \([\text {prox}_{\lambda \tau L_{ts}}({ \textbf {r}})]_i=\text {prox}_{\lambda \tau \ell _{ts}}({ r_i})\), where

$$ \text{prox}_{\lambda\tau \ell_{ts}}(r_{i})={\arg}\min_{z\in {\mathbb{R}}} \lambda\tau \ell_{ts}(z)+ \frac{1}{2}(z-r_{i})^{2}, i\in [m]. $$

Therefore, we yield (15). This completes the proof. □

1.4 A.4 Proof of Theorem 3.1

Proof

Let (w;b;p) be a local minimizer of (8). It is evidently that there exists \(\boldsymbol {\theta }^{*}\in \mathbb {R}^{m}\) such that (w;b;p) is a KKT point of (8). In other words, we have

$$ \begin{array}{@{}rcl@{}} \left\{ \begin{array}{lll} \textbf{w}^{*} + M^{\top}{\boldsymbol{\theta}^{*}} = \textbf{0}, \\ \langle \textbf{y}, {\boldsymbol{\theta}^{*}} \rangle = 0, \\ \textbf{p}^{*}+M \textbf{w}^{*}+b^{*}\textbf{y} = {\textbf{1}}, \\ \boldsymbol{\theta}^{*}+\tau \partial L_{ts}(\textbf{p}^{*}) & \ni & \textbf{0}. \end{array}\right. \end{array} $$
(34)

According to (16) and (34), it can be clearly seen that to show (16), we only prove that for any 0 < λλ, if the (𝜃;p) satisfies \(\textbf {0}\in \boldsymbol {\theta }^{*}+\tau \partial L_{ts}(\textbf {p}^{*})\), then we receive \(\textbf {p}^{*}\in \P :=\text {prox}_{\lambda \tau L_{ts}}(\textbf {p}^{*}-\tau {\boldsymbol {\theta }^{*}}).\) From \(\textbf {0}\in \boldsymbol {\theta }^{*}+\tau \partial L_{ts}(\textbf {p}^{*})\), (9) and (10), we have that

$$ \theta^{*}_{i}\in \begin{cases} 0, & p^{*}_{i}\geq1, \\ -8(1-p^{*}_{i})\tau/3, & p^{*}_{i}\in[1/2,1),\\ -4\tau/3, & p^{*}_{i}\in(0,1/2),\\ [-4\tau/3,0], & p^{*}_{i}=0,\\ 0, & p^{*}_{i}<0. \end{cases} i\in[m]. $$
(35)

Based on Lemma 3.2, we prove (16) by three scenarios: λτ ∈ (0,3/8), λτ ∈ [3/8,9/8) and λτ ≥ 9/8.

Case (i)::

For τ > 0 and λ ∈ (0,λ] satisfying λτ ∈ (0,3/8), we consider the following five cases.

  1. (i)

    For \(i\in \mathbb {S}^{*}\), we yield \(p^{*}_i> 1\) and get \(\theta ^{*}_i=0\) by (35), which implies that

    $$ u_{i}^{*}:=p^{*}_{i}-\lambda \theta^{*}_{i}=p^{*}_{i}>1. $$
    (36)
  2. (ii)

    For \(i\in \mathbb {T}^{*}\), we have \(p^{*}_i\in [1/2,1]\) and receive \(\theta ^{*}_i=-8(1-p^{*}_i)\tau /3\) from (35). Hence, we derive

    $$ u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}=p^{*}_{i}+8\lambda\tau(1-p^{*}_{i})/3, $$
    (37)

    which together with \(\lambda \leq \lambda ^{*}_1=3/(8\tau )\) gets

    $$ u_{i}^{*}\leq p^{*}_{i}+8(3/8\tau)\tau (1-p^{*}_{i})/3=1. $$
    (38)

    From (37), \(p^{*}_i\in [1/2,1)\) and \(\lambda \leq \lambda ^{*}_1=3/(8\tau )\), we obtain

    $$ u_{i}^{*}\geq1/2+4\lambda\tau/3, $$

    which together with (38) results in

    $$u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}\in[1/2+4\lambda\tau/3, 1].$$
  3. (iii)

    For \(i\in \mathbb {E}^{*}\), we receive \(p^{*}_i\in (0,1/2)\) and obtain \(\theta ^{*}_i=-4\tau /3\) based on (35). Therefore, we can obtain

    $$ u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}=p^{*}_{i}+4\lambda\tau/3, $$
    (39)

    which combine with \(p^{*}_i\in (0,1/2)\) yields

    $$ u_{i}^{*}=p^{*}_{i}+4\lambda\tau/3\in(4\lambda\tau/3,1/2+4\lambda\tau/3). $$
    (40)
  4. (iv)

    For \(i\in \mathbb {I}^{*}\), we have \(p^{*}_i=0\) and \(\theta ^{*}_i\in [-4\tau /3,0]\) by (35), which yields

    $$ u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}\in[0,4\lambda\tau/3]. $$
  5. (v)

    For \(i\in \mathbb {O}^{*}\), we get \(p^{*}_i<0\) and \(\theta _i^{*}=0\) by (35), which results in

    $$ u^{*}_{i}=p^{*}_{i}-\lambda \theta^{*}_{i}=p^{*}_{i}<0. $$

Based on the above (i)-(v), we yield (12). This completes the proof of Case (i). □

Case (ii)::

For τ > 0 and λ ∈ (0,λ] satisfying λτ ∈ [3/8,9/8), we consider five cases as below.

  1. (i)

    For \(i\in \mathcal { S}^{*}\), we have \(p^{*}_i>3/4\). By \(\lambda \leq \lambda _2^{*}=(12p^{*}_i-9)/(8\tau )\), we yield

    $$ p^{*}_{i}\geq 3/4+2\lambda\tau/3, $$

    which combine with λτ ∈ [3/8,9/8) obtains \(p^{*}_i\geq 1\) and \(\theta ^{*}_i=0\) by (35). Therefore, we get

    $$ u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}=p^{*}_{i}\geq3/4+2\lambda\tau/3. $$
  2. (ii)

    For \(i\in \mathcal { I}^{*}\), we have \(p^{*}_i=3/4\) and \(\theta ^{*}_i=-8(1-p^{*}_i)\tau /3\) from (35). Hence, we obtain

    $$ \begin{array}{@{}rcl@{}} u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}=3/4+2\lambda\tau/3. \end{array} $$
  3. (iii)

    For \(i\in \mathcal { T}^{*}\), we have \(p^{*}_i\in (0,3/4)\). From \(\lambda \leq \lambda _3^{*}=3(3-4p^{*}_i)/8\tau \) and λτ ∈ [3/8,9/8), we obtain

    $$ \begin{array}{@{}rcl@{}} p^{*}_{i}<3/4-2/3\lambda\tau\in(0,1/2), \end{array} $$

    which together with (35) yields \(\theta ^{*}_i=-4 \tau /3\) and

    $$ \begin{array}{@{}rcl@{}} u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}\in(4\lambda \tau/3,3/4+2\lambda \tau/3). \end{array} $$
  4. (iv)

    For \(i\in \mathbb {I}^{*}\), we have \(p^{*}_i=0\) and \(\theta ^{*}_i\in [-4\tau /3,0]\) by (35), which gets

    $$ u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}\in[0,4\lambda\tau/3]. $$
  5. (v)

    For \(i\in \mathbb {O}^{*}\), we receive \(p^{*}_i<0\) and \(\theta _i^{*}=0\), which implies

    $$ u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}=p^{*}_{i}<0. $$

From the above (i)-(v), we receive (13). This completes the proof Case(ii). \( \Box \)

Case (iii)::

For τ > 0 and λ ∈ (0,λ] satisfying λτ ≥ 9/8, we consider three cases as follows.

  1. (i)

    For \(i\in T^{*}:=\mathbb {S}^{*} \cup \mathbb {T}^{*}\cup \mathbb {E}^{*}\), we get \(p^{*}_i>0\). By \(\lambda \leq \lambda _5^{*}\), we have

    $$ p^{*}_{i}\geq \sqrt{2\lambda_{5}^{*}\tau}\geq\sqrt{2\lambda\tau}, $$

    which together with λτ ≥ 9/8 obtains \(\sqrt {2\lambda \tau }\geq 3/2\) and \(\theta ^{*}_i=0\) by (35). Hence, we obtain

    $$ u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}=p^{*}_{i}\geq\sqrt{2\lambda\tau}. $$
  2. (ii)

    For \(i\in \mathbb {I}^{*}\), we have \(p^{*}_i=0\) and \(\theta ^{*}_i\in [-4\tau /3, 0]\) by (35). From \(\lambda \leq \lambda _4^{*}\leq 9/8\tau \), we have

    $$ u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}\in[0,3/2], $$

    which together with λτ ≥ 9/8 gets \(\sqrt {2\lambda \tau }\geq 3/2\) and

    $$ \begin{array}{@{}rcl@{}} u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}\in[0,\sqrt{2\lambda\tau}]. \end{array} $$
  3. (iii)

    For \(i\in \mathbb {O}^{*}\), we have \(p^{*}_i<0\) and \(\theta _i^{*}=0\), which means

    $$ u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}=p^{*}_{i}<0. $$

From the above (i)-(iii), we yield (14). This completes the whole proof.

1.5 A.5 Proof of Theorem 3.2

Proof

Firstly, we denote Ω := {χ := (w;b;p) : p + Mw + by = 1} and χ := (w;b;p). For any χ ∈Ω, it is observed that p + Mw + by = 1, which combine with (16) results in

$$ -M(\textbf{w}-\textbf{w}^{*}) = (b -b^{*})\textbf{y}+\textbf{p}-\textbf{p}^{*}. $$
(41)

According to the convexity of ∥w2, we derive

$$ \begin{array}{@{}rcl@{}} \|\textbf{w}\|^{2}-\|\textbf{w}^{*}\|^{2} & \geq & 2\langle\textbf{w}-\textbf{w}^{*},\textbf{w}^{*}\rangle\\ & \overset{(16)}{=}& -2\langle \boldsymbol{\theta}^{*}, M(\textbf{w}-\textbf{w}^{*})\rangle\\ & \overset{(41)}{=} & 2\langle\boldsymbol{\theta}^{*}, \textbf{p}-\textbf{p}^{*}\rangle+2 (b -b^{*})\langle\boldsymbol{\theta}^{*}, \textbf{y}\rangle\\ & \overset{(16)}{=} & 2\langle \boldsymbol{\theta}^{*}, \textbf{p}-\textbf{p}^{*}\rangle. \end{array} $$

Based on the above facts, we will show that χ is a local minimizer of (8). That is to say, there is a neighborhood N(χ,ρ) := {χ : ∥χχ∥≤ ρ} of χ ∈Ω with ρ > 0 such that for any χ ∈Ω∩ N(χ,ρ), we receive

$$ \begin{array}{@{}rcl@{}} \frac{1}{2}\|\textbf{w} \|^{2}+\tau L_{ts}(\textbf{p})\geq \frac{1}{2}\| \textbf{w}^{*} \|^{2}+\tau L_{ts}(\textbf{p}^{*}). \end{array} $$
(42)

To show (42), we consider the following three scenarios: λτ ∈ (0,3/8), λτ ∈ [3/8,9/8) and λτ ≥ 9/8.

Case (i)::

For λτ ∈ (0,3/8), define \(\textbf {s}^{*}:=\textbf {p}^{*}-\lambda \boldsymbol {\theta }^{*}\in \mathbb {R}^m\),

$$ \begin{array}{@{}rcl@{}} \mathcal{ A}_{1}^{*}&:=&\!\{i\!\in\![m]\!: s^{*}_{i} \!<\! 0\},\mathcal{ A}_{2}^{*}:=\{i\in[m]: s^{*}_{i} \in[0,\widetilde{\tau}]\},\\ \mathcal{ A}_{3}^{*}&:=&\{i\in[m]: s^{*}_{i} \in(\widetilde{\tau},\overline{\tau}),~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\\ \mathcal{ A}_{4}^{*}&:=&\{i\in[m]: s^{*}_{i} \in[\overline{\tau},1),~ \mathcal{A}_{5}^{*}:=\{i\in[m]: s^{*}_{i}\geq1\}.~~ \end{array} $$
(43)

where \(\widetilde {\tau }:=4\lambda \tau /3\) and \(\overline {\tau }:=1/2+4\lambda \tau /3\). From (12), (15) and (43), it is evidently that \(\textbf {p}^{*}\in \text {prox}_{\lambda \tau L_{ts}}(\textbf {p}^{*}-\lambda {\boldsymbol {\theta }^{*}})\) is identical to

$$ \begin{array}{@{}rcl@{}} {\textbf{p}^{*}} =\left[\begin{array}{c} (\textbf{p}^{*}-\lambda \boldsymbol{\theta}^{*})_{\mathcal{A}_{1}^{*}} \\ {\bf0}_{\mathcal{ A}_{2}^{*}} \\ (\textbf{p}^{*}-\lambda \boldsymbol{\theta}^{*}-\frac{4\lambda\tau }{3}\textbf{1})_{\mathcal{A}_{3}^{*}} \\ \frac{(3(\textbf{p}^{*}-\lambda \boldsymbol{\theta}^{*})-8\lambda\tau \textbf{1})_{\mathcal{A}_{4}^{*}}}{3-8\lambda\tau}\\ (\textbf{p}^{*}-\lambda \boldsymbol{\theta}^{*})_{\mathcal{A}_{5}^{*}} \end{array}\right], \end{array} $$

which is equivalent to

$$ \begin{array}{@{}rcl@{}} \boldsymbol{\theta}^{*}_{\mathcal{ A}_{1}^{*}} & = & \textbf{0}_{\mathcal{A}_{1}^{*}},~\boldsymbol{p}^{*}_{\mathcal{ A}_{2}^{*}}={\bf0}_{\mathcal{ A}_{2}^{*}}, \boldsymbol{\theta}^{*}_{\mathcal{ A}_{3}^{*}}=\frac{-4\tau}{3} \textbf{1}_{\mathcal{A}_{3}^{*}}\\ \boldsymbol{\theta}^{*}_{\mathcal{ A}_{4}^{*}} & = & - \frac{8\tau(\textbf{1}-\textbf{p}^{*})_{\mathcal{ A}_{4}^{*}}}{3},~\boldsymbol{\theta}^{*}_{\mathcal{ A}_{5}^{*}}=\textbf{0}_{\mathcal{ A}_{5}^{*}}. \end{array} $$

This combine with (43) leads to

$$ \begin{array}{@{}rcl@{}} \theta_{i}^{*} & = & 0,~~p_{i}^{*}<0,~i\in \mathcal{ A}^{*}_{1}, \end{array} $$
(44)
$$ \begin{array}{@{}rcl@{}} \theta_{i}^{*} & \in & [-4\tau/3,0],~~ p_{i}^{*}=0,~i \in \mathcal{ A}^{*}_{2}, \end{array} $$
(45)
$$ \begin{array}{@{}rcl@{}} \theta_{i}^{*} & \in & -4\tau/3,~~ p_{i}^{*}\in(0,1/2),~i \in \mathcal{ A}^{*}_{3},\\ \theta_{i}^{*} & = & -8\tau(1-p^{*}_{i})/3,~~p_{i}^{*}\in[1/2,1),~i \in \mathcal{ A}^{*}_{4}, \\ \theta_{i}^{*} & = & 0,~~p^{*}_{i}\geq 1,~i \in \mathcal{ A}^{*}_{5}. \end{array} $$
(46)

Denote \(\mathcal {A}^{*}:=\mathcal { A}_2^{*}\cup \mathcal { A}_3^{*}\cup \mathcal { A}_4^{*}\) and \(\overline {\mathcal { A}}^{*}:=\mathcal { A}_1^{*}\cup \mathcal { A}_5^{*}\). To proof (42), by (42)-(44), we only need to show two facts:

$$ \tau L_{ts}(\textbf{p}_{{\mathcal{ A}}^{*} })-\tau L_{ts}(\textbf{p}^{*}_{{\mathcal{ A}}^{*} })+\langle \boldsymbol{\theta}^{*}_{\mathcal{ A}^{*}},\textbf{p}_{\mathcal{ A}^{*}}-\textbf{p}^{*}_{\mathcal{ A}^{*}} \rangle\geq0,\\ $$
(47)
$$ \tau L_{ts}(\textbf{p}_{\overline{\mathcal{ A}}^{*} })- \tau L_{ts}(\textbf{p}^{*}_{\overline{\mathcal{A}}^{*} })\geq0. $$
(48)

According to (5) and (42)-(44), we yield desired conclusion:

  1. (i)

    For \(i\in {\mathscr{H}}^{*}:=\mathcal { A}_1^{*}\cup \mathcal { A}_3^{*}\cup \mathcal { A}_4^{*}\cup \mathcal { A}_5^{*}\), we obtain that the \(\ell _{ts}(p^{*}_i)\) is differentiable. From Definition 3.1, we get that there exists a neighborhood \(N(\boldsymbol {\chi }^{*},\overline {\rho })\) with \(\overline {\rho }>0\) such that for any \(\boldsymbol {\chi }\in {\Omega }\cap N(\boldsymbol {\chi }^{*},\overline {\rho })\), we yield

    $$ \begin{array}{@{}rcl@{}} \tau \ell_{ts}(p_{i})-\tau \ell_{ts}(p^{*}_{i})-\langle\tau \nabla\ell_{ts}(p^{*}_{i}) ,p_{i}-p^{*}_{i} \rangle\geq0, \end{array} $$

    which together with \(-\tau \nabla \ell _{ts}(p^{*}_i)=-4\tau /3= \theta ^{*}_i\), for \(i\in \mathcal { A}_3^{*}\) and \(-\tau \nabla \ell _{ts}(p^{*}_i)=-8\tau (1-p^{*}_i)/3= \theta ^{*}_i\), for \(i\in \mathcal {A}_4^{*}\) makes us to get (47) and combine with \(\nabla \ell _{ts}(p^{*}_i)=0\), for \(i\in \mathcal {A}_1^{*}\cup \mathcal {A}_5^{*}\) allows us to obtain (48).

  2. (ii)

    For \(i\in \mathcal { A}_2^{*}\), we can see that the \(\ell _{ts}(p^{*}_i)\) is non-differentiable. From Definition 3.1, we also receive that there exists a neighborhood \(N(\boldsymbol {\chi }^{*},\widetilde {\rho })\) with \(\widetilde {\rho }>0\) such that for any \(\boldsymbol {\chi }\in {\Omega }\cap N(\boldsymbol {\chi }^{*},\widetilde {\rho })\), we obtain

    $$ \tau \ell_{ts}(p_{i})-\tau \ell_{ts}(p^{*}_{i})-\langle\tau \partial\ell_{ts}(p^{*}_{i}) ,p_{i}-p^{*}_{i} \rangle\geq0, $$

    which together with \(-\tau \partial \ell _{ts}(p^{*}_i)=\theta _i^{*}\in [-4\tau /3,0]\), for \(i\in \mathcal { A}_2^{*}\) leads to (47).

By the above (i) and (ii), we show (w;b;p) is local minimizer of problem (8) in a local region Ω ∩ N(χ,ρ) with \(\rho ^{*}:=\min \limits \{\overline {\rho },\widetilde {\rho }\}\). This completes the proof of Case (i).

Case (ii)::

For λτ ∈ [3/8,9/8), let \(\textbf {s}^{*}=\textbf {p}^{*}-\lambda \boldsymbol {\theta }^{*}\in \mathbb {R}^m\),

$$ \begin{array}{@{}rcl@{}} \mathcal{ C}_{1}^{*}:=\{i\in[m]: s^{*}_{i} < 0\}, \mathcal{ C}_{2}^{*}:=\{i\in[m]: s^{*}_{i} \in [0,\widetilde{\tau}]\},\\ \mathcal{ C}_{3}^{*}:=\{i\in[m]: s^{*}_{i} \in(\widetilde{\tau},\overline{\tau})~\text{or}~s^{*}_{i}=\widehat{\tau}, \theta^{*}_{i}\neq 0\},~~~~~~~~~~~\\ \mathcal{ C}^{*}_{4}:=\{i\in[m]: s^{*}_{i}>\widehat{\tau}~\text{or}~s^{*}_{i}=\widehat{\tau},\theta^{*}_{i}= 0\},~~~~~~~~~~~~~~~~~~ \end{array} $$
(49)

where \(\widetilde {\tau }=4\lambda \tau /3\) and \(\widehat {\tau }:=3/4+2\lambda \tau /3\). From (15) and (13), we get that \(\textbf {p}^{*}\in \text {prox}_{\lambda \tau L_{ts}}(\textbf {p}^{*}-\lambda {\boldsymbol {\theta }^{*}})\) is equivalent to

$$ \begin{array}{@{}rcl@{}} {\textbf{p}^{*}} =\left[\begin{array}{c} (\textbf{p}^{*}-\lambda \boldsymbol{\theta}^{*})_{\mathcal{C}_{1}^{*}}\\ \textbf{0}_{\mathcal{ C}_{2}^{*}} \\ (\textbf{p}^{*}-\lambda \boldsymbol{\theta}^{*}-\frac{4\lambda \tau }{3} \textbf{1})_{\mathcal{ C}_{3}^{*}} \\ (\textbf{p}^{*}-\lambda \boldsymbol{\theta}^{*})_{\mathcal{ C}_{4}^{*}} \end{array}\right], \end{array} $$

which is identical to

$$ \begin{array}{@{}rcl@{}} \boldsymbol{\theta}^{*}_{\mathcal{ C}_{1}^{*}}&=&\textbf{0}_{\mathcal{ C}_{1}^{*}},~~~ \textbf{p}^{*}_{\mathcal{ C}_{2}^{*}}=\textbf{0}_{\mathcal{ C}_{2}^{*}}, ~~~ \boldsymbol{\theta}^{*}_{\mathcal{C}_{3}^{*}}=-\frac{4\tau}{3}\textbf{e}_{\mathcal{C}_{3}^{*}},\\ \boldsymbol{\theta}^{*}_{\mathcal{C}_{4}^{*}}&=&\textbf{0}_{\mathcal{C}_{4}^{*}}. \end{array} $$
(50)

This combine with (49) obtains

$$ \begin{array}{@{}rcl@{}} \theta_{i}^{*} & = & 0,p_{i}^{*}<0, i~ \in \mathcal{ C}^{*}_{1},\\ \theta_{i}^{*} & \in & [-\frac{4\tau}{3},0], p_{i}^{*}=0,i \in \mathcal{ C}^{*}_{2}, \\ \theta_{i}^{*} & = & -\frac{4\tau}{3},p_{i}^{*}\in(0,\frac{3}{4}-\frac{2\lambda\tau}{3}], i \in \mathcal{ C}^{*}_{3}, \\ \theta_{i}^{*} & = & 0,p^{*}_{i}\geq \frac{3}{4}+\frac{2\lambda\tau}{3}, i \in \mathcal{ C}^{*}_{4}. \end{array} $$

Define \(\mathcal { C}^{*}:=\mathcal { C}_2^{*}\cup \mathcal { C}_3^{*}\) and \(\overline {\mathcal { C}}^{*}:=\mathcal { C}_1^{*}\cup \mathcal { C}_4^{*}\). To show (42), according to (42) and (44), we only need to prove two facts:

$$ \begin{array}{@{}rcl@{}} \tau L_{ts}(\textbf{p}_{{\mathcal{ C}}^{*} })-\tau L_{ts}(\textbf{p}^{*}_{{\mathcal{C}}^{*} })+\langle \boldsymbol{\theta}^{*}_{\mathcal{ C}^{*}},\textbf{p}_{\mathcal{ C}^{*}}-\textbf{p}^{*}_{\mathcal{ C}^{*}} \rangle\geq0, \end{array} $$
(51)
$$ \begin{array}{@{}rcl@{}} \tau L_{ts}(\textbf{p}_{\overline{\mathcal{ C}}^{*} })- \tau L_{ts}(\textbf{p}^{*}_{\overline{\mathcal{ C}}^{*} })\geq0. \end{array} $$
(52)

Based on (5), (49) and (51), we obtain the desired conclusion:

  1. (i)

    For \(i\in C^{*}:=\mathcal { C}^{*}_1\cup \mathcal { C}^{*}_3\cup \mathcal { C}^{*}_4\), from λτ ∈ [3/8,9/8), we get \((0,\frac {3}{4}-\frac {2\lambda \tau }{3}]\subset (0,1/2]\) and \(\frac {3}{4}+\frac {2\lambda \tau }{3}\geq 1\), which means that the \(\ell _{ts}(p^{*}_i)\) is differentiable. From Definition 3.1, there is a neighborhood \(N(\boldsymbol {\chi }^{*},\overline {\eta })\) with \(\overline {\eta }>0\) such that for any \(\boldsymbol {\chi }\in {\Omega }\cap N(\boldsymbol {\chi }^{*},\overline {\eta })\), we obtain

    $$ \begin{array}{@{}rcl@{}} \tau \ell_{ts}(p_{i})-\tau \ell_{ts}(p^{*}_{i})-\langle \tau\nabla\ell_{ts}(p^{*}_{i}) ,p_{i}-p^{*}_{i} \rangle\geq0, \end{array} $$

    which together with \(-\tau \nabla \ell _{ts}(p^{*}_i)=-\frac {4\tau }{3}=\theta _i^{*}, i\in \mathcal { C}^{*}_3\) gets (51) and \(-\tau \nabla \ell _{ts}(p^{*}_i)=0=\theta _i^{*}, i\in \mathcal {C}^{*}_1\cup \mathcal {C}^{*}_4\) obtains (52).

  2. (ii)

    For \(i\in \mathcal { C}_2^{*}\), we obtain that the \(\ell _{ts}(p^{*}_i)\) is non-differentiable. Based on Definition 3.1, we have that there is a neighborhood \(N(\boldsymbol {\chi }^{*},\widetilde {\eta })\) with \(\widetilde {\eta }>0\) such that for any \(\boldsymbol {\chi }\in {\Omega }\cap N(\boldsymbol {\chi }^{*},\widetilde {\eta })\), we yield

    $$ \tau \ell_{ts}(p_{i})-\tau \ell_{ts}(p^{*}_{i})-\langle \partial \tau\ell_{ts}(p^{*}_{i}) ,p_{i}-p^{*}_{i} \rangle\geq0, $$

    which together with \(-\tau \partial \ell _{ts}(p^{*}_i)=\theta _i^{*}\in [-\frac {4\tau }{3},0]\) gets (51).

By the above (i)-(iii), we prove (w;b;p) is local minimizer of problem (8) in a local region Ω ∩ N(χ,η) with \(\eta ^{*}:=\min \limits \{\overline {\eta },\widetilde {\eta }\}\), which completes the proof of Case (ii).

Case (iii)::

For λτ ≥ 9/8, let \(\textbf {s}^{*}=\textbf {p}^{*}-\lambda \boldsymbol {\theta }^{*}\in \mathbb {R}^m\),

$$ \begin{array}{@{}rcl@{}} \mathcal{ E}_{1}^{*} & := & \{i\in[m]: s_{i}^{*} < 0\}, \\ \mathcal{ E}^{*}_{2} & := & \{i\in[m]: s_{i}^{*} \in[0,\sqrt{2\nu\alpha})~\text{or}~s_{i}^{*} =\sqrt{2\nu\alpha}, \theta_{i}^{*} \neq0\},\\ \mathcal{ E}_{3}^{*} & := & \{i\in[m]: s_{i}^{*} > \sqrt{2 \nu\alpha}~\text{or}~s_{i}^{*} =\sqrt{2\nu\alpha}, \theta_{i}^{*} =0\}. \end{array} $$
(53)

Based on (15) and (13), we receive that \(\textbf {p}^{*}\in \text {prox}_{\lambda \tau L_{ts}}(\textbf {p}^{*}-\tau {\boldsymbol {\theta }^{*}})\) is equivalent to

$$ \begin{array}{@{}rcl@{}} {\textbf{p}^{*}} =\left[\begin{array}{c} (\textbf{p}^{*}-\lambda \boldsymbol{\theta}^{*})_{\mathcal{E}_{1}}^{*} \\ \textbf{0}_{\mathcal{E}_{2}}^{*} \\ (\textbf{p}^{*}-\lambda \boldsymbol{\theta}^{*})_{\mathcal{E}_{3}}^{*} \end{array}\right], \end{array} $$

which is identical to

$$ \begin{array}{@{}rcl@{}} \boldsymbol{\theta}^{*}_{\mathcal{ E}_{1}^{*}}={\bf0}_{\mathcal{ E}_{1}^{*}},~~~ \textbf{p}^{*}_{\mathcal{ E}_{2}^{*}}={\bf0}_{\mathcal{ E}_{2}^{*}}, ~~~\boldsymbol{\theta}^{*}_{\mathcal{E}_{3}^{*}}={\bf0}_{\mathcal{E}_{3}^{*}}. \end{array} $$
(54)

This combine with (53) derives

$$ \begin{array}{@{}rcl@{}} \theta_{i}^{*} & = & 0, p_{i}^{*}<0,~~i \in\mathcal{ E}_{1}^{*}, \\ \theta_{i}^{*} & \in & [-\sqrt{2\tau/\lambda},0], p_{i}^{*}=0,~~ i\in \mathcal{ E}^{*}_{2},\\ \theta_{i}^{*} & = & 0,p_{i}^{*}\geq\sqrt{2 \lambda\tau},i\in\mathcal{ E}_{3}^{*}, \end{array} $$

Define \(\mathcal { E}^{*}:=\mathcal { E}^{*}_2,~~ \overline {\mathcal { E}}^{*}:=\mathcal { E}_1^{*}\cup \mathcal { E}_3^{*}\). To show (42), we only need to prove two inequalities:

$$ \begin{array}{@{}rcl@{}} \tau L_{ts}(\textbf{p}_{{\mathcal{ E}}^{*} })-\tau L_{ts}(\textbf{p}^{*}_{{\mathcal{ E}}^{*} })+\langle \boldsymbol{\theta}^{*}_{\mathcal{ E}^{*}}, \textbf{p}_{\mathcal{E}^{*}}-\textbf{p}^{*}_{\mathcal{E}^{*}}\rangle\geq0, \end{array} $$
(55)
$$ \begin{array}{@{}rcl@{}} \tau L_{ts}(\textbf{p}_{\overline{\mathcal{ E}}^{*}})-\tau L_{ts}(\textbf{p}^{*}_{\overline{\mathcal{ E}}^{*} })\geq0. \end{array} $$
(56)

By (5), (53) and (55), we obtain the desired conclusion:

  1. (i)

    For \(i\in \mathcal { E}^{*}_1\cup \mathcal { E}^{*}_3\), by λτ ≥ 9/8, we get \(p_i^{*}\geq \sqrt {2 \lambda \tau }\geq 3/2\), which means that \(\ell _{ts}(p^{*}_i)\) is differentiable. By Definition 3.1, we get that there exists a neighborhood \(N(\boldsymbol {\chi }^{*},\overline {\kappa })\) with \(\overline {\kappa }>0\) such that for any \(\boldsymbol {\chi }\in {\Omega }\cap N(\boldsymbol {\chi }^{*},\overline {\kappa })\), we lead to

    $$ \tau \ell_{ts}(p_{i})-\tau \ell_{ts}(p^{*}_{i})-\langle \tau \nabla\ell_{ts}(p^{*}_{i}) ,p_{i}-p^{*}_{i} \rangle\geq0, $$

    which together with \(\tau \nabla \ell _{ts}(p^{*}_i)=0=\theta ^{*}_i\) obtains (56).

  2. (ii)

    For \(i\in \mathcal { E}_2^{*}\), we can see that the \(\ell _{ts}(p^{*}_i)\) is non-differentiable. According to Definition 3.1, we yield that there is a neighborhood \(N(\boldsymbol {\chi }^{*},\widetilde {\kappa })\) with \(\widetilde {\kappa }>0\) such that for any \(\boldsymbol {\chi }\in {\Omega }\cap N(\boldsymbol {\chi }^{*},\widetilde {\kappa })\), we obtain

    $$ \begin{array}{@{}rcl@{}} \tau \ell_{ts}(p_{i})-\tau \ell_{ts}(p^{*}_{i})-\langle \partial \tau\ell_{ts}(p^{*}_{i}) ,p_{i}-p^{*}_{i} \rangle\geq0, \end{array} $$

    which with \(-\tau \partial \ell _{ts}(p^{*}_i)=\theta _i^{*}\in [-\sqrt {2\tau /\lambda },0]\) results in (55).

By the above (i) and (ii), we show (w;b;p) is local minimizer of problem (8) in a local region Ω ∩ N(χ,ρ) with \(\rho ^{*}:=\min \limits \{\overline {\rho },\widetilde {\rho }\}\). This completes the proof of Case (i). □

1.6 A.6 Proof of Theorem 3.3

Proof

For λτ ∈ (0,3/8), let (w;b;p) be a proximal stationary point of (8). Based on Theorem 3.2 and (44), we obtain

$$ \begin{array}{@{}rcl@{}} \theta_{i}^{*} & \in & [-4\tau/3,0], i \in \mathcal{A}^{*}_{2},~~\theta_{i}^{*}=-4\tau/3, i \in \mathcal{ A}^{*}_{3}, \\ {\theta}^{*}_{i} & = & -8\tau(1-p^{*}_{i})/3, i\in \mathcal{ A}^{*}_{4}, ~~ {\theta}^{*}_{i}=0, i\in\overline{\mathcal{ A}}^{*}. \end{array} $$
(57)

Based on \(M=[y_1\textbf {x}_{1}~\cdots ~y_m\textbf {x}_{m}]^{\top }\in {\mathbb {R}}^{m\times n}\) and first equation of (16), we can obtain that

$$ \begin{array}{@{}rcl@{}} \textbf{w}^{*} = -M^{\top}_{\mathcal{ A}^{*}}{\boldsymbol{\theta}^{*}_{\mathcal{ A}^{*}}} - M^{\top}_{\overline{\mathcal{ A}}^{*}}{\boldsymbol{\theta}^{*}_{\overline{\mathcal{ A}}^{*}}} =- M^{\top}_{\mathcal{ A}^{*}}{\boldsymbol{\theta}^{*}_{\mathcal{ A}^{*}}}=-\underset{i \in \mathcal{ A}^{*}}{\sum} \theta_{i}^{*}y_{i}\textbf{x}_{i}, \end{array} $$

which leads to (17). Moreover, according to (44), we get \(p_i^{*}=0, i\in \mathcal { A}^{*}_2\), \(p_i^{*}\in (0,1/2), i\in \mathcal { A}^{*}_3\) and \(p_i^{*}\in [1/2,1), i\in \mathcal {A}^{*}_4\). This together with the definition of M and third equation of (16) results in (18). This completes the proof. □

1.7 A.7 Proof of Theorem 3.4

Proof

For λτ ∈ [3/8,9/8), let (w;b;p) with \(\boldsymbol {\theta }^{*}\in \mathbb {R}^m\) be a proximal stationary point. By Theorem 3.2 and (51), we receive

$$ \begin{array}{@{}rcl@{}} &&\theta_{i}^{*}\in [-4\tau/3,0] , i\in \mathcal{ C}^{*}_{2},\\ &&\theta_{i}^{*}= -4\tau/3 , i\in \mathcal{ C}^{*}_{3},~\theta^{*}_{i}=0, i\in\overline{\mathcal{ C}}^{*}, \end{array} $$

which together with the matrix M and (16) derives

$$ \begin{array}{@{}rcl@{}} \textbf{w}^{*} = -M^{\top}_{\mathcal{ C}^{*}}{\boldsymbol{\theta}^{*}_{\mathcal{ C}^{*}}} - M^{\top}_{\overline{\mathcal{ C}}^{*}}{\boldsymbol{\theta}^{*}_{\overline{\mathcal{ C}}^{*}}} =-M^{\top}_{\mathcal{ C}^{*}}{\boldsymbol{\theta}^{*}_{\mathcal{ C}^{*}}}=-\underset{i \in \mathcal{ C}^{*}}{\sum} \theta_{i}^{*}y_{i}\textbf{x}_{i}. \end{array} $$

By (51), we get \(p_i^{*}=0, i\in \mathcal { C}^{*}_2\) and \(p_i^{*}\in (0,\frac {3}{4}-\frac {2\lambda \tau }{3}], i \in \mathcal { C}^{*}_3\). This with (16) and M gets (20). This completes the proof. □

1.8 A.8 Proof of Theorem 3.5

Proof

For λτ ≥ 9/8, because (w;b;p) with \(\boldsymbol {\theta }^{*}\in \mathbb {R}^m\) is a proximal stationary point of (8), from Theorem 3.2 and (55), we get

$$ \begin{array}{@{}rcl@{}} \theta_{i}^{*}\in [-\sqrt{2\tau/\lambda},0] , i\in \mathcal{ E}^{*},~~\theta^{*}_{i}=0, i\in\overline{\mathcal{ E}}^{*}, \end{array} $$

which combine with the M and (16) leads to

$$ \begin{array}{@{}rcl@{}} \textbf{w}^{*} = -M^{\top}_{\mathcal{ E}^{*}}{\boldsymbol{\theta}^{*}_{\mathcal{ E}^{*}}} - M^{\top}_{\overline{\mathcal{ E}}^{*}}{\boldsymbol{\theta}^{*}_{\overline{\mathcal{ E}}^{*}}} =-M^{\top}_{\mathcal{ E}^{*}}{\boldsymbol{\theta}^{*}_{\mathcal{ E}^{*}}}=-\underset{i \in \mathcal{ E}^{*}}{\sum} \theta_{i}^{*}y_{i}\textbf{x}_{i}. \end{array} $$

In addition, based on (55), we get \(p_i^{*}=0, i\in \mathcal { E}^{*}\), which with (16) and M results in (22). This completes the proof. □

1.9 A.9 Proof of Theorem 3.6

Proof

It is observed that for \(k\rightarrow \infty \), the working set \(C_k\subseteq [m]\) has finite many elements, which represents that we can find a subset \({\Upsilon }\subseteq \{1,2,3,\cdots \}\) such that

$$ \begin{array}{@{}rcl@{}} C_{i}\equiv:C, ~~\forall~i\in {\Upsilon}. \end{array} $$
(58)

We now define the sequence Ψk := (wk,bk,pk,𝜃k) and its limit point Ψ := (w,b,p,𝜃), i.e., \(\{\boldsymbol {\Psi }^k\}\rightarrow \boldsymbol {\Psi }^{*}\), which represents \(\{\boldsymbol {\Psi }^i\}_{i\in {\Upsilon }}\rightarrow \boldsymbol {\Psi }^{*} \) and \(\{\boldsymbol {\Psi }^{i+1}\}_{i\in {\Upsilon }}\rightarrow \boldsymbol {\Psi }^{*}\). Taking the limit along with Υ of (33), i.e., \(k\in {\Upsilon }, k\rightarrow \infty \), we derive

$$ \begin{array}{@{}rcl@{}} \left\{ \begin{array}{lll} {\boldsymbol{\theta}}^{*}_{C} & = & {\boldsymbol{\theta}}^{*}_{C}+\kappa\eta{\boldsymbol{\Phi}}^{*}_{C},\\ {\boldsymbol{\theta}}^{*}_{\overline{C}} & = & \textbf{0}, \end{array}\right. \end{array} $$
(59)

which obtains \({\boldsymbol {\Phi }}^{*}_{C}= \textbf {0}\). Taking the limit along with Υ of sk, we can lead to

$$ \begin{array}{@{}rcl@{}} \textbf{s}^{*} & = & {\textbf{1}}-M \textbf{w}^{*}-b^{*}\textbf{y}- {\boldsymbol{\theta}}^{*}/\eta\\ & = & {\textbf{1}}-M \textbf{w}^{*}-b^{*}\textbf{y}-\textbf{p}^{*}+\textbf{p}^{*}- {\boldsymbol{\theta}}^{*}/\eta\\ & = & - {\boldsymbol{\Phi}}^{*}+\textbf{p}^{*}- {\boldsymbol{\theta}}^{*}/\eta \end{array} $$
(60)

As for (26), we take the limit along with Υ and get

$$ \begin{array}{@{}rcl@{}} {\textbf{p}}^{*}_{T^{1}}&=& {\bf0},{\textbf{p}}^{*}_{T^{2}}= \overline{\textbf{s}}^{*}_{T^{2}}, {\textbf{p}}^{*}_{T^{3}} = \widehat{\textbf{s}}^{*}_{T^{3}}, {\textbf{p}}^{*}_{\overline{C}}= \textbf{s}^{*}_{\overline{C}}, \lambda\tau\in\left( 0,\frac{3}{8}\right), \\ {\textbf{p}}^{*}_{J^{1}}&=& {\bf0},{\textbf{p}}^{*}_{J^{2}}= \overline{\textbf{s}}^{*}_{J^{2}},{\textbf{p}}^{*}_{\overline{C}} = \textbf{s}^{*}_{\overline{C}},~~~~~~~~~\lambda\tau\in\left[\frac{3}{8},\frac{9}{8}\right),~~~~~\\ {\textbf{p}}^{*}_{I}&=&{\bf0},{\textbf{p}}^{*}_{\overline{C}}= \textbf{s}^{*}_{\overline{C}},~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \lambda\tau\geq\frac{9}{8}, \end{array} $$
(61)

where \(\overline {\textbf {s}}^{*}:=\textbf {s}^{*}-\frac {4\lambda \tau }{3} \), \(\widehat {\textbf {s}}^{*}:=(3\textbf {s}^{*}-8\lambda \tau )/(3-8\lambda \tau )\). Then, we have

$$ \begin{array}{@{}rcl@{}} \textbf{p}^{*}_{\overline{C}} & = & \textbf{s}^{*}_{\overline{C}} \\ & \overset{(60)}{=} & - {\boldsymbol{\Phi}}^{*}_{\overline{C}}+\textbf{p}^{*}_{\overline{C}}- {\boldsymbol{\theta}}^{*}_{\overline{C}}/\eta\\ & \overset{(59)}{=} & - {\boldsymbol{\Phi}}^{*}_{\overline{C}}+\textbf{p}^{*}_{\overline{C}}. \end{array} $$

Therefore, we get \({\boldsymbol {\Phi }}^{*}_{\overline {C}}=\textbf {0}\) and Φ = 0. Again from (60), we receive s = p𝜃/η, which together with the Lts proximal operator (15) and (61) derives

$$ \begin{array}{@{}rcl@{}} {\textbf{p}}^{*}\in\text{Prox}_{\frac{\tau}{\eta}L_{ts}}(\textbf{s}^{*})=\text{Prox}_{\frac{\tau}{\eta}L_{ts}}(\textbf{p}^{*}- {\boldsymbol{\theta}}^{*}/\eta) \end{array} $$
(62)

For (29), we take the limit along with Υ and obtain

$$ \begin{array}{@{}rcl@{}} (I +\eta M_{C}^{\top} M_{C} ) \textbf{w}^{*} & = & \eta M_{C}^{\top} \boldsymbol{\xi}_{C}^{*} \\ & = & -\eta M_{C}^{\top} ({\textbf{p}}^{*}_{C}+b^{*}\textbf{y}_{C}-{\textbf{1}}+{\boldsymbol{\theta}}^{*}_{C}/\eta)\\ & = & -\eta M_{C}^{\top} ({\boldsymbol{\Phi}}^{*}_{C}-M_{C}{\textbf{w}}^{*}+{\boldsymbol{\theta}}^{*}_{C}/\eta)\\ & = & -\eta M_{C}^{\top} (-M_{C}{\textbf{w}}^{*}+{\boldsymbol{\theta}}^{*}_{C}/\eta), \end{array} $$

where ξ = −(p + by1 + 𝜃/η) and the last two equations hold because of \({\boldsymbol {\Phi }}^{*}_{C}= \textbf {0}\) by (59) and Φ = p + Mw + bye = 0. Thus, the last equation derives

$$ \begin{array}{@{}rcl@{}} \textbf{w}^{*} = -M_{C}^{\top} {\boldsymbol{\theta}}^{*}_{C} \overset{(59)}{=}- M^{\top} {\boldsymbol{\theta}}^{*}. \end{array} $$

Finally, for (32), we take the limit along with Υ and receive

$$ \begin{array}{@{}rcl@{}} b^{*}= \langle \textbf{y},\boldsymbol{\psi}^{*}\rangle/m & = & -\langle \textbf{y}, M \textbf{w}^{*}-{\textbf{1}}+{\textbf{p}}^{*}+{\boldsymbol{\theta}}^{*}/\eta\rangle/m\\ & = & -\langle \textbf{y},{\boldsymbol{\Phi}}^{*}-b^{*}\textbf{y}+{\boldsymbol{\theta}}^{*}/\eta\rangle/m\\ & = & -\langle \textbf{y}, -b^{*}\textbf{y}+{\boldsymbol{\theta}}^{*}/\eta\rangle/m \\ & = & b^{*}-\langle \textbf{y}, {\boldsymbol{\theta}}^{*}\rangle/(m\eta), \end{array} $$

which yields 〈y,𝜃〉 = 0 and the three equation holds due to Φ = 0. To summarize, we obtain

$$ \begin{array}{@{}rcl@{}} \left\{ \begin{array}{rll} \textbf{w}^{*} + M^{\top}{\boldsymbol{\theta}^{*}} & = & \textbf{0},\\ \langle \textbf{y}, {\boldsymbol{\theta}^{*}} \rangle & = & \textbf{0},\\ \textbf{p}^{*}+M \textbf{w}^{*}+b^{*}\textbf{y} & = & {\textbf{1}},\\ \text{prox}_{\frac{\tau}{\eta}L_{ts}}(\textbf{p}^{*}- {\boldsymbol{\theta}^{*}}/\eta) & \ni & \textbf{p}^{*}, \end{array}\right. \end{array} $$

which proves that the (w;b;p) is a proximal stationary point with λ = 1/η. According to Theorem 3.2, we also yield that it is a local minimizer to (8), which completes the proof. □

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, H., Shao, Y. Sparse and robust SVM classifier for large scale classification. Appl Intell 53, 19647–19671 (2023). https://doi.org/10.1007/s10489-023-04511-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04511-w

Keywords

Navigation