Skip to main content
Log in

Margin maximization in spherical separation

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

We face the problem of strictly separating two sets of points by means of a sphere, considering the two cases where the center of the sphere is fixed or free, respectively. In particular, for the former we present a fast and simple solution algorithm, whereas for the latter one we use the DC-Algorithm based on a DC decomposition of the error function. Numerical results for both the cases are presented on several classical binary datasets drawn from the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. An, L.T.H., Tao, P.D.: The DC (Difference of Convex Functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann. Oper. Res. 133, 23–46 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  2. Astorino, A., Fuduli, A.: Nonsmooth optimization techniques for semisupervised classification. IEEE Trans. Pattern Anal. Mach. Intell. 29, 2135–2142 (2007)

    Article  Google Scholar 

  3. Astorino, A., Fuduli, A., Gaudioso, M.: DC models for spherical separation. J. Glob. Optim. 48, 657–669 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  4. Astorino, A., Gaudioso, M.: Polyhedral separability through successive LP. J. Optim. Theory Appl. 112, 265–293 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  5. Astorino, A., Gaudioso, M.: Ellipsoidal separation for classification problems. Optim. Methods Softw. 20, 261–270 (2005)

    Article  MathSciNet  Google Scholar 

  6. Astorino, A., Gaudioso, M.: A fixed-center spherical separation algorithm with kernel transformations for classification problems. Comput. Manag. Sci. 6, 357–372 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  7. Bagirov, A.M.: Max-min separability. Optim. Methods Softw. 20, 271–290 (2005)

    Article  MathSciNet  Google Scholar 

  8. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011). Software available at www.csie.ntu.edu.tw/~cjlin/libsvm

    Article  Google Scholar 

  9. Chapelle, O., Zien, A.: Semi-supervised classification by low density separation. In: Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, pp. 57–64 (2005)

    Google Scholar 

  10. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000)

    Google Scholar 

  11. Fan, R.-E., Chen, P.-H., Lin, C.-J.: Working set selection using the second order information for training SVM. J. Mach. Learn. Res. 6, 1889–1918 (2005)

    MathSciNet  MATH  Google Scholar 

  12. Fuduli, A., Gaudioso, M., Giallombardo, G.: Minimizing nonconvex nonsmooth functions via cutting planes and proximity control. SIAM J. Optim. 14, 743–756 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  13. Gonzalez-Lima, M., Hager, W., Zhang, H.: An affine-scaling interior-point method for continuous knapsack constraints with application to support vector machines. SIAM J. Optim. 21, 361–390 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  14. Hao, P.Y., Chiang, J.H., Lin, Y.H.: A new maximal-margin spherical-structured multi-class support vector machine. Appl. Intell. 30, 98–111 (2009)

    Article  Google Scholar 

  15. Joachims, T.: Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods—Support Vector Learning. MIT Press, Cambridge (1999)

    Google Scholar 

  16. Konno, H., Gotoh, J., Uryasev, S.: Failure discrimination by semi-definite programming. In: Pardalos, P. (ed.) Financial Engineering, Electronic Commerce and Supply Chain, pp. 379–396. Kluwer Academic Publishers, Dordrecht (2002)

    Google Scholar 

  17. Murphy, P.M., Aha, D.W.: UCI repository of machine learning databases. www.ics.uci.edu/~mlearn/MLRepository.html (1992)

  18. Palagi, L., Sciandrone, M.: On the convergence of a modified version of the SVMlight algorithm. Optim. Methods Softw. 20, 315–332 (2005)

    Article  MathSciNet  Google Scholar 

  19. Odewahn, S., Stockwell, E., Pennington, R., Humphreys, R., Zumach, W.: Automated star/galaxy discrimination with neural networks. Astron. J. 103, 318–331 (1992)

    Article  Google Scholar 

  20. Schölkopf, B., Burges, C.J.C., Smola, A.J.: Advances in Kernel Methods. Support Vector Learning. MIT Press, Cambridge (1999)

    Google Scholar 

  21. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)

    Book  Google Scholar 

  22. Tao, P.D., An, L.T.H.: A D.C. optimization algorithm for solving the trust-region subproblem. SIAM J. Control Optim. 8, 476–505 (1998)

    MATH  Google Scholar 

  23. Tax, D.M.J., Duin, R.P.W.: Support vector domain description. Pattern Recognit. Lett. 20, 1191–1199 (1999)

    Article  Google Scholar 

  24. Tax, D.M.J., Duin, R.P.W.: Uniform object generation for optimizing one-class classifiers. J. Mach. Learn. Res. 2, 155–173 (2001)

    Google Scholar 

  25. Vapnik, V.: The Nature of the Statistical Learning Theory. Springer, New York (1995)

    MATH  Google Scholar 

  26. Wang, J., Neskovic, P., Cooper, L.N.: Pattern Classification via Single Spheres. Lecture Notes in Artificial Intelligence, vol. 3735, pp. 241–252 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manlio Gaudioso.

Appendix

Appendix

In this section we report the proofs of Theorems 2.1, 2.2 and 2.3.

Theorem 2.1

Let S⊆ℝn be a nonempty compact set. If x 0S and C≥1/(2k), then there exists an optimal solution to problem (5).

Proof

If x 0S, then

$$ \|b_l-x_0\|\leq \max_{x\in S} \|b_l-x\|\stackrel {\triangle }{=}D_l\geq0, \quad l=1, \ldots, k. $$
(18)

Taking into account the definition of h, the constraint zq and inequality (18), we obtain

(19)

where \(D\stackrel {\triangle }{=}\sum_{l=1}^{k} D_{l}\).

From the above inequality, if \(C\geq\frac{1}{2k}\), the thesis follows by taking into account that h is coercive on the set \(\varOmega \stackrel {\triangle }{=}\{ (x_{0},z,q)~|~x_{0}\in S \mbox{ and } 0\leq q\leq z \}\). □

Theorem 2.2

There exists an optimal solution for problem (6) with z>0.

Proof

Let \((x_{0}^{*},z^{*},q^{*},\xi^{*},\mu^{*})\) be any optimal solution for problem (6) with z =0. As a consequence q =0, \(\xi^{*}_{i}=\|a_{i}-x_{0}^{*}\|^{2}\) for i=1,…,m and \(\mu_{l}^{*}=0\) for l=1,…,k.

We define the sets

$$I\stackrel {\triangle }{=}\bigl\{i\mid 1\leq i\leq m,\ a_i\neq x_0^* \bigr \} \quad \mbox{and}\quad L\stackrel {\triangle }{=}\bigl\{l\mid 1\leq l\leq k,\ b_l\neq x_0^* \bigr\}, $$

which cannot be simultaneously empty, by the assumption that the sets \(\mathcal {A}\) and \(\mathcal {B}\) are disjoint.

Now we consider the solution

$$ \bigl(\bar{x}_0 =x_0^*, \bar{z}, \bar{q}=q^*, \bar{\xi}, \bar{\mu}\bigr), $$
(20)

with

$$\bar{z}= \min_{\scriptsize i\in I,\ l\in L} \bigl\{\|a_i-x_0^* \|^2,\|b_l-x_0^*\|^2 \bigr\}>0, $$

and the components of the vectors \(\bar{\xi}\) and \(\bar{\mu}\) defined as follows:

$$\mbox{(i)} \begin{cases} \bar{\xi}_i=\xi_i^*-\bar{z} &i\in I \\ \bar{\mu}_l=\mu_l^*=0& l\in L, \end{cases} $$
$$\mbox{(ii)} \begin{cases} \bar{\xi}_i=\xi_i^*-\bar{z} &i\in I \\ \bar{\xi}_j=\xi_j^*=0& \\ \bar{\mu}_l=\mu_l^*=0& l\in L \end{cases} $$

or

$$\mbox{(iii)} \begin{cases}\bar{\xi}_i=\xi_i^*-\bar{z} &i\in I \\ \bar{\mu}_l=\mu_l^*=0& l\in L \\ \bar{\mu}_j = \bar{z}, \end{cases} $$

according to the three possible cases:

  1. 1.

    (i) \(x_{0}^{*}\neq a_{i}\) for i=1,…,m and \(x_{0}^{*}\neq b_{l}\) for l=1,…,k;

  2. 2.

    (ii) \(x_{0}^{*}= a_{j}\) for some j∈{1,…,m};

  3. 3.

    (iii) \(x_{0}^{*}= b_{j}\) for some j∈{1,…,k}.

It is easy to show that the above solution (20), characterized by \(\bar{z}>0\), is feasible for problem (6), with an objective function value which is not worse than that one corresponding to solution \((x_{0}^{*},z^{*},q^{*},\xi^{*},\mu^{*})\). □

Theorem 2.3

Let \((x_{0}^{*}, z^{*}, q^{*}, \xi^{*}, \mu^{*})\) be any optimal solution of problem (6), with C>1. Then the sphere \(S(x_{0}^{*},R^{*})\), with \(R^{*}=\sqrt{z^{*}}\), strictly separates the sets \(\mathcal {A}\) and \(\mathcal {B}\) if and only if q >0.

Proof

(⇒) Assume \(S(x_{0}^{*},R^{*})\), with \(R^{*}=\sqrt{z^{*}}\), strictly separates the sets \(\mathcal {A}\) and \(\mathcal {B}\), and let q =0. Thus we have

$$ \begin{cases} -z^*+\|a_i-x_0^* \|^2<0& \forall i=1,\ldots,m \\ z^*-\|b_l-x_0^*\|^2<0& \forall l=1, \ldots,k \end{cases} $$
(21)

and, consequently,

$$0<\epsilon \stackrel {\triangle }{=}\min_{1\leq i\leq m,~1\leq l\leq k } \bigl\{ z^*- \bigl\| a_i-x_0^* \bigr\|^2,\bigl\|b_l-x_0^*\bigr\|^2-z^* \bigr \} \leq z^*. $$

Note that the optimality of \((x_{0}^{*}, z^{*}, q^{*}, \xi^{*}, \mu^{*})\), together with (21), implies ξ =0 and μ =0 and the optimal objective function value equal to zero.

Define now a feasible solution \((\bar{x}_{0}, \bar{z}, \bar{q}, \bar{\xi}, \bar{\mu})\), to problem (6) as follows:

$$\bar{x}_0=x_0^*,\qquad \bar{z}=z^*,\qquad \bar{q}=\epsilon,\qquad \bar{\xi}=\xi^*=0, \qquad \bar{\mu}=\mu^*=0. $$

It is characterized by an objective function value equal to −ϵ<0, which contradicts optimality of \((x_{0}^{*}, z^{*}, q^{*}, \xi^{*}, \mu^{*})\).

(⇐) Now suppose q >0, and assume by contradiction that the sphere \(S(x_{0}^{*},R^{*})\), with \(R^{*}=\sqrt{z^{*}}\), does not strictly separate \(\mathcal {A}\) and \(\mathcal {B}\). Then at least one of the two possible cases occur:

  1. 1.

    there exists \(a_{j}\in \mathcal {A}\) such that \(\|a_{j}-x_{0}^{*}\|^{2}\geq z^{*}\), which implies

    $$ \xi_j^*\geq q^*-z^* + \bigl\|a_j-x_0^* \bigr\|^2 \geq q^*>0; $$
    (22)
  2. 2.

    there exists \(b_{j}\in \mathcal {B}\) such that \(\|b_{j}-x_{0}^{*}\|^{2}\leq z^{*}\), which implies

    $$ \mu_j^*\geq q^*+z^* - \bigl\|b_j-x_0^* \bigr\|^2 \geq q^*>0. $$
    (23)

Case 1. The value of the objective function, in correspondence to the optimal solution is

$$C \Biggl(\sum_{i=1,i\neq j}^m \xi_i^*+\xi_j^*+\sum_{l=1}^k \mu^*_l \Biggr) - q^*. $$

Now we consider the solution \((\bar{x}_{0}, \bar{z}, \bar{q}, \bar{\xi}, \bar{\mu})\) defined as follows:

$$ \begin{cases} \bar{x}_0 = x_0^* \\ \bar{z} = z^* \\ \bar{q}=0 \\ \bar{\xi}_i=\xi_i^*\quad i=1,\ldots,m, \ i\neq j \\ \bar{\xi}_j= \xi_j^*-q^* \\ \bar{\mu}= \mu^*, \end{cases} $$
(24)

which can be easily proved to be feasible to problem (6), taking into account (22). Moreover, the corresponding objective function value is

$$C \Biggl(\sum_{i=1,i\neq j}^m \xi_i^*+\xi_j^*-q^*+\sum _{l=1}^k \mu^*_l \Biggr)=C \Biggl( \sum_{i=1,i\neq j}^m \xi_i^*+ \xi_j^*+\sum_{l=1}^k \mu^*_l \Biggr) - Cq^*. $$

If C>1 then

$$C \Biggl(\sum_{i=1,i\neq j}^m \xi_i^*+\xi_j^*+\sum_{l=1}^k \mu^*_l \Biggr) - Cq^*<C \Biggl(\sum _{i=1,i\neq j}^m \xi_i^*+\xi_j^*+ \sum_{l=1}^k \mu^*_l \Biggr) - q^*, $$

which contradicts the optimality of \((x_{0}^{*}, z^{*}, q^{*}, \xi^{*}, \mu^{*})\).

Case 2. Analogous considerations hold. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Astorino, A., Fuduli, A. & Gaudioso, M. Margin maximization in spherical separation. Comput Optim Appl 53, 301–322 (2012). https://doi.org/10.1007/s10589-012-9486-7

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-012-9486-7

Keywords

Navigation