Elsevier

Knowledge-Based Systems

Volume 228, 27 September 2021, 107250
Knowledge-Based Systems

A two-stage safe screening method for non-convex support vector machine with ramp loss

https://doi.org/10.1016/j.knosys.2021.107250Get rights and content

Abstract

Support vector machine (SVM) is one of the extremely effective classification tools, which has been widely employed in real-world applications. However, it is time-consuming and sensitive to outliers. To address the above two limitations, on the one hand, a variety of safe screening strategies have recently emerged, whose major aim is to accelerate training without sacrificing accuracy. On the other hand, ramp support vector machine (RSVM) remains the sparsity and enhances robustness for outliers by introducing the ramp loss. Unfortunately, due to the fact that ramp loss is non-convex, the existing commonly applied method for solving RSVM is to convert it to a series of convex problems by concave–convex procedure (CCCP), which undoubtedly leads to the increased computational cost. Inspired by the aforementioned researches, we propose a two-stage safe screening method (TSS) for RSVM under the framework of CCCP, which contains the dual screening method with variational inequalities (DVI) and the dynamic screening rule (DSR) based on duality gap. In the first stage, by discarding those redundant samples, DVI reduces the scale of the optimization problem before training and initializes a more accurate feasible solution for the next stage. In the second stage, DSR is embedded into the solver for further acceleration. Finally, a post-checking procedure is conducted to guarantee the safety after the solution is obtained. Numerical experiments on five synthetic datasets and twelve real-world datasets demonstrate the feasibility and efficiency of TSS. Moreover, we extend TSS to other non-convex models and conduct experiments to further verify the validity and safety of our proposed method.

Introduction

Support vector machine (SVM) [1], proposed by Vapnik, is a competitive classification algorithm based on the structural risk minimization principle and the maximum margin principle. For decades, SVM has been successfully applied to address numerous practical issues in various fields, including target detection [2], [3], medical and health [4], text classification [5], food quality determination [6], etc. In view of distinct backgrounds and data characteristics, some variants of SVM have been presented [7], [8], [9], [10], such as Transductive SVM (TSVM) [8], [11] in semi-supervised learning, small sphere and large margin (SSLM) [9] for the imbalanced classification problems, ramp SVM (RSVM) [10], [12], [13] which can handle outliers, and twin support vector machine (TWSVM) which reduces computational complexity [14].

Among the mentioned improved models, RSVM is an attractive one. By replacing hinge loss with ramp loss, RSVM alleviates the influence of outliers on the decision hyperplane. Nonetheless, RSVM suffers from the disadvantages of non-convexity and it is often treated as a difference of convex functions (DC). Therefore, it is usually solved by DC programming-based methods, such as the representative concave–convex procedure (CCCP) which [15] iteratively solves a series of convex problems. In other words, RSVM inherits the sparsity, but aggravates the high computational complexity of SVM. It remains a challenge to cope with the large-scale datasets.

Recently, an effective acceleration approach termed as “safe screening” has emerged, whose fundamental is to safely enhance the training speed by screening out the inactive features or samples in advance. The term “safely” here refers to the consistency between the optimal solutions obtained by the safe screening methods and the original models. It was first presented by Ghaoui et al. [16] for sparse supervised learning models, such as least absolute shrinkage and selection operator (LASSO) and L1-norm SVM, etc. That rule utilizes the sparsity of the model to eliminate many redundant features beforehand, thereby reducing the scale of problem and accelerating the computational speed. Due to its effectiveness, safe screening has been widely concerned by researchers at present. Numerous improved rules [17], [18], [19], [20], [21] have been exploited for different sparse models, including SVM and its variants [22], [23], [24], [25], [26].

Safe sample screening methods for SVM can be roughly grouped into two types:

(1) The methods of the first category are applied to discard inactive samples prior to training [27], [28], [29], which relies on the optimal solution corresponding to the previous parameter. Therefore, these rules are often combined with parameter selection frameworks. For example, Ogawa et al. [28] first proposed a sequential safe sample screening rule for SVM. Later, Wang et al. [27] improved the sequential dual screening by using variational inequalities (DVI). Recently, Cao et al. [30] and Yuan et al. [31] have extended the safe screening approaches to handle hypersphere SVMs.

(2) The second type is embedded in solvers, which is based on the feasible solution, thus it is also known as dynamic screening rule (DSR) [32], [33]. For instance, Fercoq et al. [34] utilized the duality gap to construct a screening rule for LASSO. It has been theoretically proven that as the duality gap shrinks to 0, all redundant features can be found. Zimmert et al. [35] generalized this strategy to SVM. Rakotomamonjy et al. [36] and Zhai et al. [37] extended the idea of safe screening to non-convex sparse LASSO and non-convex SVM, respectively.

Moreover, hybrid strategies [38], [39] which combine different screening methods have also been proposed successively. For instance, Pan et al. [38] presented a hybrid screening framework for SVM. It manifests that the hybrid strategies often outperform single strategy. Nevertheless, this framework cannot be mechanically applied to the non-convex models, even to the surrogate problems based on CCCP. Modified DSR proposed by Zhai et al. [37], whilst it is useful for surrogate problems of RSVM, cannot accelerate the current problem by the optimal solutions of other parameters. That is to say, it is insufficient to form a complete framework.

To this end, a two-stage safe screening method for RSVM based on CCCP is proposed in this paper. In the first stage, a modified DVI for RSVM is constructed, which can identify those inactive samples before training and initialize a more accurate feasible solution for DSR. It is worth mentioning that the optimal solutions with the same and different parameters are both serviceable for DVI. During the second stage, DSR is embedded into the solver for further speedup. To ensure the safety of TSS, a post-checking procedure is added after the solution is obtained. Theoretical analysis and numerical experiments verify the effectiveness and safety of TSS. Furthermore, we also extend TSS to the nonlinear case and other non-convex variants of SVM.

In summary, the main contributions of this paper are as follows:

(1) For the non-convex SVM model, a hybrid two-stage safe screening method is proposed. It can effectively reduce the scale of the problems and accelerate the training speed. (2) Before solving the surrogate problems, modified DVI based-screening rules are constructed for different conditions. (3) DSR and the post-checking step ensure the safety of TSS, which means that TSS provides an identical solution as RSVM. (4) The basic idea of this hybrid two-stage acceleration strategy can be extended to other non-convex models.

The remaining part of the paper proceeds as follows. In Section 2, we first give a brief overview of the basic properties and CCCP-based solving process of RSVM. Then, the general formula of safe screening rule is introduced. Details of our TSS are elaborated in Section 3. Experiments on synthetic datasets and real benchmark datasets are carried out to illustrate the superiority of TSS in Section 4. We extend our TSS method to three other models and conduct experiments on the regression datasets in Section 5. Conclusions are given in the last Section.

Section snippets

Related work

In this section, we first describe the notations used in this paper, then give a brief description of RSVM and its safe screening paradigm.

Two-stage screening method for RSVM

In this section, we describe and analyze the detailed hybrid two-stage acceleration method for RSVM, including DVI, DSR and the post-checking procedure. For clarity, the screening rules for successive CIL problems with the same and different parameters are referred to as SCSR and DPSR, respectively. SCSR and DPSR are collectively called DVI. To gain more insight, the flowing chart of applying TSS to RSVM is illustrated in Fig. 2.

The main theories involved in TSS are variational inequality and

Numerical experiments

To verify the feasibility and efficiency of TSS, experiments on five synthetic datasets and twelve real-world benchmark datasets are conducted. These real datasets are from UCI1 or LIBSVM2 databases. All experiments are performed in MATLAB (R2016a) on Windows 10 running on a Desktop PC with system configuration Intel(R)Xeon(R)Gold-5115 CPU(2.40 GHz) 64.00 GB RAM.

In all experiments, DCDM is adopted

Extensions of TSS

On the basis of TSS for RSVM, we deduce the safe screening rules for non-convex models which introduce ramp loss or employ CCCP algorithm. Take three models as examples in this section, including twin SVM with ramp loss (RTWSVM), support vector regression with ramp loss (RSVR) and a large scale Transductive SVM (TSVM).

Conclusion

In this paper, we concentrate on the non-convex models with the ramp loss. CCCP algorithm is effective but time-consuming as a representative method of solving such models. To safely accelerate the training speed, taking RSVM as an example, we proposed a hybrid TSS method which fully exploits the advantages of DVI and DSR. In the first stage, the modified DVI deletes redundant samples before training and initializes a more accurate feasible solution for DSR. In the second stage, DSR is embedded

CRediT authorship contribution statement

Jie Zhao: Methodology, Software, Formal analysis, Writing - original draft. Yitian Xu: Methodology, Writing - review & editing, Supervision. Chang Xu: Visualization, Experiments. Ting Wang: Investigation, Experiments.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The authors gratefully acknowledge the helpful comments of the reviewers, which have improved the presentation. This work was supported in part by the National Natural Science Foundation of China (NO. 12071475, 11671010) and in part by the Beijing Natural Science Foundation, China (NO. 4172035).

Jie Zhao received the M.S. degree from the College of Computer Science and Technology, Shandong Technology and Business University. Now, he is currently pursuing the Ph.D. degree in College of Information and Electrical Engineering, China Agricultural University, Beijing, China. His current research interests include multi-view learning, support vector machine and feature selection.

His research has appeared in Applied Soft Computing.

References (40)

  • PangY. et al.

    Distributed object detection with linear SVMs

    IEEE Trans. Cybern.

    (2014)
  • YinY. et al.

    Bowel sound recognition using SVM classification in a wearable health monitoring system

    Sci. China Inf. Sci.

    (2018)
  • VapnikV.

    Statistical Learning Theory

    (1998)
  • T. Joachims, Transductive inference for text classification using support vector machine, in: Proc. 16th Int. Conf....
  • WuM. et al.

    A small sphere and large margin approach for novelty detection using training data with outliers

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2009)
  • WuY. et al.

    Robust truncated hinge loss support vector machines

    J. Amer. Statist. Assoc.

    (2007)
  • CollobertR. et al.

    Large scale transductive SVMs

    J. Mach. Learn. Res.

    (2006)
  • HuangX. et al.

    Ramp loss linear programming support vector machine

    J. Mach. Learn. Res.

    (2014)
  • Jayadeva R. KhemchandaniD. et al.

    Twin support vector machines for pattern classification

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2007)
  • YuilleA.L. et al.

    The concave-convex procedure

    Neural Comput.

    (2003)
  • Cited by (5)

    • Instance elimination strategy for non-convex multiple-instance support vector machine[Formula presented]

      2022, Applied Soft Computing
      Citation Excerpt :

      Recently, based on Rakotomamonjy’s [25] work, Zhai et al. [26] introduced an instance elimination rule to non-convex Ramp-SVM which can significantly reduce the computational time. To further accelerate the training process of Ramp-SVM, Zhao et al. [27] proposed a hybrid two-stage SSR. But these methods are only applicable to the category of supervised learning, they will no longer be applicable when the label of the data is ambiguous, such as MIL.

    • Dynamic penalty adaptive matrix machine for the intelligent detection of unbalanced faults in roller bearing

      2022, Knowledge-Based Systems
      Citation Excerpt :

      SVM is a machine learning method based on statistical theory. Either the substantial theoretical basis established by using VC dimension and structural risk minimization make SVM becomes an important branch of machine learning theory [32,34]. In this section, the principle of the proposed DPAMM method is introduced.

    • Convolutional rule inference network based on belief rule-based system using an evidential reasoning approach

      2022, Knowledge-Based Systems
      Citation Excerpt :

      Classification is one of the important issues of machine learning. Various classification algorithms have been researched, such as support vector machines (SVM) [1,2], random forests [3,4], K-nearest neighbors (KNN) [5,6], artificial neural networks (ANN) [7,8]. These methods have demonstrated contributions in applications and fundamental research [9–11].

    Jie Zhao received the M.S. degree from the College of Computer Science and Technology, Shandong Technology and Business University. Now, he is currently pursuing the Ph.D. degree in College of Information and Electrical Engineering, China Agricultural University, Beijing, China. His current research interests include multi-view learning, support vector machine and feature selection.

    His research has appeared in Applied Soft Computing.

    Yitian Xu received the Ph.D. degree from the College of Science, China Agricultural University, Beijing, China, in 2007. He was a Visiting Scholar with the Department of Computer Science and Engineering, Arizona State University, Tempe, AZ, USA, from 2013 to 2014. He is currently a Professor at the College of Science, China Agricultural University.

    He has authored about 70 papers. His current research interests include machine learning and data mining. Prof. Xu’s research has appeared in IEEE TPAMI, IEEE TNNLS, IEEE TCYB, IEEE TSP, IEEE TSMC, Information Science, Pattern Recognition, and Int. Conf. Mach. Learn. (ICML).

    Chang Xu was born in 1998. She received the B.S. degree from the School of Information Management, Beijing Information Science and Technology University, Beijing, China, in 2020. Now, she is currently pursuing the master degree in College of Information and Electrical Engineering, China Agricultural University, Beijing, China, from 2020.

    Her current research interests include machine learning and Price prediction. Her research has appeared in Knowledge-Based Systems.

    Ting Wang was born in 1998. She received the B.S. degree from the College of Science, Beijing Forestry University, Beijing, China, in 2020. Now, she is currently pursuing the master degree in the College of Science, China Agricultural University, Beijing, China, from 2020.

    Her current research interests include data mining and multi-task learning.

    View full text