A two-stage safe screening method for non-convex support vector machine with ramp loss
Introduction
Support vector machine (SVM) [1], proposed by Vapnik, is a competitive classification algorithm based on the structural risk minimization principle and the maximum margin principle. For decades, SVM has been successfully applied to address numerous practical issues in various fields, including target detection [2], [3], medical and health [4], text classification [5], food quality determination [6], etc. In view of distinct backgrounds and data characteristics, some variants of SVM have been presented [7], [8], [9], [10], such as Transductive SVM (TSVM) [8], [11] in semi-supervised learning, small sphere and large margin (SSLM) [9] for the imbalanced classification problems, ramp SVM (RSVM) [10], [12], [13] which can handle outliers, and twin support vector machine (TWSVM) which reduces computational complexity [14].
Among the mentioned improved models, RSVM is an attractive one. By replacing hinge loss with ramp loss, RSVM alleviates the influence of outliers on the decision hyperplane. Nonetheless, RSVM suffers from the disadvantages of non-convexity and it is often treated as a difference of convex functions (DC). Therefore, it is usually solved by DC programming-based methods, such as the representative concave–convex procedure (CCCP) which [15] iteratively solves a series of convex problems. In other words, RSVM inherits the sparsity, but aggravates the high computational complexity of SVM. It remains a challenge to cope with the large-scale datasets.
Recently, an effective acceleration approach termed as “safe screening” has emerged, whose fundamental is to safely enhance the training speed by screening out the inactive features or samples in advance. The term “safely” here refers to the consistency between the optimal solutions obtained by the safe screening methods and the original models. It was first presented by Ghaoui et al. [16] for sparse supervised learning models, such as least absolute shrinkage and selection operator (LASSO) and -norm SVM, etc. That rule utilizes the sparsity of the model to eliminate many redundant features beforehand, thereby reducing the scale of problem and accelerating the computational speed. Due to its effectiveness, safe screening has been widely concerned by researchers at present. Numerous improved rules [17], [18], [19], [20], [21] have been exploited for different sparse models, including SVM and its variants [22], [23], [24], [25], [26].
Safe sample screening methods for SVM can be roughly grouped into two types:
(1) The methods of the first category are applied to discard inactive samples prior to training [27], [28], [29], which relies on the optimal solution corresponding to the previous parameter. Therefore, these rules are often combined with parameter selection frameworks. For example, Ogawa et al. [28] first proposed a sequential safe sample screening rule for SVM. Later, Wang et al. [27] improved the sequential dual screening by using variational inequalities (DVI). Recently, Cao et al. [30] and Yuan et al. [31] have extended the safe screening approaches to handle hypersphere SVMs.
(2) The second type is embedded in solvers, which is based on the feasible solution, thus it is also known as dynamic screening rule (DSR) [32], [33]. For instance, Fercoq et al. [34] utilized the duality gap to construct a screening rule for LASSO. It has been theoretically proven that as the duality gap shrinks to 0, all redundant features can be found. Zimmert et al. [35] generalized this strategy to SVM. Rakotomamonjy et al. [36] and Zhai et al. [37] extended the idea of safe screening to non-convex sparse LASSO and non-convex SVM, respectively.
Moreover, hybrid strategies [38], [39] which combine different screening methods have also been proposed successively. For instance, Pan et al. [38] presented a hybrid screening framework for SVM. It manifests that the hybrid strategies often outperform single strategy. Nevertheless, this framework cannot be mechanically applied to the non-convex models, even to the surrogate problems based on CCCP. Modified DSR proposed by Zhai et al. [37], whilst it is useful for surrogate problems of RSVM, cannot accelerate the current problem by the optimal solutions of other parameters. That is to say, it is insufficient to form a complete framework.
To this end, a two-stage safe screening method for RSVM based on CCCP is proposed in this paper. In the first stage, a modified DVI for RSVM is constructed, which can identify those inactive samples before training and initialize a more accurate feasible solution for DSR. It is worth mentioning that the optimal solutions with the same and different parameters are both serviceable for DVI. During the second stage, DSR is embedded into the solver for further speedup. To ensure the safety of TSS, a post-checking procedure is added after the solution is obtained. Theoretical analysis and numerical experiments verify the effectiveness and safety of TSS. Furthermore, we also extend TSS to the nonlinear case and other non-convex variants of SVM.
In summary, the main contributions of this paper are as follows:
(1) For the non-convex SVM model, a hybrid two-stage safe screening method is proposed. It can effectively reduce the scale of the problems and accelerate the training speed. (2) Before solving the surrogate problems, modified DVI based-screening rules are constructed for different conditions. (3) DSR and the post-checking step ensure the safety of TSS, which means that TSS provides an identical solution as RSVM. (4) The basic idea of this hybrid two-stage acceleration strategy can be extended to other non-convex models.
The remaining part of the paper proceeds as follows. In Section 2, we first give a brief overview of the basic properties and CCCP-based solving process of RSVM. Then, the general formula of safe screening rule is introduced. Details of our TSS are elaborated in Section 3. Experiments on synthetic datasets and real benchmark datasets are carried out to illustrate the superiority of TSS in Section 4. We extend our TSS method to three other models and conduct experiments on the regression datasets in Section 5. Conclusions are given in the last Section.
Section snippets
Related work
In this section, we first describe the notations used in this paper, then give a brief description of RSVM and its safe screening paradigm.
Two-stage screening method for RSVM
In this section, we describe and analyze the detailed hybrid two-stage acceleration method for RSVM, including DVI, DSR and the post-checking procedure. For clarity, the screening rules for successive CIL problems with the same and different parameters are referred to as SCSR and DPSR, respectively. SCSR and DPSR are collectively called DVI. To gain more insight, the flowing chart of applying TSS to RSVM is illustrated in Fig. 2.
The main theories involved in TSS are variational inequality and
Numerical experiments
To verify the feasibility and efficiency of TSS, experiments on five synthetic datasets and twelve real-world benchmark datasets are conducted. These real datasets are from UCI1 or LIBSVM2 databases. All experiments are performed in MATLAB (R2016a) on Windows 10 running on a Desktop PC with system configuration Intel(R)Xeon(R)Gold-5115 CPU(2.40 GHz) 64.00 GB RAM.
In all experiments, DCDM is adopted
Extensions of TSS
On the basis of TSS for RSVM, we deduce the safe screening rules for non-convex models which introduce ramp loss or employ CCCP algorithm. Take three models as examples in this section, including twin SVM with ramp loss (RTWSVM), support vector regression with ramp loss (RSVR) and a large scale Transductive SVM (TSVM).
Conclusion
In this paper, we concentrate on the non-convex models with the ramp loss. CCCP algorithm is effective but time-consuming as a representative method of solving such models. To safely accelerate the training speed, taking RSVM as an example, we proposed a hybrid TSS method which fully exploits the advantages of DVI and DSR. In the first stage, the modified DVI deletes redundant samples before training and initializes a more accurate feasible solution for DSR. In the second stage, DSR is embedded
CRediT authorship contribution statement
Jie Zhao: Methodology, Software, Formal analysis, Writing - original draft. Yitian Xu: Methodology, Writing - review & editing, Supervision. Chang Xu: Visualization, Experiments. Ting Wang: Investigation, Experiments.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The authors gratefully acknowledge the helpful comments of the reviewers, which have improved the presentation. This work was supported in part by the National Natural Science Foundation of China (NO. 12071475, 11671010) and in part by the Beijing Natural Science Foundation, China (NO. 4172035).
Jie Zhao received the M.S. degree from the College of Computer Science and Technology, Shandong Technology and Business University. Now, he is currently pursuing the Ph.D. degree in College of Information and Electrical Engineering, China Agricultural University, Beijing, China. His current research interests include multi-view learning, support vector machine and feature selection.
His research has appeared in Applied Soft Computing.
References (40)
- et al.
Ontology-based enriched concept graphs for medical document classification
Inform. Sci.
(2020) - et al.
A deep feature mining method of electronic nose sensor data for identifying beer olfactory information
J. Food Eng.
(2019) - et al.
Ramp loss nonparallel support vector machine for pattern classification
Knowl. Based Syst.
(2015) - et al.
An improved non-parallel universum support vector machine and its safe sample screening rule
Knowl. Based Syst.
(2019) - et al.
Multi-parameter safe sample elimination rule for accelerating nonlinear multi-class support vector machines
Pattern Recognit.
(2019) - et al.
Multi-variable estimation-based safe screening rule for small sphere and large margin support vector machine
Knowl. Based Syst.
(2020) - et al.
Bound estimation-based safe acceleration for maximum margin of twin spheres machine with pinball loss
Pattern Recognit.
(2021) - et al.
A hybrid acceleration strategy for nonparallel support vector machine
Inform. Sci.
(2021) - et al.
Support-vector networks
Mach. Learn.
(1995) - et al.
Robust least-squares support vector machine with minimization of mean and variance of modeling error
IEEE Trans. Neural Netw. Learn. Syst.
(2018)
Distributed object detection with linear SVMs
IEEE Trans. Cybern.
Bowel sound recognition using SVM classification in a wearable health monitoring system
Sci. China Inf. Sci.
Statistical Learning Theory
A small sphere and large margin approach for novelty detection using training data with outliers
IEEE Trans. Pattern Anal. Mach. Intell.
Robust truncated hinge loss support vector machines
J. Amer. Statist. Assoc.
Large scale transductive SVMs
J. Mach. Learn. Res.
Ramp loss linear programming support vector machine
J. Mach. Learn. Res.
Twin support vector machines for pattern classification
IEEE Trans. Pattern Anal. Mach. Intell.
The concave-convex procedure
Neural Comput.
Cited by (5)
Smooth and semi-smooth pinball twin support vector machine
2023, Expert Systems with ApplicationsFeature screening strategy for non-convex sparse logistic regression with log sum penalty
2023, Information SciencesInstance elimination strategy for non-convex multiple-instance support vector machine[Formula presented]
2022, Applied Soft ComputingCitation Excerpt :Recently, based on Rakotomamonjy’s [25] work, Zhai et al. [26] introduced an instance elimination rule to non-convex Ramp-SVM which can significantly reduce the computational time. To further accelerate the training process of Ramp-SVM, Zhao et al. [27] proposed a hybrid two-stage SSR. But these methods are only applicable to the category of supervised learning, they will no longer be applicable when the label of the data is ambiguous, such as MIL.
Dynamic penalty adaptive matrix machine for the intelligent detection of unbalanced faults in roller bearing
2022, Knowledge-Based SystemsCitation Excerpt :SVM is a machine learning method based on statistical theory. Either the substantial theoretical basis established by using VC dimension and structural risk minimization make SVM becomes an important branch of machine learning theory [32,34]. In this section, the principle of the proposed DPAMM method is introduced.
Convolutional rule inference network based on belief rule-based system using an evidential reasoning approach
2022, Knowledge-Based SystemsCitation Excerpt :Classification is one of the important issues of machine learning. Various classification algorithms have been researched, such as support vector machines (SVM) [1,2], random forests [3,4], K-nearest neighbors (KNN) [5,6], artificial neural networks (ANN) [7,8]. These methods have demonstrated contributions in applications and fundamental research [9–11].
Jie Zhao received the M.S. degree from the College of Computer Science and Technology, Shandong Technology and Business University. Now, he is currently pursuing the Ph.D. degree in College of Information and Electrical Engineering, China Agricultural University, Beijing, China. His current research interests include multi-view learning, support vector machine and feature selection.
His research has appeared in Applied Soft Computing.
Yitian Xu received the Ph.D. degree from the College of Science, China Agricultural University, Beijing, China, in 2007. He was a Visiting Scholar with the Department of Computer Science and Engineering, Arizona State University, Tempe, AZ, USA, from 2013 to 2014. He is currently a Professor at the College of Science, China Agricultural University.
He has authored about 70 papers. His current research interests include machine learning and data mining. Prof. Xu’s research has appeared in IEEE TPAMI, IEEE TNNLS, IEEE TCYB, IEEE TSP, IEEE TSMC, Information Science, Pattern Recognition, and Int. Conf. Mach. Learn. (ICML).
Chang Xu was born in 1998. She received the B.S. degree from the School of Information Management, Beijing Information Science and Technology University, Beijing, China, in 2020. Now, she is currently pursuing the master degree in College of Information and Electrical Engineering, China Agricultural University, Beijing, China, from 2020.
Her current research interests include machine learning and Price prediction. Her research has appeared in Knowledge-Based Systems.
Ting Wang was born in 1998. She received the B.S. degree from the College of Science, Beijing Forestry University, Beijing, China, in 2020. Now, she is currently pursuing the master degree in the College of Science, China Agricultural University, Beijing, China, from 2020.
Her current research interests include data mining and multi-task learning.