A new adaptive weighted imbalanced data classifier via improved support vector machines with high-dimension nature

doi:10.1016/j.knosys.2019.104933

Knowledge-Based Systems

Volume 185, 1 December 2019, 104933

https://doi.org/10.1016/j.knosys.2019.104933 Get rights and content

Abstract

The standard support vector machine (SVM) models are widely used in various fields, but we show that they are not rationally defined from the perspective of geometric point, which is likely to degrade the models’ performances theoretically, especially under the high-dimensional cases. In this paper, we consider a composite penalty and propose an elastic net support vector machine (ENSVM). Unlike the doubly regularized support vector machine (DrSVM, Wang et al. (2006)), we impose the penalty to the slack variables rather than the normal vectors of the hyperplane. Then, we prove that ENSVM is more rationally defined than standard SVM and DrSVM (in section 3.2.1). Moreover, the ENSVM demonstrates a more stable and high-dimension nature inherently, while the simulation results cogently support these merits. Besides, we also combine fused weights with ENSVM and propose an adaptive weighted elastic net support vector machine (AWENSVM), to make the primal model more adaptive and robust to the imbalanced data. Compared with the other popular SVMs, the AWENSVM model proposed in this paper performs better obviously.

Introduction

Support vector machine (SVM) was first proposed in Cortes and Vapnik [1], which has become a powerful tool in many fields such as classification, detection, pattern recognition, gene selection and etc. For more related works, the information can be found in Chen and Wang [2], Waring and Liu [3], Zhang et al. [4] and Zhao et al. [5]. Considering a two-class classification issue, SVM attempts to find a proper hyperplane between these two classes of training points. Although SVM is succinct, efficient and more distinctly interpreted than artificial neural network (ANN), it still involves many weaknesses. One of them is that standard SVM models are limited in high-dimensional data classification problem. In fact, the SVM model works under the principle of structural risk minimization and penalized by an $l_{1}$ norm like LASSO (least absolute shrinkage and selection operator). LASSO was first proposed in Tibshirani [6], which shrinks the parameters by an $l_{1}$ penalty and it can produce exact zero coefficients. More information about model selection and estimation are listed in Fan and Li [7], Efron et al. [8], Zou and Hastie [9], Zhao and Yu [10], Pan and Xu [11] and Zhao et al. [12]. However, LASSO performs awfully in high-dimensional cases and lacks of Oracle properties (proposed in Fan and Li [7]). To promote the interpretation of LASSO in high-dimensional cases, Zou and Hastie [9] considered a mixed penalty called elastic net. They showed that the elastic net naturally has a potential excellent performance in high-dimensional cases. Meanwhile, the elastic net also encourages the grouping effects, which means it can select high correlated variables at the same time. Thus, Wang et al. [13] applied the elastic-net penalty to improve the support vector machine and proposed doubly regularized support vector machine (DrSVM) model. They also proved the grouping effects of DrSVM for correlated variables.

The other limitation is that SVM is severely influenced by the level of the imbalance between two classes training points. The imbalance has been being widely discussed for classifiers, such as Du and Chen [14], Sun et al. [15] and Zhang et al. [16]. Many studies have indicated that SVM prefers to assign the sample to the majority class. So the performance of the minority class deteriorates. A lot of researchers studied this problem and made varieties of improvements to the standard SVM, briefly reviewed in Li et al. [17], Tang et al. [18], Zou et al. [19], Akbani et al. [20], Chawla et al. [21], Huang and Du [22], Ji and Xing [23], Shao et al. [24], Xu et al. [25], Xu [26] and Xu et al. [27]. Recently, Hwang et al. [28] considered a new weighted SVM, which performs well in simulations for the imbalanced data classification. Moreover, Ji and Xing [23] combined the density and distance weights to produce a new adaptive weighted One-Class SVM which was turned out to be robust in simulation studies.

In this paper, we consider these problems in the geometric view of SVM model. We first point out the geometric irrationality of the standard SVM and propose the primal improved model, so called elastic net support vector machine (ENSVM). The most notable difference between DrSVM and ENSVM is that ENSVM uses elastic-net penalty to constrain the slack variables other than the feature variables in DrSVM. We prove that this little modification makes standard SVM be more rationally defined than DrSVM and the standard SVM. We further show the ENSVM model possesses high-dimension nature in Section 3.2.2 theoretically. Thus, the ENSVM model shows a good performance in high-dimensional cases. The simulation results powerfully support these properties. To improve the performance in imbalanced data classifying issues, we develop combinational weights and propose adaptive weighted elastic net support vector machine (AWENSVM), which performs well in imbalanced data classification problem and presents to be more robustly than the standard SVM and other existing weighted SVMs.

In the rest of the contents, this paper is organized as follows. The next section, we give a brief review to the standard SVM. Besides, we also introduce the elastic net and DrSVM together with some important properties. In Section 3, we propose the AWENSVM model and discuss its vital systematic properties in detail, including the rationality and high-dimension nature. In Section 4, we use synthetic data and UCI (The UCI Machine Learning Repository) datasets to compare the performance of AWENSVM with other famous SVMs. Finally, a conclusion is derived in Section 5.

Section snippets

Soft support vector machines

For the convenience, we consider a dataset $X \in R^{N \times p}$ which consists of $N$ observations and $p$ features. The $i$ th training datum denoted by $x_{i} = {(x_{i 1}, \dots, x_{i p})}^{T}$ corresponds to class $y_{i}$ , where $y_{i} = - 1$ or $y_{i} = 1$ for $i = 1, \dots, N$ . For this classification problem, the standard support vector machine aims to maximize the margin between the two classes. However, the hard-margin SVM is limited due to very strict conditions. Thus, the soft-margin SVM is more reasonable and applicable, which is defined as follows. $m i n i m i z e \frac{1}{2}$

The elastic net SVM model

Considering the desired properties of elastic net, we propose the elastic net support vector machine. Unlike the DrSVM, we impose the elastic-net penalty to slack variables instead of feature variables. Thus, the ENSVM model are defined as follows.

Definition 1

Suppose the training datasets $(Y, X)$ with $N$ observations and $p$ features, $C_{1}$ and $C_{2}$ are trade-off parameters, then the ENSVM is defined by $m i n i m i z e \frac{1}{2} ({‖ ω ‖}_{2}^{2} + b^{2}) + \frac{C_{1}}{2} ξ^{T} ξ + C_{2} ξ$ $s . t . \{\begin{aligned} D (X ω + e b) - e + ξ \geq 0 \\ ξ \geq 0 \end{aligned}$

The geometric rationality of a classifier

For a classifying issue, we often care about whether the

Data tests

In this section, we mainly demonstrate the performance of AWENSVM with its standard counterparts. Specifically, we present the results using the synthetic data and the real data taken from UCI. Usually, the results of classification can be divided into four cases, TP (true positive), FP (false positive), TN (true negative) and FN (false negative). The number of each category is denoted by $N_{T P}, N_{F P}, N_{T N}$ and $N_{F N}$ . The overall accuracy of the classifier is defined as $P_{a c c} = \frac{N_{T P} + N_{T N}}{N_{T P} + N_{T N} + N_{F P} + N_{F N}}$

Conclusion

In this paper, we propose an adaptive weighted elastic net support machine (AWENSVM) for imbalanced data classification. The most important property of AWENSVM is that it enjoys a potential in high-dimensional classification issues while the other SVM models are limited in the same problems. By considering a combinational weights, the AWENSVM is competitive in imbalanced data classification problems compared with other common weighted SVM models, and outperforms in high-dimensional cases. The

Acknowledgments

The authors are grateful to referees for careful reading of the manuscript and comments with lead to an improved version of the paper. This work is supported by the National Natural Science Foundation of China [Grant No. 11671059].

References (36)

PanX. et al.
A safe reinforced feature screening strategy for lasso based on feasible solutions
Inform. Sci.
(2019)
SunJ. et al.
Imbalanced enterprise credit evaluation with DTE-sbd: decision tree ensemble based on SMOTE and bagging with differentiated sampling rates
Inform. Sci.
(2018)
ShaoY. et al.
Weighted linear loss twin support vector machine for large-scale classification
Knowl.-Based Syst.
(2015)
XuY. et al.
A maximum margin and minimum volume hyper-spheres machine with pinball loss for imbalanced data classification
Knowl.-Based Syst.
(2016)
HwangJ.P. et al.
A new weighted approach to imbalanced data classification problem via support vector machine with quadratic cost function
Expert Syst. Appl.
(2011)
AntalB. et al.
An ensemble-based system for automatic screening of diabetic retinopathy
Knowl.-Based Syst.
(2014)
JohnsonB. et al.
Classifying a high resolution image of an urban area using super-object information
ISPRS J. Photogramm. Remote Sens.
(2013)
CortesC. et al.
Support-vector networks
Mach. Learn.
(1995)
ChenY. et al.
Support vector learning for fuzzy rule-based classification systems
IEEE Trans. Fuzzy Syst.
(2003)
WaringC.A. et al.
Face detection using spectral histograms and svms
IEEE Trans. Syst. Man Cybern. B
(2005)

ZhangH. et al.

Gene selection using support vector machines with non-convex penalty

Bioinformatics

(2005)

ZhaoJ. et al.

An improved non-parallel universum support vector machine and its safe sample screening rule

Knowl.-Based Syst.

(2019)

TibshiraniR.

Regression shrinkage and selection via the LASSO

J. R. Statist. Soc. B

(1996)

FanJ. et al.

Variable selection via nonconcave penalized likelihood and its oracle properties

J. Amer. Statist. Assoc.

(2001)

EfronB. et al.

Least angle regression

Ann. Statist.

(2004)

ZouH. et al.

Regularization and variable selection via the elastic net

J. R. Statist. Soc. B

(2005)

ZhaoP. et al.

On model selection consistency of LASSO

J. Mach. Learn. Res.

(2006)

ZhaoJ. et al.

An improved non-parallel universum support vector machine and its safe sample screening rule

Knowl.-Based Syst.

(2019)

Cited by (0)

^☆: No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys.2019.104933.

View full text

A new adaptive weighted imbalanced data classifier via improved support vector machines with high-dimension nature☆

Abstract

Introduction

Section snippets

Soft support vector machines

The elastic net SVM model

The geometric rationality of a classifier

Data tests

Conclusion

Acknowledgments

Inform. Sci.

Inform. Sci.

Knowl.-Based Syst.

Knowl.-Based Syst.

Expert Syst. Appl.

Knowl.-Based Syst.

ISPRS J. Photogramm. Remote Sens.

Support-vector networks

Mach. Learn.

Support vector learning for fuzzy rule-based classification systems

IEEE Trans. Fuzzy Syst.

Face detection using spectral histograms and svms

IEEE Trans. Syst. Man Cybern. B

Gene selection using support vector machines with non-convex penalty

Bioinformatics

An improved non-parallel universum support vector machine and its safe sample screening rule

Knowl.-Based Syst.

Regression shrinkage and selection via the LASSO

J. R. Statist. Soc. B

Variable selection via nonconcave penalized likelihood and its oracle properties

J. Amer. Statist. Assoc.

Least angle regression

Ann. Statist.

Regularization and variable selection via the elastic net

J. R. Statist. Soc. B

On model selection consistency of LASSO

J. Mach. Learn. Res.

An improved non-parallel universum support vector machine and its safe sample screening rule

Knowl.-Based Syst.