Ramp Loss based robust one-class SVM

doi:10.1016/j.patrec.2016.11.016

Pattern Recognition Letters

Volume 85, 1 January 2017, Pages 15-20

https://doi.org/10.1016/j.patrec.2016.11.016 Get rights and content

Highlights

•
A new method is proposed to reduce outliers’ influence on OCSVM model.
•
Ramp Loss function is introduced into OCSVM optimization.
•
An iterative algorithm is proposed to solve this new OCSVM optimization.
•
The outliers are first identified and then removed from the training set.

Abstract

One-class SVM (OCSVM) is widely adopted in one-class classification (OCC) fields. However, outliers in the training set negatively influence the classification surface of OCSVM, degrading its performance. To solve this problem, a novel method is proposed in this paper. This proposed method introduces Ramp Loss function into OCSVM optimization, so as to reduce outliers’ influence. Then the outliers are identified and removed from the training set. The final classification surface is obtained on the remaining training samples. Various experiments verify the effectiveness of this proposed method.

Introduction

In many practical manufacturing processes, it is convenient to obtain plenty of target data in the normal operating condition, whereas the non-target data in the abnormal operating condition is too difficult to characterize or too expensive to obtain. In this case, general classification methods are no longer suitable. To achieve better classification performance, one-class classification (OCC) methods can be adopted to describe the target data, which is usually the case in anomaly detection [6], [18], [24] and novelty detection [5], [7].

Among the OCC methods, one-class SVM (OCSVM) avoids the estimation of the distribution density of the target class, is able to deal with nonlinear data, and inherits the sparseness from SVM. Therefore it is widely used in OCC application fields [12], [14], [19], [23]. However, there are some drawbacks in OCSVM. Because some training samples are allowed to be located outside the classification surface, outliers are apt to become support vectors (SVs), negatively influencing the surface.

In order to reduce the outliers’ influence on OCC method, researchers have tried to remove the outliers by data preprocessing methods. Tax and Duin [25] propose to use the distances of a sample to its k nearest neighbors to detect and remove outliers. Breunig et al. [4] try to assign to each sample in the data set a degree of being an outlier by estimating its local density, and the degree is called LOF. Zheng et al. [30] use the LOF to filter the raw samples to remove outliers. Khan et al. [16] and Andreou and Karathanassi [2] calculate the inter quartile range (IQR) of the training samples, and it provides a means to indicate the boundary beyond which the samples will be labeled as outliers and be removed. However, due to the diversity of sample distribution, it is not easy to remove outliers through the data preprocessing.

In OCSVM, researchers have tried to modify the OCSVM method to improve its robustness to outliers. Yin et al. [28] propose to weigh training samples according to their distances to the sample center in the feature space, hoping to alleviate the penalty on outliers, so as to make them more likely to be located outside the classification surface. This method is hereafter referred to as weight OCSVM. However, due to the absence of prior knowledge of outlier distribution, it is difficult to weigh samples properly, and thus the results of this method are not satisfactory. Different from the above method, which contains two steps: weighing samples and training OCSVM, Amer et al. [1] propose to identify outliers while optimizing the OCSVM hyper-plane. This method is hereafter referred to as eta OCSVM. They modified the OCSVM optimization object by introducing 0-1 variables η_i, which indicate whether x_i is an outlier. The introduction of these discrete variables makes it more difficult to solve the optimization problem. They relax this prime optimization problem and thus put forward an iterative algorithm. In each iteration, the samples with $η_{i} = 1$ are constrained to be located above the hyper-plane, and those with $η_{i} = 0$ are excluded from this training stage. If the η_i does not indicate outliers correctly in some iteration, then the result of this iteration will be negatively influenced by outliers, and the subsequent results will also be influenced. Therefore, this method does not perform as expected.

In this paper, we propose a novel method to reduce the influence of outliers. First, a surface enclosing the cluster core [3] of the target sample distribution is learned from the outlier contaminated training sample set. The cluster core refers to the samples located at the core of the target sample cluster, i.e. the more representative target samples. This part is done by the proposed method named “Ramp Loss OCSVM”, where the Ramp Loss function is used to replace the Hinge Loss function to avoid outliers from becoming support vectors, such that the surface enclosing cluster core can be obtained. Next, this surface is used to identify outliers and remove them from the training set, so as to train the final OCSVM. Combining the above two parts, the whole algorithm is formed, ROCSVM.

The remainder of this paper is arranged as follows: the second section states the basic idea of the proposed method; the third section proposes the Ramp Loss OCSVM problem; the fourth section gives the algorithm of the proposed method; the fifth section compares the proposed method with other relevant methods on outlier contaminated data sets; the sixth section concludes this paper.

Section snippets

The basic idea of the proposed method

We first review conventional OCSVM briefly. The basic idea of OCSVM is to find a hyper-plane $〈 w, φ (x) 〉 - ρ = 0$ in the feature space that separates sample images from the origin with maximum margin. The primal optimization problem is as follows [22], $\begin{matrix} \min_{w, ξ, ρ} & \frac{1}{2} {∥ w ∥}^{2} - ρ + \frac{1}{υ n} \sum_{i = 1}^{n} ξ_{i} \\ s.t. & 〈 w, φ (x_{i}) 〉 \geq ρ - ξ_{i}, ξ_{i} \geq 0 . \end{matrix}$ where x_i denote training samples, n is the total number of training samples, $υ$ is a trade-off parameter and ξ_i are slack variables. This problem is a convex optimization and can be solved by its dual

Ramp Loss OCSVM

Similar to OCSVM, the binary-class and multi-class SVM are also influenced by the outliers in training sets [21], [26]. To solve this problem for SVM, researchers propose to use Ramp Loss function [8], [10], [15], [27]. Inspired by this, we propose to introduce Ramp Loss function into OCSVM, attempting to reduce outliers’ negative influence.

To facilitate the subsequent discussion, the conventional OCSVM is rewritten as follows. $\min_{w, ρ} J (w, ρ) = \frac{1}{2} {∥ w ∥}^{2} - ρ + \frac{1}{n υ} \sum_{i = 1}^{n} H_{ρ} (z_{i})$ where $z_{i} = 〈 w, φ (x_{i}) 〉$ . The slack

Solve Ramp Loss OCSVM

Different from Hinge Loss, Ramp Loss is not a convex function. Its introduction into the OCSVM causes that the problem is no longer a convex optimization. This problem should be solved in a new way. It is found that the Ramp Loss function is actually the difference between two Hinge Loss functions. That is, $R_{ρ, r} (z_{i}) = H_{ρ} (z_{i}) - H_{r ρ} (z_{i}),$ as shown in Fig. 3. Based on this relation of the two loss functions, the optimization problem in Eq. (5) can be rewritten as follows. $\begin{matrix} \min_{w, ρ} J (w, ρ) & = \frac{1}{2} {∥ w ∥}^{2} - ρ + \frac{1}{n υ} \sum_{i = 1}^{n} H \end{matrix}$

Experiments

In this section, we compare the proposed ROCSVM with conventional OCSVM, weight OCSVM [28] and eta OCSVM [1]. By experiments on a 2D data set and UCI benchmark data sets, these methods are compared on OCC performance.

Conclusion

In this paper, we have discussed the problem that conventional OCSVM is negatively influenced by outliers. The reason lies in the fact that the penalty inflicted by Hinge Loss function on the samples located outside the classification surface is unbounded. To solve this problem, we propose the ROCSVM method, in which the Ramp Loss function is introduced to identify and then remove the outliers. As shown by experiments, compared to conventional OCSVM and other relevant methods, the proposed

References (30)

G.G. Cabral et al.
One-class classification based on searching for the problem features limits
Expert Syst. Appl.
(2014)
T. Fawcett
An introduction to ROC analysis
Pattern Recognit. Lett.
(2006)
Z.Q. Qi et al.
Robust twin support vector machine for pattern classification
Pattern Recognit.
(2013)
X.M. Song et al.
A SVM-based quantitative fMRI method for resting-state functional network detection
Magn. Reson. Imaging
(2014)
N. Vretos et al.
Using robust dispersion estimation in support vector machines
Pattern Recognit.
(2013)
S. Yin et al.
Fault detection based on a robust one class support vector machine
Neurocomputing
(2014)
M. Amer et al.
Enhancing one-class support vector machines for unsupervised anomaly detection
Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description
(2013)
C. Andreou et al.
Estimation of the number of endmembers using robust outlier detection method
IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.
(2014)
A. Ben-Hur et al.
Support vector clustering
J. Mach. Learn. Res.
(2002)
M.M. Breunig et al.
Lof: identifying density-based local outliers
ACM SIGMOD International Conference on Management of Data
(2000)

C. Callegari et al.

Improving PCA-based anomaly detection by using multiple time scale analysis and Kullback-Leibler divergence

Int. J. Commun. Syst.

(2014)

L. Clifton et al.

Probabilistic novelty detection with support vector machines

IEEE Trans. Reliab.

(2014)

R. Collobert et al.

Trading convexity for scalability

Proceedings of the 23rd international conference on Machine learning

(2006)

J. Demsar

Statistical comparisons of classifiers over multiple data sets

J. Mach. Learn. Res.

(2006)

S. Ertekin et al.

Nonconvex online support vector machines

IEEE Trans. Pattern Anal. Mach. Intell.

(2011)

Cited by (37)

Detecting the signs of desertification with Landsat imagery: A semi-supervised anomaly detection approach
2024, Results in Engineering
Desertification detection is a crucial step to improve the management of affected areas and aid in mitigating the negative impacts of desertification. This study proposes a semi-supervised approach that uses Landsat imagery and radiometric data to detect desertification. The approach involves extracting radiometric data, which is used as an indicator to identify the thematic type and desertification evolution over time. Four anomaly detection techniques, including One-Class Support Vector Machine (OCSVM), Isolation Forest, Elliptic Envelope, and Local Outlier Factor, are trained on radiometric data from non-desertified regions to detect abnormal events related to desertification. These semi-supervised techniques use unlabeled data during training and only require desertification-free data, making them practical. The study was conducted in the arid region around Biskra, Algeria, which is a well-known area strongly affected by desertification. The OCSVM method achieved the highest detection accuracy of 95.40% in comparison to other methods and studies. Furthermore, to enhance result reliability, the Bootstrap technique was employed to generate 95% confidence intervals for five evaluation metrics.
Robust hybrid learning approach for adaptive neuro-fuzzy inference systems
2024, Fuzzy Sets and Systems
The Adaptive Neuro-Fuzzy Inference System (ANFIS) is a regression model that uses fuzzy logic and neural networks, making it suitable for modeling the uncertainty of regression problems. However, the non-robust loss function in ANFIS's hybrid learning algorithm can make it susceptible to the direct effects of noise and outliers. This paper introduces a new procedure that uses robust loss functions to enhance the hybrid learning performance against noise and outliers. In addition, a new robust loss function is devised that can completely ignore outliers. Furthermore, a set of robust loss functions with mathematical relations are suggested. The proposed approach is evaluated on real-world problems, including weather forecasting and stock market prediction. Results suggested that the proposed model can reduce the Mean Square Error (MSE) in regression. Moreover, the new procedure enables utilizing different loss functions based on the application.
Privileged multi-view one-class support vector machine
2024, Neurocomputing
One-class support vector machine (OCSVM) is a typical one-class classification approach, which learns the classifier by using only the target samples. At present, most OCSVM works hypothesize that the samples have only one view, while multi-view OCSVM has not been taken into account. In this paper, a novel multi-view one-class support vector machine method with privileged information learning (MOCPIL) is put forward. MOCPIL embodies both the consensus principle and complementarity principle in multi-view learning. Privileged information is additional data that is available only in the training process, but not in the testing process. By introducing the idea of privileged information learning, MOCPIL implements the complementarity principle by treating one view as the training data and the other view as the privileged data. Moreover, MOCPIL implements the consensus principle by requiring that different views of the same object should give similar predicting outputs. The learning problem of MOCPIL is a quadratic programming (QP) problem, which is able to be solved by off-the-shelf QP solvers. To the best of our knowledge, this is the first study to tackle the multi-view learning problem based on OCSVM. The performance of MOCPIL is evaluated through extensive experiments. The experimental results have shown that MOCPIL explicitly outperforms the existing multi-view one-class classification methods.
Huberized one-class support vector machine with truncated loss function in the primal space
2022, Advances in Engineering Software
One-class support vector machine (OCSVM) is an important tool in machine learning and has been extensively used for one-class classification problems. The traditional OCSVM solves the primal problem by solving the dual problem, which is a quadratic programming problem. However, the computation of the quadratic programming is cubic and the storage complexity is quadratic with problem scale, so it is inefficient for training large-scale problems. In this paper, we propose to train OCSVM in primal space directly. Unfortunately, owing to the non-differentiability of hinge loss used in OCSVM, the OCSVM cannot be solved by the gradient-based optimization method which is first-order method that converges fast. On the other hand, the hinge loss is unbounded which makes the OCSVM less robust to outliers. The outliers will make the decision boundary severely deviate from the optimal hyperplane. To overcome the drawbacks, a huberized truncated loss function which is a nonconvex differentiable function is proposed to improve the robustness of the OCSVM. The huberized truncated loss function is insensitive to outliers as a substitute for hinge loss in traditional OCSVM. In contrast to traditional OCSVM, the primal objective function of robust OCSVM is differentiable. Considering the non-convexity of the optimization problem, we employ an accelerated proximal gradient algorithm to solve the robust OCSVM in the primal space. The numerical experiments on benchmark datasets and handwritten digit datasets show that the proposed method not only improves the robustness of the OCSVM , but also can reduce the computational complexity.
Adaptive loss function based least squares one-class support vector machine
2022, Pattern Recognition Letters
Least squares one-class support vector machine (LS-OCSVM) can accurately describe the similarity between new sample and training set. However, LS-OCSVM is very sensitive to the outliers among training samples, which means that the separating hyperplane of LS-OCSVM may deviate from the normal data even with a few outliers. To enhance the anti-outlier performance of LS-OCSVM, a novel adaptive loss function based LS-OCSVM is proposed. In the proposed method, an adaptive loss function is utilized to substitute the square loss function in the objective function of LS-OCSVM. The property of Fisher consistency for the adaptive loss function is validated from the theoretical viewpoint. The optimization problem of the proposed method is solved by the iteratively reweighted least squares (IRLS) method. In comparison with its nine related methods, the proposed method demonstrates better anti-outlier and generalization abilities on synthetic and benchmark data sets.
A non-convex semi-supervised approach to opinion spam detection by ramp-one class SVM
2020, Information Processing and Management
Citation Excerpt :
Among the aforementioned techniques, both models of one class SVM have demonstrated superior results in real-world problems, namely in fault detection (Yingchao Xiao, Wang, Xu & Zhou, 2016; Yin, Zhu & Jing, 2014), anomaly detection (S.M.H. Bamakan, Amiri, Mirzabagheri & Shi, 2015; Erfani, Rajasegarar, Karunasekera & Leckie, 2016; Tian, Mirzabagheri, Bamakan, Wang & Qu, 2018), activation detection (Yang et al., 2010), novelty detection (Sadooghi & Khadem, 2018), and process monitoring (Lee & Kim, 2018; J. Wang, Liu, Qiu, Yu & Zhao, 2018). Although the effectiveness of Ramp-OCSVM is discussed in different studies including (S.M.H. Bamakan et al., 2017; Huang, Shi & Suykens, 2014b; Liu et al., 2015; Tian et al., 2018; Xiao et al., 2017), its performance in dealing with a complex and challenging task of opinion spam detection has not been examined. Below we briefly introduce the formulation of Ramp loss one-class SVM.
Nowadays, e-commerce has become a part of our daily life in such a way that people's decision for buying products or choosing services highly depends on comments, reviews, and rates, which are posted on related businesses’ website and other social media. Because of the importance and prevalence of these sources of information, fraudsters are tempted to use fraudulently opinion sharing platforms in order to promote or to discredit some target products or services. Although a wide range of approaches have been proposed to address this problem and to help distinguish between the deceptive or fraudulent opinions from the trustful ones, this is still a challenging problem. Lack of well-defined deceptive data samples or insufficiency of spam review instances in the training sets cause supervised techniques facing an imbalanced classes problem. Furthermore, even in the golden normal opinion datasets, there is a possibility of the presence of some abnormal records or outliers. To deal with these two issues, we propose a robust and non-convex semi-supervised algorithm called “Ramp One-Class SVM”. In the proposed method, one-class SVM is adopted to handle the lack of labeled data for the deceptive opinions and by taking the advantages of non-convex properties of the Ramp loss function, we eliminate the effects of outliers and non-review opinions. The performance of the proposed method is evaluated by an artificial dataset and two real datasets including Ott and Yelp crowdsourced datasets. The results show the superiority of our method by achieving an accuracy of 92.13% and 74.37% for Ott and Yelp crowdsourced datasets, respectively. The obtained results also reveal the effectiveness of the proposed model in terms of precision, recall, generalization power, and robustness to the outliers.

View all citing articles on Scopus

View full text

Ramp Loss based robust one-class SVM

Highlights

Abstract

Introduction

Section snippets

The basic idea of the proposed method

Ramp Loss OCSVM

Solve Ramp Loss OCSVM

Experiments

Conclusion

Expert Syst. Appl.

Pattern Recognit. Lett.

Pattern Recognit.

Magn. Reson. Imaging

Pattern Recognit.

Neurocomputing

Enhancing one-class support vector machines for unsupervised anomaly detection

Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description

Estimation of the number of endmembers using robust outlier detection method

IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.

Support vector clustering

J. Mach. Learn. Res.

Lof: identifying density-based local outliers

ACM SIGMOD International Conference on Management of Data

Improving PCA-based anomaly detection by using multiple time scale analysis and Kullback-Leibler divergence

Int. J. Commun. Syst.

Probabilistic novelty detection with support vector machines

IEEE Trans. Reliab.

Trading convexity for scalability

Proceedings of the 23rd international conference on Machine learning

Statistical comparisons of classifiers over multiple data sets

J. Mach. Learn. Res.

Nonconvex online support vector machines

IEEE Trans. Pattern Anal. Mach. Intell.