Elsevier

Pattern Recognition Letters

Volume 85, 1 January 2017, Pages 15-20
Pattern Recognition Letters

Ramp Loss based robust one-class SVM

https://doi.org/10.1016/j.patrec.2016.11.016Get rights and content

Highlights

  • A new method is proposed to reduce outliers’ influence on OCSVM model.

  • Ramp Loss function is introduced into OCSVM optimization.

  • An iterative algorithm is proposed to solve this new OCSVM optimization.

  • The outliers are first identified and then removed from the training set.

Abstract

One-class SVM (OCSVM) is widely adopted in one-class classification (OCC) fields. However, outliers in the training set negatively influence the classification surface of OCSVM, degrading its performance. To solve this problem, a novel method is proposed in this paper. This proposed method introduces Ramp Loss function into OCSVM optimization, so as to reduce outliers’ influence. Then the outliers are identified and removed from the training set. The final classification surface is obtained on the remaining training samples. Various experiments verify the effectiveness of this proposed method.

Introduction

In many practical manufacturing processes, it is convenient to obtain plenty of target data in the normal operating condition, whereas the non-target data in the abnormal operating condition is too difficult to characterize or too expensive to obtain. In this case, general classification methods are no longer suitable. To achieve better classification performance, one-class classification (OCC) methods can be adopted to describe the target data, which is usually the case in anomaly detection [6], [18], [24] and novelty detection [5], [7].

Among the OCC methods, one-class SVM (OCSVM) avoids the estimation of the distribution density of the target class, is able to deal with nonlinear data, and inherits the sparseness from SVM. Therefore it is widely used in OCC application fields [12], [14], [19], [23]. However, there are some drawbacks in OCSVM. Because some training samples are allowed to be located outside the classification surface, outliers are apt to become support vectors (SVs), negatively influencing the surface.

In order to reduce the outliers’ influence on OCC method, researchers have tried to remove the outliers by data preprocessing methods. Tax and Duin [25] propose to use the distances of a sample to its k nearest neighbors to detect and remove outliers. Breunig et al. [4] try to assign to each sample in the data set a degree of being an outlier by estimating its local density, and the degree is called LOF. Zheng et al. [30] use the LOF to filter the raw samples to remove outliers. Khan et al. [16] and Andreou and Karathanassi [2] calculate the inter quartile range (IQR) of the training samples, and it provides a means to indicate the boundary beyond which the samples will be labeled as outliers and be removed. However, due to the diversity of sample distribution, it is not easy to remove outliers through the data preprocessing.

In OCSVM, researchers have tried to modify the OCSVM method to improve its robustness to outliers. Yin et al. [28] propose to weigh training samples according to their distances to the sample center in the feature space, hoping to alleviate the penalty on outliers, so as to make them more likely to be located outside the classification surface. This method is hereafter referred to as weight OCSVM. However, due to the absence of prior knowledge of outlier distribution, it is difficult to weigh samples properly, and thus the results of this method are not satisfactory. Different from the above method, which contains two steps: weighing samples and training OCSVM, Amer et al. [1] propose to identify outliers while optimizing the OCSVM hyper-plane. This method is hereafter referred to as eta OCSVM. They modified the OCSVM optimization object by introducing 0-1 variables ηi, which indicate whether xi is an outlier. The introduction of these discrete variables makes it more difficult to solve the optimization problem. They relax this prime optimization problem and thus put forward an iterative algorithm. In each iteration, the samples with ηi=1 are constrained to be located above the hyper-plane, and those with ηi=0 are excluded from this training stage. If the ηi does not indicate outliers correctly in some iteration, then the result of this iteration will be negatively influenced by outliers, and the subsequent results will also be influenced. Therefore, this method does not perform as expected.

In this paper, we propose a novel method to reduce the influence of outliers. First, a surface enclosing the cluster core [3] of the target sample distribution is learned from the outlier contaminated training sample set. The cluster core refers to the samples located at the core of the target sample cluster, i.e. the more representative target samples. This part is done by the proposed method named “Ramp Loss OCSVM”, where the Ramp Loss function is used to replace the Hinge Loss function to avoid outliers from becoming support vectors, such that the surface enclosing cluster core can be obtained. Next, this surface is used to identify outliers and remove them from the training set, so as to train the final OCSVM. Combining the above two parts, the whole algorithm is formed, ROCSVM.

The remainder of this paper is arranged as follows: the second section states the basic idea of the proposed method; the third section proposes the Ramp Loss OCSVM problem; the fourth section gives the algorithm of the proposed method; the fifth section compares the proposed method with other relevant methods on outlier contaminated data sets; the sixth section concludes this paper.

Section snippets

The basic idea of the proposed method

We first review conventional OCSVM briefly. The basic idea of OCSVM is to find a hyper-plane w,φ(x)ρ=0 in the feature space that separates sample images from the origin with maximum margin. The primal optimization problem is as follows [22], minw,ξ,ρ12w2ρ+1υni=1nξis.t.w,φ(xi)ρξi,ξi0.where xi denote training samples, n is the total number of training samples, υ is a trade-off parameter and ξi are slack variables. This problem is a convex optimization and can be solved by its dual

Ramp Loss OCSVM

Similar to OCSVM, the binary-class and multi-class SVM are also influenced by the outliers in training sets [21], [26]. To solve this problem for SVM, researchers propose to use Ramp Loss function [8], [10], [15], [27]. Inspired by this, we propose to introduce Ramp Loss function into OCSVM, attempting to reduce outliers’ negative influence.

To facilitate the subsequent discussion, the conventional OCSVM is rewritten as follows. minw,ρJ(w,ρ)=12w2ρ+1nυi=1nHρ(zi)where zi=w,φ(xi). The slack

Solve Ramp Loss OCSVM

Different from Hinge Loss, Ramp Loss is not a convex function. Its introduction into the OCSVM causes that the problem is no longer a convex optimization. This problem should be solved in a new way. It is found that the Ramp Loss function is actually the difference between two Hinge Loss functions. That is, Rρ,r(zi)=Hρ(zi)Hrρ(zi), as shown in Fig. 3. Based on this relation of the two loss functions, the optimization problem in Eq. (5) can be rewritten as follows. minw,ρJ(w,ρ)=12w2ρ+1nυi=1nH

Experiments

In this section, we compare the proposed ROCSVM with conventional OCSVM, weight OCSVM [28] and eta OCSVM [1]. By experiments on a 2D data set and UCI benchmark data sets, these methods are compared on OCC performance.

Conclusion

In this paper, we have discussed the problem that conventional OCSVM is negatively influenced by outliers. The reason lies in the fact that the penalty inflicted by Hinge Loss function on the samples located outside the classification surface is unbounded. To solve this problem, we propose the ROCSVM method, in which the Ramp Loss function is introduced to identify and then remove the outliers. As shown by experiments, compared to conventional OCSVM and other relevant methods, the proposed

References (30)

  • C. Callegari et al.

    Improving PCA-based anomaly detection by using multiple time scale analysis and Kullback-Leibler divergence

    Int. J. Commun. Syst.

    (2014)
  • L. Clifton et al.

    Probabilistic novelty detection with support vector machines

    IEEE Trans. Reliab.

    (2014)
  • R. Collobert et al.

    Trading convexity for scalability

    Proceedings of the 23rd international conference on Machine learning

    (2006)
  • J. Demsar

    Statistical comparisons of classifiers over multiple data sets

    J. Mach. Learn. Res.

    (2006)
  • S. Ertekin et al.

    Nonconvex online support vector machines

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2011)
  • Cited by (37)

    • A non-convex semi-supervised approach to opinion spam detection by ramp-one class SVM

      2020, Information Processing and Management
      Citation Excerpt :

      Among the aforementioned techniques, both models of one class SVM have demonstrated superior results in real-world problems, namely in fault detection (Yingchao Xiao, Wang, Xu & Zhou, 2016; Yin, Zhu & Jing, 2014), anomaly detection (S.M.H. Bamakan, Amiri, Mirzabagheri & Shi, 2015; Erfani, Rajasegarar, Karunasekera & Leckie, 2016; Tian, Mirzabagheri, Bamakan, Wang & Qu, 2018), activation detection (Yang et al., 2010), novelty detection (Sadooghi & Khadem, 2018), and process monitoring (Lee & Kim, 2018; J. Wang, Liu, Qiu, Yu & Zhao, 2018). Although the effectiveness of Ramp-OCSVM is discussed in different studies including (S.M.H. Bamakan et al., 2017; Huang, Shi & Suykens, 2014b; Liu et al., 2015; Tian et al., 2018; Xiao et al., 2017), its performance in dealing with a complex and challenging task of opinion spam detection has not been examined. Below we briefly introduce the formulation of Ramp loss one-class SVM.

    View all citing articles on Scopus
    View full text