Elsevier

Information Sciences

Volume 545, 4 February 2021, Pages 82-101
Information Sciences

Correntropy-based metric for robust twin support vector machine

https://doi.org/10.1016/j.ins.2020.07.068Get rights and content

Highlights

  • Propose a robust distance metric based on correntropy.

  • A robust twin SVM is built with the proposed metric.

  • The metric satisfies the conditions of distance metric.

  • Demonstrate important properties for the metric.

  • Experiments show the robustness of the proposed method.

Abstract

This work proposes a robust distance metric that is induced by correntropy based on Laplacian kernel. The proposed metric satisfies the properties that distance metric must have. Moreover, we demonstrate important properties of the proposed metric such as robustness, boundedness, non-convexity and approximation behaviors. The proposed metric includes and extends the traditional metrics such as L0-norm and L1-norm metrics. Following that we apply the proposed metric to twin support vector machine classification (TSVM), and then a new robust TSVM algorithm (called RCTSVM) is built to reduce the influence of noise and outliers. The proposed RCTSVM inherits the advantages of TSVM and improves the robustness. However, the non-convexity of the proposed model makes it difficult to optimize. A continuous optimization method is developed to solve the RCTSVM. The problem is converted into difference of convex (DC) programming, and the corresponding DC algorithm (DCA) converges linearly. Compared with the traditional algorithms, numerical experiments under different noise setting and evaluation criteria show that the proposed RCTSVM has robustness to noise and outliers in most cases, which demonstrates the feasibility and effectiveness of the proposed method.

Introduction

The traditional support vector machine (SVM) [1], [2], [3] has been successfully applied because of its solid theoretical foundation and good generalization. SVM is characterized by good properties such as kernel skill, sparsity and global solutions. However, there are two main challenges in the original SVM. First, the primal SVM needs to solve a large scale quadratic programming problem (QPP), which leads to higher computational complexity of O(m3), where m is the number of the samples. The drawback hinders the application to large-scale classification tasks. Second, when the linear independence condition is not satisfied, it is difficult to achieve satisfying performance only by a single line classification hyperplane or two parallel classification hyperplanes [4].

To address these problems, Jayadeva et al. [5] proposed a twin support vector machine (TSVM) algorithm that is driven by generalized eigenvalue proximal support vector machine (GEPSVM) [2] to construct two nonparallel hyperplanes so that each hyperplane is closer to this class points and as far as possible from another class of data points. With good generalization, TSVM only requires to solve two small-scale QPPs, and its computational complexity is approximately four times faster than the standard SVM. Further, TSVM can handle complex data classification problems such as cross data (see 1), and multi-classification problems [6], which are divided into several binary classification problems by different strategies. Don et al. [7] extended an efficient multi-classification support vector machine (DCSVM) that is a divide and conquer algorithm relying on data sparsity in high dimensional space. In the past decades, TSVM has been extensively developed by many researchers. For example, Kumar et al. [8] proposed a least squares twin support vector machine (LSTSVM), which uses equality constraints rather than inequality constraints, and then the problems are finally transformed into solving two linear equations. Tian [9] proposed a nonparallel SVM (NPSVM) by introducing -insensitive loss function instead of quadratic loss function. Due to hinge loss is prone to noise sensitivity and resampling instability, pinball SVM [10] has been proposed by using quantile distances that are less sensitive to noise. Combining the idea of intuitionistic fuzzy number and TSVM, an intuitionistic fuzzy twin support vector machine (IFTSVM) [11] has been presented to reduce the noise created by the pollutant inputs. Liu et al. [12] designed a joint L2,1-norm minimizing regularization with nonparallel twin support vector machine (TWSVML2,1) for feature selection. In order to overcome the disadvantage of slow training process, Wu et al. [13] proposed a safe screening rule to eliminate redundant points and reduce the scale. Huang et al. [14] introduced a sparse and heuristic SVM (SH-SVM) to fuse different single-hidden layer feedforward networks of feature mapping to improve the generalization performance. Gu et al. [15] reformulated an extreme vector machine (EVM) for fast training on large data. Based on wavelet transform, Wang et al. [16] developed a weighted v-twin support vector regression (WTWTSVR), which can avoid the problem of over-fitting and yield great generalization ability. In recent years an effective methodology which uses uncertainty modeling to improve the classifier generalization ability has been proposed [17]. This type of techniques supported theoretically by uncertainty information theory have been further applied in active learning by considering the diversity [18] and extended to pattern discovering by considering the data complexity [19]. Besides the aforementioned algorithms, projection twin support vector machine learning algorithms [20], [21] have been proposed by minimization of variance.

Robustness is an important spotlight in data mining and machine learning fields, which can guarantee the model stability and reduce the impact of noise. In practical applications, data is inevitably polluted by various noise and outliers during the process of acquisition and transmission. Therefore, robustness gains substantial attention in both theory and application [22], [23]. Although the data is slightly different, the stability makes the trained classifier also have higher accuracy in the test. The L2-norm distance measure is a popular choice [5], [8], [10], while L2-norm is sensitive to outliers, since square operation is easy to exaggerate the effect of outliers with unbounded derivative. To alleviate the effect of noise and outliers, the L1-norm metric with bounded derivative (except for the origin point) has been proposed and widely used in robust learning [24], [25], [26]. The L1-norm is less sensitive than the L2-norm [27], and L1-norm metric has been applied successfully to mitigate the effect of noise and outliers. Based on L1-norm metric and twin support vector machine, Ye et al. [25] developed a new robust k-plane clustering method. Wang et al. [26] used a capped L1-norm to obtain robust classifiers. In literature [28], L1-norm metric has been introduced into twin support vector regression (L1-TSVR) to analyze robust data with outliers. However, the L1-norm metric may not be effective enough to handle a large number of outliers [29] because of unboundedness of L1-norm metric.

Correntropy is a generalized local similarity measure defined in kernel space, and its theoretical basis are information theory and kernel method [30]. Correntropy characterizes the local similarity of two random variables and can effectively attenuate the effect of large outliers, which has been widely used in robust learning [31], [32], [33], [34]. Recently, correntropy has been also applied to metric learning [33], [35]. Chen et al. [33] introduced correntropy-induced loss into deep learning for auto-encoders. He et al. [34] proposed a maximum correntropy adaptation approach for robust compressive sensing to reconstruct sparse signals from noise measurements. Based on correntropy-induced metric constraint, an adaptive filter [35] has been developed to improve the robustness. In correntropy learning, Gaussian kernel is usually chosen as a kernel function due to the smoothness and strict positive definiteness. However, Gaussian kernel is not always most suitable choice for different problems.

Inspired by the above researches, we develop a new robust correntropy-induced metric (CIM) based on the Laplacian kernel function κα(x,y)=e-α|x-y|. Then we propose a robust twin support machine classification algorithm (RCTSVM) based on the CIM. The main work of this paper are summarized as follows:

  • (1) A robust metric function is proposed based on correntropy and kernel learning. Following that we demonstrate that the proposed metric is a correntropy-induced metric (CIM) and has important properties: boundedness, non-convexity and approximation behaviors. Especially, the proposed CIM approximates the L0-norm and L1-norm under different approximation behaviors respectively, which indicates that the proposed metric includes and extends the traditional metrics.

  • (2) The robustness of the proposed CIM is analyzed from M-estimation theory [36]. Moreover, the proposed CIM satisfies the properties of distance metric such as non-negativity, symmetry and triangle inequality. We discuss that the proposed CIM contains first and higher order moments rich information of samples.

  • (3) With the proposed correntropy-induced metric (CIM), a novel robust TSVM algorithm (called RCTSVM) is proposed for classification. However, the non-convexity of the proposed CIM makes it difficult to optimize. A continuous optimization method is developed to solve the proposed model. By properly decomposing the proposed CIM, the proposed RCTSVM is transformed into difference of convex functions programming (DC programming) [37], [38]. The resulting DC optimization algorithm (DCA) linearly converges.

  • (4) Numerical experiments on both artificial datasets and UCI datasets under different noise levels and different evaluation criteria show that the proposed RCTSVM achieves better performance than classical and excellent methods in most cases, which validates the effectiveness and robustness of the proposed RCTSVM.

The rest of this paper is organized as follows. The relevant background information concerning TSVM, correntropy theory and DC programming are in Section 2. The detailed description and theoretical analysis of the proposed RCTSVM is given in Section 3. All the experimental results on both artificial datasets and UCI datasets are shown in Section 4 and conclusions are given in Section 5.

Section snippets

Twin support vector machine

Consider using TSVM to achieve binary classification tasks. The training set T={(xi,yi),xiRn,yi{+1,-1},i=1,,m} is composed of m samples, which contain m1 positive class examples and m2 negative class samples as well as m=m1+m2. Moreover, matrix ARm1×n and matrix BRm2×n represent the input samples belonging to positive class and negative class, respectively. The target of TSVM are to generate a pair of non-parallel hyperplanes, each of the hyperplane is as close as possible to one of the

Robust twin support machine with correntropy-induced metric

For binary classification problems, classical TSVM seeks the classification hyperplanes based on the L2 norm distance to complete the division of positive class and negative class. However, the square operation exaggerates the effect of noise and outliers, especially when there are outliers in the datasets. In this section, we develop a new robust correntropy-induced metric with Laplacian kernel, and propose a robust twin support vector machine (RCTSVM).

Experimental results and discussion

In this section, to evaluate the effectiveness and feasibility of the proposed algorithms, numerical experiments are performed on both artificial datasets and UCI datasets [43]. Classical methods include TSVM [5], least square TSVM (LSTSVM) [8], nonparallel SVM (NPSVM) [9], generalized eigenvalue proximal support vector machine with L1-norm (L1-GEPSVM) [24] and robust capped L1-norm twin support vector machine (CTSVM) [26] are compared. In order to ensure the objectivity of the experiments, all

Conclusions

We propose a novel robust distance metric based on correntropy and kernel learning. As a similarity measurement, the proposed metric satisfies the properties that metric must have. Some important properties of the proposed metric are demonstrated such as robustness, boundedness, non-convexity and smoothness. Moreover, we analyze its asymptotic behaviors under different parameter approximation. As described, the proposed metric includes and extends the traditional metrics such as L0-norm and L1

CRediT authorship contribution statement

Chao Yuan: Writing - original draft, Software. Liming Yang: Conceptualization, Methodology, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by National Nature Science Foundation of China (11471010, 11271367) and Chinese Universities Scientific Fund. Moreover, the authors thank the referees and the editors. Their suggestions improved the paper significantly.

References (46)

  • D.R. Don et al.

    DCSVM: fast multi-class classification using support vector machines

    Int. J. Mach. Learn. Cybern.

    (2018)
  • M.A. Kumar et al.

    Least squares twin support vector machines for pattern classification

    Expert Syst. Appl.

    (2009)
  • Y. Tian et al.

    Nonparallel support vector machines for pattern classification

    IEEE Trans. Cybern.

    (2014)
  • Y. Xu et al.

    A novel twin support-vector machine with pinball loss

    IEEE Trans. Neural Networks Learn. Syst.

    (2017)
  • S. Rezvani et al.

    Intuitionistic fuzzy twin support vector machines

    IEEE Trans. Fuzzy Syst.

    (2019)
  • X. Liu et al.

    Mass classification of benign and malignant with a new twin support vector machine joint l2,1-norm

    Int. J. Mach. Learn. Cybern.

    (2019)
  • W. Wu et al.

    Accelerating improved twin support vector machine with safe screening rule

    Int. J. Mach. Learn. Cybern.

    (2019)
  • J. Huang et al.

    Sparse and heuristic support vector machine for binary classifier and regressor fusion

    Int. J. Mach. Learn. Cybern.

    (2019)
  • X. Gu et al.

    Extreme vector machine for fast training on large data

    Int. J. Mach. Learn. Cybern.

    (2020)
  • L. Wang et al.

    Wavelet transform-based weighted v-twin support vector regression

    Int. J. Mach. Learn. Cybern.

    (2020)
  • X. Wang et al.

    A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning

    IEEE Trans. Fuzzy Syst.

    (2015)
  • R. Wang et al.

    Incorporating diversity and informativeness in multiple-instance active learning

    IEEE Trans. Fuzzy Syst.

    (2017)
  • X. Wang et al.

    Discovering the relationship between generalization and uncertainty by incorporating complexity of classification

    IEEE Trans. Cybern.

    (2018)
  • Cited by (0)

    View full text