Fuzzy support vector machine based on within-class scatter for classification problems with outliers or noises

doi:10.1016/j.neucom.2012.11.023

Neurocomputing

Volume 110, 13 June 2013, Pages 101-110

https://doi.org/10.1016/j.neucom.2012.11.023 Get rights and content

Abstract

Support vector machine (SVM) is a popular machine learning technique, and it has been widely applied in many real-world applications. Since SVM is sensitive to outliers or noises in the dataset, Fuzzy SVM (FSVM) has been proposed. Like SVM, it still aims at finding an optimal hyperplane that can separate two classes with the maximal margin. The only difference is that fuzzy membership is assigned to each training point based on its importance, which makes it less sensitive to outliers or noises to some extent. However, FSVM ignores an important prior knowledge, the within-class structure. In this paper, we propose a new classification algorithm-FSVM with minimum within-class scatter (WCS-FSVM), which incorporates minimum within-class scatter in Fisher Discriminant Analysis (FDA) into FSVM. The main idea is that an optimal hyperplane is found such that the margin is maximized while the within-class scatter is kept as small as possible. In addition, we propose a new fuzzy membership function for WCS-FSVM. Experiments on six benchmarking datasets and four artificial datasets show that our proposed WCS-FSVM algorithm can not only improve the classification accuracy and generalization ability but also handle the classification problems with outliers or noises more effectively.

Introduction

Support Vector Machine (SVM) [1], [2], developed by V. N. Vapnik, is an important pattern recognition technique based on structural risk minimization (SRM) [2], [3]. It first maps the sample points into a high-dimensional feature space and aims at seeking for an optimal separating hyperplane that maximizes the margin between two classes in this space, where the margin is defined as the sum of the distances of the hyperplane from the closest point of the two classes. Because of its remarkable characteristics such as global minima, good generalization performance and small size of training data, SVM has been successfully applied in many areas, such as face recognition [4], [5], image classification [6], audio classification [7] and time-series prediction [8], just to name a few.

However, the training set is often corrupted by outliers or noises in many practical applications, and equally treating every sample may cause overfitting. Fuzzy support vector machine (FSVM) [9] proposed on the base of SVM can effectively solve this problem. In FSVM, each training sample is associated with a fuzzy membership. Different memberships make different contributions to the learning of decision surface. This can reduce the effect of outliers or noises in the training set to some extent during finding the separating hyperplane. By introducing two memberships for each training sample, Wang et al. [10] presented bilateral-weighted FSVM, which is further extended in [11] based on the vague sets. Abe and Inoue [12] proposed FSVM for multi-class problem, which was the extension from the binary classification problem and was applied to multi-class text categorization [13]. Fuzzy support vector regression was proposed in [14].

Fisher Linear Discriminant Analysis (FLDA) [15], [16] is also prevalent in pattern recognition. Its central idea is to find a linear transformation which maximizes the between-class scatter and minimizes the within-class scatter, in order to separate one class from others. However, its linear classification ability has greatly affected its application. In the sequel, Mika et al. proposed Kernel Fisher Discriminant Analysis (KFDA) [17], [18]. Like SVM, it first maps the samples into some high dimensional feature space using a nonlinear mapping function, and then performs FLDA in this feature space. Without any knowledge of the mapping function, it is presented implicitly by specifying a kernel function as the inner product between each pair of points in the feature space. As one of the standard nonlinear techniques in statistical analysis, KFDA exhibits eminent discriminant power.

As for FSVM, it first applies the fuzzy membership to each sample, and then SVM is reformulated. The normal vector of the optimal hyperplane, in FSVM, is a projection direction having strong ability in discriminant analysis from the perspective of Fisher Discriminant Analysis (FDA). FSVM lays emphasis on maximizing the margin between two classes, which corresponds to the thought of maximum between-class scatter in FDA. However, FSVM ignores an important prior knowledge, the within-class structure. The classifier on combining FSVM with minimum within-class scatter has not come up. The literature [19] proposed Fisher Large Margin Classifier (FLMC), which embedded the within-class structure term into traditional SVM. However, it has two disadvantages. On the one hand, it is only confined to linear classification, which has greatly retarded its application, and on the other hand neglecting the fuzziness results in over-fitting when the outliers or noises exist in the training set. Laplacian Support Vector Machine (LapSVM) [20] added the manifold regularization term to traditional SVM. By using the Representer Theorem, it is finally summed up into a quadratic programming (QP) problem.

In this paper, a new algorithm is proposed to improve FSVM. We call this algorithm FSVM with minimum within-class scatter (WCS-FSVM), which incorporates minimum within-class scatter in FDA into FSVM. WCS-FSVM not only considers the fuzziness of each training sample but also maximizes the margin and minimizes the within-class scatter. By using the Representer Theorem in [20], it can also be reduced to solving a QP problem. In particular, WCS-SVM is given if we do not consider the fuzziness. We systematically evaluate the WCS-SVM and WCS-FSVM on 10 datasets, comparing with SVM and FSVM. The results show that our proposed WCS-FSVM algorithm can not only improve the generalization ability and the classification accuracy but also handle the classification problems with outliers or noises more effectively.

In addition, it is of utmost importance to choose a reasonable fuzzy membership for a given problem. At present, how to define an appropriate fuzzy membership function is still an open problem, and much work has been done on this. The literature [9] defined the membership function based on the Euclidean distance between each sample point and its class center in original space, and [21] defined it in a high-dimensional feature space. However, these two fuzzy membership functions consider the distance between each sample point and its class center merely. Fuzzy membership function [22], [23] referred to both the distance between each sample point and its class center and the affinity among sample points. Several other membership functions [24], [25] have been developed by using the decision value generated by SVM. Like the idea in [22], [23], a new fuzzy membership function for WCS-FSVM is proposed on the base of both the distance between each sample and its class center and the affinity among sample points in this paper. But the difference is that we introduce the two different parameters for the positive class and negative class respectively to measure within-class affinity, and these two parameters need to be set beforehand. Hereon, we use Support Vector Data Description (SVDD) [26] to determine these two parameters. Experimental results show that WCS-FSVM with this new fuzzy membership function can more efficiently reduce the effect of outliers or noises.

This paper presents a new WCS-FSVM algorithm in both linear and nonlinear cases and a new fuzzy membership function. The remainder of this paper is organized as follows: A brief overview of FSVM will be described in Section 2. Section 3 presents WCS-FSVM in the linear case. Section 4 deduces WCS-FSVM in kernel space in detail. In Section 5, a new fuzzy membership function for WCS-FSVM is given. Section 6 gives experimental results. Finally, we conclude the paper in Section 7.

Section snippets

Fuzzy support vector machine

In traditional SVM, each training point is treated equally, namely, each sample point is assumed to belong to one and only one class. However, some training points are more important than others in many real-world classification problems. To solve this problem, Lin [9] originally proposed the theory of FSVM based on traditional SVM. Fuzzy membership associated with each training point is introduced such that different training points make different contributions to the decision surface. The

FSVM based on within-class scatter in linear space

As stated earlier, consider we have a binary classification problem (1). One class contains sample point x_i with y_i=1, denoted as C₁, then $C_{1} = {(x_{i}, y_{i}, s_{i}) | y_{i} = 1, i = 1, 2, \dots, N}$ . The other class contains such sample point x_i with $y_{i} = - 1$ , denoted as $C_{2}$ , then $C_{2} = {(x_{i}, y_{i}, s_{i}) | y_{i} = - 1, i = 1, 2, \dots, N}$ . Set $| C_{1} | = N_{1}$ and $| C_{2} | = N_{2}$ , it is clear that $S = C_{1} \cup C_{2}$ and $N_{1} + N_{2} = N$ .

The FSVM algorithm introduces the fuzzy membership to each training point, and it also lays emphasis on maximizing the margin between two classes. However,

FSVM based on within-class scatter in feature space

In many real-world applications, a linear classifier seems powerless. In this section, we present an improvement for the FSVM algorithm in feature space. We first map the training points into a high-dimensional feature space H using a nonlinear mapping function $φ (x)$ . Then, linear WCS-FSVM is performed in H. This can be achieved by solving the following quadratic problem: $\begin{array}{l} \min \frac{1}{2} | | w | |^{2} + \frac{1}{2} β w^{T} S_{w}^{φ} w + C \sum_{i = 1}^{N} s_{i} ξ_{i} \\ s . t . y_{i} [(w \cdot φ (x_{i})) + b] \geq 1 - ξ_{i}, ξ_{i} \geq 0, i = 1, 2, \dots, N \end{array}$

As we all know, it is not necessary for the algorithm

A new fuzzy membership function for WCS-FSVM

Different fuzzy membership functions have different influences on the FSVM or WCS-FSVM algorithm, so it is very important for fuzzy algorithm to choose an appropriate membership function. There exist many methods to define membership function, but so far there has not been a universal method to determine it. In many cases, researchers usually build a fuzzy membership based on the Euclidean distance between each sample point and its class center. For instance, the literature [9] defined the

Experiments

To verify the performance of WCS-FSVM proposed in this paper, a series of experiments will be conducted on six benchmarking datasets and four artificial datasets, that is, Ripley dataset [32], Diabetes dataset [33], Australian dataset [33] and German dataset [33], MONK dataset [34] and MONK dataset without noises, XOR dataset and XOR dataset with noises, Ring-shaped dataset [35] and Ring-shaped dataset with noises. All the experiments are performed in Matlab (R2010b) on personal computer, whose

Conclusions

In this paper, we firstly consider the within-class structure in the training dataset and propose an improved FSVM algorithm to learn better from datasets in the presence of outliers or noises. Based on the advantages of FDA and FSVM, we incorporate the within-class scatter in FDA into traditional FSVM, and name this new classifier WCS-FSVM. Based on it, we can easily get WCS-SVM. And it is not difficult to get the conclusion that SVM and FSVM and WCS-SVM are all special instances of the

Acknowledgments

We would like to thank the anonymous reviewers for their comments and suggestions. This work was supported by the Ministry of Science and Technology of China (“863 program”) under contract No.2007AA01Z203, the National Basic Research Program of China (“973 program”) under contract No.2007CB307101, the Fund of Beijing Jiaotong University under contract No.2006XZ002, and the Fundamental Research Funds for the Central Universities under Grant No. 2009JBM021.

Wenjuan An was born in China, in 1985. She received the B.S. Degree in Mathematics from Hebei Normal University, Shijiazhuang, China, in 2007. M.S. degree from Liaoning Normal University, Dalian, China, in 2010. She is pursuing the Ph.D. degree in Institute of information science, Department of Computer Science, Beijing Jiaotong University. Her research interests include pattern recognition and network security.

References (35)

O. Deniz et al.
Face recognition using independent component analysis and support vector machines
Pattern Recognition Lett.
(2003)
D. Tax et al.
Support vector domain description
Pattern Recognition Lett.
(1999)
C. Cortes et al.
Support vector networks
Mach. Learn.
(1995)
V.N. Vapnik
The Nature of Statistical Learning Theory
(1995)
V.N. Vapnik
An overview of statistical learning theory
IEEE Trans. Neural Networks
(1999)
E. Osuna et al.
Training support vector machines: an application to face detection
Proc. Comp. Vision Pattern Recognition
(1997)
O. Chapelle et al.
Support vector machines for histogram-based image classification
IEEE Trans. Neural Networks
(1999)
G. Guo et al.
Content-based audio classification and retrieval by support vector machines
IEEE Trans. Neural Networks
(2003)
S. Mukherjee, E. Osuna, F. Girosi, Nonlinear prediction of chaotic time series using support vector machines, in:...
C.F. Lin et al.
Fuzzy support vector machines
IEEE Trans. Neural Networks
(2002)

Y. Wang et al.

A new fuzzy support vector machine to evaluate credit risk

IEEE Trans. Fuzzy Syst.

(2005)

Y.Y. Hao, Z.X. Chi and D.Q. Yan, Fuzzy support vector machine based on vague sets for credit assessment, in Proceedings...

S. Abe and T. Inoue, Fuzzy support vector machines for multiclass problems, in Proceedings of the Tenth European...

T.Y. Wang and H.M. Chiang, Fuzzy support vector machine for multi-class text categorization. Inf. Process. Manage., 43,...

Z.H. Sun, Y.X. Sun, Fuzzy support vector machine for regression estimation, IEEE International Conference on Systems,...

C.M. Bishop

Pattern Recognition and Machine Learning

(2006)

R.A. Fisher

The use of multiple measurements in taxonomic problems

Ann. Eugen.

(1936)

Cited by (109)

Local dual-graph discriminant classifier for binary classification
2024, Neurocomputing
Graph-based methods mine the potential structural information of data by constructing various graphs that positively affect the classifiers when dealing with classification problems. However, traditional graph-based classifiers are the most common single-graph classifiers and minimize only intra-class compactness, where inter-class separability is replaced by other factors. To consider real inter-class separability, we introduce a novel local dual-graph structure that can fully mine the geometric distribution of data by simultaneously maximizing the inter-class separability and minimizing the intra-class compactness. This local dual-graph structure reflects the relationship between samples and their neighbors and hence avoids the negative impact of outliers on the construction of graphs. Furthermore, a novel classifier called the local dual-graph discriminant classifier (LDGDC) is proposed using a local dual-graph structure. Originally, LDGDC is designed to perform the following optimization: minimization of the 2-norm regularization of model coefficients and intra-class compactness, and maximization of the inter-class separability, which is a non-convex optimization problem. To facilitate the solution, we transform the original non-convex problem of LDGDC into a convex problem. Finally, experiments were conducted on several public datasets, and the results demonstrate the effectiveness and robustness of the proposed LDGDC.
Three-way imbalanced learning based on fuzzy twin SVM[Formula presented]
2024, Applied Soft Computing
Three-way decision (3WD) is a powerful tool for granular computing to deal with uncertain data, commonly used in information systems, decision-making, and medical care. Three-way decision gets much research in traditional rough set models. However, three-way decision is rarely combined with the currently popular field of machine learning to expand its research. In this paper, three-way decision is connected with SVM, a standard binary classification model in machine learning, for solving imbalanced classification problems that SVM needs to improve. A new three-way fuzzy membership function and a new fuzzy twin support vector machine with three-way membership (TWFTSVM) are proposed. The new three-way fuzzy membership function is defined to increase the certainty of uncertain data in both input space and feature space, which assigns higher fuzzy membership to minority samples compared with majority samples. To evaluate the effectiveness of the proposed model, comparative experiments are designed for forty-seven different datasets with varying imbalance ratios. In addition, datasets with different imbalance ratios are derived from the same dataset to further assess the proposed model’s performance. The results show that the proposed model significantly outperforms other traditional SVM-based methods.
A classification method based on a cloud of spheres
2023, EURO Journal on Computational Optimization
In this article we propose a binary classification model to distinguish a specific class that corresponds to a characteristic that we intend to identify (fraud, spam, disease). The classification model is based on a cloud of spheres that circumscribes the points of the class to be identified. It is intended to build a model based on a cloud and not on a disjoint set of clouds, establishing this condition on the connectivity of a graph induced by the spheres. To solve the problem, designed by a Cloud of Connected Spheres, a quadratic model with continuous and binary variables (MINLP) is proposed with the minimization of the number of spheres. The issue of connectivity implies in many models the imposition of an exponential number of constraints. However, because of the specific conditions of the problem under study, connectivity is enforced with linear constraints that scale quadratically with K, which serves as an upper bound on the number of spheres. This classification model is effective when the structure of the class to be identified is highly non-linear and non-convex, also adapting to the case of linear separation. Unlike neural networks, the classification model is transparent, with the structure perfectly identified. No kernel functions are used and it is not necessary to use meta-parameters unless it is intended also to maximize the separation margin as it is done in SVM. Finding the global optima for large instances is quite challenging, and to address this, a heuristic is proposed. The heuristic demonstrates nice results on a set of frequently tested real problems when compared to state-of-the-art algorithms.
Fuzzy support vector machine with graph for classifying imbalanced datasets
2022, Neurocomputing
Since support vector machine (SVM) considers all the training samples equally, it suffers from the problems of noise/outliers and class imbalance. Although many fuzzy support vector machines (FSVMs) have been proposed to suppress the effect of noise/outliers and class imbalance, most of them ignore the impact of the curse of dimensionality on the discriminative performance of fuzzy membership function and do not give the fuzzy membership function corresponding to the kernel space, which seriously reduces the performance of FSVM. To solve these problems, we propose the fuzzy support vector machine with graph (GraphFSVM) in this paper. Specifically, we first design a graph-based fuzzy membership function to accurately assess the importance of samples in original feature space and prove that the function can mine discriminative information between samples in high-dimensional data. Additionally, since the data distribution in kernel space is different from those in the original feature space, a method is provided to calculate the fuzzy membership function in the kernel space. Finally, the GraphFSVM model analyzes samples of each class independently, this suppresses the effect of class imbalance. Following the above principles, we design the graph-based fuzzy support vector machine and propose a detailed optimization method. Experimental results on UCI, gene expression, and image datasets show that the GraphFSVM has better generalization and robustness than other state-of-the-art methods.
Uncertainty-aware twin support vector machines
2022, Pattern Recognition
There exist uncertain data in the real world due to some factors such as imprecise measurements and noise. Unlike deterministic data, the features of samples in uncertain data are often described by interval numbers or random vectors with probability density functions. In this paper we propose novel twin support vector machines (TSVMs) to handle uncertain data. In the proposed models which are referred to as uncertainty-aware TSVMs, each uncertain sample is modeled as a random vector with Gaussian distributions. To deal with the multi-dimensional integrals in the original models, we derive an interesting and important theorem which helps us transform the original models into the model involving one-dimensional integrals. The simplification of models makes the optimization problem tractable and the simplified models are solved by using the quasi-Newton optimization algorithm. The proposed decision rule allows us to classify uncertain samples with means and covariance matrices. In addition, we extend the proposed models to their kernel versions to capture the nonlinear structure of uncertain data. Experiments on a series of data sets have been performed to demonstrate that the proposed models gain better classification performance than some existing algorithms, especially for representing uncertain cross-plane problems.
Affinity and transformed class probability-based fuzzy least squares support vector machines
2022, Fuzzy Sets and Systems
Inspired by the generalization efficiency of affinity and class probability-based fuzzy support vector machine (ACFSVM), a pair of class affinity and nonlinear transformed class probability-based fuzzy least squares support vector machine approaches is proposed. The proposed approaches handle the class imbalance problem by employing cost-sensitive learning, and by utilizing the samples' class probability determined using a novel nonlinear probability equation that adjusts itself with class size. Further, the sensitivity to outliers and noise is reduced with the help of each sample's affinity to its class obtained with the help of least squares one-class support vector machine. The first proposed approach incorporates fuzzy membership values, computed using transformed class probability and class affinity, into the objective function of LS-SVM type formulation, and introduces a new cost sensitive term based on the class cardinalities to normalize the effect of the class imbalance problem. The inherent noise and outlier sensitivity of the quadratic least squares loss function of the first approach is further reduced in the second proposed approach by truncating the quadratic growth of the loss function at a specified score. Thus, the concerns due to noise and outliers are further handled at the optimization level. However, the employed truncated loss function of the second approach takes a non-convex structure, which in turn, is resolved using ConCave-Convex Procedure (CCCP) for global convergence. Numerical experiments on artificial and real-world datasets of different imbalance ratio establish the effectiveness of the proposed approaches.

View all citing articles on Scopus

Mangui Liang is a Professor and Ph.D supervisor in the Institute of Information Science, Department of Computer Science, Beijing jiaotong University, Beijing, China. He has published many papers. His research interests include pattern recognition, speech processing, communication technology, the new generation network technology.

View full text

Fuzzy support vector machine based on within-class scatter for classification problems with outliers or noises

Abstract

Introduction

Section snippets

Fuzzy support vector machine

FSVM based on within-class scatter in linear space

FSVM based on within-class scatter in feature space

A new fuzzy membership function for WCS-FSVM

Experiments

Conclusions

Acknowledgments

Pattern Recognition Lett.

Pattern Recognition Lett.

Support vector networks

Mach. Learn.

The Nature of Statistical Learning Theory

An overview of statistical learning theory

IEEE Trans. Neural Networks

Training support vector machines: an application to face detection

Proc. Comp. Vision Pattern Recognition

Support vector machines for histogram-based image classification

IEEE Trans. Neural Networks

Content-based audio classification and retrieval by support vector machines

IEEE Trans. Neural Networks

Fuzzy support vector machines

IEEE Trans. Neural Networks

A new fuzzy support vector machine to evaluate credit risk

IEEE Trans. Fuzzy Syst.

Pattern Recognition and Machine Learning

The use of multiple measurements in taxonomic problems

Ann. Eugen.