Fuzzy support vector machine based on within-class scatter for classification problems with outliers or noises
Introduction
Support Vector Machine (SVM) [1], [2], developed by V. N. Vapnik, is an important pattern recognition technique based on structural risk minimization (SRM) [2], [3]. It first maps the sample points into a high-dimensional feature space and aims at seeking for an optimal separating hyperplane that maximizes the margin between two classes in this space, where the margin is defined as the sum of the distances of the hyperplane from the closest point of the two classes. Because of its remarkable characteristics such as global minima, good generalization performance and small size of training data, SVM has been successfully applied in many areas, such as face recognition [4], [5], image classification [6], audio classification [7] and time-series prediction [8], just to name a few.
However, the training set is often corrupted by outliers or noises in many practical applications, and equally treating every sample may cause overfitting. Fuzzy support vector machine (FSVM) [9] proposed on the base of SVM can effectively solve this problem. In FSVM, each training sample is associated with a fuzzy membership. Different memberships make different contributions to the learning of decision surface. This can reduce the effect of outliers or noises in the training set to some extent during finding the separating hyperplane. By introducing two memberships for each training sample, Wang et al. [10] presented bilateral-weighted FSVM, which is further extended in [11] based on the vague sets. Abe and Inoue [12] proposed FSVM for multi-class problem, which was the extension from the binary classification problem and was applied to multi-class text categorization [13]. Fuzzy support vector regression was proposed in [14].
Fisher Linear Discriminant Analysis (FLDA) [15], [16] is also prevalent in pattern recognition. Its central idea is to find a linear transformation which maximizes the between-class scatter and minimizes the within-class scatter, in order to separate one class from others. However, its linear classification ability has greatly affected its application. In the sequel, Mika et al. proposed Kernel Fisher Discriminant Analysis (KFDA) [17], [18]. Like SVM, it first maps the samples into some high dimensional feature space using a nonlinear mapping function, and then performs FLDA in this feature space. Without any knowledge of the mapping function, it is presented implicitly by specifying a kernel function as the inner product between each pair of points in the feature space. As one of the standard nonlinear techniques in statistical analysis, KFDA exhibits eminent discriminant power.
As for FSVM, it first applies the fuzzy membership to each sample, and then SVM is reformulated. The normal vector of the optimal hyperplane, in FSVM, is a projection direction having strong ability in discriminant analysis from the perspective of Fisher Discriminant Analysis (FDA). FSVM lays emphasis on maximizing the margin between two classes, which corresponds to the thought of maximum between-class scatter in FDA. However, FSVM ignores an important prior knowledge, the within-class structure. The classifier on combining FSVM with minimum within-class scatter has not come up. The literature [19] proposed Fisher Large Margin Classifier (FLMC), which embedded the within-class structure term into traditional SVM. However, it has two disadvantages. On the one hand, it is only confined to linear classification, which has greatly retarded its application, and on the other hand neglecting the fuzziness results in over-fitting when the outliers or noises exist in the training set. Laplacian Support Vector Machine (LapSVM) [20] added the manifold regularization term to traditional SVM. By using the Representer Theorem, it is finally summed up into a quadratic programming (QP) problem.
In this paper, a new algorithm is proposed to improve FSVM. We call this algorithm FSVM with minimum within-class scatter (WCS-FSVM), which incorporates minimum within-class scatter in FDA into FSVM. WCS-FSVM not only considers the fuzziness of each training sample but also maximizes the margin and minimizes the within-class scatter. By using the Representer Theorem in [20], it can also be reduced to solving a QP problem. In particular, WCS-SVM is given if we do not consider the fuzziness. We systematically evaluate the WCS-SVM and WCS-FSVM on 10 datasets, comparing with SVM and FSVM. The results show that our proposed WCS-FSVM algorithm can not only improve the generalization ability and the classification accuracy but also handle the classification problems with outliers or noises more effectively.
In addition, it is of utmost importance to choose a reasonable fuzzy membership for a given problem. At present, how to define an appropriate fuzzy membership function is still an open problem, and much work has been done on this. The literature [9] defined the membership function based on the Euclidean distance between each sample point and its class center in original space, and [21] defined it in a high-dimensional feature space. However, these two fuzzy membership functions consider the distance between each sample point and its class center merely. Fuzzy membership function [22], [23] referred to both the distance between each sample point and its class center and the affinity among sample points. Several other membership functions [24], [25] have been developed by using the decision value generated by SVM. Like the idea in [22], [23], a new fuzzy membership function for WCS-FSVM is proposed on the base of both the distance between each sample and its class center and the affinity among sample points in this paper. But the difference is that we introduce the two different parameters for the positive class and negative class respectively to measure within-class affinity, and these two parameters need to be set beforehand. Hereon, we use Support Vector Data Description (SVDD) [26] to determine these two parameters. Experimental results show that WCS-FSVM with this new fuzzy membership function can more efficiently reduce the effect of outliers or noises.
This paper presents a new WCS-FSVM algorithm in both linear and nonlinear cases and a new fuzzy membership function. The remainder of this paper is organized as follows: A brief overview of FSVM will be described in Section 2. Section 3 presents WCS-FSVM in the linear case. Section 4 deduces WCS-FSVM in kernel space in detail. In Section 5, a new fuzzy membership function for WCS-FSVM is given. Section 6 gives experimental results. Finally, we conclude the paper in Section 7.
Section snippets
Fuzzy support vector machine
In traditional SVM, each training point is treated equally, namely, each sample point is assumed to belong to one and only one class. However, some training points are more important than others in many real-world classification problems. To solve this problem, Lin [9] originally proposed the theory of FSVM based on traditional SVM. Fuzzy membership associated with each training point is introduced such that different training points make different contributions to the decision surface. The
FSVM based on within-class scatter in linear space
As stated earlier, consider we have a binary classification problem (1). One class contains sample point xi with yi=1, denoted as C1, then . The other class contains such sample point xi with , denoted as , then . Set and , it is clear that and .
The FSVM algorithm introduces the fuzzy membership to each training point, and it also lays emphasis on maximizing the margin between two classes. However,
FSVM based on within-class scatter in feature space
In many real-world applications, a linear classifier seems powerless. In this section, we present an improvement for the FSVM algorithm in feature space. We first map the training points into a high-dimensional feature space H using a nonlinear mapping function . Then, linear WCS-FSVM is performed in H. This can be achieved by solving the following quadratic problem:
As we all know, it is not necessary for the algorithm
A new fuzzy membership function for WCS-FSVM
Different fuzzy membership functions have different influences on the FSVM or WCS-FSVM algorithm, so it is very important for fuzzy algorithm to choose an appropriate membership function. There exist many methods to define membership function, but so far there has not been a universal method to determine it. In many cases, researchers usually build a fuzzy membership based on the Euclidean distance between each sample point and its class center. For instance, the literature [9] defined the
Experiments
To verify the performance of WCS-FSVM proposed in this paper, a series of experiments will be conducted on six benchmarking datasets and four artificial datasets, that is, Ripley dataset [32], Diabetes dataset [33], Australian dataset [33] and German dataset [33], MONK dataset [34] and MONK dataset without noises, XOR dataset and XOR dataset with noises, Ring-shaped dataset [35] and Ring-shaped dataset with noises. All the experiments are performed in Matlab (R2010b) on personal computer, whose
Conclusions
In this paper, we firstly consider the within-class structure in the training dataset and propose an improved FSVM algorithm to learn better from datasets in the presence of outliers or noises. Based on the advantages of FDA and FSVM, we incorporate the within-class scatter in FDA into traditional FSVM, and name this new classifier WCS-FSVM. Based on it, we can easily get WCS-SVM. And it is not difficult to get the conclusion that SVM and FSVM and WCS-SVM are all special instances of the
Acknowledgments
We would like to thank the anonymous reviewers for their comments and suggestions. This work was supported by the Ministry of Science and Technology of China (“863 program”) under contract No.2007AA01Z203, the National Basic Research Program of China (“973 program”) under contract No.2007CB307101, the Fund of Beijing Jiaotong University under contract No.2006XZ002, and the Fundamental Research Funds for the Central Universities under Grant No. 2009JBM021.
Wenjuan An was born in China, in 1985. She received the B.S. Degree in Mathematics from Hebei Normal University, Shijiazhuang, China, in 2007. M.S. degree from Liaoning Normal University, Dalian, China, in 2010. She is pursuing the Ph.D. degree in Institute of information science, Department of Computer Science, Beijing Jiaotong University. Her research interests include pattern recognition and network security.
References (35)
- et al.
Face recognition using independent component analysis and support vector machines
Pattern Recognition Lett.
(2003) - et al.
Support vector domain description
Pattern Recognition Lett.
(1999) - et al.
Support vector networks
Mach. Learn.
(1995) The Nature of Statistical Learning Theory
(1995)An overview of statistical learning theory
IEEE Trans. Neural Networks
(1999)- et al.
Training support vector machines: an application to face detection
Proc. Comp. Vision Pattern Recognition
(1997) - et al.
Support vector machines for histogram-based image classification
IEEE Trans. Neural Networks
(1999) - et al.
Content-based audio classification and retrieval by support vector machines
IEEE Trans. Neural Networks
(2003) - S. Mukherjee, E. Osuna, F. Girosi, Nonlinear prediction of chaotic time series using support vector machines, in:...
- et al.
Fuzzy support vector machines
IEEE Trans. Neural Networks
(2002)
A new fuzzy support vector machine to evaluate credit risk
IEEE Trans. Fuzzy Syst.
Pattern Recognition and Machine Learning
The use of multiple measurements in taxonomic problems
Ann. Eugen.
Cited by (109)
Local dual-graph discriminant classifier for binary classification
2024, NeurocomputingThree-way imbalanced learning based on fuzzy twin SVM[Formula presented]
2024, Applied Soft ComputingA classification method based on a cloud of spheres
2023, EURO Journal on Computational OptimizationFuzzy support vector machine with graph for classifying imbalanced datasets
2022, NeurocomputingUncertainty-aware twin support vector machines
2022, Pattern RecognitionAffinity and transformed class probability-based fuzzy least squares support vector machines
2022, Fuzzy Sets and Systems
Wenjuan An was born in China, in 1985. She received the B.S. Degree in Mathematics from Hebei Normal University, Shijiazhuang, China, in 2007. M.S. degree from Liaoning Normal University, Dalian, China, in 2010. She is pursuing the Ph.D. degree in Institute of information science, Department of Computer Science, Beijing Jiaotong University. Her research interests include pattern recognition and network security.
Mangui Liang is a Professor and Ph.D supervisor in the Institute of Information Science, Department of Computer Science, Beijing jiaotong University, Beijing, China. He has published many papers. His research interests include pattern recognition, speech processing, communication technology, the new generation network technology.