Elsevier

Neurocomputing

Volume 86, 1 June 2012, Pages 52-58
Neurocomputing

Feature extraction using fuzzy maximum margin criterion

https://doi.org/10.1016/j.neucom.2011.12.031Get rights and content

Abstract

In pattern recognition, feature extraction techniques are widely employed to reduce the dimensionality of date. In this paper, a novel feature extraction criterion, fuzzy maximum margin criterion (FMMC), is proposed by means of the maximum margin criterion (MMC) and fuzzy set theory. More specifically, the between-class and within-class fuzzy scatter matrices are redefined by incorporating the membership degrees of samples which relates the samples distribution information; then the feature extraction criterion maximized the average margin between classes after dimensionality reduction is applied. Furthermore, we utilize the generalized singular value decomposition (GSVD) to the criterion, which make the algorithm more effective; for nonlinear separated problems, we extend the kernel extension of FMMC with positive definite kernels. The effective of the novel criterion for linear and nonlinear separated problems is illustrated by experiments.

Introduction

In pattern recognition, feature extraction techniques are widely employed to reduce the dimensionality of date. Principal component analysis (PCA) [1] and linear discriminant analysis (LDA) [2] are two most popular linear subspace learning methods to reduce the dimensionality of date. PCA is an unsupervised learning algorithm, which performs dimensionality reduction by projecting the original m-dimensional data on to the l(lm)-dimensional linear subspace spanned by the leading eigenvectors of the data's covariance matrix. While LDA is a supervised learning algorithm, which searches the projection axes on which the data points of different classes are far from each other while requiring data points of the same class to be close to each other. Therefore, LDA encodes discriminating information in a linearly separable space. However, a drawback of LDA is that it cannot be directly applied when the scatter matrix Sw or St is singular due to the small sample size problems. In the past, many LDA extensions have been developed to deal with this problem, such as, pseudo-inverse LDA (PLDA) [3], regular LDA (RLDA) [4], penalized discriminant analysis (PDA) [5], LDA/GSVD [6], LDA/QR [7], orthogonal LDA (OLDA) [8], null space LDA (NLDA) [9], direct-LDA (D-LDA) [10], CLDA [11] and two-stage LDA [12].

However, a drawback of the presented methods is that the optimality criteria defined based on the scatter matrices does not relate directly to the samples distribution information. Specially, for multi-class classification problems involving more then two class, it overemphasized the classes with large distance in the input space and caused large overlaps of neighboring classes in the output space. To tackle this problem, a weighted function is incorporated into the Fisher criterion by giving higher weights to classes that are closer together in the input space as they are likely to lead to misclassification. The samples distribution information is represented by the weighted function, such methods as the fractional-step LDA (F-LDA) in [13], a subspace can be found in which the classification accuracy is higher than that obtained using LDA; the weighted pairwise fisher criteria for multi-class in [14] by incorporating a weighted function to modify the fisher criterion such that it is more closely related to the samples distribution information; the uncorrelated weighted linear discriminant analysis (UWLDA) in [15] introduced a weighted function to restrain the dominant role of the classes with larger distance; the direct weighted LDA (DW-LDA) in [16] combined the advantages of D-LDA and the weighted pairwise fisher criteria. However, the weighted function in [13], [14], [15], [16] is hard to determine, therefore, fuzzy pattern recognition based on fuzzy set theory have been carried out, in which the samples distribution information is represented by the membership degrees corresponding to every class, such as Kwak et al. [17] proposed a fuzzy Fisherfaces face recognition via fuzzy set; Yang et al. [18] presented a fuzzy inverse fisher discriminant analysis (FIFDA) by means of the inverse fisher discriminant criterion and fuzzy set theory. In the methods [17], [18], a membership degree matrix is calculated using FKNN [19], then the membership degrees are incorporated to redefine fuzzy scatter matrices. However, a drawback of these methods is that it still cannot be directly applied when the redefined fuzzy scatter matrix SFw or SFt is singular due to the small sample size problems.

Inspired by ideas in [17], [18], [20], [21], in the paper, we will propose a novel feature extraction criterion, FMMC, by means of MMC and fuzzy set theory; based on the kernel tricks in [22], [23], [24], we will extend the kernel fuzzy maximum margin criterion (KFMMC) model with positive definite kernels for the nonlinear separated problems. Moreover, we will apply the GSVD algorithm to the classical MMC criteria and evaluate its effectiveness by experiments. In contrast to the conventional optimality criterion, FMMC can be applied directly to the small sample size problems and related directly to the samples distribution information by the membership degrees of samples. The rest of the paper is organized as follows: we review the classical MMC in Sections 2; we apply the GSVD algorithm to the classical MMC in Section 3; in Section 4, we propose FMMC for linear separated problems and KFMMC for nonlinear separated problems; in Section 5, experiments are presented to demonstrate the effectiveness of FMMC and KFMMC; conclusions are summarized in Section 6.

Section snippets

Maximum margin criterion

In this section, we give an overview of the classical MMC. For the given pattern samples data set T={(x1,y1),,(xn,yn)}X×Y, X={x1,,xn}Rm is the input data set and Y={C1,,Cc} is the class label set. In order to reduce the dimensionality of a new sample xRm, some measures need be employed to assess similarity or dissimilarity. We want to find a linear transformation WT:RmRl such that the similarity/dissimilarity is preserved in the reduced-dimensional space.

Let the input data set X be

Generalized singular value decomposition

Inspired by ideas in [3], [11], [12], in this section, we utilize GSVD to the MMC criterion which make the algorithm more effective. Let Hb=n1n(m1m),,ncn(mcm)Rm×c,Hw=[X1m1e1T,,XcmcecT]Rm×n,where XiRm×ni is the input data matrix of the class Ci, mi=(1/ni)kNixk is the mean vector of the class Ci, m=(1/n)xXx is the total mean vector of samples, ei=[1,,1]TRni. Then the scatter matrices Sb and Sw can be expressed as Sb=HbHbT and Sw=HwHwT. If GSVD is applied to the matrix pair (HbT,HwT)

Fuzzy maximum margin criterions

In this section, we will generalize MMC introduced in Section 2 to FMMC. Moreover, we will extend the KFMMC model with positive definite kernels for the nonlinear separated problems by means of the kernel tricks.

We first propose FMMC for the linear separated problems. We know that a drawback of the presented methods is that the conventional optimality criteria based on the scatter matrices do not reflect directly to the samples distribution information. An immediate implication is that

Experiments

In this section, in order to demonstrate the effectiveness of the FMMC and KFMMC presented in this paper, we compare them with LDA, PCA, MSD, MMC, Complete-LDA, kernel PCA and kernel LDA on different data sets. All the experiments are performed on a Pentium 2.52GH with 2G RAM and programmed in the MATLAB language (version 7.0).

First, we use subsets of the low and the high dimensionality linear separated data sets to evaluate the performance of FMMC. We give the database description as follows:

Conclusion

In this paper, we present FMMC and KFMMC based on MMC and fuzzy set theory for the linear and nonlinear separated problems, respectively. Moreover we utilize GSVD to the MMC criterion which make the algorithm more effective. By using the membership degrees of samples, we redefine the fuzzy scatter matrices, which related directly to the samples distribution information. For the high dimensionality linear separated problems mentioned above, FMMC is outperform other feature extraction

Acknowledgments

The authors thank the anonymous reviewers for their constructive comments and suggestions. This work is supported by the National Natural Science Foundation of China (Grant no: 10871226), and the Natural Science Foundation of Shandong Province (Grant no: ZR2009AL006), and the Young and Middle-Aged Scientists Research Foundation (BS2010SF004) of Shandong Province.

Yan Cui was born in Shandong, China, in 1985. She received her BS and MS degree from Liaocheng University, Shandong, China, in 2008 and 2011, respectively. Now she is studying for her PhD degree in Pattern Recognition and Intelligence System from School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, China.

References (24)

  • Sh.W. Ji et al.

    Generalized linear discriminant analysis: a unified framework and efficient model selection

    IEEE Trans. Neural Networks

    (2008)
  • T. Hastie et al.

    Penalized discriminant analysis

    Ann. Stat.

    (1995)
  • Cited by (24)

    • Range space super spherical cap discriminant analysis

      2017, Neurocomputing
      Citation Excerpt :

      This is known as the heteroscedasticity problem in pattern classification. In the past, many feature extraction methods have been proposed for pattern recognition [4–11]. An enhanced fisher discriminant criterion (EFDC) was proposed in [9].

    • Moments discriminant analysis for supervised dimensionality reduction

      2017, Neurocomputing
      Citation Excerpt :

      The performance of LDA is limited as the number of available projection directions are smaller than the number of classes and is not applicable when the within-class scatter matrix is singular (not invertable) due to the Small Sample Size (SSS) problem [9,35–39]. One such effort, to overcome the SSS problem, is the MMC [10,11]. Geometrically, MMC maximizes the average margin between classes by utilizing the difference of between and within-class scatters instead of ratio as in LDA.

    • Extended semi-supervised fuzzy learning method for nonlinear outliers via pattern discovery

      2015, Applied Soft Computing Journal
      Citation Excerpt :

      This section reports a set of experiments for the tasks of face image classification. We compared the proposed method with various face recognition approaches including Fisherface [22], D-LDA [23], C-LDA [24], R-DA [25], F-LDA [26], F-LDE [27], F-MMC [28] and RF-LDA [29]. More specifically, among these approaches, a popular technique called PCA plus LDA that belongs to the two-stage LDA is commonly used.

    View all citing articles on Scopus

    Yan Cui was born in Shandong, China, in 1985. She received her BS and MS degree from Liaocheng University, Shandong, China, in 2008 and 2011, respectively. Now she is studying for her PhD degree in Pattern Recognition and Intelligence System from School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, China.

    Liya Fan received her BS degree from the Northeast Normal University, Changchun, China in 1984, the MS degree from the Inner Mongolia University, Hohhot, China in 2000, and the PhD degree from the Xidian University, Xi'an, China in 2003, respectively. Now, she is a professor of the School of Mathematics Sciences, Liaocheng University, Liaocheng, China. Her research interests include optimization theory and application, machine learning theory and pattern recognition.

    View full text