Feature extraction using fuzzy maximum margin criterion

doi:10.1016/j.neucom.2011.12.031

Neurocomputing

Volume 86, 1 June 2012, Pages 52-58

https://doi.org/10.1016/j.neucom.2011.12.031 Get rights and content

Abstract

In pattern recognition, feature extraction techniques are widely employed to reduce the dimensionality of date. In this paper, a novel feature extraction criterion, fuzzy maximum margin criterion (FMMC), is proposed by means of the maximum margin criterion (MMC) and fuzzy set theory. More specifically, the between-class and within-class fuzzy scatter matrices are redefined by incorporating the membership degrees of samples which relates the samples distribution information; then the feature extraction criterion maximized the average margin between classes after dimensionality reduction is applied. Furthermore, we utilize the generalized singular value decomposition (GSVD) to the criterion, which make the algorithm more effective; for nonlinear separated problems, we extend the kernel extension of FMMC with positive definite kernels. The effective of the novel criterion for linear and nonlinear separated problems is illustrated by experiments.

Introduction

In pattern recognition, feature extraction techniques are widely employed to reduce the dimensionality of date. Principal component analysis (PCA) [1] and linear discriminant analysis (LDA) [2] are two most popular linear subspace learning methods to reduce the dimensionality of date. PCA is an unsupervised learning algorithm, which performs dimensionality reduction by projecting the original m-dimensional data on to the $l (l ⪡ m) - dimensional$ linear subspace spanned by the leading eigenvectors of the data's covariance matrix. While LDA is a supervised learning algorithm, which searches the projection axes on which the data points of different classes are far from each other while requiring data points of the same class to be close to each other. Therefore, LDA encodes discriminating information in a linearly separable space. However, a drawback of LDA is that it cannot be directly applied when the scatter matrix S_w or S_t is singular due to the small sample size problems. In the past, many LDA extensions have been developed to deal with this problem, such as, pseudo-inverse LDA (PLDA) [3], regular LDA (RLDA) [4], penalized discriminant analysis (PDA) [5], LDA/GSVD [6], LDA/QR [7], orthogonal LDA (OLDA) [8], null space LDA (NLDA) [9], direct-LDA (D-LDA) [10], CLDA [11] and two-stage LDA [12].

However, a drawback of the presented methods is that the optimality criteria defined based on the scatter matrices does not relate directly to the samples distribution information. Specially, for multi-class classification problems involving more then two class, it overemphasized the classes with large distance in the input space and caused large overlaps of neighboring classes in the output space. To tackle this problem, a weighted function is incorporated into the Fisher criterion by giving higher weights to classes that are closer together in the input space as they are likely to lead to misclassification. The samples distribution information is represented by the weighted function, such methods as the fractional-step LDA (F-LDA) in [13], a subspace can be found in which the classification accuracy is higher than that obtained using LDA; the weighted pairwise fisher criteria for multi-class in [14] by incorporating a weighted function to modify the fisher criterion such that it is more closely related to the samples distribution information; the uncorrelated weighted linear discriminant analysis (UWLDA) in [15] introduced a weighted function to restrain the dominant role of the classes with larger distance; the direct weighted LDA (DW-LDA) in [16] combined the advantages of D-LDA and the weighted pairwise fisher criteria. However, the weighted function in [13], [14], [15], [16] is hard to determine, therefore, fuzzy pattern recognition based on fuzzy set theory have been carried out, in which the samples distribution information is represented by the membership degrees corresponding to every class, such as Kwak et al. [17] proposed a fuzzy Fisherfaces face recognition via fuzzy set; Yang et al. [18] presented a fuzzy inverse fisher discriminant analysis (FIFDA) by means of the inverse fisher discriminant criterion and fuzzy set theory. In the methods [17], [18], a membership degree matrix is calculated using FKNN [19], then the membership degrees are incorporated to redefine fuzzy scatter matrices. However, a drawback of these methods is that it still cannot be directly applied when the redefined fuzzy scatter matrix S_Fw or S_Ft is singular due to the small sample size problems.

Inspired by ideas in [17], [18], [20], [21], in the paper, we will propose a novel feature extraction criterion, FMMC, by means of MMC and fuzzy set theory; based on the kernel tricks in [22], [23], [24], we will extend the kernel fuzzy maximum margin criterion (KFMMC) model with positive definite kernels for the nonlinear separated problems. Moreover, we will apply the GSVD algorithm to the classical MMC criteria and evaluate its effectiveness by experiments. In contrast to the conventional optimality criterion, FMMC can be applied directly to the small sample size problems and related directly to the samples distribution information by the membership degrees of samples. The rest of the paper is organized as follows: we review the classical MMC in Sections 2; we apply the GSVD algorithm to the classical MMC in Section 3; in Section 4, we propose FMMC for linear separated problems and KFMMC for nonlinear separated problems; in Section 5, experiments are presented to demonstrate the effectiveness of FMMC and KFMMC; conclusions are summarized in Section 6.

Section snippets

Maximum margin criterion

In this section, we give an overview of the classical MMC. For the given pattern samples data set $T = {(x_{1}, y_{1}), \dots, (x_{n}, y_{n})} \subset X \times Y$ , $X = {x_{1}, \dots, x_{n}} \subset R^{m}$ is the input data set and $Y = {C_{1}, \dots, C_{c}}$ is the class label set. In order to reduce the dimensionality of a new sample $x \in R^{m}$ , some measures need be employed to assess similarity or dissimilarity. We want to find a linear transformation $W^{T} : R^{m} \to R^{l}$ such that the similarity/dissimilarity is preserved in the reduced-dimensional space.

Let the input data set X be

Generalized singular value decomposition

Inspired by ideas in [3], [11], [12], in this section, we utilize GSVD to the MMC criterion which make the algorithm more effective. Let $H_{b} = [\sqrt{\frac{n_{1}}{n}} (m_{1} - m), \dots, \sqrt{\frac{n_{c}}{n}} (m_{c} - m)] \in R^{m \times c},$ $H_{w} = [X_{1} - m_{1} e_{1}^{T}, \dots, X_{c} - m_{c} e_{c}^{T}] \in R^{m \times n},$ where $X_{i} \in R^{m \times n_{i}}$ is the input data matrix of the class C_i, $m_{i} = (1 / n_{i}) \sum_{k \in N_{i}} x_{k}$ is the mean vector of the class C_i, $m = (1 / n) \sum_{x \in X} x$ is the total mean vector of samples, $e_{i} = [1, \dots, 1]^{T} \in R^{n_{i}}$ . Then the scatter matrices S_b and S_w can be expressed as $S_{b} = H_{b} H_{b}^{T}$ and $S_{w} = H_{w} H_{w}^{T}$ . If GSVD is applied to the matrix pair $(H_{b}^{T}, H_{w}^{T})$

Fuzzy maximum margin criterions

In this section, we will generalize MMC introduced in Section 2 to FMMC. Moreover, we will extend the KFMMC model with positive definite kernels for the nonlinear separated problems by means of the kernel tricks.

We first propose FMMC for the linear separated problems. We know that a drawback of the presented methods is that the conventional optimality criteria based on the scatter matrices do not reflect directly to the samples distribution information. An immediate implication is that

Experiments

In this section, in order to demonstrate the effectiveness of the FMMC and KFMMC presented in this paper, we compare them with LDA, PCA, MSD, MMC, Complete-LDA, kernel PCA and kernel LDA on different data sets. All the experiments are performed on a Pentium 2.52GH with 2G RAM and programmed in the MATLAB language (version 7.0).

First, we use subsets of the low and the high dimensionality linear separated data sets to evaluate the performance of FMMC. We give the database description as follows:

Conclusion

In this paper, we present FMMC and KFMMC based on MMC and fuzzy set theory for the linear and nonlinear separated problems, respectively. Moreover we utilize GSVD to the MMC criterion which make the algorithm more effective. By using the membership degrees of samples, we redefine the fuzzy scatter matrices, which related directly to the samples distribution information. For the high dimensionality linear separated problems mentioned above, FMMC is outperform other feature extraction

Acknowledgments

The authors thank the anonymous reviewers for their constructive comments and suggestions. This work is supported by the National Natural Science Foundation of China (Grant no: 10871226), and the Natural Science Foundation of Shandong Province (Grant no: ZR2009AL006), and the Young and Middle-Aged Scientists Research Foundation (BS2010SF004) of Shandong Province.

Yan Cui was born in Shandong, China, in 1985. She received her BS and MS degree from Liaocheng University, Shandong, China, in 2008 and 2011, respectively. Now she is studying for her PhD degree in Pattern Recognition and Intelligence System from School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, China.

References (24)

J.W. Lu et al.
Regularization studies of linear discriminant analysis in small sample size scenarios with application to face recognition
Pattern Recogn. Lett.
(2005)
L.F. Chen et al.
A new LDA-based face recognition systerm which can solve the small sample size problem
Pattern Recogn.
(2000)
H. Yu et al.
A direct LDA algorithm for high-dimensional data—with application to face recognion
Pattern Recogn.
(2001)
J. Yang et al.
Why can LDA be performed in PCA transformed space?
Pattern Recogn.
(2003)
Y.X. Liang et al.
Uncorrelated linear discriminant analysis based on weighted pairwise fisher criterion
Pattern Recogn.
(2007)
K. Kwak et al.
Face recognition using a fuzzy fisherface classifier
Pattern Recogn.
(2005)
W.K. Yang et al.
Feature extraction using fuzzy inverse FDA
Neurocomputing
(2009)
J.G. Wang et al.
Kernel maximum scatter difference based feature extraction and its application to face recognition
Pattern Recogn. Lett.
(2008)
I.T. Jolliffe
Principal Component Analysis
(2002)
S. Mika, Kernel Fisher Discriminant, PhD Thesis, University of Technology, Berlin,...

Sh.W. Ji et al.

Generalized linear discriminant analysis: a unified framework and efficient model selection

IEEE Trans. Neural Networks

(2008)

T. Hastie et al.

Penalized discriminant analysis

Ann. Stat.

(1995)

Cited by (24)

Range space super spherical cap discriminant analysis
2017, Neurocomputing
Citation Excerpt :
This is known as the heteroscedasticity problem in pattern classification. In the past, many feature extraction methods have been proposed for pattern recognition [4–11]. An enhanced fisher discriminant criterion (EFDC) was proposed in [9].
To overcome the separability problem caused by sample fusion in the process of sample vectors normalization, this paper presents a unit super spherical cap discriminant analysis in the range space of the total scatter matrix. It is proved that the unit super spherical cap model can maintain the topological invariability of the structural characteristics of sample vectors. Furthermore, a sufficient condition is derived for improving the separability of sample data under the proposed model. The proposed algorithm projects sample data to the range space of the total scatter matrix, and then adds one dimension to each sample of the range space and nonlinearly maps it on the surface of the unit super spherical cap. We put forth a new classifier called the “spherical inner product nearest neighbor classifier’’ for the transformed data. It is designed for the deviation problem of the discriminant vector and the separability problem caused by sample vectors normalization when different sub-classes are located in different low-dimensional subspaces or manifolds. Experimental results on different databases show that our method outperforms other methods in terms of recognition accuracy and numerical stability.
Moments discriminant analysis for supervised dimensionality reduction
2017, Neurocomputing
Citation Excerpt :
The performance of LDA is limited as the number of available projection directions are smaller than the number of classes and is not applicable when the within-class scatter matrix is singular (not invertable) due to the Small Sample Size (SSS) problem [9,35–39]. One such effort, to overcome the SSS problem, is the MMC [10,11]. Geometrically, MMC maximizes the average margin between classes by utilizing the difference of between and within-class scatters instead of ratio as in LDA.
Most of the well-known supervised dimensionality reduction methods assume unimodal or Gaussian likelihoods, which may not be appropriate in the real life applications. In this manuscript, we introduce a novel supervised dimensionality reduction approach, moments discriminant analysis, which models linear relationships between the high-dimensional input space and a low-dimensional space by maximizing the discrimination between second order raw moments of different classes to improve the generalization capability of a classifier. Unlike the state-of-the-art methods, moments discriminant analysis is intended to accommodate data distributions that may be multimodal and non-Gaussian. Initially, experiments using synthetic random data (generated from different probability distributions) are performed to prove the efficiency of the proposed method for multimodal and non-Gaussian data with the help of five separability measures. Also, extensive experimental results on UCI machine learning repository and image retrieval on WANG and MIT (Oliva and Torralba) databases are carried out in order to exhibit the effectiveness of moments discriminant analysis over the state-of-the-art methods.
A dimension reduction algorithm preserving both global and local clustering structure
2017, Knowledge-Based Systems
By combining linear discriminant analysis and Kmeans into a coherent framework, a dimension reduction algorithm was recently proposed to select the most discriminative subspace. This algorithm utilized the clustering method to generate cluster labels and after that employed discriminant analysis to do subspace selection. However, we found that this algorithm only considers the information of global structure, and does not take into account the information of local structure. In order to overcome the shortcoming mentioned above, this paper presents a dimension reduction algorithm preserving both global and local clustering structure. Our algorithm is an unsupervised linear dimension reduction algorithm suitable for the data with cloud distribution. In the proposed algorithm, the Kmeans clustering method is adopted to generate the clustering labels for all data in the original space. And then, the obtained clustering labels are utilized to describe the global and local clustering structure. Finally, the objective function is established to preserve both the local and global clustering structure. By solving this objective function, the projection matrix and the corresponding subspace are yielded. In this way, the global and local information of the clustering structure are integrated into the process of the subspace selection, in fact, the structure discovery and the subspace selection are performed simultaneously in our algorithm. Encouraging experimental results are achieved on the artificial dataset, real-life benchmark dataset and AR face dataset.
Extended semi-supervised fuzzy learning method for nonlinear outliers via pattern discovery
2015, Applied Soft Computing Journal
Citation Excerpt :
This section reports a set of experiments for the tasks of face image classification. We compared the proposed method with various face recognition approaches including Fisherface [22], D-LDA [23], C-LDA [24], R-DA [25], F-LDA [26], F-LDE [27], F-MMC [28] and RF-LDA [29]. More specifically, among these approaches, a popular technique called PCA plus LDA that belongs to the two-stage LDA is commonly used.
This article presents an extended Parameterized Fuzzy Semi-supervised learning (PFSL) method, in which the key innovation is the capability of separating a sample set into two independent subsets: outlier sample subset and regular sample subset. In our proposed PFSL, we first develop an improved parameterized Fuzzy Linear Discriminant Analysis (F-LDA) algorithm to classify regular samples, in which the distribution information of each sample in terms of fuzzy membership degree is incorporated with the redefined within-class and between-class scatter matrices. To achieve good parameter estimation for this improved F-LDA, we advocate the use of Hopfield Neural Networks (HNN) due to its efficiency. Second, a new semi-supervised Fuzzy C-Means (S-FCM) algorithm is designed using pre-computed cluster number and cluster centers in the supervised pattern discovery stage. It is applied to classify the remaining outlier samples and generate the final classification result. Third, since Kernel Fisher Discriminant (KFD) is an efficient way to extract nonlinear discriminant features, a kernel version of the proposed PFSL (K-PFSL) is discussed. Extensive experiments on the ORL, NUST603, FERET and Yale face datasets show the effectiveness and the superiority of the proposed algorithm.
A novel discriminant criterion based on feature fusion strategy for face recognition
2015, Neurocomputing
Feature extraction is an important problem in face recognition. There are two kinds of structural features, namely the Euclidean structure and the manifold structure. However, the single-structural feature extraction methods cannot fully utilize the advantages of global feature and local feature simultaneously. Thus their performances will be degraded. To overcome the limitations of the single-structural feature based face recognition schemes, this paper proposes a novel discriminant criterion using Feature Fusion Strategy (FFS), which nonlinearly combines both Euclidean and manifold structures in the face pattern space. The proposed discriminant criterion is suitable to develop an iterative algorithm. It is able to automatically determine the optimal parameters and balance the tradeoff between Euclidean structure and manifold structure. The proposed FFS algorithm is successfully applied to face recognition. Three publicly available face databases, ORL, FERET and CMU PIE, are selected for evaluation. Compared with Linear Discriminant Analysis (LDA), Locality Preserving Projection (LPP), Unsupervised Discriminant Projection (UDP) and Semi-Supervised LDA (SSLDA), the experimental results show that the proposed method gives superior performance.
A fuzzy supervised learning method with dynamical parameter estimation for nonlinear discriminant analysis
2013, Computers and Mathematics with Applications
In this paper, we develop a novel fuzzy supervised learning algorithm based on the dynamical parameter estimation. First, a reformative supervised fuzzy LDA algorithm (RF-LDA) for the training samples is proposed. Compared with the conventional fuzzy LDA algorithm, the presented algorithm computes the discriminant vectors associated with the membership grade from each training sample, which is theoretically effective in overcoming the classification limitation originating from the imprecise samples. Second, when a novel fuzzy LDA model is required in order to take some decision about the feature extraction and classification, the dynamical parameter estimation method of the fuzzy LDA model should recursively process the measured data as they become available. In the line of previous arguments, we approach the problem of control parameter estimation of RF-LDA by considering the formulation of a Hopfield Neural Network (HNN), which is named HRF-LDA. Third, considering the fact that the Kernel Fisher Discriminant (KFD) is effective in extracting the nonlinear discriminative information of the feature space by using kernel trick, a kernel version of HRF-LDA is presented subsequently, which has the potential to outperform the traditional fuzzy learning algorithms, especially in the cases of nonlinear small sample sizes. The advantage of this learning algorithm is that it successfully utilizes the improved kernel fuzzy LDA algorithm as a supervised feature extraction tool. Meanwhile, by means of the control parameter estimation, we address the problem that the particular value of offset in the calculation of the grade of membership is dynamically assigned. Extensive experimental studies conducted on the ORL, NUST603, FERET, Yale and XM2VTS face images show the effectiveness of the proposed fuzzy integrated algorithm.

View all citing articles on Scopus

Liya Fan received her BS degree from the Northeast Normal University, Changchun, China in 1984, the MS degree from the Inner Mongolia University, Hohhot, China in 2000, and the PhD degree from the Xidian University, Xi'an, China in 2003, respectively. Now, she is a professor of the School of Mathematics Sciences, Liaocheng University, Liaocheng, China. Her research interests include optimization theory and application, machine learning theory and pattern recognition.

View full text

Feature extraction using fuzzy maximum margin criterion

Abstract

Introduction

Section snippets

Maximum margin criterion

Generalized singular value decomposition

Fuzzy maximum margin criterions

Experiments

Conclusion

Acknowledgments

Pattern Recogn. Lett.

Pattern Recogn.

Pattern Recogn.

Pattern Recogn.

Pattern Recogn.

Pattern Recogn.

Neurocomputing

Pattern Recogn. Lett.

Principal Component Analysis

Generalized linear discriminant analysis: a unified framework and efficient model selection

IEEE Trans. Neural Networks

Penalized discriminant analysis

Ann. Stat.