Feature extraction for classification problems and its application to face recognition

doi:10.1016/j.patcog.2007.10.012

Pattern Recognition

Volume 41, Issue 5, May 2008, Pages 1701-1717

https://doi.org/10.1016/j.patcog.2007.10.012 Get rights and content

Abstract

This study investigates a new method of feature extraction for classification problems. The method is based on the independent component analysis (ICA). However, unlike the original ICA, one of the unsupervised learning methods, it is developed for classification problems by utilizing class information. The proposed method is an extension of our previous work on binary-class problems to multi-class problems. It treats the class labels as input features in order to produce two sets of new features: one that carries much information on the class labels and the other that is irrelevant to the class. The learning rule for this method is obtained using the stochastic gradient method to maximize the likelihood of the observed data. Among the new features, using only class-relevant ones, the dimension of the feature space can be greatly reduced in line with the principle of parsimony, resulting better generalization. This method was applied to recognize face identities and facial expressions using various databases such as the Yale, AT&T (former ORL), Color FERET face databases and so on. The performance of the proposed method was compared with those of conventional methods such as the principal component analysis (PCA), Fisher's linear discriminant (FLD), etc. The experimental results show that the proposed method performs well for face recognition problems.

Introduction

Many subspace methods have been successfully applied to construct features of an image [1], [2], [3], [4], [5], [6]. Among these, the Eigenface [1] (based on principal component analysis, PCA) and Fisherface [2] (based on Fisher's linear discriminant, FLD) methods are popular, because they allow the efficient characterization of a low-dimensional subspace whilst preserving the perceptual quality of a very high-dimensional raw image.

Though it is most popular, the Eigenface method [1], by its nature, is not suitable for classification problems since it does not make use of any output class information in computing the principal components (PCs). Besides, it extracts features that are not invariant under the transformation. Merely scaling the attributes changes resulting features. In addition, it does not use higher order statistics and it has been reported that the performance of the Eigenface method is severely affected by the level of illumination [2].

Unlike the Eigenface method, the Fisherface method [2] focuses on the classification problems to determine optimal linear discriminating functions for certain types of data whose classes have a Gaussian distribution and the centers of which are well separated. Although it is quite simple and powerful for classification problems, it cannot produce more than $N_{c} - 1$ features, where $N_{c}$ is the number of classes. As in the Eigenface method, it only uses second-order statistics in representing the images. On the other hand, some researchers have proposed subspace methods using higher order statistics such as the evolutionary pursuit and kernel methods for face recognition [3], [4], [5].

Recently, independent component analysis (ICA), which was originally devised for blind source separation problems, has received a great deal of attention in the neural networks and signal processing societies because of its potential applications in various areas. Bell and Sejnowski [7] developed an unsupervised learning algorithm for performing ICA based on entropy maximization in a single-layer feedforward neural network, and other researchers have shown that ICA is more powerful for face recognition than the PCA [6], [8], [9]. Unlike PCA and FLD, ICA uses higher order statistics and has been applied successfully in recognizing faces with changes in pose [8], and classifying facial actions [9]. Like PCA, it does not utilize the output class information and it leaves plenty of room for improvement.

In our previous works [10], [11], [12], we have proposed a feature extraction method called ICA-FX which utilizes the standard ICA algorithm for binary classification problems. In this method, the binary class label is treated as one of the hidden sources whose linear combinations are considered to constitute the observations. Then, feature extraction problems can be solved by standard ICA algorithms. By maximizing the joint mutual information between the class labels and the new features, we could find a number of features that carry as much information on the class labels as possible.

However, the application of ICA-FX is limited to two-class problems and it cannot be applied to multi-class problems such as face recognition. As such, in this paper, ICA-FX is extended to multi-class problems. There have been several researches whose focus is to extend binary classification problems to multi-class problems [13], [14], [15], [16]. Most of the researches of this kind dealt with the problem of how to extend binary classifiers such as support vector machines (SVM) to multi-class classification problems and the most popular solution is to decompose multi-class classification problems into several multiple binary classification problems and to use combining schemes afterwards [13], [14], [15], [16]. Regarding feature extraction schemes, FLD which was originally designed for two-class problems has been easily extended to multi-class problems by changing the form of within-scatter and between-scatter matrices.

In this paper, instead of adding only one class node as an input to the structure of ICA, we have added $N_{c}$ class nodes as inputs to the structure of ICA where $N_{c}$ denotes the number of classes. In doing so, the 1-out-of- $N_{c}$ coding scheme is used to code the class label.

The proposed method is applied to face recognition and facial expression problems. The experimental results show that it greatly reduces the dimension of feature space while improving the classification performance.

This paper is organized as follows. A brief review of the ICA is carried out in Section 2 and a new feature extraction algorithm is proposed in Section 3. The experimental results for the face recognition problems are given to show the advantages of the proposed algorithm in Section 4. Finally, conclusions follow in Section 5.

Section snippets

Review of ICA

The problem of linear ICA for blind source separation was developed in the literature [17], [18], [19]. In parallel, Bell and Sejnowski [7] developed an unsupervised learning algorithm based on entropy maximization of a feedforward neural network's output layer, which is referred to as the Infomax algorithm. The Infomax approach, the maximum likelihood estimation (MLE) approach and the negentropy maximization approach were reported to have identical results [20], [21], [22].

The problem setting

Feature extraction based on ICA for multi-class problems

ICA outputs a set of maximally independent vectors that are linear combinations of the observed data. Although these vectors might have some applications in such areas as blind source separation [7] and data visualization [25], for classification problems, it does not perform as good as supervised methods such as FLD, because it does not make use of class information. The effort to incorporate the standard ICA with supervised learning has been made in our previous works [11], [12]. In those

Experimental results

In this section, the ICA-FX was applied to face recognition problems and the performance was compared with those of the other methods such as PCA, ICA and FLD. This is an extension of Ref. [29] where face recognition problems were viewed as multiple binary classification problems and the binary version of the ICA-FX [11], [12] was used to tackle the multi-class classification problems.

To apply the ICA-FX to face recognition problems, we first need to determine the original input features $x$ of

Conclusions

In this paper, the feature extraction algorithm, ICA-FX, has been extended to multi-class problems and it has been applied to face recognition problems. The proposed algorithm is based on the standard ICA and can generate very useful features for classification problems.

Although ICA can be directly used for feature extraction, it does not generate useful information because of its unsupervised learning nature. In the proposed algorithm, class information was added in the learning stage of ICA.

About the Author—NOJUN KWAK received the B.S., M.S. and Ph.D., degrees from the School of Electrical Engineering and Computer Science, Seoul National University, Seoul, Korea, in 1997, 1999 and 2003, respectively. From 2003 to 2006, he worked for Samsung Electronics. In 2006, he joined Seoul National University as a BK21 assistant professor. Currently, he is an assistant professor at Ajou University, Korea. His research interests include pattern recognition, neural networks, machine learning,

References (40)

P. Comon
Independent component analysis a new concept?
Signal Process.
(1994)
T.-W. Lee et al.
A unifying information—theoretic framework for independent component analysis
Comput. Math. Appl.
(2000)
L. Xu et al.
Learned parametric mixture based ICA algorithm
Neurocomputing
(1998)
J. Lu et al.
Regularization studies of linear discriminant analysis in small sample size scenarios with applications to face recognition
Pattern Recognition Lett.
(2005)
C. Kim et al.
Image covariance-based subspace method for face recognition
Pattern Recognition
(2007)
M. Turk, A. Pentland, Face recognition using eigenfaces, in: Proceedings of the IEEE Conference on Computer Vision and...
P. Belhumeur et al.
Eigenfaces vs. fisherfaces: recognition using class specific linear projection
IEEE Trans. Pattern Anal. Mach. Intell.
(1997)
C. Liu et al.
Evolutionary pursuit and its application to face recognition
IEEE Trans. Pattern Anal. Mach. Intell.
(2000)
M.-H. Yang, Face recognition using kernel methods, Adv. Neural Inf. Process. Syst. 14...
J. Lu et al.
Face recognition using kernel direct discriminant analysis algorithms
IEEE Trans. Neural Networks
(2003)

M.S. Bartlett et al.

Face recognition by independent component analysis

IEEE Trans. Neural Networks

(2002)

A. Bell et al.

An information-maximization approach to blind separation and blind deconvolution

Neural Comput.

(1995)

M. Bartlett et al.

Viewpoint invariant face recognition using independent component analysis and attractor networks

Neural Inf. Process. Syst. Nat. Synth.

(1997)

G. Donato et al.

Classifying facial actions

IEEE Trans. Pattern Anal. Mach. Intell.

(1999)

N. Kwak et al.

Feature extraction using ICA

N. Kwak et al.

A new method of feature extraction and its stability

N. Kwak et al.

Feature extraction based on ICA for binary classification problems

IEEE Trans. Knowl. Data Eng.

(2003)

N. Yukinawa, S. Oba, K. Kato, S. Ishii, Multi-class pattern classification based on a probabilistic model of combining...

C.-W. Hsu et al.

A comparison of methods for multiclass support vector machines

IEEE Trans. Neural Networks

(2002)

D. Tax, R. Duin, Using two-class classifiers for multiclass classification, in: Proceedings of the International...

Cited by (39)

Likelihood ratio equivalence and imbalanced binary classification
2019, Expert Systems with Applications
Citation Excerpt :
Such problems have much relevance, and are frequent in practice. Consequently, there is a long list of works addressing different applications, including Rao, Krishnan, and Niculescu (2006), Mazurowski et al. (2008), Mena and González (2009), Freitas (2011) and Nahar, Imam, Tickle, and Chen (2013) in medicine, Radivojac, Chawla, Dunker, and Obradovic (2004), Batuwita and Palade (2009), Yu, Ni, and Zhao (2013) and Triguero et al. (2015) in bioinformatics, Viola and Jones (2004), Tao, Tang, Li, and Wu (2006), Kwak (2008), Chen, Fang, Huo, and Li (2011) and De la Torre, Granger, Sabourin, and Gorodnichy (2015) in image processing and retrieval, Liao (2008), Park, Oh, and Pedrycz (2013) and Seiffert, Khoshgoftaar, Van Hulse, and Folleco (2014) in production processes, Chan and Stolfo (1998), Phua, Alahakoon, and Lee (2004), Tavallaee, Stakhanova, and Ghorbani (2010) and Mehrotra, Singh, Vatsa, and Majhi (2016) in security and safety, Liu, Hsu, and Ma (1999) and Zhou (2013) in business and finance, Manevitz and Yousef (2001) and Tong and Koller (2001) in text classification, Tsai, Chang, and Chiang (2009) in meteorology and González et al. (2013) in biology. The most widely used classifiers, also referred to as discriminative machines - including those employing multi-layer perceptrons (MLPs) and radial basis function networks (RBFNs), support vector machines (SVMs), and the corresponding machine ensembles - are sensitive to imbalance because their parameter values are established by algorithms that try to optimize performance measures that do not consider imbalance effects.
This contribution proves that neutral re-balancing mechanisms, that do not alter the likelihood ratio, and training discriminative machines using Bregman divergences as surrogate costs are necessary and sufficient conditions to estimate the likelihood ratio of imbalanced binary classification problems in a consistent manner. These two conditions permit the estimation of the theoretical Neyman–Pearson operating characteristic corresponding to the problem under study. In practice, a classifier operates at a certain working point corresponding to, for example, a given false positive rate. This perspective allows the introduction of an additional principled procedure to improve classification performance by means of a second design step in which more weight is assigned to the appropriate training samples. The paper includes a number of examples that demonstrate the performance capabilities of the methods presented, and concludes with a discussion of relevant research directions and open problems in the area.
Finding the Best Classification Threshold in Imbalanced Classification
2016, Big Data Research
Citation Excerpt :
Imbalanced classification is one of most popular topics in the field of machine learning [1–4]. This issue is represented in many real-world applications, such as bioinformatics [5–11], telecommunications management [12], text classification [13], face recognition [14], and ozone level forecasting [15]. Traditional classifications algorithms perform poorly on imbalanced datasets because the applied evaluation metrics, such as the overall accuracy metric, force classifiers to minimize the error rate, i.e., the percentage of the incorrect prediction of class labels.
Classification with imbalanced class distributions is a major problem in machine learning. Researchers have given considerable attention to the applications in many real-world scenarios. Although several works have utilized the area under the receiver operating characteristic (ROC) curve to select potentially optimal classifiers in imbalanced classifications, limited studies have been devoted to finding the classification threshold for testing or unknown datasets. In general, the classification threshold is simply set to 0.5, which is usually unsuitable for an imbalanced classification. In this study, we analyze the drawbacks of using ROC as the sole measure of imbalance in data classification problems. In addition, a novel framework for finding the best classification threshold is proposed. Experiments with SCOP v.1.53 data reveal that, with the default threshold set to 0.5, our proposed framework demonstrated a 20.63% improvement in terms of F-score compared with that of more commonly used methods. The findings suggest that the proposed framework is both effective and efficient. A web server and software tools are available via http://datamining.xmu.edu.cn/prht/ or http://prht.sinaapp.com/.
An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics
2013, Information Sciences
Citation Excerpt :
Regarding real world domains, the importance of the imbalance learning problem is growing, since it is a recurring issue in many applications. As some examples, we could mention very high resolution airbourne imagery [31], forecasting of ozone levels [125], face recognition [78], and especially medical diagnosis [11,86,91,93,132]. It is important to remember that the minority class usually represents the concept of interest and it is the most difficult to obtain from real data, for example patients with illnesses in a medical diagnosis problem; whereas the other class represents the counterpart of that concept (healthy patients).
Training classifiers with datasets which suffer of imbalanced class distributions is an important problem in data mining. This issue occurs when the number of examples representing the class of interest is much lower than the ones of the other classes. Its presence in many real-world applications has brought along a growth of attention from researchers.
We shortly review the many issues in machine learning and applications of this problem, by introducing the characteristics of the imbalanced dataset scenario in classification, presenting the specific metrics for evaluating performance in class imbalanced learning and enumerating the proposed solutions. In particular, we will describe preprocessing, cost-sensitive learning and ensemble techniques, carrying out an experimental study to contrast these approaches in an intra and inter-family comparison.
We will carry out a thorough discussion on the main issues related to using data intrinsic characteristics in this classification problem. This will help to improve the current models with respect to: the presence of small disjuncts, the lack of density in the training data, the overlapping between classes, the identification of noisy data, the significance of the borderline instances, and the dataset shift between the training and the test distributions. Finally, we introduce several approaches and recommendations to address these problems in conjunction with imbalanced data, and we will show some experimental examples on the behavior of the learning algorithms on data with such intrinsic characteristics.
Integrating independent component analysis and support vector machine for multivariate process monitoring
2010, Computers and Industrial Engineering
This study aims to develop an intelligent algorithm by integrating the independent component analysis (ICA) and support vector machine (SVM) for monitoring multivariate processes. For developing a successful SVM-based fault detector, the first step is feature extraction. In real industrial processes, process variables are rarely Gaussian distributed. Thus, this study proposes the application of ICA to extract the hidden information of a non-Gaussian process before conducting SVM. The proposed fault detector will be implemented via two simulated processes and a case study of the Tennessee Eastman process. Results demonstrate that the proposed method possesses superior fault detection when compared to conventional monitoring methods, including PCA, ICA, modified ICA, ICA–PCA and PCA–SVM.
Imbalanced dataset classification using fuzzy ARTMAP and computational intelligence techniques
2023, Indonesian Journal of Electrical Engineering and Computer Science
Monocular Occlusion-Robust Tracking Novel for Campus Intelligent Monitor System
2023, Lecture Notes in Electrical Engineering

View all citing articles on Scopus

View full text

Feature extraction for classification problems and its application to face recognition

Abstract

Introduction

Section snippets

Review of ICA

Feature extraction based on ICA for multi-class problems

Experimental results

Conclusions

Signal Process.

Comput. Math. Appl.

Neurocomputing

Pattern Recognition Lett.

Pattern Recognition

Eigenfaces vs. fisherfaces: recognition using class specific linear projection

IEEE Trans. Pattern Anal. Mach. Intell.

Evolutionary pursuit and its application to face recognition

IEEE Trans. Pattern Anal. Mach. Intell.

Face recognition using kernel direct discriminant analysis algorithms

IEEE Trans. Neural Networks

Face recognition by independent component analysis

IEEE Trans. Neural Networks

An information-maximization approach to blind separation and blind deconvolution

Neural Comput.

Viewpoint invariant face recognition using independent component analysis and attractor networks

Neural Inf. Process. Syst. Nat. Synth.

Classifying facial actions

IEEE Trans. Pattern Anal. Mach. Intell.

Feature extraction using ICA

A new method of feature extraction and its stability

Feature extraction based on ICA for binary classification problems

IEEE Trans. Knowl. Data Eng.

A comparison of methods for multiclass support vector machines

IEEE Trans. Neural Networks