Feature extraction for classification problems and its application to face recognition
Introduction
Many subspace methods have been successfully applied to construct features of an image [1], [2], [3], [4], [5], [6]. Among these, the Eigenface [1] (based on principal component analysis, PCA) and Fisherface [2] (based on Fisher's linear discriminant, FLD) methods are popular, because they allow the efficient characterization of a low-dimensional subspace whilst preserving the perceptual quality of a very high-dimensional raw image.
Though it is most popular, the Eigenface method [1], by its nature, is not suitable for classification problems since it does not make use of any output class information in computing the principal components (PCs). Besides, it extracts features that are not invariant under the transformation. Merely scaling the attributes changes resulting features. In addition, it does not use higher order statistics and it has been reported that the performance of the Eigenface method is severely affected by the level of illumination [2].
Unlike the Eigenface method, the Fisherface method [2] focuses on the classification problems to determine optimal linear discriminating functions for certain types of data whose classes have a Gaussian distribution and the centers of which are well separated. Although it is quite simple and powerful for classification problems, it cannot produce more than features, where is the number of classes. As in the Eigenface method, it only uses second-order statistics in representing the images. On the other hand, some researchers have proposed subspace methods using higher order statistics such as the evolutionary pursuit and kernel methods for face recognition [3], [4], [5].
Recently, independent component analysis (ICA), which was originally devised for blind source separation problems, has received a great deal of attention in the neural networks and signal processing societies because of its potential applications in various areas. Bell and Sejnowski [7] developed an unsupervised learning algorithm for performing ICA based on entropy maximization in a single-layer feedforward neural network, and other researchers have shown that ICA is more powerful for face recognition than the PCA [6], [8], [9]. Unlike PCA and FLD, ICA uses higher order statistics and has been applied successfully in recognizing faces with changes in pose [8], and classifying facial actions [9]. Like PCA, it does not utilize the output class information and it leaves plenty of room for improvement.
In our previous works [10], [11], [12], we have proposed a feature extraction method called ICA-FX which utilizes the standard ICA algorithm for binary classification problems. In this method, the binary class label is treated as one of the hidden sources whose linear combinations are considered to constitute the observations. Then, feature extraction problems can be solved by standard ICA algorithms. By maximizing the joint mutual information between the class labels and the new features, we could find a number of features that carry as much information on the class labels as possible.
However, the application of ICA-FX is limited to two-class problems and it cannot be applied to multi-class problems such as face recognition. As such, in this paper, ICA-FX is extended to multi-class problems. There have been several researches whose focus is to extend binary classification problems to multi-class problems [13], [14], [15], [16]. Most of the researches of this kind dealt with the problem of how to extend binary classifiers such as support vector machines (SVM) to multi-class classification problems and the most popular solution is to decompose multi-class classification problems into several multiple binary classification problems and to use combining schemes afterwards [13], [14], [15], [16]. Regarding feature extraction schemes, FLD which was originally designed for two-class problems has been easily extended to multi-class problems by changing the form of within-scatter and between-scatter matrices.
In this paper, instead of adding only one class node as an input to the structure of ICA, we have added class nodes as inputs to the structure of ICA where denotes the number of classes. In doing so, the 1-out-of- coding scheme is used to code the class label.
The proposed method is applied to face recognition and facial expression problems. The experimental results show that it greatly reduces the dimension of feature space while improving the classification performance.
This paper is organized as follows. A brief review of the ICA is carried out in Section 2 and a new feature extraction algorithm is proposed in Section 3. The experimental results for the face recognition problems are given to show the advantages of the proposed algorithm in Section 4. Finally, conclusions follow in Section 5.
Section snippets
Review of ICA
The problem of linear ICA for blind source separation was developed in the literature [17], [18], [19]. In parallel, Bell and Sejnowski [7] developed an unsupervised learning algorithm based on entropy maximization of a feedforward neural network's output layer, which is referred to as the Infomax algorithm. The Infomax approach, the maximum likelihood estimation (MLE) approach and the negentropy maximization approach were reported to have identical results [20], [21], [22].
The problem setting
Feature extraction based on ICA for multi-class problems
ICA outputs a set of maximally independent vectors that are linear combinations of the observed data. Although these vectors might have some applications in such areas as blind source separation [7] and data visualization [25], for classification problems, it does not perform as good as supervised methods such as FLD, because it does not make use of class information. The effort to incorporate the standard ICA with supervised learning has been made in our previous works [11], [12]. In those
Experimental results
In this section, the ICA-FX was applied to face recognition problems and the performance was compared with those of the other methods such as PCA, ICA and FLD. This is an extension of Ref. [29] where face recognition problems were viewed as multiple binary classification problems and the binary version of the ICA-FX [11], [12] was used to tackle the multi-class classification problems.
To apply the ICA-FX to face recognition problems, we first need to determine the original input features of
Conclusions
In this paper, the feature extraction algorithm, ICA-FX, has been extended to multi-class problems and it has been applied to face recognition problems. The proposed algorithm is based on the standard ICA and can generate very useful features for classification problems.
Although ICA can be directly used for feature extraction, it does not generate useful information because of its unsupervised learning nature. In the proposed algorithm, class information was added in the learning stage of ICA.
About the Author—NOJUN KWAK received the B.S., M.S. and Ph.D., degrees from the School of Electrical Engineering and Computer Science, Seoul National University, Seoul, Korea, in 1997, 1999 and 2003, respectively. From 2003 to 2006, he worked for Samsung Electronics. In 2006, he joined Seoul National University as a BK21 assistant professor. Currently, he is an assistant professor at Ajou University, Korea. His research interests include pattern recognition, neural networks, machine learning,
References (40)
Independent component analysis a new concept?
Signal Process.
(1994)- et al.
A unifying information—theoretic framework for independent component analysis
Comput. Math. Appl.
(2000) - et al.
Learned parametric mixture based ICA algorithm
Neurocomputing
(1998) - et al.
Regularization studies of linear discriminant analysis in small sample size scenarios with applications to face recognition
Pattern Recognition Lett.
(2005) - et al.
Image covariance-based subspace method for face recognition
Pattern Recognition
(2007) - M. Turk, A. Pentland, Face recognition using eigenfaces, in: Proceedings of the IEEE Conference on Computer Vision and...
- et al.
Eigenfaces vs. fisherfaces: recognition using class specific linear projection
IEEE Trans. Pattern Anal. Mach. Intell.
(1997) - et al.
Evolutionary pursuit and its application to face recognition
IEEE Trans. Pattern Anal. Mach. Intell.
(2000) - M.-H. Yang, Face recognition using kernel methods, Adv. Neural Inf. Process. Syst. 14...
- et al.
Face recognition using kernel direct discriminant analysis algorithms
IEEE Trans. Neural Networks
(2003)
Face recognition by independent component analysis
IEEE Trans. Neural Networks
An information-maximization approach to blind separation and blind deconvolution
Neural Comput.
Viewpoint invariant face recognition using independent component analysis and attractor networks
Neural Inf. Process. Syst. Nat. Synth.
Classifying facial actions
IEEE Trans. Pattern Anal. Mach. Intell.
Feature extraction using ICA
A new method of feature extraction and its stability
Feature extraction based on ICA for binary classification problems
IEEE Trans. Knowl. Data Eng.
A comparison of methods for multiclass support vector machines
IEEE Trans. Neural Networks
Cited by (39)
Likelihood ratio equivalence and imbalanced binary classification
2019, Expert Systems with ApplicationsCitation Excerpt :Such problems have much relevance, and are frequent in practice. Consequently, there is a long list of works addressing different applications, including Rao, Krishnan, and Niculescu (2006), Mazurowski et al. (2008), Mena and González (2009), Freitas (2011) and Nahar, Imam, Tickle, and Chen (2013) in medicine, Radivojac, Chawla, Dunker, and Obradovic (2004), Batuwita and Palade (2009), Yu, Ni, and Zhao (2013) and Triguero et al. (2015) in bioinformatics, Viola and Jones (2004), Tao, Tang, Li, and Wu (2006), Kwak (2008), Chen, Fang, Huo, and Li (2011) and De la Torre, Granger, Sabourin, and Gorodnichy (2015) in image processing and retrieval, Liao (2008), Park, Oh, and Pedrycz (2013) and Seiffert, Khoshgoftaar, Van Hulse, and Folleco (2014) in production processes, Chan and Stolfo (1998), Phua, Alahakoon, and Lee (2004), Tavallaee, Stakhanova, and Ghorbani (2010) and Mehrotra, Singh, Vatsa, and Majhi (2016) in security and safety, Liu, Hsu, and Ma (1999) and Zhou (2013) in business and finance, Manevitz and Yousef (2001) and Tong and Koller (2001) in text classification, Tsai, Chang, and Chiang (2009) in meteorology and González et al. (2013) in biology. The most widely used classifiers, also referred to as discriminative machines - including those employing multi-layer perceptrons (MLPs) and radial basis function networks (RBFNs), support vector machines (SVMs), and the corresponding machine ensembles - are sensitive to imbalance because their parameter values are established by algorithms that try to optimize performance measures that do not consider imbalance effects.
Finding the Best Classification Threshold in Imbalanced Classification
2016, Big Data ResearchCitation Excerpt :Imbalanced classification is one of most popular topics in the field of machine learning [1–4]. This issue is represented in many real-world applications, such as bioinformatics [5–11], telecommunications management [12], text classification [13], face recognition [14], and ozone level forecasting [15]. Traditional classifications algorithms perform poorly on imbalanced datasets because the applied evaluation metrics, such as the overall accuracy metric, force classifiers to minimize the error rate, i.e., the percentage of the incorrect prediction of class labels.
An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics
2013, Information SciencesCitation Excerpt :Regarding real world domains, the importance of the imbalance learning problem is growing, since it is a recurring issue in many applications. As some examples, we could mention very high resolution airbourne imagery [31], forecasting of ozone levels [125], face recognition [78], and especially medical diagnosis [11,86,91,93,132]. It is important to remember that the minority class usually represents the concept of interest and it is the most difficult to obtain from real data, for example patients with illnesses in a medical diagnosis problem; whereas the other class represents the counterpart of that concept (healthy patients).
Integrating independent component analysis and support vector machine for multivariate process monitoring
2010, Computers and Industrial EngineeringImbalanced dataset classification using fuzzy ARTMAP and computational intelligence techniques
2023, Indonesian Journal of Electrical Engineering and Computer ScienceMonocular Occlusion-Robust Tracking Novel for Campus Intelligent Monitor System
2023, Lecture Notes in Electrical Engineering
About the Author—NOJUN KWAK received the B.S., M.S. and Ph.D., degrees from the School of Electrical Engineering and Computer Science, Seoul National University, Seoul, Korea, in 1997, 1999 and 2003, respectively. From 2003 to 2006, he worked for Samsung Electronics. In 2006, he joined Seoul National University as a BK21 assistant professor. Currently, he is an assistant professor at Ajou University, Korea. His research interests include pattern recognition, neural networks, machine learning, data mining, image processing and their applications.