Feature learning via partial differential equation with applications to face recognition
Introduction
Nowadays, many well-known methods for image classification tasks (e.g. face recognition) involve two steps: feature extraction and classification. As the performance of the classifier is heavily dependent on the quality of features (or data representation), much of the effort on image classification goes into the design of features and data transformations [1]. The approaches to feature extraction can be split into two categories: manually designing features and automatically learning features.
Manual feature design is a way that incorporates human ingenuity and prior knowledge to represent data. Features extracted by existing popular methods, such as Scale-Invariant Feature Transform (SIFT) [2], Histogram of Oriented Gradients (HOG) [3], and Invariant Scattering Convolution Networks [4], usually satisfy some invariance properties, e.g., translational and rotational invariance, that are beneficial to the image classification tasks. They are intuitive and fit for various image classification tasks relatively well. However, inventing these methods is extremely labor-intensive, and existing methods may not extract discriminative information from the data well. So researchers gradually turn to learn representations of data.
Linear representation based feature learning methods have attracted much attention recently. This is because images of convex and Lambertian objects taken under distant illumination lie near an approximately nine-dimensional linear subspace, known as the harmonic plane [5]. By utilizing this subspace property, Low Rank Representation [6] based methods extract feature to capture the global structure of the whole data and are robust to noise. Chen et al. [7] extract the low rank matrix as feature and then apply Sparse Representation Classification (SRC) [8] for classification. Li et al. [9] propose a semi-supervised framework with class-wide diagonal structure to learn low-rank representations. Zhang et al. [10] expand the low-rank model into a dictionary learning method. Wu et al. [11] also apply a low-rank dictionary model into multi-view tasks. Dictionary learning methods, which learn a set of representation atoms and weighted coefficients (feature) at the same time, have also achieved huge success. Zhang et al. [12] propose a discriminative KSVD method (D-KSVD) which combines the dictionary reconstruction error and classification error and then solve their model by a single KSVD. Mairal et al. [13] model the supervised dictionary learning as a bilevel optimization framework. To build the relationship between dictionary atoms and the class labels, Jiang et al. [14] associate label information with each dictionary item and propose a Label Consistent K-SVD method (LC-KSVD). Liu et al. [15] also propose an oriented-discriminative dictionary to tackle this problem. There are also some works which construct several different dictionaries for classification. Ou et al. [16] use an occlusion dictionary for face recognition with occlusion. Liu et al. [17] apply a bilinear dictionary for face recognition. However, these linear representation based feature learning methods ignore the invariance of the features. For example, in face recognition tasks the changes of illumination or poses can only be regarded as noise. Moreover, since a little misalignment among faces can bring down the performance of classification significantly, much effort is spent on aligning the faces before classification [18].
Deep neural networks, which are composed of multiple nonlinear transformations, have shown their superiority during the past few years [19], [20], [21]. Their hierarchical structure is effective in extracting discriminative information. Convolutional Neural Networks (CNN) [22] cut down the connections between the successive layers by using shared weights (same filters) and apply pooling strategies to extract local useful features, which have achieved an amazing performance [21] in image classification tasks. However, deep neural networks usually need a huge number of samples for training. Unfortunately, for many problems, such as tasks in bioinformatics and face recognition, each class only has several samples for training.
Recently, Liu et al. [23], [24] have proposed a framework that learns partial differential equations (PDEs) from training image pairs, which has been successfully applied to several computer vision and image processing problems. In [24], they apply learning-based PDEs to object detection, color2gray, and demosaicking. In [25], they model the saliency detection task as learning a boundary condition of a PDE system. Zhao et al. [26] extend this model to text detection.
The incapability of the existing methods in incorporating both discrimination and invariance into features motivates us to find new ways to feature learning, especially in the case of limited training samples. Considering that symmetry methods for differential equations can construct invariances rigorously, in this paper we propose a novel PDE model for feature learning. An illustration of the proposed approach is shown in Fig. 1. The PDE is formulated as a linear combination of fundamental differential invariants. The evolution process of the PDE works as a mapping from the raw images to the features of the same dimension. Distinguished from traditional PDE methods, our PDE is data-driven, enhancing discriminative information in the learned feature. In addition, its evolution process is strictly translationally and rotationally invariant. Then the feature is fed to a simple linear classifier for classification. We also provide an algorithm that updates the parameters alternately to optimize our discretized model. By utilizing the invariance property well, our method is very efficient when the training samples are few. We summarize the contributions of this paper as follows:
- •
We propose a novel PDE based method to extract image feature for classification. We model the feature extraction process as an evolutionary PDE. The learned feature is both discriminative and invariant under translation, rotation and gray-level scaling. To our best knowledge, this is the first work that applies PDE to feature learning and image recognition.
- •
We provide a simple yet effective algorithm to optimize our discretized PDE model. The whole training time in each experiment is less than five minutes.1
Face recognition is a paradigm where the training samples are few. Our experimental results2 on the four well-known public face recognition datasets show that our method outperforms the state-of-the-art methods in this case. For example, we obtain a recognition accuracy of 96% on Extended Yale B, with only 10 samples for each person, which is about 9% higher than sparse coding and dictionary learning methods.
The rest of the paper is structured as follows: we will first introduce our PDE model in Section 2. In Section 3, we provide our algorithm to optimize our model. We discuss some other related works in Section 4. In Section 5, we evaluate our PDE model on face recognition tasks and show the superiority of our model. Finally, we will conclude our paper in Section 6.
Section snippets
PDE based feature learning model
In this section, we present our PDE model for discriminative feature learning. We first propose the general framework and then crystallize our model via some invariance properties. To begin with, we provide in Table 1 a brief summary of the notations used throughout the paper. For vector x, xi presents its ith component.
Algorithm for solving (7)
In this section, we propose an algorithm to solve our feature learning model (7). The main strategy is to update the parameters A and W alternately, where discretized of ai is the ith column of A. We first discretize the PDE and then show details of optimizing A and W. When updating A, we use the gradient descent method. W is given a closed-form solution. The whole algorithm is shown in Algorithm 1, including some fixed hyper-parameters.
Distinction from other PDE based methods
There are also some PDE based works which try to devise particular PDEs for classification [34], [35]. In [34], Yin et al. apply the total variation as regularization to decompose the image, and use the decomposed part as feature for classification. In [35], Shan et al. devise a simple PDE to normalize illumination and then use the normalized image as feature for classification. These PDE based works are actually using the PDE as a pre-processing for classification. The classification and PDE
Experiments
In this section, we present experiments to validate the proposed method. Classification with few training samples is a big challenge in image classification tasks, which is often encountered in reality and could be as difficult as the case of large training samples. Many sparse coding and dictionary learning methods [7], [8], [10], [12], [14] have aimed at classification in this case and have shown their superiorities. Face recognition is a paradigm which has few training samples but a lot of
Conclusions
In this paper, we propose a novel PDE method for feature learning. We model the feature extraction process as an evolution process governed by a PDE. The PDE is assumed to be a linear combination of fundamental differential invariants under translation and rotation, which is transformed by a nonlinear mapping to achieve the invariance with respect to gray-level scaling. The experiments with few training samples show that our approach achieves the best performance in various settings. It should
Acknowledgments
Zhouchen Lin is supported by National Basic Research Program of China (973 Program) (grant no. 2015CB352502), National Natural Science Foundation (NSF) of China (grant nos. 61625301 and 61231002), and the Okawa Foundation. Zhenyu Zhao is supported by NSF China (grant nos. 61473302).
Cong Fang received the bachelor’s degree in electronic Science and technology (for optoelectronic technology) from Tianjin University in 2014. He is currently pursuing the Ph.D. degree with the School of Electronics Engineering and Computer Science, Peking University. His research interests include computer vision, pattern recognition, machine learning and optimization.
References (51)
- et al.
Robust face recognition via occlusion dictionary learning
Pattern Recognit.
(2014) - et al.
Bilinear discriminative dictionary learning for face recognition
Pattern Recognit.
(2014) - et al.
Toward designing intelligent PDEs for computer vision: an optimal control approach
Image Vis. Comput.
(2013) - et al.
A robust hybrid method for text detection in natural scenes by learning-based partial differential equations
Neurocomputing
(2015) - et al.
Max-margin multiple-instance dictionary learning
Proceedings of the International Conference on Machine Learning
(2013) - et al.
Feature extraction by learning Lorentzian metric tensor and its extensions
Pattern Recognit.
(2010) - et al.
Representation learning: a review and new perspectives
IEEE Trans. Pattern Anal. Mach. Intell.
(2013) Object recognition from local scale-invariant features
Proceedings of the IEEE International Conference on Computer Vision
(1999)- et al.
Histograms of oriented gradients for human detection
Proceedings of the IEEE Conference on Automatic Face and Gesture Recognition
(2005) - et al.
Invariant scattering convolution networks
IEEE Trans. Pattern Anal. Mach. Intell.
(2013)
Lambertian reflectance and linear subspaces
IEEE Trans. Pattern Anal. Mach. Intell.
Robust recovery of subspace structures by low-rank representation
IEEE Trans. Pattern Anal. Mach. Intell.
Low-rank matrix recovery with structural incoherence for robust face recognition
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Robust face recognition via sparse representation
IEEE Trans. Pattern Anal. Mach. Intell.
Learning low-rank representations with classwise block-diagonal structure for robust face recognition
Proceedings of the AAAI Conference on Artificial Intelligence
Learning structured low-rank representations for image classification
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Multi-view low-rank dictionary learning for image classification
Pattern Recognit.
Discriminative K-SVD for dictionary learning in face recognition
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Task-driven dictionary learning
IEEE Trans. Pattern Anal. Mach. Intell.
Label consistent K-SVD: learning a discriminative dictionary for recognition
IEEE Trans. Pattern Anal. Mach. Intell.
Class relatedness oriented discriminative dictionary learning
Pattern Recognit.
Towards a practical face recognition system: robust registration and illumination by sparse representation
IEEE Trans. Pattern Anal. Mach. Intell.
Reducing the dimensionality of data with neural networks
Science
Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups
IEEE Signal Process Mag.
Imagenet classification with deep convolutional neural networks
Proceedings of the Conference on Neural Information Processing Systems
Cited by (33)
A multi-rank two-dimensional CCA based on PDEs for multi-view feature extraction
2024, Expert Systems with ApplicationsDescription method of Illumination invariant image features
2020, Signal Processing: Image CommunicationCitation Excerpt :Meanwhile, this algorithm adopts gradient information to describe image features, which overcomes the problem that LBP is sensitive to local intensity change of edge components. Then, some other improved algorithms have been proposed to apply into the field of image pattern recognition including Local Graphic Structure (LGS) [20], Scale Invariant Feature Transform (SIFT) [21], Local Phase Quantization (LPQ) [22], Local Derivative Pattern (LDP) [23], weighted Local Gabor (LG) [24], Local Gabor Binary Pattern (LGBP) [25], Local Differential Binary (LDB) [26], Local Linear Directional Pattern (LLDP) [27], local adapted ternary pattern (LATP) [28], improved completed LTP (ICLTP) [29], (LG-face) [30], ULBP_ MHOG [31,32], Efficient LBP(ELBP) [33], LDT [34], GF [35], L-PDE [36], KCDVD [37]and other algorithms [38–41]. It is a considerable challenge to extract the feature that is totally independent form the illumination.
A hybrid SUGWO optimization for partial face recognition with new similarity index
2023, Multimedia Tools and Applications
Cong Fang received the bachelor’s degree in electronic Science and technology (for optoelectronic technology) from Tianjin University in 2014. He is currently pursuing the Ph.D. degree with the School of Electronics Engineering and Computer Science, Peking University. His research interests include computer vision, pattern recognition, machine learning and optimization.
Zhenyu Zhao received the B.S. degree in mathematics from University of Science and Technology in 2009, and the M.S. degree in system science from National University of Defense and Technology in 2011. He received the Ph.D. degree in applied mathematics, National University of Defense and Technology in 2016. His research interests include computer vision, pattern recognition and machine learning.
Pan Zhou received Master Degree in computer science from Peking University in 2016. Now He is a Ph.D. candidate at the Vision and Machine Learning Lab, Department of Electrical and Computer Engineering (ECE), National University of Singapore, Singapore. His research interests include computer vision, machine learning, and pattern recognition.
Zhouchen Lin received the PhD degree in applied mathematics from Peking University in 2000. Currently, he is a professor at the Key Laboratory of Machine Perception (MOE), School of Electronics Engineering and Computer Science, Peking University. He is also a chair professor at Northeast Normal University. His research interests include computer vision, image processing, machine learning, pattern recognition, and numerical optimization. He is an associate editor of IEEE T. Pattern Analysis and Machine Intelligence and International J. Computer Vision and a senior member of the IEEE. He is an IAPR Fellow.