Feature learning via partial differential equation with applications to face recognition

doi:10.1016/j.patcog.2017.03.034

Pattern Recognition

Volume 69, September 2017, Pages 14-25

https://doi.org/10.1016/j.patcog.2017.03.034 Get rights and content

Highlights

•
A novel Partial Differential Equation method is proposed for feature learning.
•
The feature is discriminative and invariant under rotation, translation and illumination.
•
The method is efficient in low-resolution images and when the samples are few.
•
To our best knowledge, this is the first work that applies PDE to feature learning.

Abstract

Feature learning is a critical step in pattern recognition, such as image classification. However, most of the existing methods cannot extract features that are discriminative and at the same time invariant under some transforms. This limits the classification performance, especially in the case of small training sets. To address this issue, in this paper we propose a novel Partial Differential Equation (PDE) based method for feature learning. The feature learned by our PDE is discriminative, also translationally and rotationally invariant, and robust to illumination variation. To our best knowledge, this is the first work that applies PDE to feature learning and image recognition tasks. Specifically, we model feature learning as an evolution process governed by a PDE, which is designed to be translationally and rotationally invariant and is learned via minimizing the training error, hence extracts discriminative information from data. After feature extraction, we apply a linear classifier for classification. We also propose an efficient algorithm that optimizes the whole framework. Our method is very effective when the training samples are few. The experimental results of face recognition on the four benchmark face datasets show that the proposed method outperforms the state-of-the-art feature learning methods in the case of low-resolution images and when the training samples are limited.

Introduction

Nowadays, many well-known methods for image classification tasks (e.g. face recognition) involve two steps: feature extraction and classification. As the performance of the classifier is heavily dependent on the quality of features (or data representation), much of the effort on image classification goes into the design of features and data transformations [1]. The approaches to feature extraction can be split into two categories: manually designing features and automatically learning features.

Manual feature design is a way that incorporates human ingenuity and prior knowledge to represent data. Features extracted by existing popular methods, such as Scale-Invariant Feature Transform (SIFT) [2], Histogram of Oriented Gradients (HOG) [3], and Invariant Scattering Convolution Networks [4], usually satisfy some invariance properties, e.g., translational and rotational invariance, that are beneficial to the image classification tasks. They are intuitive and fit for various image classification tasks relatively well. However, inventing these methods is extremely labor-intensive, and existing methods may not extract discriminative information from the data well. So researchers gradually turn to learn representations of data.

Linear representation based feature learning methods have attracted much attention recently. This is because images of convex and Lambertian objects taken under distant illumination lie near an approximately nine-dimensional linear subspace, known as the harmonic plane [5]. By utilizing this subspace property, Low Rank Representation [6] based methods extract feature to capture the global structure of the whole data and are robust to noise. Chen et al. [7] extract the low rank matrix as feature and then apply Sparse Representation Classification (SRC) [8] for classification. Li et al. [9] propose a semi-supervised framework with class-wide diagonal structure to learn low-rank representations. Zhang et al. [10] expand the low-rank model into a dictionary learning method. Wu et al. [11] also apply a low-rank dictionary model into multi-view tasks. Dictionary learning methods, which learn a set of representation atoms and weighted coefficients (feature) at the same time, have also achieved huge success. Zhang et al. [12] propose a discriminative KSVD method (D-KSVD) which combines the dictionary reconstruction error and classification error and then solve their model by a single KSVD. Mairal et al. [13] model the supervised dictionary learning as a bilevel optimization framework. To build the relationship between dictionary atoms and the class labels, Jiang et al. [14] associate label information with each dictionary item and propose a Label Consistent K-SVD method (LC-KSVD). Liu et al. [15] also propose an oriented-discriminative dictionary to tackle this problem. There are also some works which construct several different dictionaries for classification. Ou et al. [16] use an occlusion dictionary for face recognition with occlusion. Liu et al. [17] apply a bilinear dictionary for face recognition. However, these linear representation based feature learning methods ignore the invariance of the features. For example, in face recognition tasks the changes of illumination or poses can only be regarded as noise. Moreover, since a little misalignment among faces can bring down the performance of classification significantly, much effort is spent on aligning the faces before classification [18].

Deep neural networks, which are composed of multiple nonlinear transformations, have shown their superiority during the past few years [19], [20], [21]. Their hierarchical structure is effective in extracting discriminative information. Convolutional Neural Networks (CNN) [22] cut down the connections between the successive layers by using shared weights (same filters) and apply pooling strategies to extract local useful features, which have achieved an amazing performance [21] in image classification tasks. However, deep neural networks usually need a huge number of samples for training. Unfortunately, for many problems, such as tasks in bioinformatics and face recognition, each class only has several samples for training.

Recently, Liu et al. [23], [24] have proposed a framework that learns partial differential equations (PDEs) from training image pairs, which has been successfully applied to several computer vision and image processing problems. In [24], they apply learning-based PDEs to object detection, color2gray, and demosaicking. In [25], they model the saliency detection task as learning a boundary condition of a PDE system. Zhao et al. [26] extend this model to text detection.

The incapability of the existing methods in incorporating both discrimination and invariance into features motivates us to find new ways to feature learning, especially in the case of limited training samples. Considering that symmetry methods for differential equations can construct invariances rigorously, in this paper we propose a novel PDE model for feature learning. An illustration of the proposed approach is shown in Fig. 1. The PDE is formulated as a linear combination of fundamental differential invariants. The evolution process of the PDE works as a mapping from the raw images to the features of the same dimension. Distinguished from traditional PDE methods, our PDE is data-driven, enhancing discriminative information in the learned feature. In addition, its evolution process is strictly translationally and rotationally invariant. Then the feature is fed to a simple linear classifier for classification. We also provide an algorithm that updates the parameters alternately to optimize our discretized model. By utilizing the invariance property well, our method is very efficient when the training samples are few. We summarize the contributions of this paper as follows:

•
We propose a novel PDE based method to extract image feature for classification. We model the feature extraction process as an evolutionary PDE. The learned feature is both discriminative and invariant under translation, rotation and gray-level scaling. To our best knowledge, this is the first work that applies PDE to feature learning and image recognition.
•
We provide a simple yet effective algorithm to optimize our discretized PDE model. The whole training time in each experiment is less than five minutes.¹

Face recognition is a paradigm where the training samples are few. Our experimental results² on the four well-known public face recognition datasets show that our method outperforms the state-of-the-art methods in this case. For example, we obtain a recognition accuracy of 96% on Extended Yale B, with only 10 samples for each person, which is about 9% higher than sparse coding and dictionary learning methods.

The rest of the paper is structured as follows: we will first introduce our PDE model in Section 2. In Section 3, we provide our algorithm to optimize our model. We discuss some other related works in Section 4. In Section 5, we evaluate our PDE model on face recognition tasks and show the superiority of our model. Finally, we will conclude our paper in Section 6.

Section snippets

PDE based feature learning model

In this section, we present our PDE model for discriminative feature learning. We first propose the general framework and then crystallize our model via some invariance properties. To begin with, we provide in Table 1 a brief summary of the notations used throughout the paper. For vector x, x_i presents its ith component.

Algorithm for solving (7)

In this section, we propose an algorithm to solve our feature learning model (7). The main strategy is to update the parameters A and W alternately, where discretized of a_i is the ith column of A. We first discretize the PDE and then show details of optimizing A and W. When updating A, we use the gradient descent method. W is given a closed-form solution. The whole algorithm is shown in Algorithm 1, including some fixed hyper-parameters.

Distinction from other PDE based methods

There are also some PDE based works which try to devise particular PDEs for classification [34], [35]. In [34], Yin et al. apply the total variation as regularization to decompose the image, and use the decomposed part as feature for classification. In [35], Shan et al. devise a simple PDE to normalize illumination and then use the normalized image as feature for classification. These PDE based works are actually using the PDE as a pre-processing for classification. The classification and PDE

Experiments

In this section, we present experiments to validate the proposed method. Classification with few training samples is a big challenge in image classification tasks, which is often encountered in reality and could be as difficult as the case of large training samples. Many sparse coding and dictionary learning methods [7], [8], [10], [12], [14] have aimed at classification in this case and have shown their superiorities. Face recognition is a paradigm which has few training samples but a lot of

Conclusions

In this paper, we propose a novel PDE method for feature learning. We model the feature extraction process as an evolution process governed by a PDE. The PDE is assumed to be a linear combination of fundamental differential invariants under translation and rotation, which is transformed by a nonlinear mapping to achieve the invariance with respect to gray-level scaling. The experiments with few training samples show that our approach achieves the best performance in various settings. It should

Acknowledgments

Zhouchen Lin is supported by National Basic Research Program of China (973 Program) (grant no. 2015CB352502), National Natural Science Foundation (NSF) of China (grant nos. 61625301 and 61231002), and the Okawa Foundation. Zhenyu Zhao is supported by NSF China (grant nos. 61473302).

Cong Fang received the bachelor’s degree in electronic Science and technology (for optoelectronic technology) from Tianjin University in 2014. He is currently pursuing the Ph.D. degree with the School of Electronics Engineering and Computer Science, Peking University. His research interests include computer vision, pattern recognition, machine learning and optimization.

References (51)

W. Ou et al.
Robust face recognition via occlusion dictionary learning
Pattern Recognit.
(2014)
H.D. Liu et al.
Bilinear discriminative dictionary learning for face recognition
Pattern Recognit.
(2014)
R. Liu et al.
Toward designing intelligent PDEs for computer vision: an optimal control approach
Image Vis. Comput.
(2013)
Z. Zhao et al.
A robust hybrid method for text detection in natural scenes by learning-based partial differential equations
Neurocomputing
(2015)
X. Wang et al.
Max-margin multiple-instance dictionary learning
Proceedings of the International Conference on Machine Learning
(2013)
R. Liu et al.
Feature extraction by learning Lorentzian metric tensor and its extensions
Pattern Recognit.
(2010)
Y. Bengio et al.
Representation learning: a review and new perspectives
IEEE Trans. Pattern Anal. Mach. Intell.
(2013)
D. Lowe
Object recognition from local scale-invariant features
Proceedings of the IEEE International Conference on Computer Vision
(1999)
N. Dalal et al.
Histograms of oriented gradients for human detection
Proceedings of the IEEE Conference on Automatic Face and Gesture Recognition
(2005)
J. Bruna et al.
Invariant scattering convolution networks
IEEE Trans. Pattern Anal. Mach. Intell.
(2013)

R. Basri et al.

Lambertian reflectance and linear subspaces

IEEE Trans. Pattern Anal. Mach. Intell.

(2003)

G. Liu et al.

Robust recovery of subspace structures by low-rank representation

IEEE Trans. Pattern Anal. Mach. Intell.

(2013)

C.-F. Chen et al.

Low-rank matrix recovery with structural incoherence for robust face recognition

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(2012)

J. Wright et al.

Robust face recognition via sparse representation

IEEE Trans. Pattern Anal. Mach. Intell.

(2009)

Y. Li et al.

Learning low-rank representations with classwise block-diagonal structure for robust face recognition

Proceedings of the AAAI Conference on Artificial Intelligence

(2014)

Y. Zhang et al.

Learning structured low-rank representations for image classification

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(2013)

F. Wu et al.

Multi-view low-rank dictionary learning for image classification

Pattern Recognit.

(2015)

Q. Zhang et al.

Discriminative K-SVD for dictionary learning in face recognition

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(2010)

J. Mairal et al.

Task-driven dictionary learning

IEEE Trans. Pattern Anal. Mach. Intell.

(2012)

Z. Jiang et al.

Label consistent K-SVD: learning a discriminative dictionary for recognition

IEEE Trans. Pattern Anal. Mach. Intell.

(2013)

P. Liu et al.

Class relatedness oriented discriminative dictionary learning

Pattern Recognit.

(2015)

A. Wagner et al.

Towards a practical face recognition system: robust registration and illumination by sparse representation

IEEE Trans. Pattern Anal. Mach. Intell.

(2012)

G. Hinton et al.

Reducing the dimensionality of data with neural networks

Science

(2006)

G. Hinton et al.

Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups

IEEE Signal Process Mag.

(2012)

A. Krizhevsky et al.

Imagenet classification with deep convolutional neural networks

Proceedings of the Conference on Neural Information Processing Systems

(2012)

Cited by (33)

A multi-rank two-dimensional CCA based on PDEs for multi-view feature extraction
2024, Expert Systems with Applications
Feature extraction is one of the fundamental problems in pattern recognition research. For image recognition, extracting effective image features is the key to accomplish the recognition task. In this paper, a partial differential equations-based multi-rank two-dimensional canonical correlation analysis (PDEs-MR2DC²A) is proposed for multi-view feature extraction and pattern classification. Unlike most of the previous researches on multi-view algorithms that work directly on the original 2D representation, in our approach, we first utilize the evolution process of PDEs to extract the feature matrix of per-view. In addition, we employ multi-rank left and right projecting matrices to maximize the correlation. The computational complexity of PDEs-MR2DC²A is also analyzed. To evaluate the effectiveness of the proposed algorithm, we conducted a series of performance comparisons with some existing methods on several popular datasets. The experimental results showed that our proposed algorithm performed very well on these datasets and outperformed the existing related methods on some metrics.
Description method of Illumination invariant image features
2020, Signal Processing: Image Communication
Citation Excerpt :
Meanwhile, this algorithm adopts gradient information to describe image features, which overcomes the problem that LBP is sensitive to local intensity change of edge components. Then, some other improved algorithms have been proposed to apply into the field of image pattern recognition including Local Graphic Structure (LGS) [20], Scale Invariant Feature Transform (SIFT) [21], Local Phase Quantization (LPQ) [22], Local Derivative Pattern (LDP) [23], weighted Local Gabor (LG) [24], Local Gabor Binary Pattern (LGBP) [25], Local Differential Binary (LDB) [26], Local Linear Directional Pattern (LLDP) [27], local adapted ternary pattern (LATP) [28], improved completed LTP (ICLTP) [29], (LG-face) [30], ULBP_ MHOG [31,32], Efficient LBP(ELBP) [33], LDT [34], GF [35], L-PDE [36], KCDVD [37]and other algorithms [38–41]. It is a considerable challenge to extract the feature that is totally independent form the illumination.
Image feature extraction technology is one of main topics in the field of computer vision, which has been widely applied in biological recognition, image retrieval, target detection and other fields. To overcome the drawbacks of WLD under complex illumination condition, we propose a novel illumination–insensitive feature descriptor named as anisotropic Weber synergy gradient descriptor (AWSGD). The proposed algorithm contains two parts:differential excitation component and gradient direction component. Firstly, by introducing the differential synergy excitation pattern (DSEP) and anisotropic LOG operator with variable scales and angles, we propose the anisotropic differential synergy excitation pattern (ADSEP) as the differential excitation component. Next, focused on the shortage that local gradient pattern (LGP) lacks detailed description of local features with single-layer model, we propose weighted local synergy gradient pattern (WLSGP) as the gradient direction component based on two-layer structure model and weight coefficient distribution model. Finally, ADSEP and WLSGP are fused to form AWSGD histogram. Meanwhile, we adopt XGBoost classifier to conduct related experiments on face databases CMUPIE, Yale B and texture databases PhoTex, RawFooT. The experimental results indicate that the proposed algorithm has stronger robustness to illumination variation and achieves the best performance compared with state-of-the-art methods, which has a certain theoretical significance and practical value in image recognition field under complex illumination condition.
Iirnet: Infinite Impulse Response Inspired Network for Compressed Video Quality Enhancement
2024, SSRN
Enhancing Navier-Stokes Flow Learning Through the Level Set Approach
2023, SSRN
A hybrid SUGWO optimization for partial face recognition with new similarity index
2023, Multimedia Tools and Applications
Improved Fractional Feature Analysis for Face Recognition
2023, SSRN

View all citing articles on Scopus

Zhenyu Zhao received the B.S. degree in mathematics from University of Science and Technology in 2009, and the M.S. degree in system science from National University of Defense and Technology in 2011. He received the Ph.D. degree in applied mathematics, National University of Defense and Technology in 2016. His research interests include computer vision, pattern recognition and machine learning.

Pan Zhou received Master Degree in computer science from Peking University in 2016. Now He is a Ph.D. candidate at the Vision and Machine Learning Lab, Department of Electrical and Computer Engineering (ECE), National University of Singapore, Singapore. His research interests include computer vision, machine learning, and pattern recognition.

Zhouchen Lin received the PhD degree in applied mathematics from Peking University in 2000. Currently, he is a professor at the Key Laboratory of Machine Perception (MOE), School of Electronics Engineering and Computer Science, Peking University. He is also a chair professor at Northeast Normal University. His research interests include computer vision, image processing, machine learning, pattern recognition, and numerical optimization. He is an associate editor of IEEE T. Pattern Analysis and Machine Intelligence and International J. Computer Vision and a senior member of the IEEE. He is an IAPR Fellow.

View full text

Feature learning via partial differential equation with applications to face recognition

Highlights

Abstract

Introduction

Section snippets

PDE based feature learning model

Algorithm for solving (7)

Distinction from other PDE based methods

Experiments

Conclusions

Acknowledgments

Pattern Recognit.

Pattern Recognit.

Image Vis. Comput.

Neurocomputing

Pattern Recognit.

Representation learning: a review and new perspectives

IEEE Trans. Pattern Anal. Mach. Intell.

Object recognition from local scale-invariant features

Proceedings of the IEEE International Conference on Computer Vision

Histograms of oriented gradients for human detection

Proceedings of the IEEE Conference on Automatic Face and Gesture Recognition

Invariant scattering convolution networks

IEEE Trans. Pattern Anal. Mach. Intell.

Lambertian reflectance and linear subspaces

IEEE Trans. Pattern Anal. Mach. Intell.

Robust recovery of subspace structures by low-rank representation

IEEE Trans. Pattern Anal. Mach. Intell.

Low-rank matrix recovery with structural incoherence for robust face recognition

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Robust face recognition via sparse representation

IEEE Trans. Pattern Anal. Mach. Intell.

Learning low-rank representations with classwise block-diagonal structure for robust face recognition

Proceedings of the AAAI Conference on Artificial Intelligence

Learning structured low-rank representations for image classification

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Multi-view low-rank dictionary learning for image classification

Pattern Recognit.

Discriminative K-SVD for dictionary learning in face recognition

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Task-driven dictionary learning

IEEE Trans. Pattern Anal. Mach. Intell.

Label consistent K-SVD: learning a discriminative dictionary for recognition

IEEE Trans. Pattern Anal. Mach. Intell.

Class relatedness oriented discriminative dictionary learning

Pattern Recognit.

Towards a practical face recognition system: robust registration and illumination by sparse representation

IEEE Trans. Pattern Anal. Mach. Intell.

Reducing the dimensionality of data with neural networks

Science

Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups

IEEE Signal Process Mag.

Imagenet classification with deep convolutional neural networks

Proceedings of the Conference on Neural Information Processing Systems