Elsevier

Neurocomputing

Volume 173, Part 3, 15 January 2016, Pages 1554-1564
Neurocomputing

Spatial regularization in subspace learning for face recognition: implicit vs. explicit

https://doi.org/10.1016/j.neucom.2015.09.028Get rights and content

Abstract

In applying traditional statistical method to face recognition, each original face image is often vectorized as a vector. But such a vectorization not only leads to high-dimensionality, thus small sample size (SSS) problem, but also loses the original spatial relationship between image pixels. It has been proved that spatial regularization (SR) is an effective means to compensate the loss of such relationship and at the same time, and mitigate SSS problem by explicitly imposing spatial constraints. However, SR still suffers from two main problems: one is high computational cost due to high dimensionality and the other is the selection of the key regularization factors controlling the spatial regularization and thus learning performance. Accordingly, in this paper, we provide a new idea, coined as implicit spatial regularization (ISR), to avoid losing the spatial relationship between image pixels and deal with SSS problem simultaneously for face recognition. Different from explicit spatial regularization (ESR), which introduces directly spatial regularization term and is based on vector representation, the proposed ISR constrains spatial smoothness within each small image region by reshaping image and then executing 2D-based feature extraction methods. Specifically, we follow the same assumption as made in SSSL (a typical ESR method) that a small image region around an image pixel is smooth, and reshape each original image into a new matrix whose each column corresponds to a vectorized small image region, and then we extract features from the newly-formed matrix using any off-the-shelf 2D-based method which can take the relationship between pixels in the same row or column into account, such that the original spatial relationship within the neighboring region can be greatly retained. Since ISR does not impose constraint items, compared with ESR, ISR not only avoids the selection of the troublesome regularization parameter, but also greatly reduces computational cost. Experimental results on four face databases show that the proposed ISR can achieve competitive performance as SSSL but with lower computational cost.

Introduction

Face recognition [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], as one of the most important issues in computer vision and pattern recognition, has been advanced and widely studied over the past few decades because of its wide applications in security, human-machine communication, etc. Different from conventional image retrieval [11], [12] or recognition [13] tasks, face recognition has its own challenges and attracted extensive research efforts. Among the existing face recognition methods, subspace learning method [1], [2], [3], [4], [5], [6], [7], [8], [9], [10] is one of the most successful and well-studied techniques. In implementing the subspace learning methods, one need to first convert a two-dimensional (2D) face image of size m×n into a one-dimensional (1D) vector of length mn, i.e., representing each face image as a corresponding point in the high-dimensional vector space, and then apply features extraction for face recognition. However, such a vector conversion often suffers from two main problems: 1) small sample size (SSS) problem which leads to over-fitting in classification, and likewise makes the subspace learning methods (e.g., LDA, LPP, NPE, etc.) difficult to discover the real intrinsic discriminant or geometrical structures [14]; and 2) it breaks the natural spatial structure of images and thus makes the concatenated vector losing spatial relationship between pixels. In this paper, we attempt to address such two problems.

In order to retain the spatial relationship between image pixels as much as possible and at the same time avoid SSS problem, researchers have paid a lot of attention on original matrix or 2D representation of face images and developed corresponding 2D methods by directly operating on face matrices, for the vector-based subspace dimensionality reduction methods especially for PCA and LDA. The 2D versions of PCA include two-dimensional principal component analysis (2DPCA) [15], generalized low rank approximations of matrices (GLRAM) [16], etc. While the 2D versions of LDA include single-side 2D-LDA [17] and bi-side 2DLDA [18], etc. Among these methods, 2DPCA [15] and 2D-LDA [17] extract features only along the row (or the column) direction of image matrices, while GLRAM and 2DLDA can extract features along both the row and column directions of image matrices. Subsequently some researchers have further extended the subspace dimensionality reduction methods to higher order (HO) tensor data [19], [20], [21]. Until now, almost all existing vector-based subspace methods have been successively extended to their corresponding 2D or HO counterparts. Compared with the vector-based approaches, 2D-based approaches have three main advantages: 1) they can naturally and effectively elude SSS problem due to far lower dimensionality of the scatter matrix directly defined on images themselves, thus effortlessly avoiding singularity problem of the original within-class scatter matrix; 2) they partially utilize the spatial relationship among pixels in the same whole row (column); 3) they dramatically reduce the computational complexity in feature extraction due to small-size scatter matrix involved in dimensionality reduction. Extensive experimental results have shown the superiority of these 2D-based approaches to their corresponding vector counterparts. However, a recent research [14] pointed out that 2D-based approaches actually consider relationship between pixels only in the whole row (column) but fail to capitalize on the spatial information of the whole images such that the embedding functions of 2D-based approaches will still be spatially rough or not smooth enough.

To learn a spatially smooth subspace, recently, Cai et al. re-paid their attention to the original vector representation of face image and proposed a spatial regularization method called Spatially Smooth Subspace Learning (SSSL) [14], whose idea is quite general and thus suitable for almost all existing vector-based subspace methods. As a result, as expected, SSSL indeed achieves better recognition performance on benchmarks than the corresponding vector-based and 2D-based counterparts. In its implementation, the projection vectors are enforced to be spatially smooth by explicitly introducing a regularization term reflecting spatial relationship between pixels to the discriminant objectives of vector versions such as LDA. Following SSSL method, several variants [22], [23] have been developed. In [22], Hou et al. proposed a orthogonal smooth subspace learning method (OSSL) by constraining the transformation vectors to be orthogonal and spatially smooth simultaneously; In [23], Zuo et al. improved SSSL by using LoG and DOG penalties as spatial regularization to replace the Laplacian penalty. Since these methods take into account the spatial relationship between image pixels by explicitly smoothing projection vectors of face space, they likewise significantly outperform their corresponding vector-based subspace learning methods without such regularization and 2D-based versions. However, these explicit spatial regularization (ESR) methods still have some disadvantages: first, compared with 2D-based methods, they have higher computational cost like traditional vector-based subspace methods; second, as aforementioned, they suffer from the relatively troublesome selection of key regularization factor which seriously influences the recognition performance and the optimal determination of the factor is still an open problem in machine learning, especially when the value of the factor is continuously changed; third, the size of image local region or window involved in spatial smoothness must be set to an odd value like 3×3, 5×5, or 7×7, etc., and each change of the spatial window size may make the selection of the regularization parameter be re-searched.

In this paper, we provide a new idea, coined as implicit spatial regularization (ISR), to retain the spatial relationship between image pixels and deal with SSS problem simultaneously for face recognition. Different from explicit spatial regularization (ESR), ISR is a direct realization for spatial relationship by just reshaping an original image matrix into another matrix but not need to introduce any explicit regularization term. Specifically, we use the prior knowledge that a small image region (hereafter it is called as spatial window) around a pixel is generally smooth, and reshape an original face image (denoted as a matrix) into a new alternative matrix by vectorizing each small image region into column vector, then we perform feature extraction on newly-formed matrix with the help of any off-the-shelf 2D-based method. Since a spatial window is reshaped into the same column, it can be desirable that the spatial structure within a spatial window can be obtained by using 2D-based methods. Compared with SSSL, our method not only avoids the selection of the troublesome regularization parameters, greatly reduces computational cost inherited from the 2D-based methods, but also the size of spatial window can be arbitrarily set according to the size of the whole face image. Compared with 2D-based methods, our method considers the spatial relationship of pixels within a small spatial region, rather than within a global row (or column). As a result, more spatial relationship can be used.

Note that, although our method does not explicitly introduce spatial regularization term to constrain spatial smoothness between neighboring pixels, it is implicitly accomplished by reshaping the image and then executing any off-the-shelf 2D-based method. Hence, in some loose sense, our method plays the role of spatial regularization, for which we coin it as implicit spatial regularization (ISR).

To evaluate efficacy of ISR, we compare ISR with SSSL on four face databases (Yale, ORL, Extended YALE B and CMU PIE), and the results showed that the proposed ISR-motivated methods achieve competitive performances against SSSL with lower computational cost. In addition, we also analyze the influence of the size of spatial window on performance of ISR.

The remaining parts of this paper are organized as follows. In Section 2, a brief review about SSSL and 2D-based subspace feature extraction methods is given. In Section 3, implicit spatial regularization (ISR) is formulated in detail. In Section 4, experimental comparisons carried out on four face databases are reported. Finally, a conclusion is drawn in Section 5.

Section snippets

Brief review of SSSL and 2D-based feature extraction methods

Let A=[A1,A2,,AN] be the training face image set and its corresponding vectorized training set be X=[X1,X2,,XN], where N is the number of training samples, Ai(i=1,2,,N) is the i-th face image with the size of m×n, and Xi is the vector representation of Ai. Also, let G be a graph constructed from A (or X), W and L be the edge weight matrix and graph Laplacian matrices associated with the G, respectively. With these definitions, we below briefly review Spatially Smooth Subspace Learning (SSSL)

Implicit spatial regularization (ISR)

In this section, we provide a nominal yet simple spatial regularization method to avoid the loss of spatial information between image pixels and deal with SSS problem, simultaneously. The brief description of our proposed ISR algorithm is summarized in Fig. 1.

Experimental settings

In order to evaluate the recognition performance of our method ISR, we carry out some experiments on four benchmark face databases: the Yale database, the Olivetti Research Laboratory (ORL) database, the Extended Yale B database and the CMU PIE database. Considering the specific characteristics of these four databases, the Yale database is employed to test the performance of ISR under various facial expressions and slight lighting conditions. The ORL is used to test the robustness of ISR to

Conclusions

In this paper, a nominal yet simple implicit spatial regularization (ISR) method was provided for face recognition via retaining as much spatial information between image pixels as possible. As opposed to existing explicit spatial regularization (ESR) for vector representation, our proposed ISR is based on a second-order tensor representation and retains spatial information through reshaping face image rather than constraining the projection vectors to be spatially smooth by introducing

Acknowledgment

This work was supported by the Natural Science Foundation of Jiangsu Province under Grant no. BK20130813, the National Natural Science Foundation of China under Grant nos. 61035003 and 61170151, the Fundamental Research Funds for the Central Universities under Grant no. NS2014100 and Jiangsu Qinglan project.

Yulian Zhu received her B.S and M.S degrees in computer application from Nanjing University of Aeronautics & Astronautics (NUAA) in 2001 and 2004, respectively. Then she worked in NUAA from April 2004. There she received a Ph.D. degree in computer application in 2010. Her main research interests include machine learning, pattern recognition, and image processing.

References (31)

  • X. He, P. Niyogi. Locality preseving projections, in: Advances in Neural Information Processing Systems,...
  • D. Xu

    Marginal Fisher analysis and its variants for human gait recognition and content- based image retrieval

    IEEE Trans. Image Process.

    (2007)
  • X. He, et al. Neighborhood preserving embedding, in: ICCV,...
  • H.T. Chen, H.W. Chang, T.L. Liu. Local discriminant embedding and its variants. Computer Vision and Pattern...
  • Y. Gao

    3-D object retrieval with Hausdorff distance learning

    IEEE Trans. Ind. Electron.

    (2014)
  • Cited by (3)

    • Robust face descriptor in unconstrained environments

      2024, Expert Systems with Applications

    Yulian Zhu received her B.S and M.S degrees in computer application from Nanjing University of Aeronautics & Astronautics (NUAA) in 2001 and 2004, respectively. Then she worked in NUAA from April 2004. There she received a Ph.D. degree in computer application in 2010. Her main research interests include machine learning, pattern recognition, and image processing.

    Songcan Chen received the B.S. degree from Hangzhou University (now merged into Zhejiang University), the M.S. degree from Shanghai Jiaotong University and the Ph.D. degree from Nanjing University of Aeronautics and Astronautics (NUAA) in 1983, 1985, and 1997, respectively. He joined in NUAA in 1986, and since 1998, he has been a full-time Professor with the Department of Computer Science and Engineering. He has authored/co-authored over 170 scientific peer-reviewed papers and ever obtained Honorable Mentions of 2006, 2007 and 2010 Best Paper Awards of Pattern Recognition Journal respectively. His current research interests include pattern recognition, machine learning, and neural computing.

    Qing Tian received the B.S. degree in computer science from Southwest University for Nationalities, China, and the M.S. degree in computer science from Zhejiang University of Technology, China, respectively with the honors of Sichuan province-level excellent graduate and Zhejiang province-level excellent graduate in 2008 and 2011. From Feb 2011 to Feb 2012, as a researcher in the field of gender/age recognition, he worked in Arcsoft, U.S. Now he is a Ph.D. candidate in computer science at Nanjing University of Aeronautics and Astronautics, and his current research interests include machine learning and pattern recognition.

    View full text