Elsevier

Pattern Recognition

Volume 93, September 2019, Pages 164-178
Pattern Recognition

Robust Jointly Sparse Regression with Generalized Orthogonal Learning for Image Feature Selection

https://doi.org/10.1016/j.patcog.2019.04.011Get rights and content

Abstract

Ridge regression (RR) and its variants are fundamental methods for multivariable data analysis, which have been widely used to deal with different problems in pattern recognition or classification. However, these methods have their common drawback. That is, the number of the learned projections is limited by the number of class. Moreover, most of these methods do not consider the local structure of the data, which makes them less competitive in the case when data are lying on a lower dimensional manifold. Therefore, in this paper, we propose a robust jointly sparse regression method to integrate the locality geometric structure with generalized orthogonality constraint and joint sparsity into a regression modal to address these problems. The optimization model can be solved by an alternatively iterative algorithm using orthogonal matching pursuit (OMP) and singular value decomposition. Experimental results on face and non-face image database demonstrate the superiority of the proposed method. The matlab code can be found at http://www.scholat.com/laizhihui.

Introduction

Since the data in image/video processing, bioinformatics and web data mining are often high dimensional, the computational or memory cost can be very high. Therefore, it is very necessary to have powerful tools to deal with those massive data sets. Feature selection or extraction is considered one of the most effective tools to select or compress the important information into a reduced low-dimensional space [1], [2], [3]. Thus, many algorithms have been developed to deal with this problem [4], [5]. The most widely used multivariable analysis methods for dimensionality reduction are principal component analysis (PCA) [6], linear discriminant analysis (LDA) [7], ridge regression (RR) and their variations.

However, in many practical applications like face recognition, the data is usually sampled from a nonlinear low-dimensional manifold of the high dimensional ambient space and both of PCA, LDA and RR are not suitable in these cases. Thus, many subspace learning algorithms based on manifold learning are proposed [8], [9], [10], [11], [12]. Motivated by the manifold learning methods, RR was extended to have local preserving ability [13], [14], [15].

Although all the subspace learning methods mentioned above have their suitable application cases, they still have a major disadvantage. That is, since their learned projections are linear combinations of all the original features, it lacks the interpretation of the results. For example, RR uses L2-norm on the regularization term and lacks sparsity property. However, many regression methods using L1-norm on the regularization term can obtain sparse projections, and thus they have attracted much attention in the field of machine learning and pattern recognition [16]. The most representative sparse regression methods are the sparse RR [17], and Elastic Net [18]. Motivated by the sparse RR and the Elastic Net, many subspace learning methods were extended to sparse cases in regression forms [19], including sparse PCA (SPCA) [6], sparse LDA (SLDA) [20], sparse locality preserving embedding (SLPE) [21] and sparse locality preserving projections (SpLPP) [22]. All these methods learn sparse projections by incorporating L1 norm regularized regression in the process of learning projections. One of the main disadvantages is that they are usually time-consuming because the L1-norm based methods conduct feature selection on high dimensional image vectors. In addition, the learn projections are not joint sparse. That is, the L1-norm based sparse learning cannot obtain the joint sparsity which is considered much more effective for feature selection and classification in computer vision or biometric.

Motivated by the property of L2, 1-norm as regularization for jointly sparse learning, the regression methods are further developed to be the jointly sparse regression [23], [24], [25]. Nie et al. proposed efficient and robust feature selection (RFS) [26] via L2, 1-norms minimization regression. This method uses L2, 1-norm on both of loss function and the regularization term on the regression model so as to not only enhance the robustness to outliers but also guarantee the joint sparse projections for effective feature selection. Other related L2, 1-norm based regression methods [27], [28], [29], [30], [31] were also proposed for jointly sparse subspace learning.

No matter what kinds of variation of the above methods, the basic model is still the form of ridge regression just using different norms as the measurement on the main part or on the regularization term. Therefore, in this paper, we focus on the basic model to develop a generalized ridge regression method to solve the potential drawbacks of the previous methods. First, the number of the learned projections is limited by the number of class (i.e. small-class problem), which means that they cannot obtain enough projections for more effective feature selection. Second, the correlation of the learned sparse projections direction is not taken into consideration. That is, since the projection directions are not mutually orthogonal, the effectiveness of each projection direction is not guaranteed. Third, the robustness as well as flexibility of the previous L2, 1-norm based methods are unknown since there is no specific technic incorporated into their objective functions to release this problem. Therefore, to release the above problems, we have done some research and the previous work was published as a conference paper [32]. However, the previous work still did not consider the orthogonality of the projection direction. Also, the local geometric structure of the data is ignored. In this paper, we further extend the proposed method in the conference paper [32] into a more general form. That is, one more constraint characterizing the manifold structure of the data is appended to the model in [32]. We call the proposed method Robust Jointly Sparse Regression (RJSR), which aims to solve the problems mentioned above so as to not only improve the performance of feature selection and extraction but also enhance the robustness.

The main contributions of this paper have three-folds:

  • 1)

    The optimal projections are mutually orthogonal by adding generalized orthogonal constraint and the optimal solution is iteratively learned via Orthogonal Matching Pursuit (OMP). The OMP is able to help the model to obtain more discriminative information for effective feature selection.

  • 2)

    The robustness is enhanced by not only utilizing L2, 1-norm instead of L2-norm on the loss function to reduce the sensitivity to outliers, but also incorporating an elastic factor on the regression model to avoid the overfitting which usually arises in regression-based methods.

  • 3)

    The proposed method can break through the small-class problem, which exists in the regression-based methods or the LDA-based methods, so as to obtain more projections to improve the performance of pattern recognition or classification. In addition, the convergence of the proposed algorithm is also proved.

The rest of this paper is organized as follows. The related works are presented in Section 2 while Section 3 gives the objective function and the optimal solution of the proposed method. Section 4 presents the theoretical analysis including the convergence of the proposed algorithm and its computational complexity. The experiment is analyzed in Section 5 and the conclusion of the paper is drawn in Section 6.

Section snippets

Related works

In this section, we first present the notations used in this paper and then briefly review some related works of the proposed method.

Robust jointly sparse regression

In this section, we first present the motivations and the novel definitions of the paper, and then introduce our previous work presented in [32]. Based on the groundwork, we finally propose our objective function of the model and its corresponding optimal solution. Some comparison and discussion versus other relevant methods are also made to demonstrate the novelty and the advantages of the proposed method.

Theoretical analysis

In this section, theoretical analysis including the convergence of the proposed algorithm and the corresponding computational complexity is presented.

Experiments

In this section, the COIL100 dataset (http://www.cs.columbia.edu/CAVE/software/softlib/coil-100.php) is first used to evaluate the performance of the proposed RJSR when images are with rotational variations. Then experiments on other five databases are conducted to evaluate the performance of the proposed method on face databases (i.e. Yale [54], ORL [55] and AR [56] Dataset) and non-face databases (i.e. hyperspectral images from the University of Pavia Data Set (PaviaU) [57], Binary alpha

Conclusion

In this paper, we propose a robust jointly sparse regression for effective feature selection. By combining the locality of the manifold structure of the original data, the orthogonality and the joint sparsity of the projection, RJSR is able to obtain more discriminative information for image recognition or classification tasks. In addition, RJSR can also release the small-class problem to obtain more projections via the designed loss function. The proposed optimization problem can be solved by

Acknowledgments

This work was supported in part by the Natural Science Foundation of China (Grant 61573248, Grant 61732011 and Grant 61802267), Research Grant of The Hong Kong Polytechnic University (Project Code: G-UA2B), the Guangdong Natural Science Foundation (Project 2017A030313367 and Project 2017A030310067), and Shenzhen Municipal Science and Technology Innovation Council (No. JCYJ20170302153434048, No. JCYJ20180305124834854 and No. JCYJ20160429182058044).

Dongmei Mo received the B.S degree and M.S Degree from Zhaoqing University and Shenzhen University. She is now pursuing PHD degree in the Hong Kong Polytechnic University (e-mail: [email protected]).

References (61)

  • Y. Pang et al.

    Outlier-resisting graph embedding

    Neurocomputing

    (2010)
  • J. Xie et al.

    Robust nuclear norm-based matrix regression with applications to robust face recognition

    IEEE Trans. Image Process

    (2017)
  • Y. Xu et al.

    A new discriminative sparse representation method for Robust face Recognition via L2-norm rRegularization

    IEEE Trans. Neural Netw. Learn. Syst.

    (2017)
  • H. Zhao, W.K. Wong, Regularized discriminant entropy analysis, 47 (2014)...
  • H. Zou et al.

    Sparse principal component analysis

    J. Comput. Graph. Stat.

    (2004)
  • B. Peter N. et al.

    Eigenfaces vs. fisherfaces: recognition using class specific linear projection

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1997)
  • X. He et al.

    Locality preserving projections

    Neural Inf. Process. Syst.

    (2004)
  • D. Cai et al.

    Isometric projection

  • Xiaofei He et al.

    Neighborhood preserving embedding

  • S. Roweis et al.

    Nonlinear dimensionality reduction by locally linear embedding

    Science

    (2000)
  • S. Yan et al.

    Graph embedding: a general framework for dimensionality reduction

    Intern. Conf. Comput. Vis. Pattern Recognit

    (2005)
  • N. Nguyen et al.

    Ridge regression for two dimensional locality preserving projection

    Int. Conf. Pattern Recognit.

    (2008)
  • D. Cai et al.

    Spectral regression for efficient regularized subspace learning

    Int. Conf. Comput. Vis.

    (2007)
  • D. Brown

    Locality-regularized Linear regression for face recognition

    Int. Conf. Pattern Recognit

    (2012)
  • J. Wright et al.

    Robust face recognition via sparse representation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2009)
  • R. Tibshirani

    Regression shrinkage and selection via the Lasso

    J. R. Stat. Soc. Ser. B Stat. Methodol.

    (1996)
  • H. Zou et al.

    Regularization and variable selection via the elastic net

    J. R. Stat. Soc. Ser. B.

    (2005)
  • Z. Qiao et al.

    Sparse linear discriminant analysis with applications to high dimensional low sample size data

    IAENG Int. J. Appl. Math.

    (2009)
  • Z. Zheng

    Sparse locality preserving embedding

    Int. Congr. Image Signal Process.

    (2009)
  • K. Wang et al.

    Joint feature selection and subspace learning for cross-modal retrieval

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2016)
  • Cited by (0)

    Dongmei Mo received the B.S degree and M.S Degree from Zhaoqing University and Shenzhen University. She is now pursuing PHD degree in the Hong Kong Polytechnic University (e-mail: [email protected]).

    Zhihui Lai received the B.S degree in mathematics from South China Normal University, M.S degree from Jinan University, and the PhD degree in pattern recognition and intelligence system from Nanjing University of Science and Technology (NUST), China, in 2002, 2007 and 2011, respectively. He has been a research associate, Postdoctoral Fellow and Research Fellow at The Hong Kong Polytechnic University. His research interests include face recognition, image processing and content-based image retrieval, pattern recognition, compressive sense, human vision modelization and applications in the fields of intelligent robot research. He has published over 100 scientific articles, including more than 30 papers published on top-tier IEEE Transactions. Now he is an associate editor of International Journal of Machine Learning and Cybernetics. For more information, including all the papers and the Matlab codes, please refer to his website: http://www.scholat.com/laizhihui.

    View full text