Elsevier

Information Sciences

Volume 573, September 2021, Pages 345-359
Information Sciences

Double L2,p-norm based PCA for feature extraction

https://doi.org/10.1016/j.ins.2021.05.079Get rights and content

Abstract

Recently, robust-norm distance related principal component analysis (PCA) for feature extraction has been shown to be very effective for image analysis, which considers either minimization of reconstruction error or maximization of data variance in low-dimensional subspace. However, both of them are important for feature extraction. Furthermore, most of existing methods cannot obtain satisfactory results due to the utilization of inflexible robust norm for distance metric. To address these problems, this paper proposes a novel robust PCA formulation called Double L2,p-norm based PCA (DLPCA) for feature extraction, in which the minimization of reconstruction error and the maximization of variance are simultaneously taken into account in a unified framework. In the reconstruction error function, we target to learn a latent subspace to bridge the relationship between the transformed features and the original features. To guarantee the objective to be insensitive to outliers, we take L2,p-norm as the distance metric for both reconstruction error and data variance. These characteristics make our method more applicable for feature extraction. We present an effective iterative algorithm to obtain the solution of this challenging work, and conduct theoretical analysis on the convergence of the algorithm. The experimental results on several databases show the effectiveness of our model.

Introduction

In recent years, people usually encounter various high-dimensional data, such as images and text. How to effectively represent this kind of data has been one of the most important problems for pattern classification. Feature extraction (or dimension reduction) has been widely employed as a useful data analysis tool for addressing this problem. Principal Component Analysis (PCA) [1] is one of the most representative techniques, which finds an optimal projection maximizing the variance or minimizing the reconstructed error of data and performs feature extraction as well as data reconstruction.

Conventional PCA, however, suffers from sensitiveness to outliers due to the application of squared L2-norm distance metric in the objective function [2], [3], [4], [5]. As we all know, squared L2-norm distance metric easily magnifies the effect of outliers that may make the projection vectors drift from the desired direction [6], [7]. To cope with this problem, there are an increasing number of robust PCA techniques developed for extracting features, such as low-rank PCA [8], [9], [10], [11] and L1-norm distance related PCA methods [2], [13], [14], [15]. Low-rank PCA reconstructs data with low-rank structure, which, however, cannot obtain the low-dimensional representation of data and thus is not suitable for dimension reduction of data [10]. In the literature, compared study has shown that L1-norm distance metric is more robust than squared L2-norm distance metric, since it has the ability to suppress the outliers [12]. Recently, there exist many studies on robust feature extraction, which take L1-norm as the distance metric in the objectives. Among them, L1-PCA [13], PCA-L1 [2], [14], and R1-PCA [15] are three of the most representative ones. L1-PCA seeks for robust projection vectors by minimizing the reconstructed error of data measured by L1-norm. Different from L1-PCA, PCA-L1 [2] solves the solution by maximizing L1-norm data variance rather than minimizing L1-norm reconstructed error, and the projection vectors are greedily obtained by utilizing a greedy strategy. In other words, in each greedy step, one only achieves a projection vector. In [14], Nie et al. targeted to solve the objective function using a non-greedy rather than greedy strategy, which is contrast to [2]. The literature [15] imposed L2,1-norm or rotationally-invariant (RI) L1-norm on the objective of PCA, aiming to guarantee the derivation of rotationally invariant solution. The difference between the two works [14], [15] mainly lies in the objective function: the work [14] minimized the reconstruction error, while the work [15] maximized the data variance. L1-norm has been shown to have better robustness in suppressing outliers [2], [6], thus PCA-L1 has demonstrated the most promising result, regardless of the recognition rate or the reconstructed error. In the view of the robustness advantage of PCA-L1, its idea has been extended for multilinear PCA [16]. Motivated by the impressive results of PCA-L1, many robust supervised methods have been proposed for feature extraction, which use L1-norm distance metric in the discriminant criterions. For instance, the researchers proposed linear discriminant analysis with L1-norm (LDA-L1) by maximizing the L1-norm between-class dispersion and at the same time minimizing the L1-norm within-class dispersion [6], [7], [17], [18], [47], [48]. The projection vectors for LDA-L1 are greedily solved by a greedy strategy, each being obtained by a gradient ascending algorithm. The similar work can be seen in [18]. The non-greedy version of LDA-L1 was proposed in [19], [20], [21], [22], [23], whose connections are shown in [24]. In [4], [22], rotationally invariant LDA was proposed, which utilizes L2,1-norm distance metric in the objective.

To the best of our knowledge, a necessary step is to reshape each image into a high-dimensional vector, prior to implement the robust PCA algorithms mentioned above. In [25], Yang et al. proposed a classic image-as-matrix method for feature extraction, termed as Two-dimensional PCA (2DPCA), which, however, is based on squared L2-norm and thus has the sensitiveness to outliers. By applying L1-norm distance metric in the objective 2DPCA, Li et al. [26] proposed L1-norm based 2DPCA (2DPCA-L1). In [27], a sparse version of 2DPCA-L1 (2DPCAL1-S) is developed. In addition to measuring the variance of data using L1-norm distance metric, the solution is also imposed by L1-norm. A common point of both methods is the derivation of the projection vectors by a greedy strategy. The non-greedy strategy designed in [3] has been introduced to solve the objective of Block PCA-L1 [28] and 2DPCA-L1 [29]. To have rotational invariance, the works in [5], [30] proposed LF-norm based 2DPCA, which respectively maximize the variance of data and minimize the reconstructed error. Note that LF-norm can be viewed as a 2D version of L2,1-norm.

In [31], Kwak generalized PCA-L1 to PCA-Lp, which maximizes the Lp-norm based variance with an arbitrary value of p>0. Clearly, conventional PCA and PCA-L1 are special cases of PCA-Lp by specifying p. By making an appropriate choice for p, PCA-Lp has been shown to be able to obtain more satisfactory results. Wang et al. [33] introduced Lp-norm in 2DPCA and developed a generalized 2DPCA (G2DPCA) problem. Both of the two methods consider to maximize the data variance. Different from these methods, L2,p-norm based PCA (PCA-L2,p) proposed in [32] seeks to minimize the reconstructed error. Due to the application of L2,p-norm, PCA-L2,p has the rotation invariance and is a non-greedy algorithm. Furthermore, extracting robust discriminant features by minimizing the L2,p-norm ratio of LDA has been well explored in [33].

As we all know, the objective of conventional PCA is formulated in two kinds of equivalent formulations: one is the maximization of variance of data, and the other is the minimization of reconstruction error. However, they are in-equivalent when robust norm distances (e.g. L2,p-norm distances) are embedded in the objectives. Most of the robust PCA works mentioned above only consider the application of one of the two formulations that may both be important to extract effective features for classification [10], leading to obtain inexact extraction of features. To mitigate these problems, we present a novel and more practical robust PCA formulation, namely Double L2,p-norm based PCA (DLPCA), for feature extraction in this paper. It is worthwhile to highlight the contributions of our paper as follows:

  • (1)

    The proposed method DLPCA jointly considers both the minimization of reconstruction error and the maximization of data variance in feature space which have both shown to be effective for data representation in the objective.

  • (2)

    Instead of learning a transformation matrix, we aim to search for two transformation matrices in the reconstruction error term: one is to project the data to a low-dimensional subspace, and the other is to recover the data, enabling to construct the relationship between the transformed features and the original features.

  • (3)

    We take L2,p-norm as the distance metric in the objective function of DLPCA. Since the L2,p-norm distance metric has the ability of weakening the sensitiveness to outliers, the robustness of PCA can be well promoted. Besides, the L2,p-norm ratio minimization formulation could make the solution be non-convexity and non-smoothness.

  • (4)

    Solving the new objective of DLPCA is a much more challenging work since both of minimizing the L2,p-norm reconstruction error and maximizing the L2,p-norm data variance should be solved. Nevertheless, the existing robust PCA algorithms cannot work here. To solve the difficult problem, we derive a novel iterative and non-greedy algorithm to optimize the L2,p-norm ratio minimization problem and present analysis on the convergence of the algorithm from theoretical view of point.

  • (5)

    To validate the effectiveness of our method, we conduct extensive experiments on several image datasets. Promising experimental results on image databases show the effectiveness of the proposed method compared with some other state-of-the-art PCA methods.

The rest of this paper is organized as follows. Section 2 outlines several related works. Section 3 introduces our proposed method in detail. Section 4 provides theoretical analysis of convergence of our algorithm. Section 4 shows the experimental results of the proposed method on several image datasets, and Section 5 gives the conclusions and future works.

Section snippets

Related works

Assume that there is a dataset containing n samples with d dimensions, which is represented by matrix X=[x1,x2,...,xn]Rd×n‘, where xiRd denotes the i- th sample. We assume that the data has been centered, i.e., xi=xi-1ni=1nxi. In this section, we briefly review several related works such as conventional PCA, PCA-L1, PCA-Lp and PCA-L2,p.

Conventional PCA [1] obtains the projection matrix W=w1,...,wrRd×r by maximizing the data variance:maxW,WTW=Ii=1n||WTxi||22=tr(WTStW)in which St=XXT is the

The proposed method

In this section, we will introduce the proposed algorithm DLPCA in detail.

Convergence analysis

In this section, we provide the proofs on the convergence of the proposed iterative algorithm, commencing with some lemmas and theorems.

Lemma 1

[36]

Given any two vectors c and f, when 0<p2, the inequality

||c||2p-0.5p||f||2p-2||c||22||f||2p-0.5p||f||2p-2||f||22holds.

Theorem 2

Under the iterative framework of DLPCA, the objective of (8) is monotonically decreasing via each iteration t, when p1.

Proof

From Step (5), we have:

i=1n||xi-U(t)(W(t+1))Txi||22di-αtr(W(t+1))TK(t)i=1n||xi-U(t)(W(t))Txi||22di-αtr(W(t))TK(t)

Since

Relations with other robust PCA methods

We have introduced the objective problem of our proposed DLPCA. In this section, we will discuss the connections between our DLPCA and some recently-relevant works on PCA.

Our method is of robustness against outliers. Let p=2 and ignore the minimization of reconstructed error of data. We obtain maxW,WTW=Ii=1n||WTxi||22, which is the formulation of (1) for conventional PCA. Clearly, compared with conventional PCA, our method additionally considers the derivation of robustness as well as the

Experimental verification

In this section, we evaluate our algorithm by conducting experiments on five image databases, i.e., face databases including CMU PIE [43] and ORL [45], object database ALOI [44], and traffic sign database GTSDB [46]. Four methods are taken for comparisons with our method, i.e., PCA, RIPCA [4], PCA-Lp [31], and PCA-L2,p [32], which perform classification and reconstruction tasks on the databases. Note that there are many L1-norm versions of PCA, such as PCA-L1 [13], [14] and R1-PCA [15], which

Conclusions and future work

In this paper, we proposed an effective and robust PCA algorithm, in which the minimization of reconstruction error and the maximization of data variance were simultaneously considered in a unified framework. To bridge the relationship between the original features to the transformed features for preserving the information in the input space, we proposed to learn a latent subspace. To guarantee the latent subspace to be of robustness to outliers, the L2,p-norm was adopted as the distance metric

CRediT authorship contribution statement

Pu Huang: Conceptualization, Methodology, Writing - original draft. Qiaolin Ye: Validation, Methodology, Formal analysis. Fanlong Zhang: Data curation. Guowei Yang: Software. Wei Zhu: Resources. Zhangjing Yang: Supervision, Project administration.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This research was partially supported in part by the National Science Foundation of China under Grant Nos. U1831127, 62072246 and 61871444, the Natural Science Foundation of Jiangsu Province under Grant No. BK20171453, the Industry University Research Project of Jiangsu Science and Technology Department under Grant No. BY2020033, open project for young teachers of Nanjing Audit University (School of Information Engineering) (Grant No. A111010004/012), Six Peak Talent and Qinglan Project of

References (48)

  • H. Wang et al.

    Fisher discriminant analysis with L1-norm

    IEEE Trans. Cybern.

    (2014)
  • N. Shahid et al.

    Robust principal component analysis on graphs

  • M. Rahmani et al.

    Coherence pursuit: fast, simple, and robust principal component analysis

    IEEE Trans. Signal Process.

    (2017)
  • J. Fan, Q. Sun, W. Zhou, Z. Zhu, Principal component analysis for big data, arXiv: 1801.01602,...
  • G. Lerman et al.

    An overview of robust subspace recovery

    Proc. IEEE

    (2018)
  • Q. Gao et al.

    Angle 2DPCA: a new formulation for 2DPCA

    IEEE Trans. Cybern.

    (2018)
  • Q. Ke et al.

    Robust L1 norm factorization in the presence of outliers and missing data by alternative convex programming

  • F. Nie et al.

    Robust principal component analysis with non-greedy L1-norm maximization

    Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence

    (2011)
  • C. Ding et al.

    R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization

  • K. Luu et al.

    Compressed submanifold multifactor analysis

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2017)
  • D. Zhang et al.

    Recursive discriminative subspace learning with ℓ1-norm distance constraint

    IEEE Trans. Cybern.

    (2020)
  • Q. Ye et al.

    L1-norm distance linear discriminant analysis based on an effective iterative algorithm

    IEEE Trans. Circuits Syst. Video Technol.

    (2018)
  • F.P. Nie, H. Wang, Z. Wang, Robust Linear Discriminant Analysis Using Ratio Minimization of L1,2-Norms, arXiv:...
  • J. Wen et al.

    Robust sparse linear discriminant analysis

    IEEE Trans. Circuits Syst. Video Technol.

    (2019)
  • Cited by (20)

    • Orthogonal autoencoder regression for image classification

      2022, Information Sciences
      Citation Excerpt :

      For different tasks, we mainly compared the proposed OAR algorithm with some classic and current methods, including SRC, LRC, CRC, LSR, DLSR, RLSL, LRDLSR, and SN-TSL, and the VGG16, ResNet50, MobileNet, and Xception deep learning methods. In this subsection, we describe the experiments performed on the AR [41], CMU PIE [42], Extended Yale B [43], and PolyU Palmprint databases [44] to verify our method. Fig. 4 shows sample images from each database.

    View all citing articles on Scopus
    1

    means that these authors contribute equally to this work.

    View full text