Double L2,p-norm based PCA for feature extraction
Introduction
In recent years, people usually encounter various high-dimensional data, such as images and text. How to effectively represent this kind of data has been one of the most important problems for pattern classification. Feature extraction (or dimension reduction) has been widely employed as a useful data analysis tool for addressing this problem. Principal Component Analysis (PCA) [1] is one of the most representative techniques, which finds an optimal projection maximizing the variance or minimizing the reconstructed error of data and performs feature extraction as well as data reconstruction.
Conventional PCA, however, suffers from sensitiveness to outliers due to the application of squared L2-norm distance metric in the objective function [2], [3], [4], [5]. As we all know, squared L2-norm distance metric easily magnifies the effect of outliers that may make the projection vectors drift from the desired direction [6], [7]. To cope with this problem, there are an increasing number of robust PCA techniques developed for extracting features, such as low-rank PCA [8], [9], [10], [11] and L1-norm distance related PCA methods [2], [13], [14], [15]. Low-rank PCA reconstructs data with low-rank structure, which, however, cannot obtain the low-dimensional representation of data and thus is not suitable for dimension reduction of data [10]. In the literature, compared study has shown that L1-norm distance metric is more robust than squared L2-norm distance metric, since it has the ability to suppress the outliers [12]. Recently, there exist many studies on robust feature extraction, which take L1-norm as the distance metric in the objectives. Among them, L1-PCA [13], PCA-L1 [2], [14], and R1-PCA [15] are three of the most representative ones. L1-PCA seeks for robust projection vectors by minimizing the reconstructed error of data measured by L1-norm. Different from L1-PCA, PCA-L1 [2] solves the solution by maximizing L1-norm data variance rather than minimizing L1-norm reconstructed error, and the projection vectors are greedily obtained by utilizing a greedy strategy. In other words, in each greedy step, one only achieves a projection vector. In [14], Nie et al. targeted to solve the objective function using a non-greedy rather than greedy strategy, which is contrast to [2]. The literature [15] imposed L2,1-norm or rotationally-invariant (RI) L1-norm on the objective of PCA, aiming to guarantee the derivation of rotationally invariant solution. The difference between the two works [14], [15] mainly lies in the objective function: the work [14] minimized the reconstruction error, while the work [15] maximized the data variance. L1-norm has been shown to have better robustness in suppressing outliers [2], [6], thus PCA-L1 has demonstrated the most promising result, regardless of the recognition rate or the reconstructed error. In the view of the robustness advantage of PCA-L1, its idea has been extended for multilinear PCA [16]. Motivated by the impressive results of PCA-L1, many robust supervised methods have been proposed for feature extraction, which use L1-norm distance metric in the discriminant criterions. For instance, the researchers proposed linear discriminant analysis with L1-norm (LDA-L1) by maximizing the L1-norm between-class dispersion and at the same time minimizing the L1-norm within-class dispersion [6], [7], [17], [18], [47], [48]. The projection vectors for LDA-L1 are greedily solved by a greedy strategy, each being obtained by a gradient ascending algorithm. The similar work can be seen in [18]. The non-greedy version of LDA-L1 was proposed in [19], [20], [21], [22], [23], whose connections are shown in [24]. In [4], [22], rotationally invariant LDA was proposed, which utilizes L2,1-norm distance metric in the objective.
To the best of our knowledge, a necessary step is to reshape each image into a high-dimensional vector, prior to implement the robust PCA algorithms mentioned above. In [25], Yang et al. proposed a classic image-as-matrix method for feature extraction, termed as Two-dimensional PCA (2DPCA), which, however, is based on squared L2-norm and thus has the sensitiveness to outliers. By applying L1-norm distance metric in the objective 2DPCA, Li et al. [26] proposed L1-norm based 2DPCA (2DPCA-L1). In [27], a sparse version of 2DPCA-L1 (2DPCAL1-S) is developed. In addition to measuring the variance of data using L1-norm distance metric, the solution is also imposed by L1-norm. A common point of both methods is the derivation of the projection vectors by a greedy strategy. The non-greedy strategy designed in [3] has been introduced to solve the objective of Block PCA-L1 [28] and 2DPCA-L1 [29]. To have rotational invariance, the works in [5], [30] proposed LF-norm based 2DPCA, which respectively maximize the variance of data and minimize the reconstructed error. Note that LF-norm can be viewed as a 2D version of L2,1-norm.
In [31], Kwak generalized PCA-L1 to PCA-Lp, which maximizes the Lp-norm based variance with an arbitrary value of . Clearly, conventional PCA and PCA-L1 are special cases of PCA-Lp by specifying p. By making an appropriate choice for p, PCA-Lp has been shown to be able to obtain more satisfactory results. Wang et al. [33] introduced Lp-norm in 2DPCA and developed a generalized 2DPCA (G2DPCA) problem. Both of the two methods consider to maximize the data variance. Different from these methods, L2,p-norm based PCA (PCA-L2,p) proposed in [32] seeks to minimize the reconstructed error. Due to the application of L2,p-norm, PCA-L2,p has the rotation invariance and is a non-greedy algorithm. Furthermore, extracting robust discriminant features by minimizing the L2,p-norm ratio of LDA has been well explored in [33].
As we all know, the objective of conventional PCA is formulated in two kinds of equivalent formulations: one is the maximization of variance of data, and the other is the minimization of reconstruction error. However, they are in-equivalent when robust norm distances (e.g. L2,p-norm distances) are embedded in the objectives. Most of the robust PCA works mentioned above only consider the application of one of the two formulations that may both be important to extract effective features for classification [10], leading to obtain inexact extraction of features. To mitigate these problems, we present a novel and more practical robust PCA formulation, namely Double L2,p-norm based PCA (DLPCA), for feature extraction in this paper. It is worthwhile to highlight the contributions of our paper as follows:
- (1)
The proposed method DLPCA jointly considers both the minimization of reconstruction error and the maximization of data variance in feature space which have both shown to be effective for data representation in the objective.
- (2)
Instead of learning a transformation matrix, we aim to search for two transformation matrices in the reconstruction error term: one is to project the data to a low-dimensional subspace, and the other is to recover the data, enabling to construct the relationship between the transformed features and the original features.
- (3)
We take L2,p-norm as the distance metric in the objective function of DLPCA. Since the L2,p-norm distance metric has the ability of weakening the sensitiveness to outliers, the robustness of PCA can be well promoted. Besides, the L2,p-norm ratio minimization formulation could make the solution be non-convexity and non-smoothness.
- (4)
Solving the new objective of DLPCA is a much more challenging work since both of minimizing the L2,p-norm reconstruction error and maximizing the L2,p-norm data variance should be solved. Nevertheless, the existing robust PCA algorithms cannot work here. To solve the difficult problem, we derive a novel iterative and non-greedy algorithm to optimize the L2,p-norm ratio minimization problem and present analysis on the convergence of the algorithm from theoretical view of point.
- (5)
To validate the effectiveness of our method, we conduct extensive experiments on several image datasets. Promising experimental results on image databases show the effectiveness of the proposed method compared with some other state-of-the-art PCA methods.
The rest of this paper is organized as follows. Section 2 outlines several related works. Section 3 introduces our proposed method in detail. Section 4 provides theoretical analysis of convergence of our algorithm. Section 4 shows the experimental results of the proposed method on several image datasets, and Section 5 gives the conclusions and future works.
Section snippets
Related works
Assume that there is a dataset containing samples with d dimensions, which is represented by matrix ‘, where denotes the sample. We assume that the data has been centered, i.e., . In this section, we briefly review several related works such as conventional PCA, PCA-L1, PCA-Lp and PCA-L2,p.
Conventional PCA [1] obtains the projection matrix by maximizing the data variance:in which is the
The proposed method
In this section, we will introduce the proposed algorithm DLPCA in detail.
Convergence analysis
In this section, we provide the proofs on the convergence of the proposed iterative algorithm, commencing with some lemmas and theorems. Lemma 1 Given any two vectors and , when , the inequality Theorem 2 Under the iterative framework of DLPCA, the objective of (8) is monotonically decreasing via each iteration , when . Proof From Step (5), we have:[36]
Since
Relations with other robust PCA methods
We have introduced the objective problem of our proposed DLPCA. In this section, we will discuss the connections between our DLPCA and some recently-relevant works on PCA.
Our method is of robustness against outliers. Let and ignore the minimization of reconstructed error of data. We obtain , which is the formulation of (1) for conventional PCA. Clearly, compared with conventional PCA, our method additionally considers the derivation of robustness as well as the
Experimental verification
In this section, we evaluate our algorithm by conducting experiments on five image databases, i.e., face databases including CMU PIE [43] and ORL [45], object database ALOI [44], and traffic sign database GTSDB [46]. Four methods are taken for comparisons with our method, i.e., PCA, RIPCA [4], PCA-Lp [31], and PCA-L2,p [32], which perform classification and reconstruction tasks on the databases. Note that there are many L1-norm versions of PCA, such as PCA-L1 [13], [14] and R1-PCA [15], which
Conclusions and future work
In this paper, we proposed an effective and robust PCA algorithm, in which the minimization of reconstruction error and the maximization of data variance were simultaneously considered in a unified framework. To bridge the relationship between the original features to the transformed features for preserving the information in the input space, we proposed to learn a latent subspace. To guarantee the latent subspace to be of robustness to outliers, the L2,p-norm was adopted as the distance metric
CRediT authorship contribution statement
Pu Huang: Conceptualization, Methodology, Writing - original draft. Qiaolin Ye: Validation, Methodology, Formal analysis. Fanlong Zhang: Data curation. Guowei Yang: Software. Wei Zhu: Resources. Zhangjing Yang: Supervision, Project administration.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgement
This research was partially supported in part by the National Science Foundation of China under Grant Nos. U1831127, 62072246 and 61871444, the Natural Science Foundation of Jiangsu Province under Grant No. BK20171453, the Industry University Research Project of Jiangsu Science and Technology Department under Grant No. BY2020033, open project for young teachers of Nanjing Audit University (School of Information Engineering) (Grant No. A111010004/012), Six Peak Talent and Qinglan Project of
References (48)
- et al.
Dual robust regression for pattern classification
Inf. Sci.
(2021) - et al.
2DPCA with L1-norm for simultaneously robust and sparse modelling
Neural Netw.
(2013) - et al.
Joint sparse principal component analysis
Pattern Recogn.
(2017) - et al.
Image decomposition based matrix regression with applications to robust face recognition
Pattern Recogn.
(2020) - et al.
Bi-weighted robust matrix regression for face recognition
Neurocomputing
(2017) - et al.
Eigenfaces for recognition
J. Cogn. Neurosci.
(1991) Principal component analysis based on L1-norm maximization
IEEE Trans. Pattern Anal. Mach. Intell.
(2008)- et al.
Rotational invariant dimensionality reduction algorithms
IEEE Trans. Cybern.
(2017) - et al.
R1-2-DPCA and face recognition
IEEE Trans. Cybern.
(2019) - et al.
L1-norm heteroscedastic discriminant analysis under mixture of Gaussian distributions
IEEE Trans. Neural Netw. Learn. Syst.
(2019)
Fisher discriminant analysis with L1-norm
IEEE Trans. Cybern.
Robust principal component analysis on graphs
Coherence pursuit: fast, simple, and robust principal component analysis
IEEE Trans. Signal Process.
An overview of robust subspace recovery
Proc. IEEE
Angle 2DPCA: a new formulation for 2DPCA
IEEE Trans. Cybern.
Robust L1 norm factorization in the presence of outliers and missing data by alternative convex programming
Robust principal component analysis with non-greedy L1-norm maximization
Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence
R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization
Compressed submanifold multifactor analysis
IEEE Trans. Pattern Anal. Mach. Intell.
Recursive discriminative subspace learning with ℓ1-norm distance constraint
IEEE Trans. Cybern.
L1-norm distance linear discriminant analysis based on an effective iterative algorithm
IEEE Trans. Circuits Syst. Video Technol.
Robust sparse linear discriminant analysis
IEEE Trans. Circuits Syst. Video Technol.
Cited by (20)
Unsupervised feature extraction based on uncorrelated approach
2024, Information SciencesA joint-norm distance metric 2DPCA for robust dimensionality reduction
2023, Information SciencesNoise-related face image recognition based on double dictionary transform learning
2023, Information SciencesOrthogonal autoencoder regression for image classification
2022, Information SciencesCitation Excerpt :For different tasks, we mainly compared the proposed OAR algorithm with some classic and current methods, including SRC, LRC, CRC, LSR, DLSR, RLSL, LRDLSR, and SN-TSL, and the VGG16, ResNet50, MobileNet, and Xception deep learning methods. In this subsection, we describe the experiments performed on the AR [41], CMU PIE [42], Extended Yale B [43], and PolyU Palmprint databases [44] to verify our method. Fig. 4 shows sample images from each database.
- 1
means that these authors contribute equally to this work.