Max–Min Robust Principal Component Analysis

doi:10.1016/j.neucom.2022.11.092

Neurocomputing

Volume 521, 7 February 2023, Pages 89-98

https://doi.org/10.1016/j.neucom.2022.11.092 Get rights and content

Abstract

Principal Component Analysis (PCA) is a powerful unsupervised dimensionality reduction algorithm, which uses squared $ℓ_{2}$ -norm to cleverly connect reconstruction error and projection variance, and those improved PCA methods only consider one of them, which limits their performance. To alleviate this problem, we propose a novel Max–Min Robust Principal Component Analysis via binary weight, which ingeniously combines reconstruction error and projection variance to learn projection matrix more accurately, and uses $ℓ_{2}$ -norm as evaluation criterion to make the model rotation invariant. In addition, we design binary weight to remove outliers to improve the robustness of model and obtain the ability of anomaly detection. Subsequently, we exploit an efficient iterative optimization algorithm to solve this problem. Extensive experimental results show that our model outperforms related state-of-the-art PCA methods.

Introduction

Principal Component Analysis [1], [2] (PCA) is a very popular unsupervised dimensionality reduction method [3], [4], which is often used in image denoising [5], [6], [7], [8], image compression [9], [10], [11], subspace learning [12], [13], etc. In recent years, with the development of science and technology, it has also been widely used in biology and chemistry, such as cancelable biometrics [14], biometric cryptosystems [15] and chemometrics [16], etc. Therefore, it has an important place in many fields.

PCA learns projection matrix using maximum projection variance or minimum reconstruction error as cost function [17], [18]. To be specific, suppose the data matrix $X = [x_{1}, x_{2}, x_{3}, \dots, x_{n}] \in R^{d \times n}$ , where d and n represent the dimensionality and number of data, respectively. Without loss of generality, the data have been centralized, $i . e$ ., $\sum_{i = 1}^{n} x_{i} = 0$ . Let $W = [w_{1}, w_{2}, w_{3}, \dots, w_{k}] \in R^{d \times k}$ be a projection matrix, and then the traditional PCA focuses on solving the following optimization problem: $\begin{matrix} \min_{W} \sum_{i = 1}^{n} {||W^{⊤} x_{i}||}_{2}^{2} \Rightarrow \begin{matrix} \max_{W} \end{matrix} Tr (W^{⊤} {XX}^{⊤} W) \\ \Rightarrow \begin{matrix} \min_{W} \end{matrix} \sum_{i = 1}^{n} {||x_{i} - W W^{⊤} x_{i}||}_{2}^{2}, s . t . W^{⊤} W = I . \end{matrix}$ Since the above objective functions are based on squared $ℓ_{2}$ -norm, they are equivalent, but because of this, PCA is very sensitive to outliers [19]. Many recent research efforts have been proposed to alleviate this drawback.

In [20], Wright et al. proposed L1PCA, which uses the $ℓ_{1}$ -norm to minimize the reconstruction error and try to solve the following problem: $\min_{W} \sum_{i = 1}^{n} {||x_{i} - W W^{⊤} x_{i}||}_{1}, s . t . W^{⊤} W = I .$ In [21], Kwak et al. proposed PCAL1 based on the maximum projected variance, which focuses on solving the following problem: $\min_{W} \sum_{i = 1}^{n} {‖W^{⊤} x_{i}‖}_{1}, s . t . W^{⊤} W = I .$ In [22], Ding et al. proposed R1PCA by applying $ℓ_{1}$ -norm and $ℓ_{2}$ -norm to the data and spatial dimensions, respectively, which attempts to optimize the following problem: $\min_{W} \sum_{i = 1}^{n} {||x_{i} - W W^{⊤} x_{i}||}_{2, 1}, s . t . W^{⊤} W = I .$ In [23], Wang et al. proposed $ℓ_{2, p}$ -PCA using the $ℓ_{2}$ -norm, which correlates PCA and R1PCA by setting different p values, and then this problem is optimized by the following problem: $\min_{W} \sum_{i = 1}^{n} {||x_{i} - W W^{⊤} x_{i}||}_{2}^{p}, s . t . W^{⊤} W = I .$ Similar studies also include RPCA-OM [24], TRPCA [25], KPCA [26] and more [27], [28], [29], [30], [31]. Through the above analysis, although these existing PCA works mitigate the effects of outliers to some extent, they usually focus on the minimum reconstruction error or maximum projection variance without considering the connection between the two. In fact, the traditional PCA has this property, that is, $\sum_{i = 1}^{n} {||x_{i} - W W^{⊤} x_{i}||}_{2}^{2} + \sum_{i = 1}^{n} {||W^{⊤} x_{i}||}_{2}^{2} = \sum_{i = 1}^{n} {||x_{i}||}_{2}^{2}$ , which means that the projection variance and reconstruction error of data are considered at the same time, while those improved methods do not have the above equivalence relationship, resulting in limited model performance.

In this paper, we propose a novel Max–Min Robust Principal Component Analysis via binary weight (MMRPCA), which simultaneously considers reconstruction error and projection variance, so that this model not only obtains strong reconstruction ability, but also makes the data after dimensionality reduction more separable. Furthermore, to improve the robustness of model, we carefully design binary weight to make this model treat normal samples and outliers differently, and eliminate the negative effects of outliers to obtain a more accurate projection matrix. Interestingly, we also find that the ability of anomaly detection can be achieved by binary weight, which is not considered by existing PCA methods. To solve this model, we explore an efficient iterative optimization algorithm and strictly guarantee its convergence. Extensive experimental results show that our proposed method outperforms other methods.

Notations: In this paper, we use uppercase letters, bold lowercase letters and lowercase letters represent matrices, vectors and scalars, respectively. For matrix $Q, q_{i}$ represents the i-th column, $Q^{⊤}$ is the transpose of Q, $Tr (Q)$ is the trace operator of Q, and I denotes identity matrix. For vector $r$ , where $‖ r ‖_{1}$ and $‖ r ‖_{2}$ are the $ℓ_{1}$ -norm and $ℓ_{2}$ -norm of vector $r$ , respectively.

Section snippets

Methodology

Let $p_{i} = ‖ W^{⊤} x_{i} ‖_{2}$ and $r_{i} = ‖ x_{i} - W W^{⊤} x_{i} ‖_{2}$ represent projection variance and reconstruction error of data $x_{i}$ , respectively. As shown in Fig. 1, we take the two-dimensional space as an example. For the data point $x_{i}$ , its projection variance $r_{i}$ can be represented by the short side of right triangle, and the reconstruction error $p_{i}$ can be represented by the long side of right triangle, the essence of PCA is to hope that the larger $p_{i}$ is, the better, and the smaller $r_{i}$ is, the better, so that the important

Optimization

To solve objective function(7), we first introduce a related theorem [32], [33]:

Theorem 1

For any two functions $F (W)$ and $G (W)$ , both related to W, then: $\underset{W^{⊤} W = I}{\arg \max} \frac{F (W)}{G (W)},$ we can obtain the solution of problem(8) by optimizing the following problem: $\underset{W^{⊤} W = I}{\arg \max} F (W) - λ G (W) .$ where $λ$ is also related to W.

Now, let’s solve objective function(7). Obviously, there are two variables $g$ and W in problem(7) that need to be optimized. By the above Theorem 1, let $ξ_{i} = \frac{g_{i} {||W^{⊤} x_{i}||}_{2}}{{||x_{i} - W W^{⊤} x_{i}||}_{2}}$ , then solving problem (7) is

Convergence analysis

Theorem 1 Algorithm 1 will monotonically decrease the value of objective function(7) until convergence.

Proof The Lagrangian function of problem (10) is: $\begin{matrix} L_{0} = & \sum_{i = 1}^{n} \{g_{i} {||W^{⊤} x_{i}||}_{2} - ξ_{i} {||x_{i} - W W^{⊤} x_{i}||}_{2}\} \\ - Tr ((W^{⊤} W - I) Λ) - ζ (\sum_{i = 1}^{n} g_{i} - η) \end{matrix}$ where $Λ$ and $ζ$ are Lagrange multipliers, taking the derivative $w . r . t$ W and setting it to zero, we have the KKT condition [35] of problem(10) is: $PW = W Λ$ According to fourth step of Algorithm 1, the Lagrangian function of problem(13) is: $L_{1} = Tr (W^{⊤} PW) - Tr ((W^{⊤} W - I) Λ)$ Taking the derivative $w . r . t$ W and setting it

Experiment

In this section, we verify the superiority of our algorithm and compared it with other algorithms, including synthetical experiment, visualization experiment, reconstruction experiment, classification experiment, running time, and anomaly detection experiment, etc.

Conclusion

In this paper, we propose a novel Max–Min Robust Principal Component Analysis (MMRPCA), which combines reconstruction error and projection variance simultaneously through $ℓ_{2}$ -norm, and fully considers the reconstructability and separability of model, this is not considered by other improved PCA methods. To improve the robustness of model, we design binary weight to remove outliers, so that this model has anomaly detection ability. To solve this problem, we explore an efficient iterative

CRediT authorship contribution statement

Sisi Wang: Conceptualization, Methodology, Writing - original draft. Feiping Nie: Data curation, Validation, Funding acquisition. Zheng Wang: Writing - review & editing, Visualization, Investigation. Rong Wang: Supervision, Formal analysis. Xuelong Li: Project administration.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work was supported in part by the National Natural Science Foundation of China under Grant 62236001, in part by the Natural Science Basic Research Program of Shaanxi under Program 2021JM-071, in part by the National Natural Science Foundation of China under Grant 62176212, Grant 61936014 and Grant 61772427, and in part by the Fundamental Research Funds for the Central Universities under Grant G2019KY0501.

Sisi Wang received the M.S. degree from Northwestern Polytechnical University, Xi’an, China, where she is currently pursuing the Ph.D. degree with the School of Computer Science and the School of Artificial Intelligence, OPtics and ElectroNics (iOPEN). Her current research interests are machine learning and its applications, such as dimensionality reduction, feature selection, object detection, and anomaly detection.

References (46)

S. Wold et al.
Principal component analysis
Chemometr. Intell. Labor. Syst.
(1987)
H. Hu et al.
A novel dimensionality reduction method: Similarity order preserving discriminant analysis
Signal Process.
(2021)
X. Xu et al.
Adaptive graph weighting for multi-view dimensionality reduction
Signal Process.
(2019)
X. Yang et al.
Limited-energy output formation for multiagent systems with intermittent interactions
J. Franklin Inst.
(2021)
C. Clausen et al.
Color image compression using PCA and backpropagation learning
Pattern Recogn.
(2000)
M. Hubert et al.
A fast method for robust principal components with applications to chemometrics
Chemometr. Intell. Labor. Syst.
(2002)
J.C. Gomez et al.
PCA document reconstruction for email classification
Comput. Stat. Data Anal.
(2012)
C. Croux et al.
Algorithms for projection pursuit robust principal component analysis
Chemometr. Intell. Labor. Syst.
(2007)
G.-F. Lu et al.
L1-norm-based principal component analysis with adaptive regularization
Pattern Recogn
(2016)
X. Shi et al.
Robust principal component analysis via optimal mean by joint ℓ_2,1) and schatten p-norms minimization
Neurocomputing
(2018)

Y. Liu et al.

Robust neighborhood embedding for unsupervised feature selection

Knowl-based Syst

(2020)

Q. Ye et al.

Flexible orthogonal semisupervised learning for dimension reduction with image classification

Neurocomputing

(2014)

A. Parkins et al.

Genetic programming techniques for hand written digit recognition

Signal Process

(2004)

T. Mandal et al.

Curvelet based face recognition via dimension reduction

Signal Process

(2009)

H. Abdi et al.

Principal component analysis

WIREs Comput. Stat.

(2010)

T. Tasdizen, Principal components for non-local means image denoising, in: 2008 15th IEEE International Conference on...

Y.M.M. Babu et al.

PCA based image denoising

Signal Image Process.

(2012)

K. Dabov et al.

BM3D image denoising with Shape-adaptive principal component analysis

X. Yang et al.

Fuzzy embedded clustering based on bipartite graph for Large-scale hyperspectral image

IEEE Geosci. Remote Sens. Lett.

(2022)

Q. Du et al.

Low-complexity principal component analysis for hyperspectral image compression

Int. J. High Perform. Comput. Appl.

(2008)

N. Vaswani et al.

Robust subspace learning: Robust PCA, robust subspace tracking, and robust subspace recovery

IEEE Signal Process. Mag.

(2018)

J. Zhan et al.

Robust PCA with partial subspace knowledge

IEEE Trans. Signal Process.

(2015)

N. Kumar et al.

Random permutation principal component analysis for cancelable biometric recognition

Appl. Intell.

(2018)

Cited by (3)

Sparse discriminant PCA based on contrastive learning and class-specificity distribution
2023, Neural Networks
Much mathematical effort has been devoted to developing Principal Component Analysis (PCA), which is the most popular feature extraction method. To suppress the negative effect of noise on PCA performance, there have been extensive studies and applications of a large number of robust PCAs achieving outstanding results. However, existing methods suffer from at least two shortcomings: (1) They expressed PCA as a reconstruction model measured by Euclidean distance, which only considers the relationship between the data and its reconstruction and ignores the differences between different data points; (2) They did not consider the class-specificity distribution information contained in the data itself, thus lacking discriminative properties. To overcome the above problems, we propose a Sparse Discriminant Principal Components Analysis (SDPCA) model based on contrastive learning and class-specificity distribution. Specifically, we use contrastive learning to measure the relationship between samples and their reconstructions, which fully takes the discriminative information between data into account in PCA. In order to make the extracted low-dimensional features profoundly reflect the class-specificity distribution of the data, we minimize the squared $ℓ_{1, 2}$ -norm of the low-dimensional embedding. In addition, to reduce the effects of redundant features and noise and to improve the interpretability of PCA at the same time, we impose sparsity constraints on the projection matrix using the squared $ℓ_{1, 2}$ -norm. Our experimental results on different types of benchmark databases demonstrate that our model has state-of-the-art performance.
Diagnosis and staging of cervical cancer using label-free surface-enhanced Raman spectroscopy and BWRPCA-TLNN model
2023, Vibrational Spectroscopy
This paper presents a label-free and highly accurate classification serum analytical platform, which will be used to identify cervical cancer at different stages. In detail, the microarray chip fabricated based on the ordered Ag-AuNRs substrate was prepared to measure the surface-enhanced Raman scattering (SERS) spectra of serum of healthy subjects and cervical cancer patients (Ⅰ, Ⅱ, Ⅲ, and Ⅳ), and then a binary weight robust principal component analysis (BWRPCA)-two-layer nearest neighbor (TLNN) model was designed as the diagnosis and recognition model of SERS spectra. The results revealed that the microarray chip can realize rapid, sensitive, high-throughput detection of SERS spectra of serum. The BWRPCA-TLNN successfully differentiated the SERS spectra and accurately captured the key characteristics for classification. The established BWRPCA-TLNN model achieved an excellent classification accuracy of 91.2 %, a diagnostic sensitivity of over 84.0 % and a specificity of over 97.0 %. This exploratory work demonstrated that SERS combined with BWRPCA-TLNN as a diagnostic screening method has potential for improving cervical cancer detection and screening.
Fun with Flags: Robust Principal Directions via Flag Manifolds
2024, arXiv

Feiping Nie received the Ph.D. degree in computer science from Tsinghua University, Beijing, China, in 2009. He is currently a Full Professor with Northwestern Polytechnical University, Xi’an, China. His research interests are machine learning and its applications, such as pattern recognition, data mining, computer vision, image processing, and information retrieval. He has published more than 100 papers in the following journals and conferences: TPAMI, TIP, TNNLS, TKDE, ICML, NIPS, KDD, IJCAI, AAAI, ICCV. His papers have been cited more than 20000 times and the H-index is 84. He is now serving as Associate Editor or PC member for several prestigious journals and conferences in the related fields.

Zheng Wang received the M.S. degree from Anhui University, Hefei, China. He is currently pursuing the Ph.D. degree with the School of Computer Science and the School of Artificial Intelligence, OPtics and ElectroNics (iOPEN), Northwestern Polytechnical University, Xi’an, China. He has published several articles in journals, such as TPAMI, TCyb, TKDE, TKDD, and PR. His research interests mainly focus on representation learning for generic data and its applications.

Rong Wang Received the B.S. degree in information engineering, the M.S. degree in signal and information processing, and the Ph.D. degree in computer science from Xi’an Research Institute of Hi-Tech, Xi’an, China, in 2004, 2007 and 2013, respectively. He also studied at the Department of Automation, Tsinghua University, Beijing, China, in 2007 and 2013, for his Ph.D. degree. He is currently an associate professor with the School of Artificial Intelligence, OPtics and ElectroNics (iOPEN), Northwestern Polytechnical University, Xi’an, China. His research interests focus on machine learning and its applications.

Xuelong Li (M 0 02-SM 0 07-F 0 12) is a full professor with the School of Artificial Intelligence, OPtics and ElectroNics (iOPEN), Northwestern Polytechnical University, Xi’an, China.

View full text

Max–Min Robust Principal Component Analysis

Abstract

Introduction

Section snippets

Methodology

Optimization

Convergence analysis

Experiment

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgment

Chemometr. Intell. Labor. Syst.

Signal Process.

Signal Process.

J. Franklin Inst.

Pattern Recogn.

Chemometr. Intell. Labor. Syst.

Comput. Stat. Data Anal.

Chemometr. Intell. Labor. Syst.

Pattern Recogn

Neurocomputing

Knowl-based Syst

Neurocomputing

Signal Process

Signal Process

Principal component analysis

WIREs Comput. Stat.

PCA based image denoising

Signal Image Process.

BM3D image denoising with Shape-adaptive principal component analysis

Fuzzy embedded clustering based on bipartite graph for Large-scale hyperspectral image

IEEE Geosci. Remote Sens. Lett.

Low-complexity principal component analysis for hyperspectral image compression

Int. J. High Perform. Comput. Appl.

Robust subspace learning: Robust PCA, robust subspace tracking, and robust subspace recovery

IEEE Signal Process. Mag.

Robust PCA with partial subspace knowledge

IEEE Trans. Signal Process.

Random permutation principal component analysis for cancelable biometric recognition

Appl. Intell.