Bayesian robust principal component analysis with structured sparse component

https://doi.org/10.1016/j.csda.2016.12.005Get rights and content

Abstract

The robust principal component analysis (RPCA) refers to the decomposition of an observed matrix into the low-rank component and the sparse component. Conventional methods model the sparse component as pixel-wisely sparse (e.g., 1-norm for the sparsity). However, in many practical scenarios, elements in the sparse part are not truly independently sparse but distributed with contiguous structures. This is the reason why representative RPCA techniques fail to work well in realistic complex situations. To solve this problem, a Bayesian framework for RPCA with structured sparse component is proposed, where both amplitude and support correlation structure are considered simultaneously in recovering the sparse component. The model learning is based on the variational Bayesian inference, which can potentially be applied to estimate the posteriors of all latent variables. Experimental results demonstrate the proposed methodology is validated on synthetic and real data.

Introduction

High-dimensional data, such as image and video processing, documents, web search, and biological data often lie in a low-dimensional structure or manifold. As the fields of optimization and statistics develop, the problem of exploring and exploiting low-dimensional structure in high-dimensional data has been extensively studied.

Principal component analysis (PCA) (Serneels and Verdonck, 2008, Giordani and Kiers, 2006), as a classical and popular unsupervised dimensionality reduction approach, has wide applications in computer vision and pattern recognition. To a large extent, PCA efficiently develops the best 2-norm low-rank approximation of the observed data. However, 2-norm is sensitive to outliers which often appear in practical situations. Therefore, PCA may not get the optimal performance under a large corruption, even if the corruption affects only a small part of the data.

To address the brittleness of classical PCA with respect to outliers, robust principal component analysis (RPCA) (Candès et al., 2011) has been proposed to decompose the observed data matrix YRm×n into a low-rank matrix XRm×n (rank: rmin{m,n}) and a sparse matrix ERm×n (with sparse outliers). This research seeks to recover the low-rank matrix X and the sparse matrix E from Y by solving the following optimization problem minX,Erank(X)+λE0,s.t.Y=X+E, where E0 denotes the number of nonzero entries in E and λ is a positive weighting parameter. Unfortunately, (1) is a highly nonconvex optimization problem. However, one can obtain a tractable optimization problem by relaxing (1) that replaces the 0-norm with the 1-norm and the rank with the nuclear norm, yielding the following convex optimization problem minX,EX+λE1,s.t.Y=X+E, where and 1 denote the matrix nuclear norm (the sum of its singular values) and the 1-norm (the sum of the absolute values of entries), respectively. Candès et al. (2011) showed that under broad conditions the underlying low-rank component X and the sparse component E can be exactly recovered with high probability by solving the above convex optimization problem. As a promising tool, RPCA has been successfully applied to many research fields, such as video surveillance (Zhou et al., 2013), face recognition (Wagner et al., 2012), subspace clustering (Liu et al., 2013) and image alignment (Peng et al., 2012).

In fact, it is an identifiability issue to perfectly disentangle the low-rank and the sparse components. To make the problem meaningful, we first need to impose that the low-rank component is not sparse. As done in Candès et al. (2011), the singular value decomposition of the real low-rank component X0 is written as X0=UΣV, where U,V are the left-singular and the right-singular matrices. Suppose that the rank of X0 is r, the incoherence condition with parameter μ is defined as maxiUei2μrm,maxiVei2urn,UVμrmn, where M=maxi,j|Mij| and ei is the unit vector that has a one in the ith element and zeros everywhere else. As discussed in Candès et al. (2011), the incoherence condition asserts that for small values of μ, the singular vectors of X0 spread out. In other words, the low-rank component X0 is not sparse. To avoid another meaningless situation, we assume that the sparsity pattern of the real sparse component is selected uniformly at random (the sparse component is not low-rank). Under these assumptions that the rank of the low-rank component is not too large and the sparse component is reasonably sparse, the model (2) can perfectly recover the low-rank and the sparse components. The result is given in the following theorem.

Theorem 1 Candès et al., 2011

Suppose X0 obeys   (4). Fix any m×n matrix Ψ of signs. Suppose that the support set Ω of E0 is uniformly distributed among all sets of cardinality w, and that sgn([E0]ij)=Ψij for all (i,j)Ω. Then, there is numerical constant c such that with probability at least 1cn110 (over the choice of support of E0, n1=max(m,n) ), the model   (2)   with a range of correct values of λ can exactly recover the real low-rank and the real sparse components, provided that rank(X0)ρrn2μ1(logn1)2 and wρsmn. In this equation, ρr and ρs are positive numerical constants and n2=min(m,n).

Algorithms developed for the RPCA problem are often intuitive extensions of low-rank matrix recovery, therefore share a similar trajectory. Among the different methods proposed are heuristic deterministic approaches based on nuclear norm relaxation, such as singular value thresholding (SVT) (Cai et al., 2010), singular value projection (SVP) (Jain et al., 2010), an accelerated proximal gradient algorithm (APG) (Toh and Yun, 2010), the augmented lagrange multiplier method (ALM) (Lin et al., 0000) etc. Although these algorithms may work well theoretically, they have limited reconstruction effects since the nuclear norm may not be a good surrogate to the rank function. To get a more accurate and robust approximation to the rank function, Hu et al. proposed a novel method called truncated nuclear norm regularization (TNNR) (Hu et al., 2013) which only minimized the smallest p singular values to recover the low-rank component. Noted that all the existing nonconvex penalty functions were concave and their gradients were decreasing functions, an iteratively reweighted nuclear norm (IRNN) was suggested in Lu et al. (2014). Inspired by the paradigm of p-norm in compressive sensing (Ince et al., 2013), some try to expand this concept to the traditional nuclear norm (Lu, 2014, Lu et al., 2015, Nie et al., 2012, Peng et al., 2014), which can approximate the rank function better.

In addition to the above deterministic methods, some statistical algorithms express the RPCA as the solution of a Bayesian inference problem and apply statistical tools to solve it. The statistical procedures (Manteiga and Vieu, 2007), recently gaining popularity, offer several advantages over deterministic methods. First, prior knowledge about the rank of matrix is not necessary, and the way to estimate the unknown rank is similar to the automatic relevance determination strategy in machine learning. On the other hand, algorithmic parameters are treated as stochastic quantities so that it is insensitive to the initialization of parameters. In Ding et al. (2011), the authors modeled the singular values of low-rank component X and the entries of sparse component E with beta-Bernoulli priors, and the resulting algorithm used a Markov Chain Monte Carlo (MCMC) sampling scheme with high computational complexity for inference. Babacan et al. (2012) adopted sparse Bayesian learning principles to recover the sparse component and the low-rank component, which started from a matrix factorization formulation and enforced the low-rank constraint in the process through the sparsity constraint. The complex noise, as considered in Zhao et al. (2014), results in representing data noise as a mixture of Gaussian, which could fit a wide range of noises and provide high recovery performance.

However, the present methods obviously impose the sparsity constraint on individual coefficient of E, and ignore the spatial connection of nonzero coefficients. In many practical scenarios, the distributions of coefficients in the sparse component are not truly pixel-wise sparse but structurally sparse. There has been a stream of problems related to RPCA proposing the use of structural information. In face recognition, the sparse noises, such as occlusion and shadow constitute continuous region distributions at different face positions. The sparse outliers in background subtraction are also typically spatially continuous. It is possible to leverage the realistic sparse distributions that go beyond simple sparsity by exploiting structural dependencies between the values and locations of the sparse coefficients. Gao et al. (2014) introduced a novel algorithm for background, which fell into the category of group-sparse RPCA. However, these works do not extend the special circumstance to general RPCA. Clearly a further adjustment, either implicitly or explicitly, is needed.

The structured sparse model that parallels the conventional theory is a general extension of traditional sparse patterns. It effectively captures the structured properties of sparse component by clustering relevant parts together, and therefore the dependencies among different elements are considered. The structured sparsity has been proved useful in a variety of domains including compressive sensing (Baraniuk et al., 2010, Peleg et al., 2012, Yu et al., 2012, Drémeau et al., 2012), background subtraction (Gao et al., 2014), and face recognition (Jia et al., 2012, Xiao et al., 2014), etc. To intuitively illustrate the reasonability of group-sparse model, we show two different sparse distributions in Fig. 1, in which blank regions indicate the pixels are zeros and shadow regions correspond to outliers (nonzero values). Clearly, the two different patterns have same values by using individual sparse constraint (e.g.,  1-norm). Therefore, it seems plausible to further recognize different sparse patterns and capture the interactions between the related pixels.

Our motivation in this paper is to solve the RPCA problem with the structured sparse component. The beta process for latent factor analysis has been widely researched, which provides more flexibility for the learning of the sparsity. The zero elements in sparse component can be further guaranteed to be exactly zero by introducing the binary latent factor indicators. Taking advantage of the beta process factor analysis (Paisley and Carin, 2009), the estimation of sparse component is split into the magnitudes of sparse coefficients and the latent variables indicating sparse patterns. According to the inherent structures, the dependence between the entries in the support can be modeled through a graph with local clusters, thus a Gaussian-Gamma framework is employed on each of local clusters to control the sparsity. In this way, not only the values of sparse component but also their sparse patterns are exploited to boost the performance of RPCA. Moreover, the proposed method is cast within a Bayesian inference procedure and based on variational Bayesian approximation (McGrory and Titterington, 2007).

This paper is organized as follows. In Section  2, we present the proposed Bayesian framework. Subsequently, the variational Bayesian inference is employed to estimate the algorithm in Section  3. Experiments including synthetic and real data are shown in Section  4. After that, we devise an extension model which can potentially be applied to the background subtraction in Section  5. Finally, Section  6 gives the conclusion.

Next, we introduce some notations used in this paper. Boldfaced upper-case letters, e.g.,  A denote matrices, while boldfaced lower-case letters, e.g.,  a, denote column vectors. For a matrix A, Ai denotes the ith row, Ai denotes the ith column, and Ai,j denotes the element that lies in the ith row and the jth column. A m by m identity matrix is denoted by Im. Imn denotes the matrix in which all elements are equal to 1 and lm is a m-dimension row vector of ones. AT and Tr(A) denote the transpose and trace of A, respectively. AF is Frobenius norm, which is the square root of the sum of the absolute value squared of all elements in matrix A.

Section snippets

Bayesian RPCA model based on structured sparsity

Let YRm×n be the observed matrix which consists of three parts: low-rank component XRm×n, sparse component ERm×n and noise term NRm×n. The Low-rank component X is formulated as the form of matrix factorization, which is commonly used in recovering the low-rank matrix, i.e., X=UVT. As discussed above, we separate the estimation of sparse values from the estimation of latent labels, which indicate the sparse patterns, such that the sparse nature of E can be guaranteed, i.e., E=WZ. The

Variational Bayesian inference

Let Θ={U,V,γ,W,α,Z,π,β} denote all hidden variables. Bayesian inference is based on evaluating the posterior distributions of unknowns given the observation. However, the posterior distributions are computationally intractable since the marginal distribution p(Y) is not calculated analytically. In this paper, we use the variational Bayes (VB) (Bishop, 2006) to deal with the problem of intractable joint posterior distribution. The approximate posterior distribution is denoted by q(Θ). The

Experiments

In this section, we present the experimental results of synthetic and real data to illustrate the performance of proposed algorithm. Before conducting these experiments, we study the choice of cardinality k (size of group). On the synthetic data, both unstructured and structured sparse cases are considered. Moreover, comparisons in different noise are made to examine the robustness of the proposed algorithm. Finally, we carry out experiments on real data including face recognition and

Discussion

So far, all issues in the process of handling RPCA with structured sparse component have been solved. In fact, the proposed variational Bayesian inference procedure estimates the posterior distributions of the unknowns iteratively, where in each iteration the method first computes the low-rank parts q(U), q(V), q(γ) using (24), (25), (27), respectively. The structured sparse parts are calculated according to (28), (29), (30), (31), respectively. Finally, the estimation of noise is obtained from

Conclusion

In this paper, we proposed a variational Bayesian based algorithm for estimation of the low-rank and structured sparse components from the observed data. The key ingredient of our method is a more realistic structured sparsity model that goes beyond the simple sparsity pattern by codifying the interdependency structure among different elements of the sparse component. To this end, the estimation of sparse component is split into the magnitudes of sparse elements and latent variables indicating

Acknowledgments

The authors would like to express their sincere gratitude to Dr. Ying Li for many valuable suggestions and comments that helped to improve the paper. The work was partially supported by the National Natural Science Foundation of China (Grant No. 61379014).

References (34)

  • J.-F. Cai et al.

    A singular value thresholding algorithm for matrix completion

    SIAM J. Optim.

    (2010)
  • E.J. Candès et al.

    Robust principal component analysis ?

    J. ACM

    (2011)
  • X. Ding et al.

    Bayesian robust principal component analysis

    IEEE Trans. Image Process.

    (2011)
  • A. Drémeau et al.

    Boltzmann machine and mean-field approximation for structured sparse decompositions

    IEEE Trans. Signal Process.

    (2012)
  • Z. Gao et al.

    Block-sparse rpca for salient motion detection

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2014)
  • Y. Hu et al.

    Fast and accurate matrix completion via truncated nuclear norm regularization

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2013)
  • Jain, P., Meka, R., Dhillon, I.S., 2010. Guaranteed rank minimization via singular value projection, in: Advances in...
  • Cited by (0)

    View full text