Robust estimator of the correlation matrix with sparse Kronecker structure for a high-dimensional matrix-variate

https://doi.org/10.1016/j.jmva.2020.104598Get rights and content

Abstract

It is of interest in many applications to estimate the correlation matrix of a high dimensional matrix-variate XRp×q. Existing works usually impose strong assumptions on the distribution of X such as sub-Gaussian or strong moment conditions. These assumptions may be violated easily in practice and a robust estimator is desired. In this paper, we consider the case where the correlation matrix has a sparse Kronecker structure and propose a robust estimator based on Kendall’s τ correlation. The proposed estimator is extended further to tensor data. The theoretical properties of the estimator are established, showing that Kronecker structure actually increases the effective sample size and leads to a fast convergence rate. Finally, we apply the proposed estimator to bigraphical model, obtaining an estimator of better convergence rate than the existing results. Simulations and real data analysis confirm the competitiveness of the proposed method.

Introduction

With the rapid development of data-collection techniques, matrix-valued data are often encountered in biostatistics and medical research [29], such as Nuclear Magnetic Resonance (NMR) data [35], electroencephalograph (EEG) data [27] and functional Magnetic Resonance Imaging (fMRI) data [6], etc. For example, in imaging genetics, each subject has a weighted (or binary) adjacency matrix for characterizing structural (or functional) connectivity pattern, which can be jointly analyzed with genetic data for discovering genes of many neuropsychiatric and neurological disorders [14], [22], [25], [31]. Moreover, on study of an electroencephalography dataset of alcoholism, each subject was exposed to a stimulus. Voltage values were measured from 64 channels of electrodes placed on the subject’s scalp for 256 time points, which forms a 256 × 64 matrix. It is of scientific interest to study the association between alcoholism and the pattern of voltage over times and channels [13], [17], [39]. Suppose the matrix-variates XkRp×q,k{1,,n} are independent copies of XRp×q. For each k, denote Xk=(Xst,k)1sp,1tq and vec(Xk)Rpq to be the vector obtained by stacking the columns of Xk.

Since the covariance matrix plays a critical role in statistic inference, estimating the covariance matrix of a matrix-variate has attracted much attention. Denote the covariance matrix of vec(Xk) as cov(vec(Xk))=ΓRpq×pq

When both p and q are fixed, many works have been developed on estimating Γ [5], [7]. When p and q are diverging, estimating Γ can be challenging. Firstly, the number of the parameters in model (1) is pq(pq+1)2 by the symmetry of Γ, which can be much larger than the sample size n. This will greatly reduce the computational efficiency and statistical accuracy. For example, when performing face recognition, Li and Yuan in [19] analyzed merely 400 observations with each of size 112 × 92, leading to more than 10,000 unknown parameters. Secondly, model (1) simply stacks a matrix into a vector, losing information brought by matrix structure. To overcome these difficulties, it is common to impose a Kronecker structure on Γ in the literature [11], [15], [16], [23], [32], [37], etc. For example, Teng and Huang in [30] proposed a Kronecker product matrix to model gene-experiment interactions. This gene expression matrix measured over multiple tissues is transposable, implying that potentially both the rows and/or columns are correlated. Similar idea has also been used in spatio-temporal data. Specifically, Γ=ΨΣ,where ΨRq×q represents the dependence among the columns of Xk, while ΣRp×p denotes that of the rows. Both Ψ and Σ are symmetric matrices. Model (2) not only retains the dependence patterns among columns and rows, but also reduces the number of unknown parameters. Obviously, model (2) has (p2+q2+p+q)21 free parameters, much smaller than that of model (1). Many works have been developed for model (2) [15], [16], [23], [24], [32], [33], [37]. For example, under additional sparsity assumptions on Ψ and Σ, Leng and Pan in [15] proposed their estimators, denoted as ΨˆLP and ΣˆLP, taking the advantages of both Kronecker structure and sparsity. It is shown that ΨˆLP and ΣˆLP have the convergence rates of order Op(logq(np)) and Op(logp(nq)), respectively, in terms of the element-wise norm.

However, there are some potential limitations on estimating the covariance matrix. As the covariance matrix is not scale invariant, researchers always standardize the variables to have zero mean and unit variance ahead. In this way, the estimation of covariance matrix is turned out to be that of correlation matrix. Without this prior information, more unknown parameters should be estimated and the convergence rate will be slowed down. Besides, correlation matrix is sufficient to reveal the dependence among variables without estimating the marginal variance. More discussion on the importance of high-dimensional correlation matrix estimation is referred to [4]. Thus, we focus on the estimation of the correlation matrix RΓRpq×pq of vec(Xk) under the following Kronecker structure RΓ=RΨRΣ,where RΨRq×q and RΣRp×p are both symmetric matrices. Since the diagonal elements of RΓ are all equal to one, it is easy to see that diag(RΨ)=c1q and diag(RΣ)=c11p for some c>0, where for any integer m, 1m denotes the vector of length m with all elements being one. Note that multiplying RΨ by c̃ and RΣ by c̃1 for any c̃>0, one gets the same RΓ. For identifiability, we set all the diagonal elements of both RΨ and RΣ being one. That is, RΨ and RΣ denote the correlations among the columns and rows of Xk, respectively. In this paper, we focus on the case of p and q being much larger than n. Similar to [15], we assume that both RΨ and RΣ are sparse matrices.

The estimator of covariance matrix can deliver an estimator of the correlation matrix in a way similar to [15]. However, most of the works mentioned above are developed under Gaussian, sub-Gaussian assumption or strong moment conditions on Xk. As mentioned in [8], there are several disadvantages of these assumptions. Firstly, these estimators are not robust to outliers or heavy tailed distributions. Secondly, the Gaussian or sub-Gaussian assumption may not be realistic for many real-world applications, such as equities and market indices data [8], and gene expression data analysis. For example, when analyzing the atlas of gene expression in the mouse aging project dataset [38], Ning and Liu in [23] found out by the quantile–quantile plot that many gene expression levels may not be normally distributed. Therefore, it is desirable to develop a robust estimation under weaker assumptions on the distribution. Particularly, Han et al. in [20] and [8], [9] proposed a transelliptical family of distribution for random vector y=(y1,,yp). y follows a transelliptical distribution, denoted by yTEp(Σ0,ξ;f1,,fp), if there exist some unknown univariate strictly increasing functions (f1,,fp) such that (f1(y1),,fp(yp)) follows an elliptical distribution ECp(0,Σ0,ξ) with the location parameter zero and the scale parameter Σ0, whose diagonal elements are one. Σ0 is called latent generalized correlation matrix [8]. Particularly, if (f1(y1),,fp(yp)) follows a normal distribution with correlation matrix Σ0, then Σ0 is called latent correlation matrix. For brevity, a unified term “latent correlation matrix” is used to denote “latent (generalized) correlation matrix” throughout this paper. Since each fj,j{1,,p} is a strictly increasing function, its inverse function exists. Consequently, viewing the nuisance parameters (f1,,fp) as contaminations, y can be viewed as contaminated observation of some elliptical or normal variable with correlation matrix Σ0, which is the parameter of interest.

In this paper, we apply the frame of [8] to matrix-variates, assuming that vec(Xk),k{1,,n} are independently and identically distributed (i.i.d.) variables from transelliptical family TEp(RΓ,ξ;f1,,fp), where the latent correlation matrix RΓ has the sparse Kronecker structure as in (3). A robust estimator of the latent correlation matrix is developed based on Kendall’s τ correlation. We extend our approach further to the tensor type data and apply the estimator to bigraphical model.

Our estimator is an extension of that of [15], relaxing the assumptions on Xk. However, there are several major differences between two works. First, we focus on correlation matrix while the work of Leng and Pan [15] focuses on covariance matrix. The correlation matrix has a special property shown in (4) in Section 2, which is not owned by covariance matrix. As a result, the procedures of our estimators, utilizing this property, are significantly different from that of [15]. Second, our estimator is robust with weaker assumption on the distribution of the data. Third, because the Kendall’s τ estimator involves a high dimensional binary random vector, the techniques used for establishing the theoretical properties are different from that of [15]. Interestingly, the convergence rates obtained are similar to those of [15]. In addition, our work improves some existing works. For example, Niu and Zhao in [24] proposed a robust estimator for a general model, which included (3) as a special case. However, they did not assume the sparsity of the parameters, and the corresponding estimator has a slower convergence rate of the order Op((p2+q2+logn)n+(pqlog(pq))n). Our estimator improves those of [24] by utilizing the sparsity of the parameters. In addition, our estimator improves the results of [23] on the bigraphical model. Ning and Liu in [23] proposed an estimator for the nonparanormal bigraphical model based on Kendall’s τ correlation. However, they failed to utilize the Kronecker structure when estimating RΓ, obtaining only a slower convergence rate than ours. Details are referred to Section 3.

The main contents of the article are organized as follows. In Section 2, we propose the robust estimator for the latent correlation matrix with sparse Kronecker structure (3) based on Kendall’s τ correlation. The theoretical properties of the proposed estimators are established in Section 3.1. Then, our estimators are applied to bigraphical models in Section 3.2. The generalization to tensor data is presented in Section 4. Simulation results and real data analysis are presented in Section 5 and Section 6, respectively. All the proofs are presented in Appendix.

Finally, we introduce some notations. For any constant aR, (a)+=max{a,0}. For any vector vRm, v denotes the Euclidean norm of v. For any m×m matrix M=(Mij), M2 denotes the operator norm and MF denotes the Frobenius norm of M. Mmax stands for the element-wise norm, i.e. Mmax=maxi,j|Mij|. tr(M) denotes the trace of M. M0 indicates that M is a positive semidefinite matrix, while M0 denotes a positive definite matrix. For any symmetric matrix M, ωmin(M) and ωmax(M) denote the minimum and maximum eigenvalues of M, respectively. For clarity, for any random vector y=(y1,,yp)Rp, the Pearson correlation matrix and Kendall’s τ correlation matrix of y are denoted as corr(y)Rp×p and corrK(y)Rp×p, respectively.

Section snippets

Estimation of the latent correlation matrix with sparse kronecker structure by Kendall’s τ correlation

Recall that vec(Xk),1kn are i.i.d. variables from the transelliptical distribution TEp(RΓ,ξ;f1,,fp). Xi:,k and X:j,k denote the ith row and jth column of Xk, respectively. Let R(j)ΣRp×p and R(i)ΨRq×q be the latent correlation matrices of the random vectors X:j,k and Xi:,k, respectively. According to model (3) and the identifiability condition there, a key observation is that RΣR(j)Σforj{1,,q}.Similarly, the latent correlation matrix of vec(Xk) equals RΣRΨ, and consequently RΨR(i)Ψ, i

Asymptotic properties of the estimators

In this section, we present the theoretical properties of the estimators RˆτΣ, RˆτΨ, and RˆτΓ. To simplify the argument, we introduce some notations. Let Zk=vec(Xk)Rpq, Ukk=sign(ZkZk)Rpq,1kkn, and Vkk=vec(UkkUkkE(UkkUkk)). TΓ=E(UkkUkk) and TˆΓ=2(n(n1))11k<knUkkUkk denote the population Kendall’s τ correlation matrix and its estimator, respectively. Let Envec(TˆΓTΓ)=2(n(n1))11k<knVkk,which is a U-statistic. Let aRp2q2 be a vector with a=1 and denote W=cov(

Extension to tensor data

The proposed estimators can be generalized to tensor data. We first introduce some notations. For any tensor ZRp1××pM, denote vec(Z) to be the operator that vectorizes an M-dimensional tensor ZRp1××pM into a column vector. Particularly, the (i1,,iM)th entry of Z, denoted as zi1iM, maps to the jth entry in vec(Z), where j=i1+m=1M(im1)m=1m1pm. The mode-m matricization (or known as unfolding) of a tensor Z, defined as Z(m), is a pm×qm matrix with qm=mmpm such that the (i1,,iM)th

Simulation on correlation matrix estimation

In this section, we adopt the adaptive Lasso penalty in (8), (9), and compare our estimator with the non-robust one of [15]. We generate XkRp×q,1kn being i.i.d. matrix-covariates as follows: Xk=ASkB,1kn,where ARp×p, BRq×q are constant matrices and SkRp×q are independent random matrices with cov(vec(Sk))=cIpq with some constant c>0. The distribution of Sk will be specified later. Then it is easy to see that Γ=cov(vec(Xk))=c2BBAA=ΨΣ,where Ψ and Σ are defined accordingly.

Real data analysis

We apply our method to Atlas of Gene Expression in the Mouse Aging (AGEMAP) database, which is a resource of gene expression as a function of age in mice. The database includes expression of 8932 genes in 16 tissues as a function of age [38]. There are 4 age states: 1, 6, 16 and 24 months. For each age state, researchers chose ten mice with five for each gender, and consequently there are 40 observations in total (i.e., n=40). Similar to [15] and [37], we select seven tissues: Cerebrum,

Discussion

In this paper, we consider the estimation of the correlation matrix with a sparse Kronecker structure and propose a robust estimator based on Kendall’s τ correlation under the assumption of transelliptical distribution. The proposed estimator is extended further to bigraphical model and tensor data. We also establish the theoretical properties of the estimator, showing that Kronecker structure actually increases the effective sample size and leads to a fast convergence rate. The empirical

CRediT authorship contribution statement

Lu Niu: Conceptualization, Methodology, Software, Formal analysis, Data curation, Writing - original draft, Writing - review & editing. Xiumin Liu: Writing - review & editing. Junlong Zhao: Conceptualization, Methodology, Writing - review & editing, Supervision, Project administration, Funding acquisition.

Acknowledgments

Junlong ZHAO was supported by National Science Foundation of China, No. 11871104 and No. 11471030, and the Fundamental Research Funds for the Central Universities, China. Lu NIU was supported by National Science Foundation of China , No. 11971048.

References (41)

  • BarberR.F. et al.

    Rocket: Robust confidence intervals via Kendall’s tau for transelliptical graphical models

    Ann. Statist.

    (2018)
  • BergsmaW. et al.

    A consistent test of independence based on a sign covariance related to Kendall’s tau

    Bernoulli

    (2014)
  • BickelP.J. et al.

    Regularized estimation of large covariance matrices

    Ann. Statist.

    (2008)
  • CuiY. et al.

    Sparse estimation of high-dimensional correlation matrices

    Comput. Statist. Data Anal.

    (2016)
  • DutilleulP.

    The mle algorithm for the matrix normal distribution

    J. Statist. Comput. Simul.

    (1999)
  • FoxM.D. et al.

    Spontaneous fluctuations in brain activity observed with functional magnetic resonance imaging.

    Nat. Rev. Neurosci.

    (2007)
  • GuptaA.K. et al.

    Matrix Variate Distributions, Vol. 104

    (1999)
  • HanF. et al.

    Scale-invariant sparse PCA on high dimensional meta-elliptical data.

    J. Amer. Statist. Assoc.

    (2014)
  • HanF. et al.

    Statistical analysis of latent generalized correlation matrix estimation in transelliptical distribution

    Bernoulli

    (2017)
  • HeS. et al.

    Graphical model selection and estimation for high dimensional tensor data.

    J. Multivariate Anal.

    (2014)
  • HoffP.D.

    Separable covariance arrays via the Tucker product, with applications to multivariate relational data

    Bayesian Anal.

    (2011)
  • HufferF.W. et al.

    A test for elliptical symmetry

    J. Multivariate Anal.

    (2007)
  • HungH. et al.

    Matrix variate logistic regression model with application to EEG data

    Biostatistics

    (2012)
  • KongD. et al.

    L2rm: Low-rank linear regression models for high-dimensional matrix responses

    J. Amer. Statist. Assoc.

    (2018)
  • LengC. et al.

    Covariance estimation via sparse Kronecker structures

    Bernoulli

    (2018)
  • LengC. et al.

    Sparse matrix graphical models

    J. Amer. Statist. Assoc.

    (2012)
  • LiB. et al.

    On dimension folding of matrix-or array-valued statistical objects

    Ann. Statist.

    (2010)
  • LiG. et al.

    Robust rank correlation based screening

    Ann. Statist.

    (2010)
  • LiM. et al.

    2d-lda: A statistical linear discriminant analysis for image matrix

    Pattern Recognit. Lett.

    (2005)
  • LiuH. et al.

    High dimensional semiparametric Gaussian copula graphical models

    Ann. Statist.

    (2012)
  • Cited by (6)

    View full text