Robust estimator of the correlation matrix with sparse Kronecker structure for a high-dimensional matrix-variate

doi:10.1016/j.jmva.2020.104598

Journal of Multivariate Analysis

Volume 177, May 2020, 104598

https://doi.org/10.1016/j.jmva.2020.104598 Get rights and content

Abstract

It is of interest in many applications to estimate the correlation matrix of a high dimensional matrix-variate $X \in R^{p \times q}$ . Existing works usually impose strong assumptions on the distribution of $X$ such as sub-Gaussian or strong moment conditions. These assumptions may be violated easily in practice and a robust estimator is desired. In this paper, we consider the case where the correlation matrix has a sparse Kronecker structure and propose a robust estimator based on Kendall’s $τ$ correlation. The proposed estimator is extended further to tensor data. The theoretical properties of the estimator are established, showing that Kronecker structure actually increases the effective sample size and leads to a fast convergence rate. Finally, we apply the proposed estimator to bigraphical model, obtaining an estimator of better convergence rate than the existing results. Simulations and real data analysis confirm the competitiveness of the proposed method.

Introduction

With the rapid development of data-collection techniques, matrix-valued data are often encountered in biostatistics and medical research [29], such as Nuclear Magnetic Resonance (NMR) data [35], electroencephalograph (EEG) data [27] and functional Magnetic Resonance Imaging (fMRI) data [6], etc. For example, in imaging genetics, each subject has a weighted (or binary) adjacency matrix for characterizing structural (or functional) connectivity pattern, which can be jointly analyzed with genetic data for discovering genes of many neuropsychiatric and neurological disorders [14], [22], [25], [31]. Moreover, on study of an electroencephalography dataset of alcoholism, each subject was exposed to a stimulus. Voltage values were measured from 64 channels of electrodes placed on the subject’s scalp for 256 time points, which forms a 256 × 64 matrix. It is of scientific interest to study the association between alcoholism and the pattern of voltage over times and channels [13], [17], [39]. Suppose the matrix-variates $X_{k} \in R^{p \times q}, k \in {1, \dots, n}$ are independent copies of $X \in R^{p \times q}$ . For each $k$ , denote $X_{k} = {(X_{s t, k})}_{1 \leq s \leq p, 1 \leq t \leq q}$ and $vec (X_{k}) \in R^{p q}$ to be the vector obtained by stacking the columns of $X_{k}$ .

Since the covariance matrix plays a critical role in statistic inference, estimating the covariance matrix of a matrix-variate has attracted much attention. Denote the covariance matrix of $vec (X_{k})$ as $cov (vec (X_{k})) = Γ \in R^{p q \times p q}$

When both $p$ and $q$ are fixed, many works have been developed on estimating $Γ$ [5], [7]. When $p$ and $q$ are diverging, estimating $Γ$ can be challenging. Firstly, the number of the parameters in model (1) is $p q (p q + 1) ∕ 2$ by the symmetry of $Γ$ , which can be much larger than the sample size $n$ . This will greatly reduce the computational efficiency and statistical accuracy. For example, when performing face recognition, Li and Yuan in [19] analyzed merely 400 observations with each of size 112 × 92, leading to more than 10,000 unknown parameters. Secondly, model (1) simply stacks a matrix into a vector, losing information brought by matrix structure. To overcome these difficulties, it is common to impose a Kronecker structure on $Γ$ in the literature [11], [15], [16], [23], [32], [37], etc. For example, Teng and Huang in [30] proposed a Kronecker product matrix to model gene-experiment interactions. This gene expression matrix measured over multiple tissues is transposable, implying that potentially both the rows and/or columns are correlated. Similar idea has also been used in spatio-temporal data. Specifically, $Γ = Ψ \otimes Σ,$ where $Ψ \in R^{q \times q}$ represents the dependence among the columns of $X_{k}$ , while $Σ \in R^{p \times p}$ denotes that of the rows. Both $Ψ$ and $Σ$ are symmetric matrices. Model (2) not only retains the dependence patterns among columns and rows, but also reduces the number of unknown parameters. Obviously, model (2) has $(p^{2} + q^{2} + p + q) ∕ 2 - 1$ free parameters, much smaller than that of model (1). Many works have been developed for model (2) [15], [16], [23], [24], [32], [33], [37]. For example, under additional sparsity assumptions on $Ψ$ and $Σ$ , Leng and Pan in [15] proposed their estimators, denoted as ${\hat{Ψ}}_{L P}$ and ${\hat{Σ}}_{L P}$ , taking the advantages of both Kronecker structure and sparsity. It is shown that ${\hat{Ψ}}_{L P}$ and ${\hat{Σ}}_{L P}$ have the convergence rates of order $O_{p} (\sqrt{log q ∕ (n p)})$ and $O_{p} (\sqrt{log p ∕ (n q)})$ , respectively, in terms of the element-wise $ℓ_{\infty}$ norm.

However, there are some potential limitations on estimating the covariance matrix. As the covariance matrix is not scale invariant, researchers always standardize the variables to have zero mean and unit variance ahead. In this way, the estimation of covariance matrix is turned out to be that of correlation matrix. Without this prior information, more unknown parameters should be estimated and the convergence rate will be slowed down. Besides, correlation matrix is sufficient to reveal the dependence among variables without estimating the marginal variance. More discussion on the importance of high-dimensional correlation matrix estimation is referred to [4]. Thus, we focus on the estimation of the correlation matrix $R^{Γ} \in R^{p q \times p q}$ of $vec (X_{k})$ under the following Kronecker structure $R^{Γ} = R^{Ψ} \otimes R^{Σ},$ where $R^{Ψ} \in R^{q \times q}$ and $R^{Σ} \in R^{p \times p}$ are both symmetric matrices. Since the diagonal elements of $R^{Γ}$ are all equal to one, it is easy to see that $diag (R^{Ψ}) = c 1_{q}$ and $diag (R^{Σ}) = c^{- 1} 1_{p}$ for some $c > 0$ , where for any integer $m$ , $1_{m}$ denotes the vector of length $m$ with all elements being one. Note that multiplying $R^{Ψ}$ by $\tilde{c}$ and $R^{Σ}$ by ${\tilde{c}}^{- 1}$ for any $\tilde{c} > 0$ , one gets the same $R^{Γ}$ . For identifiability, we set all the diagonal elements of both $R^{Ψ}$ and $R^{Σ}$ being one. That is, $R^{Ψ}$ and $R^{Σ}$ denote the correlations among the columns and rows of $X_{k}$ , respectively. In this paper, we focus on the case of $p$ and $q$ being much larger than $n$ . Similar to [15], we assume that both $R^{Ψ}$ and $R^{Σ}$ are sparse matrices.

The estimator of covariance matrix can deliver an estimator of the correlation matrix in a way similar to [15]. However, most of the works mentioned above are developed under Gaussian, sub-Gaussian assumption or strong moment conditions on $X_{k}$ . As mentioned in [8], there are several disadvantages of these assumptions. Firstly, these estimators are not robust to outliers or heavy tailed distributions. Secondly, the Gaussian or sub-Gaussian assumption may not be realistic for many real-world applications, such as equities and market indices data [8], and gene expression data analysis. For example, when analyzing the atlas of gene expression in the mouse aging project dataset [38], Ning and Liu in [23] found out by the quantile–quantile plot that many gene expression levels may not be normally distributed. Therefore, it is desirable to develop a robust estimation under weaker assumptions on the distribution. Particularly, Han et al. in [20] and [8], [9] proposed a transelliptical family of distribution for random vector $y = {(y_{1}, \dots, y_{p})}^{⊤}$ . $y$ follows a transelliptical distribution, denoted by $y \sim T E_{p} (Σ^{0}, ξ; f_{1}, \dots, f_{p})$ , if there exist some unknown univariate strictly increasing functions $(f_{1}, \dots, f_{p})$ such that ${(f_{1} (y_{1}), \dots, f_{p} (y_{p}))}^{⊤}$ follows an elliptical distribution $E C_{p} (0, Σ^{0}, ξ)$ with the location parameter zero and the scale parameter $Σ^{0}$ , whose diagonal elements are one. $Σ^{0}$ is called latent generalized correlation matrix [8]. Particularly, if ${(f_{1} (y_{1}), \dots, f_{p} (y_{p}))}^{⊤}$ follows a normal distribution with correlation matrix $Σ^{0}$ , then $Σ^{0}$ is called latent correlation matrix. For brevity, a unified term “latent correlation matrix” is used to denote “latent (generalized) correlation matrix” throughout this paper. Since each $f_{j}, j \in {1, \dots, p}$ is a strictly increasing function, its inverse function exists. Consequently, viewing the nuisance parameters $(f_{1}, \dots, f_{p})$ as contaminations, $y$ can be viewed as contaminated observation of some elliptical or normal variable with correlation matrix $Σ^{0}$ , which is the parameter of interest.

In this paper, we apply the frame of [8] to matrix-variates, assuming that $vec (X_{k}), k \in {1, \dots, n}$ are independently and identically distributed $(i . i . d .)$ variables from transelliptical family $T E_{p} (R^{Γ}, ξ; f_{1}, \dots, f_{p})$ , where the latent correlation matrix $R^{Γ}$ has the sparse Kronecker structure as in (3). A robust estimator of the latent correlation matrix is developed based on Kendall’s $τ$ correlation. We extend our approach further to the tensor type data and apply the estimator to bigraphical model.

Our estimator is an extension of that of [15], relaxing the assumptions on $X_{k}$ . However, there are several major differences between two works. First, we focus on correlation matrix while the work of Leng and Pan [15] focuses on covariance matrix. The correlation matrix has a special property shown in (4) in Section 2, which is not owned by covariance matrix. As a result, the procedures of our estimators, utilizing this property, are significantly different from that of [15]. Second, our estimator is robust with weaker assumption on the distribution of the data. Third, because the Kendall’s $τ$ estimator involves a high dimensional binary random vector, the techniques used for establishing the theoretical properties are different from that of [15]. Interestingly, the convergence rates obtained are similar to those of [15]. In addition, our work improves some existing works. For example, Niu and Zhao in [24] proposed a robust estimator for a general model, which included (3) as a special case. However, they did not assume the sparsity of the parameters, and the corresponding estimator has a slower convergence rate of the order $O_{p} (\sqrt{(p^{2} + q^{2} + log n) ∕ n + (p q log (p q)) ∕ n})$ . Our estimator improves those of [24] by utilizing the sparsity of the parameters. In addition, our estimator improves the results of [23] on the bigraphical model. Ning and Liu in [23] proposed an estimator for the nonparanormal bigraphical model based on Kendall’s $τ$ correlation. However, they failed to utilize the Kronecker structure when estimating $R^{Γ}$ , obtaining only a slower convergence rate than ours. Details are referred to Section 3.

The main contents of the article are organized as follows. In Section 2, we propose the robust estimator for the latent correlation matrix with sparse Kronecker structure (3) based on Kendall’s $τ$ correlation. The theoretical properties of the proposed estimators are established in Section 3.1. Then, our estimators are applied to bigraphical models in Section 3.2. The generalization to tensor data is presented in Section 4. Simulation results and real data analysis are presented in Section 5 and Section 6, respectively. All the proofs are presented in Appendix.

Finally, we introduce some notations. For any constant $a \in R$ , ${(a)}_{+} = max {a, 0}$ . For any vector $v \in R^{m}$ , $‖ v ‖$ denotes the Euclidean norm of $v$ . For any $m \times m$ matrix $M = (M_{i j})$ , ${‖ M ‖}_{2}$ denotes the operator norm and ${‖ M ‖}_{F}$ denotes the Frobenius norm of $M$ . ${‖ M ‖}_{max}$ stands for the element-wise $ℓ_{\infty}$ norm, i.e. ${‖ M ‖}_{max} = {max}_{i, j} | M_{i j} |$ . $tr (M)$ denotes the trace of $M$ . $M ⪰ 0$ indicates that $M$ is a positive semidefinite matrix, while $M ≻ 0$ denotes a positive definite matrix. For any symmetric matrix $M$ , $ω_{min} (M)$ and $ω_{max} (M)$ denote the minimum and maximum eigenvalues of $M$ , respectively. For clarity, for any random vector $y = {(y_{1}, \dots, y_{p})}^{⊤} \in R^{p}$ , the Pearson correlation matrix and Kendall’s $τ$ correlation matrix of $y$ are denoted as $corr (y) \in R^{p \times p}$ and ${corr}^{K} (y) \in R^{p \times p}$ , respectively.

Section snippets

Estimation of the latent correlation matrix with sparse kronecker structure by Kendall’s $τ$ correlation

Recall that $vec (X_{k}), 1 \leq k \leq n$ are $i . i . d$ . variables from the transelliptical distribution $T E_{p} (R^{Γ}, ξ; f_{1}, \dots, f_{p})$ . $X_{i :, k}$ and $X_{: j, k}$ denote the $i$ th row and $j$ th column of $X_{k}$ , respectively. Let $R_{(j)}^{Σ} \in R^{p \times p}$ and $R_{(i)}^{Ψ} \in R^{q \times q}$ be the latent correlation matrices of the random vectors $X_{: j, k}$ and $X_{i :, k}$ , respectively. According to model (3) and the identifiability condition there, a key observation is that $R^{Σ} \equiv R_{(j)}^{Σ} for j \in {1, \dots, q} .$ Similarly, the latent correlation matrix of $vec (X_{k}^{⊤})$ equals $R^{Σ} \otimes R^{Ψ}$ , and consequently $R^{Ψ} \equiv R_{(i)}^{Ψ}$ , $i \in$

Asymptotic properties of the estimators

In this section, we present the theoretical properties of the estimators ${\hat{R}}_{τ}^{Σ}$ , ${\hat{R}}_{τ}^{Ψ}$ , and ${\hat{R}}_{τ}^{Γ}$ . To simplify the argument, we introduce some notations. Let $Z_{k} = vec (X_{k}) \in R^{p q}$ , $U_{k k^{'}} = sign (Z_{k} - Z_{k^{'}}) \in R^{p q}, 1 \leq k \neq k^{'} \leq n$ , and $V_{k k^{'}} = vec (U_{k k^{'}} U_{k k^{'}}^{⊤} - E (U_{k k^{'}} U_{k k^{'}}^{⊤}))$ . $T^{Γ} = E (U_{k k^{'}} U_{k k^{'}}^{⊤})$ and ${\hat{T}}^{Γ} = 2 {(n (n - 1))}^{- 1} \sum_{1 \leq k < k^{'} \leq n} U_{k k^{'}} U_{k k^{'}}^{⊤}$ denote the population Kendall’s $τ$ correlation matrix and its estimator, respectively. Let $E_{n} ≔ vec ({\hat{T}}^{Γ} - T^{Γ}) = 2 {(n (n - 1))}^{- 1} \sum_{1 \leq k < k^{'} \leq n} V_{k k^{'}},$ which is a U-statistic. Let $a \in R^{p^{2} q^{2}}$ be a vector with $‖ a ‖ = 1$ and denote $W = cov ($

Extension to tensor data

The proposed estimators can be generalized to tensor data. We first introduce some notations. For any tensor $Z \in R^{p_{1} \times \dots \times p_{M}}$ , denote $vec (Z)$ to be the operator that vectorizes an $M$ -dimensional tensor $Z \in R^{p_{1} \times \dots \times p_{M}}$ into a column vector. Particularly, the $(i_{1}, \dots, i_{M})$ th entry of $Z$ , denoted as $z^{i_{1} \dots i_{M}}$ , maps to the $j$ th entry in $vec (Z)$ , where $j = i_{1} + \sum_{m = 1}^{M} (i_{m} - 1) \prod_{m^{'} = 1}^{m - 1} p_{m^{'}}$ . The mode- $m$ matricization (or known as unfolding) of a tensor $Z$ , defined as $Z^{(m)}$ , is a $p_{m} \times q_{m}$ matrix with $q_{m} = \prod_{m^{'} \neq m} p_{m^{'}}$ such that the $(i_{1}, \dots, i_{M})$ th

Simulation on correlation matrix estimation

In this section, we adopt the adaptive Lasso penalty in (8), (9), and compare our estimator with the non-robust one of [15]. We generate $X_{k} \in R^{p \times q}, 1 \leq k \leq n$ being $i . i . d$ . matrix-covariates as follows: $X_{k} = A S_{k} B^{⊤}, 1 \leq k \leq n,$ where $A \in R^{p \times p}$ , $B \in R^{q \times q}$ are constant matrices and $S_{k} \in R^{p \times q}$ are independent random matrices with $cov (vec (S_{k})) = c \cdot I_{p q}$ with some constant $c > 0$ . The distribution of $S_{k}$ will be specified later. Then it is easy to see that $Γ = cov (vec (X_{k})) = c^{2} B B^{⊤} \otimes A A^{⊤} = Ψ \otimes Σ,$ where $Ψ$ and $Σ$ are defined accordingly.

Real data analysis

We apply our method to Atlas of Gene Expression in the Mouse Aging (AGEMAP) database, which is a resource of gene expression as a function of age in mice. The database includes expression of 8932 genes in 16 tissues as a function of age [38]. There are 4 age states: 1, 6, 16 and 24 months. For each age state, researchers chose ten mice with five for each gender, and consequently there are 40 observations in total (i.e., $n = 40$ ). Similar to [15] and [37], we select seven tissues: Cerebrum,

Discussion

In this paper, we consider the estimation of the correlation matrix with a sparse Kronecker structure and propose a robust estimator based on Kendall’s $τ$ correlation under the assumption of transelliptical distribution. The proposed estimator is extended further to bigraphical model and tensor data. We also establish the theoretical properties of the estimator, showing that Kronecker structure actually increases the effective sample size and leads to a fast convergence rate. The empirical

CRediT authorship contribution statement

Lu Niu: Conceptualization, Methodology, Software, Formal analysis, Data curation, Writing - original draft, Writing - review & editing. Xiumin Liu: Writing - review & editing. Junlong Zhao: Conceptualization, Methodology, Writing - review & editing, Supervision, Project administration, Funding acquisition.

Acknowledgments

Junlong ZHAO was supported by National Science Foundation of China, No. 11871104 and No. 11471030, and the Fundamental Research Funds for the Central Universities, China. Lu NIU was supported by National Science Foundation of China , No. 11971048.

References (41)

BarberR.F. et al.
Rocket: Robust confidence intervals via Kendall’s tau for transelliptical graphical models
Ann. Statist.
(2018)
BergsmaW. et al.
A consistent test of independence based on a sign covariance related to Kendall’s tau
Bernoulli
(2014)
BickelP.J. et al.
Regularized estimation of large covariance matrices
Ann. Statist.
(2008)
CuiY. et al.
Sparse estimation of high-dimensional correlation matrices
Comput. Statist. Data Anal.
(2016)
DutilleulP.
The mle algorithm for the matrix normal distribution
J. Statist. Comput. Simul.
(1999)
FoxM.D. et al.
Spontaneous fluctuations in brain activity observed with functional magnetic resonance imaging.
Nat. Rev. Neurosci.
(2007)
GuptaA.K. et al.
Matrix Variate Distributions, Vol. 104
(1999)
HanF. et al.
Scale-invariant sparse PCA on high dimensional meta-elliptical data.
J. Amer. Statist. Assoc.
(2014)
HanF. et al.
Statistical analysis of latent generalized correlation matrix estimation in transelliptical distribution
Bernoulli
(2017)
HeS. et al.
Graphical model selection and estimation for high dimensional tensor data.
J. Multivariate Anal.
(2014)

HoffP.D.

Separable covariance arrays via the Tucker product, with applications to multivariate relational data

Bayesian Anal.

(2011)

HufferF.W. et al.

A test for elliptical symmetry

J. Multivariate Anal.

(2007)

HungH. et al.

Matrix variate logistic regression model with application to EEG data

Biostatistics

(2012)

KongD. et al.

L2rm: Low-rank linear regression models for high-dimensional matrix responses

J. Amer. Statist. Assoc.

(2018)

LengC. et al.

Covariance estimation via sparse Kronecker structures

Bernoulli

(2018)

LengC. et al.

Sparse matrix graphical models

J. Amer. Statist. Assoc.

(2012)

LiB. et al.

On dimension folding of matrix-or array-valued statistical objects

Ann. Statist.

(2010)

LiG. et al.

Robust rank correlation based screening

Ann. Statist.

(2010)

LiM. et al.

2d-lda: A statistical linear discriminant analysis for image matrix

Pattern Recognit. Lett.

(2005)

LiuH. et al.

High dimensional semiparametric Gaussian copula graphical models

Ann. Statist.

(2012)

Cited by (6)

Robust tests for scatter separability beyond Gaussianity
2023, Computational Statistics and Data Analysis
Citation Excerpt :
Ning and Liu (2013) study a bigraphical model using a (matrix-variate) nonparanormal distribution, which is more robust at data-generation by nature than the Gaussian bigraphical model. Niu et al. (2020) propose sparse Kronecker components under a transelliptical distribution, a extension of an elliptical distribution later explained. Breloy et al. (2016) and Sun et al. (2016) explore Tyler's M estimator under separability (with/without a rank constraint) and their algorithmic properties (e.g. global convergence of geodesically convex estimators).
Separability (a Kronecker product) of a scatter matrix is one of favorable structures when multivariate heavy-tailed data are collected in a matrix form, due to its parsimonious representation. However, little attempt has been made to test separability beyond Gaussianity. In this paper, we present nonparametric separability tests that can be applied to a larger class of multivariate distributions not only including elliptical distributions but also generalized elliptical distributions and transelliptical distributions. The proposed test statistic exploits robustness of Tyler's M (or Kendall's tau) estimator and a likelihood function of a scaled variable. Since its distribution is hard to specify, we approximate the p-value using a permutation procedure, whose unbiasedness is obtained from the permutation invariance of multivariate paired data. Our simulation study demonstrates the efficacy of our method against other alternatives, and we apply it to rhesus monkey data and corpus callosum data.
Statistical inference on the significance of rows and columns for matrix-valued data in an additive model
2023, Test
Interpretable and Scalable Graphical Models for Complex Spatio-temporal Processes
2023, arXiv
Thresholding Approach for Low-Rank Correlation Matrix Based on MM Algorithm
2022, Entropy
An overview of heavy-tail extensions of multivariate Gaussian distribution and their relations
2022, Journal of Applied Statistics
Generalized covariance estimator
2021, arXiv

View full text

Robust estimator of the correlation matrix with sparse Kronecker structure for a high-dimensional matrix-variate

Abstract

Introduction

Section snippets

Estimation of the latent correlation matrix with sparse kronecker structure by Kendall’s τ correlation

Asymptotic properties of the estimators

Extension to tensor data

Simulation on correlation matrix estimation

Real data analysis

Discussion

CRediT authorship contribution statement

Acknowledgments

Rocket: Robust confidence intervals via Kendall’s tau for transelliptical graphical models

Ann. Statist.

A consistent test of independence based on a sign covariance related to Kendall’s tau

Bernoulli

Regularized estimation of large covariance matrices

Ann. Statist.

Sparse estimation of high-dimensional correlation matrices

Comput. Statist. Data Anal.

The mle algorithm for the matrix normal distribution

J. Statist. Comput. Simul.

Spontaneous fluctuations in brain activity observed with functional magnetic resonance imaging.

Nat. Rev. Neurosci.

Matrix Variate Distributions, Vol. 104

Scale-invariant sparse PCA on high dimensional meta-elliptical data.

J. Amer. Statist. Assoc.

Statistical analysis of latent generalized correlation matrix estimation in transelliptical distribution

Bernoulli

Graphical model selection and estimation for high dimensional tensor data.

J. Multivariate Anal.

Separable covariance arrays via the Tucker product, with applications to multivariate relational data

Bayesian Anal.

A test for elliptical symmetry

J. Multivariate Anal.

Matrix variate logistic regression model with application to EEG data

Biostatistics

L2rm: Low-rank linear regression models for high-dimensional matrix responses

J. Amer. Statist. Assoc.

Covariance estimation via sparse Kronecker structures

Bernoulli

Sparse matrix graphical models

J. Amer. Statist. Assoc.

On dimension folding of matrix-or array-valued statistical objects

Ann. Statist.

Robust rank correlation based screening

Ann. Statist.

2d-lda: A statistical linear discriminant analysis for image matrix

Pattern Recognit. Lett.

High dimensional semiparametric Gaussian copula graphical models

Ann. Statist.

Estimation of the latent correlation matrix with sparse kronecker structure by Kendall’s $τ$ correlation