Envelope-based sparse reduced-rank regression for multivariate linear model

https://doi.org/10.1016/j.jmva.2023.105159Get rights and content

Abstract

Envelope models were first proposed by Cook et al. (2010) as a method to reduce estimative and predictive variations in multivariate regression. Sparse reduced-rank regression, introduced by Chen and Huang (2012), is a widely used technique that performs dimension reduction and variable selection simultaneously in multivariate regression. In this work, we combine envelope models and sparse reduced-rank regression method to propose an envelope-based sparse reduced-rank regression estimator, and then establish its consistency, asymptotic normality and oracle property in high-dimensional data. We carry out some Monte Carlo simulation studies and also analyze two datasets to demonstrate that the proposed envelope-based sparse reduced-rank regression method displays good variable selection and prediction performance.

Introduction

In this work, we consider the following multivariate linear regression model: Yi=βXi+ɛi,i{1,,n},where YiRr denotes a multivariate response vector, XiRp denotes a non-stochastic vector of predictors, ɛi is an error vector having mean 0, covariance matrix Σ and is independent of Xi, and βRr×p is the regression coefficient matrix in which we are primarily interested in. If Xi is a vector of random quantities during sampling, then the model is conditional on the observed values of Xi. Let X and Y denote (X1,,Xn) and (Y1,,Yn), respectively. Without loss of generality, let us assume that the data are centered, so that the intercept can be excluded from the regression model. Then, model (1) can be re-expressed as Y=βX+ɛ,where ɛ is (ɛ1,,ɛn).

For positive integers r and p, Rr×p represents the class of all real matrices of dimension r×p, and Sr×r denotes the class of all symmetric r×r matrices. Given a matrix ARn×n,tr(A) stands for the trace of A. For BRr×p, span (B) stands for the subspace of Rr spanned by the columns of B, the Frobenius norm of B is denoted by BF=trBTB. B+ denotes the Moore–Penrose inverse of B. For a column vector X, the Euclidean norm is represented as X2=XTX. A basis matrix for a subspace S is any matrix whose columns form a basis for S. A subspace R of Rr is a reducing subspace of MRr×r if MRR and MRR.

Envelope method was developed for the multivariate linear model by Cook et al. [10]. It is built on a key assumption that the linear combination of some response variables is irrelevant as the predictors vary. The goal of this method is then to reduce the dimension of variables and improve efficiency. More specifically, let PξY denote the projection of Y onto a subspace ξRr with the following two properties: (i) The distribution of QξY|X does not depend on X, where Qξ=IrPξ, and (ii) PξY is independent of QξY, given X. The two conditions, when combined, imply that the distribution of QξY is not affected marginally by X or through an association with PξY. As a result, changes in X influence this distribution only through PξY. Furthermore, conditions (i) and (ii) hold if and only if (a) B span(β) is the subspace of ξ and (b) ξ is a reducing subspace of Σ. The Σ-envelope of B, denoted by ξΣ(B), is defined formally as the intersection of all reducing subspaces of Σ that contain B. Let u = dim{ξΣ(B)} and (Γ,Γ0)Rr×r be an orthogonal matrix with ΓRr×u being a column orthogonal matrix and span(Γ) =ξΣ(B). This then leads directly to the following envelope version of model (1): Yi=βXi+ɛi,Σ=ΓΩΓT+Γ0Ω0Γ0T,i{1,,n},where β=Γγ with γRu×p representing the coordinates of β corresponding to the basis Γ, ΩSu×u and Ω0S(ru)×(ru) are both positive definite matrices, γ, Ω and Ω0 depend on the basis Γ. It should be mentioned that the parameters β and Σ depend only on ξΣ(B) rather than on the basis. The estimators of the parameters in (3) can be achieved by maximum likelihood estimation, and dimension u of the envelope can be determined based on likelihood ratio test, information criteria, and cross-validation. The envelope estimator βˆen of β, denoted by βˆen=PξˆβˆOLS, is just the projection of the ordinary least-squares estimator βˆOLS of β onto the estimated envelope. A detailed review of envelope models can be found in Cook et al. [8] and Cook [7].

From model (2), the ordinary least-squares (OLS) estimator of β is βˆOLS=Y XT(XXT)1.It is clear that the OLS estimator of multiple responses is equivalent to performing separate OLS estimation for each response variable, and so the estimator does not make use of the likely correlation existing between the multiple responses. It will, of course, be useful to consider the correlation between response variables. One way of incorporating possible interrelationships between response variables is to consider reduced-rank regression (RRR) model (Reinsel and Velu [20]). The reduced-rank regression would allow the rank of β to be less than min(p,r), and so the model parametrization can be expressed as β=AB, where ARr×d, BRd×p, and rank(A) = rank(B) =d. The decomposition β=AB is non-unique since for any orthogonal matrix ORd×d, A=AO and B=OTB will result in other valid decompositions satisfying β=AB=AB. Nevertheless, the parameter β of interest is identifiable, as well as span(A) = span(β) and span(BT) = span(βT) (Cook et al. [9]). Under some constraints on A and B, such as BBT=Id or ATA=Id, Anderson [1] and Reinsel and Velu [20] derived the maximum likelihood estimators of the RRR parameters. As there are some linear constraints on the regression coefficients, the number of effective parameters gets reduced and the prediction accuracy may therefore get improved. In high-dimensional data, a large number of predictor variables will be typically available, but some of them may not be useful for predictive purpose. For this reason, Chen and Huang [4] proposed sparse reduced-rank regression for simultaneous dimension reduction and variable selection in multivariate regression with fixed dimension of parameters in terms of penalty functions. Lian and Kim [17] provided sufficient conditions to guarantee the oracle estimator to be a local minimizer, and stronger conditions to guarantee that it is a global minimizer in an ultra-high dimensional setting for a class of nonconvex penalties. Chen et al. [2] made use of sparse singular value decomposition (SVD) of the coefficient matrix β to propose a regularized reduced-rank regression approach improving predictive accuracy and also facilitating good interpretations. Chen et al. [3] proposed an adaptive nuclear norm penalization approach for low-rank matrix approximation, and then used it to develop a new reduced-rank estimation method for high-dimensional multivariate regression. Cook et al. [9] incorporated the idea of envelopes into reduced-rank regression by proposing a reduced-rank envelope model, which has a total number of parameters to be no more than either of the reduced-rank regression or the envelope regression. The reduced-rank envelope estimator is at least as efficient as the two estimators mentioned above, but it is not sparse.

In many regression problems, we are often interested in finding important predictor variables for predicting the response variable, where each predictor variable may be represented by a group of derived input variables. For this reason, Yuan and Lin [26] proposed model selection and estimation in a general regression problem with grouped variables in terms of LASSO penalty. Nardi and Rinaldo [18] established asymptotic properties of the group LASSO estimator for general linear models. Zhao and Yu [27] studied model selection consistency of LASSO in the classical fixed p setting as well as in the setting when p grows with sample size n. For the classical linear regression model, Zou and Zhang [29] studied the model selection and estimation when the number of parameters diverges with sample size, in terms of the adaptive elastic-net penalty function. Guo et al. [13] established the oracle property of the group SCAD estimator in linear regression model under high-dimensional setting when the number of groups grows at a certain polynomial rate. Su et al. [24] proposed a sparse envelope model that performs response variable selection efficiently under the envelope model. In their model, it is assumed that the number of predictors p is fixed and is smaller than the sample size n, but r can be greater than n.

In the present work, we propose a sparse reduced-rank regression method based on the envelope model with adaptive group LASSO for multivariate linear model, which performs the tasks of dimension reduction of response and predictor variables, as well as group variable selection simultaneously. The proposed method is suitable for all n and p. Moreover, the cases when r and p are fixed, and r and p grow simultaneously with n, are also considered. We then establish the consistency, asymptotic normality and oracle property of the envelope-based sparse reduced-rank regression estimation developed here. Finally, with the use of Monte Carlo simulation studies as well as two datasets, we demonstrate that the method developed here displays good variable selection and prediction performance as compared to some well-known existing methods.

Section snippets

Envelope-based sparse reduced-rank regression estimator and its properties

From model (3), with β having a low rank structure, we have ΓTY=ηBX+ΓTɛ,where β=AB, β=Γγ and β=ΓηB represent the reduced-rank method, the standard envelope method and the reduced-rank envelope method, respectively. Also, ηRu×d, ud, denotes the coordinates of A with respect to Γ.

If Γ is unknown, we can obtain an estimator Γˆ of Γ by using the method described in Section 3. Then, the standard envelope estimator is obtained as βˆen=ΓˆΓˆTY XT(XXT)1.Using singular value decomposition, we have ηB=

Estimation of the envelope

To achieve the estimation of the envelope ξΣ(B), Cook et al. [8] and Su et al. [24] developed an iterative algorithm which is fast and effective. Let Γ=Γ1Γ2=IuNΓ1QNΓ1,where Γ1 consists of the first u rows of Γ, and suppose it is nonsingular, and N represents Γ2Γ11 which depends on Γ only through the space formed by the column vectors of Γ. This is so because, for any orthogonal matrix PRu×u, if Γ=ΓP, then Γ1=Γ1P, Γ2=Γ2P, and N=Γ2PP1Γ11=N. The optimization problem estimating ξΣ(B) is

Selection of u

In the above discussion, we have assumed that u, the dimension of the envelope, is known. In practice, however, u will be unknown. There are a few ways to choose u such as cross-validation (CV), likelihood-ratio test (LRT) and information criterion such as AIC or BIC. Cook [7] has provided an elaborate discussion on all these methods. The AIC tends to select a model that contains the true model, and so it tends to overestimate u. The BIC tends to select the correct u with probability getting

Tuning

The rank d can be selected by cross-validation (CV). The parameter of adaptive LASSO penalty function is denoted by λn. We set ωj=1/βj2δ as the adaptive weight. Let ω̃j=1/β̃j2δ be an estimator of ω, where β̃j is a consistently estimated value of βj. When np, the reduced-rank envelope method can be used to estimate βj. When p>n, by setting ωj’s all equal to ωn, a reasonable estimator can be the solution of (10) with single penalty parameter λnωn. In this paper, we use fivefold CV procedure

Simulation setups and methods

Scenario I. We generated data with p and r being smaller than n, taking Ω=Iu and Ω0=10Iru. We assumed that elements of the first s columns in η were independent uniform (0, 10) variables, and the remaining elements of ps columns were all zeros. Then, β=Γη, Xi follows multivariate normal distribution with mean 0 and covariance matrix Ip, and (Γ,Γ0) was obtained by standardizing an r×r matrix of independent uniform (0, 1) variables. The error covariance matrix was generated from Σ=ΓΩΓT+Γ0Ω0Γ0T.

Example 1: Yeast cell cycle data

A yeast cell cycle data set was first used by Spellman et al. [23], which is available in the R package spls. The response matrix Y consists of 542 cell-cycle-regulated genes. The cell cycle was measured by taking RNA levels on genes at 18 time points using the α-factor arrest method. The 542 × 106 predictor matrix X contains the binding information of the target genes for a total of 106 transition factors (TFs). This data set has been analyzed by some other authors including Chun and Keleş [6]

CRediT authorship contribution statement

Wenxing Guo: The problem formulation, Methodology, Theoretical development, Software, Writing – original draft, Writing – review & editing. Narayanaswamy Balakrishnan: Correcting and revising manuscript, Guidance and discussion, Writing – review & editing. Mu He: Computational help, Writing – review & editing.

Acknowledgments

We express our sincere thanks to the anonymous reviewers and the Editor for their incisive comments on an earlier version of this manuscript which led to this much improved version.

References (29)

  • CookR.D.
  • CookR.D. et al.

    Envelopes and reduced-rank regression

    Biometrika

    (2015)
  • CookR.D. et al.

    Envelope models for parsimonious and efficient multivariate linear regression (with discussion)

    Statist. Sinica

    (2010)
  • FriedmanJ. et al.

    Pathwise coordinate optimization

    Ann. Appl. Stat.

    (2007)
  • Cited by (0)

    View full text