Envelope-based sparse reduced-rank regression for multivariate linear model
Introduction
In this work, we consider the following multivariate linear regression model: where denotes a multivariate response vector, denotes a non-stochastic vector of predictors, is an error vector having mean 0, covariance matrix and is independent of , and is the regression coefficient matrix in which we are primarily interested in. If is a vector of random quantities during sampling, then the model is conditional on the observed values of . Let and denote and , respectively. Without loss of generality, let us assume that the data are centered, so that the intercept can be excluded from the regression model. Then, model (1) can be re-expressed as where is .
For positive integers and , represents the class of all real matrices of dimension , and denotes the class of all symmetric matrices. Given a matrix stands for the trace of . For , span stands for the subspace of spanned by the columns of , the Frobenius norm of is denoted by . denotes the Moore–Penrose inverse of . For a column vector , the Euclidean norm is represented as . A basis matrix for a subspace is any matrix whose columns form a basis for . A subspace of is a reducing subspace of if and .
Envelope method was developed for the multivariate linear model by Cook et al. [10]. It is built on a key assumption that the linear combination of some response variables is irrelevant as the predictors vary. The goal of this method is then to reduce the dimension of variables and improve efficiency. More specifically, let denote the projection of onto a subspace with the following two properties: (i) The distribution of does not depend on , where , and (ii) is independent of , given . The two conditions, when combined, imply that the distribution of is not affected marginally by or through an association with . As a result, changes in influence this distribution only through . Furthermore, conditions (i) and (ii) hold if and only if (a) span is the subspace of and (b) is a reducing subspace of . The -envelope of , denoted by , is defined formally as the intersection of all reducing subspaces of that contain . Let dim{} and be an orthogonal matrix with being a column orthogonal matrix and span() . This then leads directly to the following envelope version of model (1): where with representing the coordinates of corresponding to the basis , and are both positive definite matrices, , and depend on the basis . It should be mentioned that the parameters and depend only on rather than on the basis. The estimators of the parameters in (3) can be achieved by maximum likelihood estimation, and dimension of the envelope can be determined based on likelihood ratio test, information criteria, and cross-validation. The envelope estimator of , denoted by , is just the projection of the ordinary least-squares estimator of onto the estimated envelope. A detailed review of envelope models can be found in Cook et al. [8] and Cook [7].
From model (2), the ordinary least-squares (OLS) estimator of is It is clear that the OLS estimator of multiple responses is equivalent to performing separate OLS estimation for each response variable, and so the estimator does not make use of the likely correlation existing between the multiple responses. It will, of course, be useful to consider the correlation between response variables. One way of incorporating possible interrelationships between response variables is to consider reduced-rank regression (RRR) model (Reinsel and Velu [20]). The reduced-rank regression would allow the rank of to be less than , and so the model parametrization can be expressed as , where , , and rank() rank() . The decomposition is non-unique since for any orthogonal matrix , and will result in other valid decompositions satisfying . Nevertheless, the parameter of interest is identifiable, as well as span() span() and span() span() (Cook et al. [9]). Under some constraints on and , such as or , Anderson [1] and Reinsel and Velu [20] derived the maximum likelihood estimators of the RRR parameters. As there are some linear constraints on the regression coefficients, the number of effective parameters gets reduced and the prediction accuracy may therefore get improved. In high-dimensional data, a large number of predictor variables will be typically available, but some of them may not be useful for predictive purpose. For this reason, Chen and Huang [4] proposed sparse reduced-rank regression for simultaneous dimension reduction and variable selection in multivariate regression with fixed dimension of parameters in terms of penalty functions. Lian and Kim [17] provided sufficient conditions to guarantee the oracle estimator to be a local minimizer, and stronger conditions to guarantee that it is a global minimizer in an ultra-high dimensional setting for a class of nonconvex penalties. Chen et al. [2] made use of sparse singular value decomposition (SVD) of the coefficient matrix to propose a regularized reduced-rank regression approach improving predictive accuracy and also facilitating good interpretations. Chen et al. [3] proposed an adaptive nuclear norm penalization approach for low-rank matrix approximation, and then used it to develop a new reduced-rank estimation method for high-dimensional multivariate regression. Cook et al. [9] incorporated the idea of envelopes into reduced-rank regression by proposing a reduced-rank envelope model, which has a total number of parameters to be no more than either of the reduced-rank regression or the envelope regression. The reduced-rank envelope estimator is at least as efficient as the two estimators mentioned above, but it is not sparse.
In many regression problems, we are often interested in finding important predictor variables for predicting the response variable, where each predictor variable may be represented by a group of derived input variables. For this reason, Yuan and Lin [26] proposed model selection and estimation in a general regression problem with grouped variables in terms of LASSO penalty. Nardi and Rinaldo [18] established asymptotic properties of the group LASSO estimator for general linear models. Zhao and Yu [27] studied model selection consistency of LASSO in the classical fixed setting as well as in the setting when grows with sample size . For the classical linear regression model, Zou and Zhang [29] studied the model selection and estimation when the number of parameters diverges with sample size, in terms of the adaptive elastic-net penalty function. Guo et al. [13] established the oracle property of the group SCAD estimator in linear regression model under high-dimensional setting when the number of groups grows at a certain polynomial rate. Su et al. [24] proposed a sparse envelope model that performs response variable selection efficiently under the envelope model. In their model, it is assumed that the number of predictors is fixed and is smaller than the sample size , but can be greater than .
In the present work, we propose a sparse reduced-rank regression method based on the envelope model with adaptive group LASSO for multivariate linear model, which performs the tasks of dimension reduction of response and predictor variables, as well as group variable selection simultaneously. The proposed method is suitable for all and . Moreover, the cases when and are fixed, and and grow simultaneously with , are also considered. We then establish the consistency, asymptotic normality and oracle property of the envelope-based sparse reduced-rank regression estimation developed here. Finally, with the use of Monte Carlo simulation studies as well as two datasets, we demonstrate that the method developed here displays good variable selection and prediction performance as compared to some well-known existing methods.
Section snippets
Envelope-based sparse reduced-rank regression estimator and its properties
From model (3), with having a low rank structure, we have where , and represent the reduced-rank method, the standard envelope method and the reduced-rank envelope method, respectively. Also, , , denotes the coordinates of with respect to .
If is unknown, we can obtain an estimator of by using the method described in Section 3. Then, the standard envelope estimator is obtained as Using singular value decomposition, we have
Estimation of the envelope
To achieve the estimation of the envelope , Cook et al. [8] and Su et al. [24] developed an iterative algorithm which is fast and effective. Let where consists of the first rows of , and suppose it is nonsingular, and represents which depends on only through the space formed by the column vectors of . This is so because, for any orthogonal matrix , if , then , , and . The optimization problem estimating is
Selection of
In the above discussion, we have assumed that , the dimension of the envelope, is known. In practice, however, will be unknown. There are a few ways to choose such as cross-validation (CV), likelihood-ratio test (LRT) and information criterion such as AIC or BIC. Cook [7] has provided an elaborate discussion on all these methods. The AIC tends to select a model that contains the true model, and so it tends to overestimate . The BIC tends to select the correct with probability getting
Tuning
The rank can be selected by cross-validation (CV). The parameter of adaptive LASSO penalty function is denoted by . We set as the adaptive weight. Let be an estimator of , where is a consistently estimated value of . When , the reduced-rank envelope method can be used to estimate . When , by setting ’s all equal to , a reasonable estimator can be the solution of (10) with single penalty parameter . In this paper, we use fivefold CV procedure
Simulation setups and methods
Scenario I. We generated data with and being smaller than , taking and . We assumed that elements of the first columns in were independent uniform (0, 10) variables, and the remaining elements of columns were all zeros. Then, , follows multivariate normal distribution with mean 0 and covariance matrix , and was obtained by standardizing an matrix of independent uniform (0, 1) variables. The error covariance matrix was generated from .
Example 1: Yeast cell cycle data
A yeast cell cycle data set was first used by Spellman et al. [23], which is available in the R package spls. The response matrix consists of 542 cell-cycle-regulated genes. The cell cycle was measured by taking RNA levels on genes at 18 time points using the -factor arrest method. The 542 × 106 predictor matrix contains the binding information of the target genes for a total of 106 transition factors (TFs). This data set has been analyzed by some other authors including Chun and Keleş [6]
CRediT authorship contribution statement
Wenxing Guo: The problem formulation, Methodology, Theoretical development, Software, Writing – original draft, Writing – review & editing. Narayanaswamy Balakrishnan: Correcting and revising manuscript, Guidance and discussion, Writing – review & editing. Mu He: Computational help, Writing – review & editing.
Acknowledgments
We express our sincere thanks to the anonymous reviewers and the Editor for their incisive comments on an earlier version of this manuscript which led to this much improved version.
References (29)
- et al.
Genomic and transcriptional aberrations linked to breast cancer pathophysiologies
Cancer Cell
(2006) - et al.
A note on fast envelope estimation
J. Multivariate Anal.
(2016) - et al.
Model selection and estimation in high dimensional regression models with group SCAD
Statist. Probab. Lett.
(2015) - et al.
A well-conditioned estimator for large-dimensional covariance matrices
J. Multivariate Anal.
(2004) - et al.
Nonconvex penalized reduced rank regression and its oracle properties in high dimensions
J. Multivariate Anal.
(2016) Asymptotic distribution of the reduced rank regression estimator under general conditions
Ann. Statist.
(1999)- et al.
Reduced rank stochastic regression with a sparse singular value decomposition
J. R. Stat. Soc. Ser. B Stat. Methodol.
(2012) - et al.
Reduced rank regression via adaptive nuclear norm penalization
Biometrika
(2013) - et al.
Sparse reduced-rank regression for simultaneous dimension reduction and variable selection
J. Amer. Statist. Assoc.
(2012) - et al.
Sparse partial least squares regression for simultaneous dimension reduction and variable selection
J. R. Stat. Soc. Ser. B Stat. Methodol.
(2010)