A sequential test for variable selection in high dimensional complex data
Introduction
Consider the Big-Mac dataset (Enz, 1991), a simple dataset that gives average values in 1991 of several economic indicators for 45 world cities. It has nine continuous predictors and a continuous outcome variable. The outcome is the minimum labor to buy a Big Mac and fries in US dollars. A regression fitting to the raw data without any transformation of the response or predictors yields a multiple , the square of the correlation between the observed and the fitted response to be 0.46. After a graphical exploration and the appropriate transformation of the variables, we obtained . The reason of this drastic improvement is that the relationships between the response and the predictors, initially nonlinear (Fig. 1), were transformed into linear through a model fitting procedure guided by diagnostics (see Cook and Weisberg, 1982, Section 1.2). With nine predictors, this procedure is easily doable. However, when is large, say 50 or more, a regression modeling using this iterative procedure is rather a daunting task, tedious and imponderable. Ubiquitously, a forward linear model is considered, and the relationship between individual predictors and the response is often unexplored because of the high dimensionality of the predictors. Diagnostic methods are seldom used for model checking. Using an ill-fitting model to solve a variable selection problem can result in reduced performance.
Most variable selection methods are constructed around forward linear regression models. Because the ordinary least squares estimation does not yield satisfactory results when is large, it is often assumed that a large portion of these predictors is irrelevant in explaining the response . The corresponding coefficients of these predictors in a linear regression model are shrunk or even set to zero. This brings the concept of sparsity into regression modeling with two induced consequences: parsimony of the model and accuracy in prediction. A flurry of research on algorithms and theory for variable selection involving sparsity constraints have been observed in recent years. These methods include the soft thresholding (Donoho, 1995), the nonnegative garotte (Breiman, 1995), lasso (Tibshirani, 1996), the smoothly clipped absolute deviation penalty (SCAD; Fan and Li, 2001), elastic net (Zou and Hastie, 2005), and Dantzig selector (Candès and Tao, 2007) among many others. These methods work exceptionally well when the model is accurate. However they do not perform adequately when the predictors and the response have an arbitrary non-linear relationship.
A recent methodology proposed by Cook (2007) brings significant openings to address the shortcomings of linear models in capturing information about high dimensional predictors non-linearly related to the response. Cook (2007) proposed the concept of sufficient dimension reduction in regression and set up a new paradigm of dimension reduction through a likelihood-based approach called principal fitted components (PFC). A reduction , was defined to be sufficient if it satisfies one of the following three statements: (i) , (ii) , and (iii) . The symbol stands for statistical independence, and stands for and having identical distribution. Statement (i) holds in a forward regression while statement (ii) holds in an inverse regression setup. Under a joint distribution of the three statements are equivalent.
Principal fitted components are a class of inverse regression models that yield a sufficient reduction of the predictors. Let denote the random vector and assume that there is a vector-valued function , with and , so that can be represented by the model . The term is a semi-orthogonal matrix, and . The covariance is assumed to be independent of . Under this model the translated conditional means fall in the -dimensional subspace , and thus captures the dependency between and . Once the response is observed, the term which is unobserved can be approximated using a flexible set of basis functions as . The subsequent model is called a PFC model where is assumed to be normally distributed with mean 0 and variance . Under this model, Cook (2007) showed that is a sufficient reduction of . The choice of the basis function allows to capture predictors that are linearly and nonlinearly related to the response. The maximum likelihood estimators of the parameters in the model have been obtained (Cook, 2007, Cook and Forzani, 2008).
In high dimensional settings, irrelevant predictors, which often abound, can hinder the accuracy of the estimated sufficient reduction. Our goal is to obtain a “pruned” estimator of the sufficient reduction, which not only helps achieve accuracy, but also allows the identification of the relevant variables. By “pruning”, we mean removing inactive predictors that do not contain any regression information about the response. This is often called a sparse estimator.
An estimation of the sparse reduction kernel has been proposed by Li (2007) who established a framework to obtain the sparse sufficient reduction using a regression-type formulation with the lasso (Tibshirani, 1996) and elastic net (Zou and Hastie, 2005) penalties. Chen et al. (2010) proposed the coordinate independent sparse sufficient dimension reduction that shrinks row elements of while preserving the orthogonality constraint of . Both methodologies are apt when . We herein construct a sequential likelihood ratio test that is reminiscent of the idea of testing predictor contribution in sufficient dimension reduction of Cook (2004). It helps obtain the sparse reduction under structures of that allow . We show the performance of the procedure through simulations.
Section snippets
A sequential test for sparse PFC
We assume that the -vector predictor can be partitioned as , with , and let and be the corresponding partitions of and following the partition of . Under model (1), the sufficient reduction can be written as Let us suppose that represents the set of predictors with no regression information about in the sense that has the same distribution as . Consequently, we have
Numerical studies
We illustrate the performance of the sequential likelihood ratio test for sparse sufficient reduction estimation and variable selection with PFC on two datasets and also through a simulation study. With the first dataset, the performance of the method is evaluated when the assumption of the conditional independence is violated. The second dataset is a case where the sufficient reduction methodology leads to fitting a linear regression model and its related shrinkage methodologies for variable
Discussions
We have presented a sequential likelihood ratio test to obtain a sparse estimate of the sufficient reduction of the data with PFC in high dimensional setup when the relationship between the active predictors and the response is nonlinear. The sparse sufficient reduction also yields the active or important predictors relevant in explaining the response.
The sparse sufficient reduction can be readily carried into a forward model for prediction or classification. With the reduction of the
References (31)
- et al.
Sparse sufficient dimension reduction using optimal scoring
Comput. Statist. Data Anal.
(2013) - et al.
Discussion on the sure independence screening for ultrahigh dimensional feature space of Jianqing Fan and Jinchi Lv (2007)
J. R. Stat. Soc. Ser. B
(2008) - et al.
Sufficient dimension reduction and prediction in regression
Phil. Trans. R. Soc. A
(2009) - Adragni, K.P., Raim, A., 2014. ldr: Methods for likelihood-based dimension reduction in regression. R package version...
Better subset regression using the nonnegative garrote
Technometrics
(1995)- et al.
The Dantzig selector: statistical estimation when is much larger than
Ann. Statist.
(2007) - et al.
Coordinate-independent sparse sufficient dimension reduction and variable selection
Ann. Statist.
(2010) - et al.
Sparse partial least squares regression for simultaneous dimension reduction and variable selection
J. R. Stat. Soc. Ser. B Stat. Methodol.
(2010) - Chung, D., Chun, H., Keles, S., 2012. spls: Sparse Partial Least Squares (SPLS) Regression and Classification. R...
Regression Graphics: Ideas for Studying Regression Through Graphics
(1998)
Testing predictor contributions in sufficient dimension reduction
Ann. Statist.
Fisher lecture
Statist. Sci.
Principal fitted components for dimension reduction in regression
Statist. Sci.
Sufficient dimension reduction via inverse regression: a minimum discrepancy approach
J. Amer. Statist. Assoc.
Residuals and Influence in Regression
Cited by (1)
Dimension reduction method of rich modelusing affinity propagation clustering algorithm
2016, Sichuan Daxue Xuebao (Gongcheng Kexue Ban)/Journal of Sichuan University (Engineering Science Edition)