Adaptive conditional feature screening

https://doi.org/10.1016/j.csda.2015.09.002Get rights and content

Abstract

When the correlation among the predictors is relatively strong and/or the model structures cannot be specified, the construction of adaptive feature screening remains a challenging issue. A general technique of conditional feature screening is proposed via combining a model-free feature screening with a predetermined set of predictors. The proposed centralization technique can remove the irrelevant part from the criterion of the model-free feature screening. Consequently, the new criterion can measure the marginal utilities of predictors conditional on the predetermined set of predictors. The conditional information about these predetermined predictors helps reducing the correlation among covariates and as a result the resulting method can reduce the false positive and the false negative rates in the variable selection procedure. Thus, our method is adaptive to both the correlation among the covariates and the model misspecification. The new procedures are computationally efficient and simple, and can be extended to other relevant methods.

Introduction

In some contemporary applications, such as biomedical imaging, functional magnetic resonance imaging, tomography, tumor classifications and finance, researchers are frequently confronted with high-dimensional variables and the models whose structure cannot be completely specified. In such situations, the number p of variables or parameters in the model can be much larger than the sample size n and only little information about the actual model structures is known in advance. When the correlation among the covariates is relatively strong and the model structures cannot be correctly specified, it is difficult to establish the dimension reduction methodologies that are adaptive to both the correlation among covariates and model misspecification. In this paper, we are going to try and address this issue.

It is known that when the dimension of predictor vector is much larger than the sample size, ranking and screening have been proved to be useful for dimension reduction under the situations where models are specified correctly and the true model structures are relatively simple, such as linear structure and generalized linear structure. This type of approaches is about feature screening or marginal utility screening. Fan and Lv (2008) first introduced sure independence screening (SIS) and iterated sure independence screening (ISIS) in the context of linear regression models; Fan et al. (2009) and Fan and Song (2010) extended the SIS and the ISIS to handle generalized linear models; Fan et al. (2011) developed the nonparametric independence screening (NIS) for nonparametric models with additive structure. For more related methodologies see Xue and Zou (2011), Zhu et al. (2011), Li et al. (2012), Wang (2012), Zhao and Li (2012), Lin et al. (2013) and Chang et al. (2013), among others.

All the feature screening methods aforementioned are based on a common condition: the true model structures are specified accurately. Of course the performances of these methods critically depend on the belief that the models under study are equal to or at least are close to the underlying models. When the supposed structures are far from the underlying ones, however, their behaviors may become poor. To develop robust feature screening against model misspecification, Zhu et al. (2011) proposed a sure independent ranking and screening (SIRS). Their proposal can be available for a wide range of commonly used parametric and semiparametric models. Thus theirs could be thought of as a model-free method. Lin et al. (2013) proposed a nonparametric ranking feature screening (NRS) through local information flows of the predictors, by which the function-correlation between response and predictors can be captured successfully, without any model structure assumption. Li et al. (2012) proposed a distance correlation-based sure independence screening (DC-SIS). This is a model-free approach as well. Recently, He et al. (2013) introduced a quantile-adaptive model-free variable screening for high-dimensional heterogeneous data, such an approach allows the set of active variables to vary across quantile and thus make the variable selection more flexible to accommodate heterogeneity.

Moreover, as was mentioned in existing literature such as Fan and Lv (2008), Zhu et al. (2011) and Barut et al. (2012), the correlation among predictors heavily influence the marginal utility. When the correlation among the predictors is relatively high, simple feature screenings may result in false positives (i.e., the selected active predictors may be actually inactive) and false negatives (i.e., the true active predictors may be regarded as inactive predictors and then are removed from the models). Thus most of existing feature screening methods require the relevant conditions to restrict the correlation among predictors. However, it was proved by Hall and Li (1993), Fan and Lv (2008) that with growing dimensionality p, there always exist spurious correlations among predictors. Thus the correlation among predictors is an unevadable problem in statistical inference for all high-dimensional models. To adapt to the circumstances in which predictors may be relatively highly correlated, Cho and Fryzlewicz (2012) proposed a new criterion, for linear models, to measure the contribution of each predictor to response. Their method takes into account the correlations among predictors by projecting correlated predictors into the orthogonal spaces and then eliminating the correlations between the transformed variables. However, the projection method is difficultly applied in or cannot be extended to other models such as nonlinear models and nonparametric models.

In many applications, researchers know from previous investigations and experiences that certain predictors are responsible for the response. It was stated by Barut et al. (2012) that, with these known active predictors, conditioning can help reducing the correlation among the predictors. This is particularly the case when predictors share some common factors, as in many biological (e.g. treatment effects) and financial studies (e.g. market risk factors). Thus, it can be expected that conditioning could help improving the measure of marginal utility. But the conditional sure independence screening of Barut et al. (2012) strongly depends on the model structure assumption, generalized linear model, and needs to estimate the corresponding parameters. It is difficult to extend the method to the models with complex or unspecified structure.

As stated above, a strong correlation among the predictors seriously damages the quality of the existing feature screening methods. However, such a correlation can be easily predetermined. For example, the marginal correlation between any two predictors can be easily and efficiently estimated by sample correlation coefficient. It is an interesting issue to use such a predetermined correlation to reducing the correlation among the predictors and then to enhance the adaptability of the feature screening methods.

In this paper, a general technique is proposed for reducing correlation among the predictors and formulating conditional feature ranking. The key technique used here is to centralize the criterion of the existing model-free criterion, by which the irrelevant term that is related only to the predetermined set of predictors can be removed from the criterion of model-free screening. Consequently, the correlation between the centralized variable and the preselected variables is reduced significantly or eliminated completely, and the new criterion can measure the marginal utility of a predictor conditional on the known set of predictors. As stated above, the conditional information about the predetermined predictors helps reducing the correlation among predictors. It implies that the new method can reduce the false positive and the false negative rates in the variable selection process. This, together with the model-free property, ensures that our method is adaptive to both the correlation among the predictors and model misspecification, especially for the case of the number of the predetermined predictors being large. It is proved that with the number of predictors growing at an exponential rate of the sample size, the proposed procedure possesses consistency in ranking, which is both useful in its own right and can lead to consistency in selection. Moreover, unlike the conditional feature screening of Barut et al. (2012), the new criteria do not need to estimate any model parameter, the new procedures are computationally efficient and simple, and can be extended to other relevant methods.

The remainder of the paper is organized in the following way. In Section  2, the SIRS proposed by Zhu et al. (2011) is first reviewed to motivate the methodological development. Then, the SIRS is centralized so that the irrelevant term is removed from the original criterion and consequently, new conditional model-free feature screening is defined naturally. Furthermore, the consistent estimators for the new conditional model-free feature screening are proposed. In Section  3, for our method, the theoretical properties (including correlation reduction and ranking consistency) are investigated. Simulation studies, together with a two-stage procedure, are presented in Section  4, and the technical proofs are postponed to Appendix.

Section snippets

Problems and motivations

Let x=(X1,,Xp)τ be a p-dimensional vector of predictors and Y be the response variable. Denote by Xk and Y the supports of Xk and Y, respectively. Here the dimension p is large and may be much larger than the sample size n. Denote by A the index set of the active predictors, namely, A={k:F(y|x)  functionally depends on  Xk  for some  yY}, where F(y|x) is the distribution function of Y conditional on x. If kA, Xk and F(y|x) are indeed functionally correlated for some yY. Denote by A¯ the

Theoretical properties

We now investigate the theoretical properties of the marginal correlation function Ωk(y,C,βC) in (2.4), which is a foundation for our method.

Theorem 3.1

The marginal correlation function Ωk(y,C,βC) in   (2.4)   has the following properties:

  • (1)

    If Xk(kD) and Y are independent conditional on βCτxC for any βCC, then Ωk(y,C,βC)=0 uniformly for yY and βCC.

  • (2)

    Particularly, under model(2.1), suppose Xk(kD) and Xj(jDA,jk) are independent. If Xk(kD) and Y are functionally uncorrelated conditional on βCτxC for

Simulation studies

In this subsection we present several simulation examples, together with a two-stage procedure, to compare the finite sample performances of the newly proposed CSIRS in both Case 1 and Case 2 with the existing competitors, such as the unconditional SIRS (Zhu et al., 2011) and the CSIS (Barut et al., 2012). To get comprehensive comparisons, we investigate these feature screening methods in a variety of settings with p=2000 predictors and the sample size n=200. Throughout, the number of

Acknowledgments

Authors thank the Associate Editor and the referees for the very constructive and thoughtful comments and suggestions. Lu Lin was supported by NNSF projects (11171188, 11571204 and 11231005) of China. Jing Sun was supported by NNSF project (11426126) of China, and NSF project (ZR2014AP007) of Shandong Province, China.

References (19)

  • L. Lin et al.

    Nonparametric feature screening

    Comput. Statist. Data Anal.

    (2013)
  • Barut, E., Fan, J., Verhasselt, A., 2012. Conditional sure independence screening. Manuscript....
  • J. Chang et al.

    Marginal empirical likelihood and sure independence feature screening

    Ann. Statist.

    (2013)
  • H. Cho et al.

    High dimensional variable selection via tilting

    R. Stat. Soc. Ser. B Stat. Methodol.

    (2012)
  • J. Fan et al.

    Nonparametric independence screening in sparse ultra-high-dimensional additive models

    J. Amer. Statist. Assoc. Ser. B

    (2011)
  • J. Fan et al.

    Sure independence screening for ultrahigh dimensional feature space (with discussion)

    J. R. Stat. Soc. Ser. B Stat. Methodol.

    (2008)
  • J. Fan et al.

    Ultrahigh dimensional feature selection: beyond the linear model

    J. Mach. Learn. Res.

    (2009)
  • J. Fan et al.

    Sure independence screening in generalized linear models with NP-dimensionality

    Ann. Statist.

    (2010)
  • P. Hall et al.

    On almost linearity of low dimensional projection from high dimensional data

    Ann. Statist.

    (1993)
There are more references available in the full text version of this article.

Cited by (15)

  • A simple model-free survival conditional feature screening

    2019, Statistics and Probability Letters
    Citation Excerpt :

    Some papers have investigated this problem under different contexts. The related literature includes Barut et al. (2016), Lin and Sun (2016), Liu and Chen (2018), and Hong et al. (2018). However, among these articles, only Hong et al. (2018) studied conditional screening for survival data.

  • Feature screening for multi-response varying coefficient models with ultrahigh dimensional predictors

    2018, Computational Statistics and Data Analysis
    Citation Excerpt :

    However, the model-based methods face the risk of the model misspecification, consequently, they may lead to erroneous screening results. To avoid such a problem, many researchers have put forward model-free methods, the relevant literature includes Zhu et al. (2011), Li et al. (2012a), Li et al. (2012b), Mai et al. (2015), Ma and Zhang (2016), Lin and Sun (2016), Lu and Lin (2017), Xue and Liang (2017) and the references therein. In this paper, we develop a new feature screening method specifically for the multivariate response varying coefficient linear model (MVCLM).

  • Joint feature screening method for ultrahigh dimensional censored data

    2023, Xitong Gongcheng Lilun yu Shijian/System Engineering Theory and Practice
View all citing articles on Scopus
View full text