Unified mean-variance feature screening for ultrahigh-dimensional regression

Wang, Liming; Li, Xingxiang; Wang, Xiaoqing; Lai, Peng

doi:10.1007/s00180-021-01184-2

Unified mean-variance feature screening for ultrahigh-dimensional regression

Original paper
Published: 17 January 2022

Volume 37, pages 1887–1918, (2022)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Liming Wang^1,2,
Xingxiang Li ORCID: orcid.org/0000-0002-1510-3390³,
Xiaoqing Wang⁴ &
…
Peng Lai²

542 Accesses
1 Altmetric
Explore all metrics

Abstract

Feature screening is a popular and efficient statistical technique in processing ultrahigh-dimensional data. When a regression model consists both categorical and continuous predictors, a unified feature screening procedure is needed. Thus, we propose a unified mean-variance sure independence screening (UMV-SIS) for this setup. The mean-variance (MV), an effective utility to measure the dependence between two random variables, is widely used in feature screening for discriminant analysis. In this paper, we advocate using the kernel smoothing method to estimate MV between two continuous variables, thereby extending it to screen categorical and continuous predictors simultaneously. Besides the uniformity for screening, UMV-SIS is a model-free procedure without any specification of a regression model; this broadens the scope of its application. In theory, we show that the UMV-SIS procedure has the sure screening and ranking consistency properties under mild conditions. To solve some difficulties in marginal feature screening for linear model and further enhance the screening performance of our proposed method, an iterative UMV-SIS procedure is developed. The promising performances of the new method are supported by extensive numerical examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Model-free feature screening based on Hellinger distance for ultrahigh dimensional data

Article 02 November 2024

Robust composite weighted quantile screening for ultrahigh dimensional discriminant analysis

Article 06 January 2020

Robust rank screening for ultrahigh dimensional discriminant analysis

Article 12 February 2016

References

Cui H, Li R, Zhong W (2015) Model-free feature screening for ultrahigh dimensional discriminant analysis. J Am Statist Assoc 110:630–641
Article MathSciNet Google Scholar
Cui H, Zhong W (2018), A Distribution-Free Test of Independence and Its Application to Variable Selection. arXiv:1801.10559
Fan J, Fan Y (2008) High dimensional classification using features annealed independence rules. Ann Statist 36:2605–2637
Article MathSciNet Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Statist Assoc 96:1348–1360
Article MathSciNet Google Scholar
Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space. J R Statist Soc Ser B (Statist Methodol) 70:849–911
Article MathSciNet Google Scholar
Fan J, Samworth R, Wu Y (2009) Ultrahigh dimensional feature selection: beyond the linear model. J Mach Learn Res 10:1829–1853
MathSciNet MATH Google Scholar
He X, Wang L, Hong HG et al (2013) Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Ann Statist 41:342–369
MathSciNet MATH Google Scholar
Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Statist Assoc 58:13–30
Article MathSciNet Google Scholar
Hosmer D, Lemeshow S (1989) Applied Logistic Regression. John Wiley, New York
MATH Google Scholar
Kong E, Xia Y, Zhong W (2019) Composite coefficient of determination and its application in ultrahigh dimensional variable screening. J Am Statist Assoc 114:1740–1751
Article MathSciNet Google Scholar
Li G, Peng H, Zhang J, Zhu L et al (2012) Robust rank correlation based screening. Ann Statist 40:1846–1877
MathSciNet MATH Google Scholar
Li Q, Racine JS (2007) Nonparametric econometrics: theory and practice. Princeton University Press, NY
MATH Google Scholar
Li R, Zhong W, Zhu L (2012) Feature screening via distance correlation learning. J Am Statist Assoc 107:1129–1139
Article MathSciNet Google Scholar
Li X, Cheng G, Wang L, Lai P, Song F (2017) Ultrahigh dimensional feature screening via projection. Comput Statist Data Anal 114:88–104
Article MathSciNet Google Scholar
Li X, Li R, Xia Z, Xu C (2020) Distributed Feature Screening via Componentwise Debiasing. J Mach Learn Res 21:1–32
MathSciNet MATH Google Scholar
Liu J, Li R, Wu R (2014) Feature selection for varying coefficient models with ultrahigh-dimensional covariates. J Am Statist Assoc 109:266–274
Article MathSciNet Google Scholar
Mai Q, Zou H (2012) The Kolmogorov filter for variable screening in high-dimensional binary classification. Biometrika 100:229–234
Article MathSciNet Google Scholar
Mai Q, Zou H et al (2015) The fused Kolmogorov filter: a nonparametric model-free screening method. Ann Statist 43:1471–1497
Article MathSciNet Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Statist Soc Ser B (Methodol) 58:267–288
MathSciNet MATH Google Scholar
Wang H (2009) Forward regression for ultra-high dimensional variable screening. J Am Statist Assoc 104:1512–1524
Article MathSciNet Google Scholar
Xu C, Chen J (2014) The sparse MLE for ultrahigh-dimensional feature screening. J Am Statist Assoc 109:1257–1269
Article MathSciNet Google Scholar
Yan X, Tang N, Xie J, Ding X, Wang Z (2018) Fused mean-variance filter for feature screening. Comput Statist Data Anal 122:18–32
Article MathSciNet Google Scholar
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Statist Soc Ser B (Statist Methodol) 68:49–67
Article MathSciNet Google Scholar
Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Statist 38:894–942
Article MathSciNet Google Scholar
Zhao SD, Li Y (2012) Principled sure independence screening for Cox models with ultra-high-dimensional covariates. J Multivar Anal 105:397–411
Article MathSciNet Google Scholar
Zhou Y, Zhu L (2018) Model-free feature screening for ultrahigh dimensional data through a modified Blum-Kiefer-Rosenblatt correlation. Statistica Sinica 28:1351–1370
MathSciNet MATH Google Scholar
Zhu L-P, Li L, Li R, Zhu L-X (2011) Model-free feature screening for ultrahigh-dimensional data. J Am Statist Assoc 106:1464–1475
Article MathSciNet Google Scholar
Zou H (2006) The adaptive lasso and its oracle properties. J Am Statist Assoc 101:1418–1429
Article MathSciNet Google Scholar
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Statist Soc Ser B (Statist Methodol) 67:301–320
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work is supported by Philosophy and Social Science Research Fund of Jiangsu Province for Universities with grant No.2019SJA2093. The content is solely the responsibility of the authors and does not necessarily represent the official views of the aforementioned funding agency.

Author information

Authors and Affiliations

Nanjing University of Finance and Economics Hongshan College, Nanjing, 210003, China
Liming Wang
School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing, 210044, China
Liming Wang & Peng Lai
School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an, 710049, China
Xingxiang Li
School of Public Administration, Nanjing University of Finance and Economics, Nanjing, 210003, China
Xiaoqing Wang

Authors

Liming Wang
View author publications
You can also search for this author inPubMed Google Scholar
Xingxiang Li
View author publications
You can also search for this author inPubMed Google Scholar
Xiaoqing Wang
View author publications
You can also search for this author inPubMed Google Scholar
Peng Lai
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Xingxiang Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proof of the theorems

Leave-one-out least-square cross validation (LSCV). Given m different $\alpha _1,...,\alpha _m$, LSCV is used to determine the optimal $\alpha $ for each $X_j$ in Example 1 and 3.4. The objective function of LSCV is

$$\begin{aligned} \text {CV}(h(\alpha ))=\frac{1}{n^2}\sum _{i=1}^n\sum _{m=1}^n [I(Y_i\le Y_m)-\hat{E}_{-i}(I(Y_i\le Y_m)|X_j=X_{ij})]^2W(X_{ij}), \end{aligned}$$

(6.1)

where

$$\begin{aligned} \hat{E}_{-i}(I(Y_i\le Y_m)|X_j=X_{ij})=\frac{\sum _{l\ne i}^n I(Y_l\le Y_m)K_h(X_{lj}-X_{ij})}{\sum _{{l\ne i}}^nK_h(X_{lj}-X_{ij})} \end{aligned}$$

(6.2)

is the leave-one-out kernel estimator of $\hat{E}_{-i}(I(Y_i\le Y_m)|X_j=X_{ij})$, and $W(\cdot )$ is a nonnegative weight function which truncates some bad estimates of $\hat{E}_{-i}(I(Y_i\le Y_m)|X_j=X_{ij})$ caused by the so-called boundary effect or random denominator issue. Having the observations $X_{1j},...,X_{nj}$ ordered to be $X_{(1)j},...,X_{(n)j}$, we can simply set $W(X_{ij})=I(X_{ij} \in \{X_{(r+1)j},...,X_{(n-r)j}\})$ for a given r. Given $\{\alpha _1,...,\alpha _m\}$, we can determine the optimal $\alpha $ for $X_j$ by

$$\begin{aligned} \alpha ^{opt}_j = \mathop {\text {argmin}}_{\alpha \in \{\alpha _1,...,\alpha _m\}} \text {CV}(h(\alpha )). \end{aligned}$$

Then, the screening index $\hat{\omega }_j$ is computed with the $\alpha ^{opt}_j$.

Lemma 1

(Hoeffding’s inequality; Hoeffding 1963) Let $X_1,\ldots ,X_n$ be independent random variables. Assume that $P(X_i\in [a_i,b_i])=1$ for $1\le i\le n$, where $a_i$ and $b_i$ are constants. Let $\bar{X}=n^{-1}\sum _{i=1}^nX_i$. Then the following inequality holds

$$\begin{aligned} P(|\bar{X}-E(\bar{X})|\ge t)\le 2 \text {exp}\left( -\frac{2n^{2}t^{2}}{\sum _{i=1}^n(b_i-a_i)^{2}}\right) , \end{aligned}$$

where t is a positive constant and $E(\bar{X})$ is the expected value of $\bar{X}$.

Lemma 2

(Liu et al. 2014) For any random variable X, the following two statements are equivalent:

(A) There exists $H>0$ such that $Ee^{tX}<\infty $ for all $|t|<H$.

(B) There exist $\eta >0$ and $\nu >0$ such that $Ee^{\nu (X-EX)}\le e^{\eta \nu ^2}$.

Lemma 3

(Liu et al. 2014) Suppose that a(x) and b(x) are two uniformly-bounded functions, that is, there exist $M_1>0$ and $M_2>0$ such that

$$\begin{aligned} \sup _{x\in \mathbb {X}}|a(x)|\le M_1, \ \sup _{x\in \mathbb {X}}|b(x)|\le M_2. \end{aligned}$$

For a given $x\in \mathbb {X}$, $\hat{A}(x)$ and $\hat{B}(x)$ are estimators of a(x) and b(x) with sample size n. For any small $\varepsilon \in (0,1)$, suppose that there exist positive constants $c_1$, $c_2$ and $\nu $, such that

$$\begin{aligned} \sup _{x\in \mathbb {X}} P(|\hat{A}(x)-a(x)|\ge \varepsilon )\le \frac{c_1}{2}\left( 1-\frac{\varepsilon \nu }{c_1}\right) ^n, \nonumber \\ \sup _{x\in \mathbb {X}} P(|\hat{B}(x)-b(x)|\ge \varepsilon )\le \frac{c_2}{2}\left( 1-\frac{\varepsilon \nu }{c_2}\right) ^n. \end{aligned}$$

(6.3)

Moreover, suppose b(x) is uniformly bounded away from 0 (i.e., there is $M_3>0$ such that $\inf _{x\in \mathbb {X}}|b(x)|>M_3$). There exists a constant $C'>0$ such that

$$\begin{aligned} \sup _{x\in \mathbb {X}} P(|\hat{A}(x)/\hat{B}(x)-a(x)/b(x)|\ge \varepsilon )\le \frac{C'}{2}\left( 1-\frac{\varepsilon \nu }{C'}\right) ^n. \end{aligned}$$

Proof of Theorem 1

We first prove the exponential consistency of $\hat{\omega }_{j}$ in (2.4) corresponding to continuous predictor. According to the definitions of $\omega _{j}$ (2.2), we have

$$\begin{aligned} \hat{\omega }_{j}-\omega _{j}= & {} \frac{1}{n}\sum _{m=1}^{n}\frac{1}{n} \sum _{i=1}^{n}\left( \hat{F}(Y_i|X_j=X_{mj})-\hat{F}(Y_i)\right) ^2 \nonumber \\&-\int _{\mathbb {X}_j}\int _{\mathbb {Y}} \left( F(y|X_j=x)-F(y) \right) ^2dF(y)dF_j(x) \nonumber \\= & {} \left[ \frac{1}{n}\sum _{m=1}^{n}\left( \frac{1}{n}\sum _{i=1}^{n}(\hat{F}(Y_i|X_j=X_{mj})-\hat{F}(Y_i))^2\right. \right. \nonumber \\&\left. \left. -\int _{\mathbb {Y}}(F(y|X_j=X_{mj})-F(y))^2dF(y)\right) \right] + \nonumber \\&\left[ \frac{1}{n}\sum _{m=1}^{n}\int _{\mathbb {Y}}(F(y|X_j=X_{mj})-F(y))^2dF(y)\right. \nonumber \\&\left. -\int _{\mathbb {X}_j}\int _{\mathbb {Y}}[F(y|X_j=x)-F(y)]^2dF(y)dF_j(x)\right] \nonumber \\=: & {} H_{j,1}+H_{j,2}. \end{aligned}$$

(6.4)

The term $H_{j,1}$ can be further decomposed as

$$\begin{aligned} H_{j,1}= & {} \frac{1}{n}\sum _{m=1}^{n}\left\{ \frac{1}{n}\sum _{i=1}^{n}(\hat{F}(Y_i|X_j=X_{mj})-\hat{F}(Y_i))^2\right. \\&\left. -\frac{1}{n}\sum _{i=1}^{n}(F(Y_i|X_j=X_{mj})-F(Y_i))^2 \right\} \nonumber \\&+\frac{1}{n}\sum _{m=1}^{n}\left\{ \frac{1}{n}\sum _{i=1}^{n}(F(Y_i|X_j=X_{mj})-F(Y_i))^2\right. \\&\left. -\int _{\mathbb {Y}}[F(y|X_j=X_{mj})-F(y)]^2dF(y)\right\} \nonumber \\=: & {} I_{j,1}+I_{j,2}. \end{aligned}$$

The terms $H_{j,2}$ and $I_{j,2}$ can be dealt with by Lemma 1. Let $G(X_{mj})=\int _{\mathbb {Y}}(F(y|X_j=X_{mj})-F(y))^2dF(y)$, and $Q(Y_i,X_{mj})=(F(Y_i|X_j=X_{mj})-F(Y_i))^2$. Obviously, $G(X_{mj})$ and $Q(Y_i,X_{mj})$ can be bounded between 0 and 1. It follows that

$$\begin{aligned} P\left\{ |H_{j,2}|\ge \varepsilon \right\}= & {} P\left\{ \left| \frac{1}{n}\sum _{m=1}^{n}G(X_{mj})-E(G(X_j))\right| \ge \varepsilon \right\} \nonumber \\\le & {} 2\exp \{-2n\varepsilon ^2\}. \end{aligned}$$

(6.5)

Similarly,

$$\begin{aligned} P\left\{ |I_{j,2}|\ge \varepsilon \right\}= & {} P\left\{ \left| \frac{1}{n}\sum _{m=1}^{n}\left( \frac{1}{n}\sum _{i=1}^{n}Q(Y_i,X_{mj})-E_Y(Q(Y,X_{mj}))\right) \right| \ge \varepsilon \right\} \nonumber \\\le & {} P\left\{ \sum _{m=1}^{n}\left| \frac{1}{n}\sum _{i=1}^{n}Q(Y_i,X_{mj})-E_Y(Q(Y,X_{mj}))\right| \ge n\varepsilon \right\} \nonumber \\\le & {} \sum _{m=1}^{n}P\left\{ \left| \frac{1}{n}\sum _{i=1}^{n} Q(Y_i,X_{mj})-E_Y(Q(Y,X_{mj}))\right| \ge \varepsilon \right\} \nonumber \\\le & {} 2n\exp \{-2n\varepsilon ^2\}. \end{aligned}$$

(6.6)

Then we deal with the $I_{j,1}$.

$$\begin{aligned} |I_{j,1}|= & {} \frac{1}{n^2}\sum _{m=1}^{n}\sum _{i=1}^{n}\left\{ (\hat{F}(Y_i|X_j=X_{mj})-\hat{F}(Y_i))^2-(F(Y_i|X_j=X_{mj})-F(Y_i))^2\right\} \nonumber \\\le & {} \frac{2}{n^2}\sum _{m=1}^{n}\sum _{i=1}^{n}\left\{ |\hat{F}(Y_i|X_j=X_{mj})-F(Y_i|X_j=X_{mj})|+|\hat{F}(Y_i)-F(Y_i)|\right\} \nonumber \\=: & {} 2(D_{j,1}+D_{j,2}). \end{aligned}$$

Next, $D_{j,1}$ is analysed elaborately by using the similar technology in Liu et al. (2014). Specifically,

$$\begin{aligned}&P\left( D_{j,1}\ge \varepsilon \right) =P\left( \frac{1}{n^2}\sum _{m=1}^{n}\sum _{i=1}^{n}\left| \hat{F}(Y_i|X_j=X_{mj})-F(Y_i|X_j=X_{mj})\right| \ge \varepsilon \right) \nonumber \\\le & {} \sum _{m=1}^{n}\sum _{i=1}^{n}P\left( \left| \hat{F}(Y_i|X_j=X_{mj})-F(Y_i|X_j=X_{mj})\right| \ge \varepsilon \right) \nonumber \\\le & {} n^2\sup _{x\in \mathbb {X}_j, y\in \mathbb {Y}}P\left( \left| \hat{F}(y|X_j=x)-F(y|X_j=x)\right| \ge \varepsilon \right) \nonumber \\= & {} n^2\sup _{x\in \mathbb {X}_j, y\in \mathbb {Y}}P\left( \left| \frac{\frac{1}{n}\sum _{m=1}^n I(Y_m\le y)K(\frac{X_{mj}-x}{h})}{\frac{1}{n}\sum _{m=1}^nK(\frac{X_{mj}-x}{h})} -E(I(Y\le y)|X_j=x) \right| \ge \varepsilon \right) \nonumber \\=: & {} n^2\sup _{x\in \mathbb {X}_j, y\in \mathbb {Y}}P\left( \left| \frac{Z_{j,1}(x,y)}{Z_{j,2}(x)}- \frac{hf_j(x)m_j(x,y)}{hf_j(x)}\right| \ge \varepsilon \right) , \end{aligned}$$

(6.7)

where $m_j(x,y)=E\left( I(Y\le y)|X_j=x\right) $ and $f_j(x)$ is the density function of $X_j$. We define

$$\begin{aligned} P_{j,1}(x,y)= & {} P\left( Z_{j,1}(x,y)-hf_j(x)m_j(x,y)\ge \varepsilon \right) , \nonumber \\ P_{j,2}(x)= & {} P\left( Z_{j,2}(x)-hf_j(x)\ge \varepsilon \right) . \end{aligned}$$

We now work on $P_{j,1}(x,y)$. For any $\xi >0$, by Markov’s inequality,

$$\begin{aligned} P_{j,1}(x,y)\le & {} P\left( \exp \{\xi (Z_{j,1}(x,y)-hf_j(x)m_j(x,y))\}\ge \exp (\xi \varepsilon )\right) \nonumber \\\le & {} E[\exp \{\xi Z_{j,1}(x,y)-\xi hf_j(x)m_j(x,y)\}]/\exp (\xi \varepsilon ) \nonumber \\= & {} \exp (-\xi \varepsilon )\cdot \exp \{-\xi hf_j(x)m_j(x,y)\}\cdot E\{\exp (\xi Z_{j,1}(x,y))\}.\qquad \end{aligned}$$

(6.8)

Furthermore,

$$\begin{aligned} E\{\exp (\xi Z_{j,1}(x,y))\}= & {} E\left[ \exp \left\{ \xi \cdot \frac{1}{n}\sum _{m=1}^{n}I(Y_m\le y)K(\frac{X_{mj}-x}{h})\right\} \right] \nonumber \\= & {} E\left[ \prod _{m=1}^{n}\exp \left\{ \frac{\xi }{n}I(Y_m\le y)K(\frac{X_{mj}-x}{h})\right\} \right] \nonumber \\= & {} \left[ E\left\{ \exp \left( \frac{\xi }{n}I(Y_m\le y)K(\frac{X_{mj}-x}{h})\right) \right\} \right] ^n. \end{aligned}$$

Set $\xi =n\nu $ and define $\psi (\nu )= E\{\exp \left( \nu I(Y_m\le y)K(\frac{X_{mj}-x}{h})\right) \}$. Then (6.8) can be expressed by

$$\begin{aligned} P_{j,1}(x,y)\le & {} [\exp (-\nu \varepsilon )\cdot \exp \{-\nu hf_j(x)m_j(x,y)\}\cdot \psi (\nu )]^n. \end{aligned}$$

(6.9)

Let us work on the factor $\exp \{-\nu hf_j(x)m_j(x,y)\}\cdot \psi (\nu )$ of (6.9). It can be further decomposed by

$$\begin{aligned}&\exp \{-\nu hf_j(x)m_j(x,y)\}\cdot \psi (\nu )\nonumber \\= & {} E\left[ \exp \left\{ \nu \left( I(Y_m\le y)K(\frac{X_{mj}-x}{h})-hf_j(x)m_j(x,y)\right) \right\} \right] \nonumber \\=: & {} L_{j,1}(x,y)L_{j,2}(x,y), \end{aligned}$$

(6.10)

where

$$\begin{aligned} L_{j,1}(x,y)= & {} \exp \left\{ \nu \left( E\left[ I(Y_m\le y)K(\frac{X_{mj}-x}{h})\right] -hf_j(x)m_j(x,y)\right) \right\} , \nonumber \\ L_{j,2}(x,y)= & {} E\left[ \exp \left\{ \nu \left( I(Y_m\le y)K(\frac{X_{mj}-x}{h})\right. \right. \right. \nonumber \\&\left. \left. \left. -E\{I(Y_m\le y) K(\frac{X_{mj}-x}{h})\}\right) \right\} \right] . \end{aligned}$$

When x is close to 0, by Taylor’s expansion, $\exp (x)$ can be bounded by

$$\begin{aligned} \exp (x)=1+x+o(|x|)\le 1+x+|x|\le 1+2|x|. \end{aligned}$$

(6.11)

Under Conditions C3-C5, we choose such a small $\nu $ that (6.11) can be applied to bound $L_{j,1}(x,y)$ as

$$\begin{aligned} L_{j,1}(x,y)\le 1+2\nu \left| E\left\{ I(Y_m\le y)K(\frac{X_{mj}-x}{h})\right\} -hf_j(x)m_j(x,y)\right| , \forall x\in \mathbb {X}_j. \end{aligned}$$

Denote $\delta _h(x,y)= E\left\{ I(Y_m\le y)K(\frac{X_{mj}-x}{h})\right\} -hf_j(x)m_j(x,y)$. Recalling that $m_j(X_{j},y)=E\left( I(Y\le y)|X_{j}\right) $, we have $\delta _h(x,y)=E\left\{ m_j(X_{j},y)K(\frac{X_{j}-x}{h})\right\} -hf_j(x)m_j(x,y)$. Since $\int t K(t) dt=1$, it follows that

$$\begin{aligned} h^{-1}\delta _h(x,y)=\int \{m_j(x+th,y)f_j(x+th)-f_j(x)m_j(x,y)\}K(t)dt. \end{aligned}$$

By using Condition C4, $\int tK(t)dt=0$ and $\int t^2K(t) dt<\infty $. Therefore,

$$\begin{aligned}&\lim _{h\rightarrow 0} h^{-2}[m_j(x+th,y)f_j(x+th)-f_j(x)m_j(x,y)\nonumber \\&\quad -\{f'_j(x)m_j(x,y)+f_j(x)m'_{j}(x,y)\}th] \nonumber \\&\quad \rightarrow \{m''_j(x,y)+2f'_j(x)m'_j(x,y)+f''_j(x)\}t^2/2, \end{aligned}$$

where $m'_{j}(x,y)=\frac{\partial m_j(x,y)}{\partial x}$ and $m''_{j}(x,y)=\frac{\partial ^2 m_j(x,y)}{\partial x^2}$. By using the dominated convergence theorem together with $m''_j(x,y)+2f'_j(x)m'_j(x,y)+f''_j(x)$ being uniformly bounded by Conditions C3 and C5, $h^{-3}\delta _h(x,y)$ is uniformly bounded by some constant C for all $x\in \mathbb {X}_j$. This implies that

$$\begin{aligned} \sup _{x\in \mathbb {X}_j, y\in \mathbb {Y}}L_{j,1}(x,y)\le 1+2\nu Ch^3, \ \ \text {as} \ \ h\rightarrow 0. \end{aligned}$$

Since $h\rightarrow 0$ as $n\rightarrow \infty $, it follows that $\sup _{x\in \mathbb {X}_j, y\in \mathbb {Y}}L_{j,1}(x,y)\le 1+\varepsilon \nu /16$, for large enough n. Then we focus on $L_{j,2}(x,y)$ in (6.10). According to Lemma 2, $L_{j,2}(x,y)$ is uniformly bounded by $\exp (\eta \nu ^2)$ for some constant $\eta >0$. Using Taylor’s expansion, $\exp (\eta \nu ^2)\le 1+2\eta \nu ^2<1+\varepsilon \nu /16$, as long as $\nu $ is close to 0 and satisfies $0<\nu <\varepsilon /(32\eta )$. Thus, for sufficiently small $\nu >0$ and large n, (6.10) satisfies

$$\begin{aligned} \sup _{x\in \mathbb {X}_j, y\in \mathbb {Y}}\exp \{-\nu hf_j(x)m_j(x,y)\}\cdot \psi (\nu )\le & {} \sup _{x\in \mathbb {X}_j, y\in \mathbb {Y}}L_{j,1}(x,y)\cdot \sup _{x\in \mathbb {X}_j, y\in \mathbb {Y}}L_{j,2}(x,y) \nonumber \\< & {} (1+\varepsilon \nu /16)^2<1+\varepsilon \nu /4. \end{aligned}$$

By Taylor’s expansion, (6.9) can be bounded

$$\begin{aligned} \sup _{x\in \mathbb {X}_j, y\in \mathbb {Y}}P_{j,1}(x,y)\le & {} \{\exp (-\varepsilon \nu )(1+\varepsilon \nu /4)\}^n\\\le & {} (1-\varepsilon \nu +\varepsilon \nu /2)^n(1+\varepsilon \nu /4)^n\le (1-\varepsilon \nu /4)^n. \end{aligned}$$

Similarly,

$$\begin{aligned} \sup _{x\in \mathbb {X}_j, y\in \mathbb {Y}}P\left( Z_{j,1}(x,y)-hf_j(x)m_j(x,y)\le -\varepsilon \right) \le (1-\varepsilon \nu /4)^n. \end{aligned}$$

It follows that

$$\begin{aligned} \sup _{x\in \mathbb {X}_j, y\in \mathbb {Y}}P\left( |Z_{j,1}(x,y)-hf_j(x)m_j(x,y)|\ge \varepsilon \right) \le 2(1-\varepsilon \nu /4)^n. \end{aligned}$$

By setting $m_j(x,y)=1$, we have $\sup _{x\in \mathbb {X}_j} P_{j,2}(x)\le (1-\varepsilon \nu /4)^n$ and

$$\begin{aligned}&\sup _{x\in \mathbb {X}_j}P\left( |Z_{j,2}(x)-hf_j(x)|\ge \varepsilon \right) \le 2(1-\varepsilon \nu /4)^n. \end{aligned}$$

Furthermore, by Lemma 3, there exists some $c_3 > 0$ such that

$$\begin{aligned} \sup _{x\in \mathbb {X}_j, y\in \mathbb {Y}} P\left( \left| \frac{Z_{j,1}(x,y)}{Z_{j,2}(x)}-\frac{hf_j(x)m_j(x,y)}{hf_j(x)}\right| \ge \varepsilon \right) \le \frac{c_3}{2}(1-\varepsilon \nu /c_3)^n. \end{aligned}$$

Thus, (6.7) can be further bounded by

$$\begin{aligned} P\left( D_{j,1}\ge \varepsilon \right)\le & {} n^2\sup _{x\in \mathbb {X}_j, y\in \mathbb {Y}} P\left( \left| \frac{Z_{j,1}(x,y)}{Z_{j,2}(x)}- \frac{hf_j(x)m_j(x,y)}{hf_j(x)}\right| \ge \varepsilon \right) \nonumber \\\le & {} \frac{c_3 n^2}{2}(1-\varepsilon \nu /c_3)^n. \end{aligned}$$

(6.12)

To bound $D_{j,2}$, we use Hoeffding’s inequalities in Lemma 1 again,

$$\begin{aligned} P(D_{j,2}\ge \varepsilon )\le & {} P(\frac{1}{n^2}\sum _{m=1}^{n}\sum _{i=1}^{n}|\hat{F}(Y_i)-F(Y_i)| \ge \varepsilon )\nonumber \\\le & {} \sum _{m=1}^{n}\sum _{i=1}^{n}P(|\hat{F}(Y_i)-F(Y_i)|\ge \varepsilon )\nonumber \\\le & {} 2n^2\exp (-2n\varepsilon ^2). \end{aligned}$$

(6.13)

Finally, by in Eqs. (6.4), (6.5), (6.6), (6.12) and (6.13), there exists a constant $c_4$ such that

$$\begin{aligned}&P(\left| \hat{\omega }_{j}-\omega _{j}\right| \ge \varepsilon ) \nonumber \\&\quad \le P(\left| I_{j,1}\right| +\left| I_{j,2}\right| +\left| H_{j,2}\right| \ge \varepsilon ) \nonumber \\&\quad \le P(2(D_{j,1}+D_{j,2})\ge 2\varepsilon /3))+P(\left| I_{j,2}\right| \ge \varepsilon /6)+P(\left| H_{j,2}\right| \ge \varepsilon /6) \nonumber \\&\quad \le P(D_{j,1}\ge \varepsilon /6)+P(D_{j,2}\ge \varepsilon /6)+P(\left| I_{j,2}\right| \ge \varepsilon /6)+P(\left| H_{j,2}\right| \ge \varepsilon /6) \nonumber \\&\quad \le (2n^2+2n+2)\exp (-\frac{n\varepsilon ^2}{18})+\frac{c_3}{12}n^2(1-\frac{\varepsilon \nu }{6c_3})^n.\nonumber \\&\quad \le O(n^2\exp (-c_4n\varepsilon ^2)). \end{aligned}$$

(6.14)

By using the similar skill, for those categorical estimates $\hat{\omega }_j$ in (2.3), there exists a positive constant $c_5$ such that

$$\begin{aligned} P(\left| \hat{\omega }_{j}-\omega _{j}\right| \ge \varepsilon )\le & {} O(nK_j\exp (-{c_5n\varepsilon ^2}/{K_j})). \end{aligned}$$

(6.15)

The detailed proof can be referred to the Lemma A.4 in Cui et al. (2015).

The convergent properties of categorical and continuous estimators have been developed in (6.14) and (6.15). By letting $\varepsilon =cn^{-\tau }$ and recalling the assumption $h=O(n^{-\theta })$ with $\tau /3<\theta <1/3$, we have $h^3=o(\varepsilon )$ and $L_{j,1}(x,y)$ can be easily bounded. Therefore, there exist a positive constants $C_1$ such that

$$\begin{aligned} P\left( \max _{1\le j\le p}\left| \hat{\omega }_j-\omega _j\right| \ge cn^{-\tau } \right)\le & {} \sum _{j=1}^{p}P\left( \left| \hat{\omega }_j-\omega _j\right| \ge cn^{-\tau } \right) \nonumber \\\le & {} O(n^2p\exp (-C_1n^{1-2\tau })), \end{aligned}$$

(6.16)

The theorem is proved. $\square $

Proof of Theorem 2

If $\mathcal {M}\nsubseteq \widetilde{\mathcal {M}}$, there must exist some $j\in \mathcal {M}$ such that $\widetilde{\omega }_j<cn^{-\tau }$. Also, by Condition C6, we assume $ \min \limits _{j \in \mathcal {M}} \omega _j \ge 2cn^{-\tau }$. Thus, $\mathcal {M}\nsubseteq \widetilde{\mathcal {M}}$ implies $|\widetilde{\omega }_j-\omega _j|>cn^{-\tau }$ for some $j\in \mathcal {M}$. Therefore, by (6.16), we have

$$\begin{aligned} P\{ \mathcal {M}\subseteq \widetilde{\mathcal {M}} \}\ge & {} P(\max \limits _{j \in \mathcal {M}} \left| \hat{\omega }_j-\omega _j\right| \le cn^{-\tau }) \nonumber \\\ge & {} 1-P(\max \limits _{j \in \mathcal {M}} |\hat{\omega }_j-\omega _j|> cn^{-\tau }) \nonumber \\\ge & {} 1-d\cdot P(|\hat{\omega }_j-\omega _j|> cn^{-\tau }) \nonumber \\\ge & {} 1-O(n^2d\exp (-C_1n^{1-2\tau })), \end{aligned}$$

where d is the cardinality of $\mathcal {M}$. The sure screening property is proved.

According to the assumption in Theorem 2 that $\kappa =\min \limits _{j\in \mathcal {M}}\omega _j-\max \limits _{j\in \mathcal {M}^c}\omega _j>0$, therefore there exists a positive constant $C_2$ such that

$$\begin{aligned}&P\left( \min \limits _{j\in \mathcal {M}}\hat{\omega }_j \le \max \limits _{j\in \mathcal {M}^c}\hat{\omega }_j \right) =P\left( \min \limits _{j\in \mathcal {M}}\hat{\omega }_j-\min \limits _{j\in \mathcal {M}}\omega _j+\kappa \le \max \limits _{j\in \mathcal {M}^c}\hat{\omega }_j- \max \limits _{j\in \mathcal {M}^c}\omega _j\right) \nonumber \\&\quad = P\left( \{ \max \limits _{j\in \mathcal {M}^c}\hat{\omega }_j- \max \limits _{j\in \mathcal {M}^c}\omega _j\}-\{\min \limits _{j\in \mathcal {M}}\hat{\omega }_j-\min \limits _{j\in \mathcal {M}}\omega _j\} \ge \kappa \right) \nonumber \\&\quad \le P\left( \max \limits _{j\in \mathcal {M}^c}|\hat{\omega }_j-\omega _j|+\max \limits _{j\in \mathcal {M}}|\hat{\omega }_j-\omega _j| \ge \kappa \right) \nonumber \\&\quad \le P\left( \max \limits _{1\le j\le p}|\hat{\omega }_j-\omega _j| \ge \kappa /2 \right) \nonumber \\&\quad \le O(n^2p\exp (-C_2 n\kappa ^2)), \end{aligned}$$

The last inequality is the direct result from Eq. (6.16), and it goes to 0 as $n\rightarrow \infty $, for $\log (p)=O(n^a)$ with some $a<1$. The ranking consistency property is therefore proved. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, L., Li, X., Wang, X. et al. Unified mean-variance feature screening for ultrahigh-dimensional regression. Comput Stat 37, 1887–1918 (2022). https://doi.org/10.1007/s00180-021-01184-2

Download citation

Received: 23 November 2020
Accepted: 30 November 2021
Published: 17 January 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s00180-021-01184-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unified mean-variance feature screening for ultrahigh-dimensional regression

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Model-free feature screening based on Hellinger distance for ultrahigh dimensional data

Robust composite weighted quantile screening for ultrahigh dimensional discriminant analysis

Robust rank screening for ultrahigh dimensional discriminant analysis

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Proof of the theorems

Appendix: Proof of the theorems

Lemma 1

Lemma 2

Lemma 3

Proof of Theorem 1

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now