Robust rank screening for ultrahigh dimensional discriminant analysis

Cheng, Guosheng; Li, Xingxiang; Lai, Peng; Song, Fengli; Yu, Jun

doi:10.1007/s11222-016-9637-2

Robust rank screening for ultrahigh dimensional discriminant analysis

Published: 12 February 2016

Volume 27, pages 535–545, (2017)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Guosheng Cheng¹,
Xingxiang Li¹,
Peng Lai¹,
Fengli Song¹ &
…
Jun Yu²

885 Accesses
8 Citations
Explore all metrics

Abstract

In this paper, we consider sure independence feature screening for ultrahigh dimensional discriminant analysis. We propose a new method named robust rank screening based on the conditional expectation of the rank of predictor’s samples. We also establish the sure screening property for the proposed procedure under simple assumptions. The new procedure has some additional desirable characters. First, it is robust against heavy-tailed distributions, potential outliers and the sample shortage for some categories. Second, it is model-free without any specification of a regression model and directly applicable to the situation with many categories. Third, it is simple in theoretical derivation due to the boundedness of the resulting statistics. Forth, it is relatively inexpensive in computational cost because of the simple structure of the screening index. Monte Carlo simulations and real data examples are used to demonstrate the finite sample performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust composite weighted quantile screening for ultrahigh dimensional discriminant analysis

Article 06 January 2020

Unified mean-variance feature screening for ultrahigh-dimensional regression

Article 17 January 2022

Ridge-forward quadratic discriminant analysis in high-dimensional situations

Article 13 December 2016

References

Barrett, T., Suzek, T.O., Troup, D.B., Wilhite, S.E., Ngau, W.-C., Ledoux, P., Rudnev, D., Lash, A.E., Fujibuchi, W., Edgar, R.: NCBI GEO: mining millions of expression profiles database and tools. Nucleic Acids Res. 33, D562–D566 (2005)
Article Google Scholar
Bickel, P.J., Levina, E.: Some theory for Fisher’s linear discriminant function’, naive Bayes’, and some alternatives when there are many more variables than observations. Bernoulli 10, 989–1010 (2004)
Article MathSciNet MATH Google Scholar
Clemmensen, L., Hastie, T., Witten, D., Ersbøll, B.: Sparse discriminant analysis. Technometrics 53, 406–415 (2011)
Article MathSciNet Google Scholar
Cui, H., Li, R., Zhong, W. : Model-free feature screening for ultrahigh dimensional discriminant analysis. J. Am. Stat. Assoc. (2014)
Fan, J., Fan, Y.: High dimensional classification using features annealed independence rules. Ann. Stat. 36, 2605–2637 (2008)
Article MathSciNet MATH Google Scholar
Fan, J., Feng, Y., Song, R.: Nonparametric independence screening in sparse ultra-high-dimensional additive models. J. Am. Stat. Assoc. 106, 544–557 (2011)
Article MathSciNet MATH Google Scholar
Fan, J., Lv, J.: Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 70, 849–911 (2008)
Article MathSciNet Google Scholar
Fan, J., Samworth, R., Wu, Y.: Ultrahigh dimensional feature selection: beyond the linear model. J. Mach. Learn. Res. 10, 1829–1853 (2009)
MathSciNet MATH Google Scholar
Fan, J., Song, R.: Sure independence screening in generalized linear models with NP-dimensionality. Ann. Stat. 38, 3567–3604 (2010)
Gordon, G.J., Jensen, R.V., Hsiao, L.-L., Gullans, S.R., Blumenstock, J.E., Ramaswamy, S., Richards, W.G., Sugarbaker, D.J., Bueno, R.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. 62, 4963–4967 (2002)
He, X., Wang, L., Hong, H.G., et al.: Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Ann. Stat. 41, 342–369 (2013)
Article MathSciNet MATH Google Scholar
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–30 (1963)
Article MathSciNet MATH Google Scholar
Li, G., Peng, H., Zhang, J., Zhu, L., et al.: Robust rank correlation based screening. Ann. Stat. 40, 1846–1877 (2012a)
Article MathSciNet MATH Google Scholar
Li, R., Zhong, W., Zhu, L.: Feature screening via distance correlation learning. J. Am. Stat. Assoc. 107, 1129–1139 (2012b)
Article MathSciNet MATH Google Scholar
Mai, Q., Zou, H.: The Kolmogorov filter for variable screening in high-dimensional binary classification. Biometrika 100, 229–234 (2012)
Article MathSciNet MATH Google Scholar
Mai, Q., Zou, H.: The Fused Kolmogorov Filter: A Nonparametric Model-Free Screening Method, arXiv preprint arXiv:1403.7701 (2014)
Nakayama, R., Nemoto, T., Takahashi, H., Ohta, T., Kawai, A., Seki, K., Yoshida, T., Toyama, Y., Ichikawa, H., Hasegawa, T.: Gene expression analysis of soft tissue sarcomas: characterization and reclassification of malignant fibrous histiocytoma. Modern Pathol. 20, 749–759 (2007)
Article Google Scholar
Pan, R., Wang, H., Li, R.: Ultrahigh dimensional multi-class linear discriminant analysis by pairwise sure independence screening. J. Am. Stat. Assoc. (2015)
Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. 99, 6567–6572 (2002)
Article Google Scholar
Wang, H.: Forward regression for ultra-high dimensional variable screening. J. Am. Stat. Assoc. 104, 1512–1524 (2009)
Article MathSciNet MATH Google Scholar
Witten, D.M., Tibshirani, R.: Penalized classification using Fisher’s linear discriminant. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 73, 753–772 (2011)
Article MathSciNet MATH Google Scholar
Zhu, L.-P., Li, L., Li, R., Zhu, L.-X.: Model-free feature screening for ultrahigh-dimensional data. J. Am. Stat. Assoc. 106, 1464–1475 (2011)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors thank the editor and two referees for their valuable comments and suggestions. Peng Lai’s research was supported by National Natural Science Foundation of China (Grant No. 11301279). Fengli Song’s research was supported by Natural Science Foundation of Jiangsu Province for Youth (Grant No. BK20140983).

Author information

Authors and Affiliations

School of Mathematics and Statistics, Nanjing University of Information Science & Technology, Nanjing, 210044, China
Guosheng Cheng, Xingxiang Li, Peng Lai & Fengli Song
Department of Mathematics and Statistics, University of Vermont, Burlington, VT, 05401, USA
Jun Yu

Authors

Guosheng Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Xingxiang Li
View author publications
You can also search for this author in PubMed Google Scholar
Peng Lai
View author publications
You can also search for this author in PubMed Google Scholar
Fengli Song
View author publications
You can also search for this author in PubMed Google Scholar
Jun Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peng Lai.

Appendix: proof of the theorems

Lemma 4.1

(Hoeffding’s inequality; Hoeffding 1963) Let $X_1,\ldots ,X_n$ be independent random variables. Assume that $P(X_i\in [a_i,b_i])=1$ for $1\le i\le n$, where $a_i$ and $b_i$ are constants. Let $\bar{X}=n^{-1}\sum _{i=1}^nX_i$. Then the following inequality holds

$$\begin{aligned} P(|\bar{X}-E(\bar{X})|\ge t)\le 2 \text {exp}\left( -\frac{2n^{2}t^{2}}{\sum _{i=1}^n(b_i-a_i)^{2}}\right) , \end{aligned}$$

where t is a positive constant and $E(\bar{X})$ is the expected value of $\bar{X}$.

Lemma 4.2

(Hoeffding’s inequality for U-statistics; Hoeffding 1963) Let $h=h(x_1,\ldots ,x_m)$ be a symmetric kernel of the U-statistics $U_n$, with $a\le h(x_1,\ldots ,x_m)\le b$. Put $\theta =Eh(x_1,\ldots ,x_m)$. Then, for $t>0$ and $m\le n$, we have

$$\begin{aligned} P(|U_n-E(U_n)|> t)\le 2 \text {exp}\left( -\frac{2\lfloor (n/m)\rfloor t^{2}}{(b-a)^{2}}\right) . \end{aligned}$$

In order to prove the Theorem 1 smoothly, we give the following inequality.

Lemma 4.3

$$\begin{aligned}&\left| \frac{\hat{E}(R(X_{j})|Y=y_m)}{(n+1)/2}-1\right| \le 1,\\&\quad \left| \frac{E(R(X_{j})|Y=y_m)}{E(R(X_{j}))}-1\right| \le 1. \end{aligned}$$

Proof

$$\begin{aligned}&1\le \hat{E}(R(X_{j})|Y=y_m) \le n, \nonumber \\&\frac{2}{n+1}\le \frac{\hat{E}(R(X_{j})|Y=y_m)}{(n+1)/2} \le \frac{2n}{n+1}, \nonumber \\&-\frac{n-1}{n+1}\le \frac{\hat{E}(R(X_{j})|Y=y_m)}{(n+1)/2}-1 \le \frac{n-1}{n+1}, \nonumber \\&0\le \left| \frac{\hat{E}(R(X_{j})|Y=y_m)}{(n+1)/2}-1\right| \le 1. \end{aligned}$$

Similarly, we have

$$\begin{aligned} 0\le \left| \frac{E(R(X_{j})|Y=y_m)}{E(R(X_{j}))}-1\right| \le 1. \end{aligned}$$

$\square $

In order to apply Hoeffding’s inequality for U-statistics smoothly, we give the following equality.

Lemma 4.4

Denote

$$\begin{aligned} E_{1}=\frac{1}{n}\sum _{i=1}^n\left( \sum _{k=1}^nI(X_{kj}<X_{ij})+1\right) I(Y_i=y_m), \end{aligned}$$

then

$$\begin{aligned} E\left( \frac{E_{1}}{p_m}\right) =E(R(X_{j})|Y=y_m). \end{aligned}$$

Proof

$$\begin{aligned}&E\left( \frac{E_{1}}{p_m}\right) =\frac{1}{p_m}E(E_{1}) \nonumber \\&\quad =\frac{1}{p_m}E\left( \frac{1}{n}\sum _{i=1}^n\left( \sum _{k=1}^nI(X_{kj}<X_{ij})+1\right) I(Y_i=y_m)\right) \nonumber \\&\quad =\frac{1}{p_m}\frac{1}{n}\sum _{i=1}^nE\left( (\sum _{k=1}^nI(X_{kj}<X_{ij})+1) I(Y_i=y_m)\right) \nonumber \\&\quad =\frac{1}{p_m}E\left( \left( \sum _{k=1}^nI(X_{kj}<X_{ij})+1\right) I(Y_i=y_m)\right) \nonumber \\&\quad =\frac{1}{p_m}E\left( R(X_{ij})I(Y_i=y_m)\right) \nonumber \\&\quad =\frac{E\left( R(X_{j})I(Y=y_m)\right) }{P(Y=y_m)}\nonumber \\&\quad =E(R(X_{j})|Y=y_m). \end{aligned}$$

The last equality follows that the conditional expectation of X given the event $Y=y$ is $E(X|Y=y)=\frac{E(XI(Y=y))}{P(Y=y)}$. $\square $

Proof of Theorem 1

According the definitions of $\omega _j$ and $\hat{\omega }_j$, we have

$$\begin{aligned} \hat{\omega }_j-\omega _j= & {} \sum _{m=1}^r\hat{p}_m\left[ \frac{\hat{E}(R(X_{j})|Y=y_m)}{(n+1)/2}-1\right] ^2\nonumber \\&-\sum _{m=1}^rp_m\left[ \frac{E(R(X_{j})|Y=y_m)}{E(R(X_{j}))}-1\right] ^2\nonumber \\= & {} \sum _{m=1}^r\hat{p}_m\left[ \left( \frac{\hat{E}(R(X_{j})|Y=y_m)}{(n+1)/2}-1\right) ^2\right. \nonumber \\&-\left. \left( \frac{E(R(X_{j})|Y=y_m)}{E(R(X_{j}))}-1\right) ^2\right] \nonumber \\&+\sum _{m=1}^r(\hat{p}_m-p_m)\left[ \frac{E(R(X_{j})|Y=y_m)}{E(R(X_{j}))}-1\right] ^2\nonumber \\=: & {} E_{j1}+E_{j2}. \end{aligned}$$

(4.1)

We first deal with the term $E_{j1}$ by Lemma 4.3.

$$\begin{aligned}&\left| E_{j1}\right| \le \sum _{m=1}^r\hat{p}_m \left| \left[ \left( \frac{\hat{E}(R(X_{j})|Y=y_m)}{(n+1)/2)}-1\right) \right. \right. \nonumber \\&\left. \left. +\, \left( \frac{E(R(X_{j})|Y=y_m)}{E(R(X_{j}))}-1\right) \right] \right. \\&\quad \left. \cdot \left[ \left( \frac{\hat{E}(R(X_{j})|Y=y_m)}{(n+1)/2}-1\right) - \left( \frac{E(R(X_{j})|Y=y_m)}{E(R(X_{j}))}-1\right) \right] \right| \\&\quad \le 2\max \limits _{m}\left| \left( \frac{\hat{E}(R(X_{j})|Y=y_m)}{(n+1)/2}-1\right) - \left( \frac{E(R(X_{j})|Y=y_m)}{E(R(X_{j}))}-1\right) \right| \\&\quad =2\max \limits _{m}\left| \frac{\hat{E}(R(X_{j})|Y=y_m)}{(n+1)/2}-\frac{E(R(X_{j})|Y=y_m)}{E(R(X_{j}))}\right| \\&\quad =2\max \limits _{m}\left| \frac{\hat{E}(R(X_{j})|Y=y_m)-E(R(X_{j})|Y=y_m)}{(n+1)/2}\right| . \\&\quad \hat{E}(R(X_{j})|Y=y_m)=\frac{\frac{1}{n}\sum _{i=1}^n(\sum _{k=1}^nI(X_{kj}<X_{ij})+1)I(Y_i=y_m)}{\frac{1}{n}\sum _{l=1}^nI(Y_l=y_m)} \\&\quad =:\frac{E_{1}}{\hat{p}_{m}}.\\&\left| E_{j1}\right| \le 2\max \limits _{m}\left| \left( \frac{E_{1}}{\hat{p}_{m}}-\frac{E_{1}}{p_m}+\frac{E_{1}}{p_m}-E(R(X_{j})|Y=y_m)\right) \cdot \frac{2}{n+1}\right| \\&\quad \le 2\max \limits _{m}\left| \left( \frac{E_{1}}{\hat{p}_{m}}-\frac{E_{1}}{p_m}\right) \cdot \frac{2}{n+1}\right| \\&\quad +2\max \limits _{m}\left| \left( \frac{E_{1}}{p_m}-E(R(X_{j})|Y=y_m)\right) \cdot \frac{2}{n+1}\right| \\&\quad =:2(I_{j1}+I_{j2}). \end{aligned}$$

For the term $I_{j1}$, we have under Condition (C1), for any $0<\epsilon <1/2$,

$$\begin{aligned} P\left( I_{j1}\ge \epsilon \right)= & {} P\left( \max \limits _{m}\left| \left( \frac{(p_m-\hat{p}_{m})E_{1}}{p_m\hat{p}_{m}}\right) \cdot \frac{2}{n+1}\right| \ge \epsilon \right) \nonumber \\\le & {} P\left( \max \limits _{m}\left| \frac{2(p_m-\hat{p}_{m})}{c_1}\right| \ge \epsilon \right) \nonumber \\\le & {} P\left( \max \limits _{m}\left| \frac{1}{n}\sum _{i=1}^nI(Y_i=y_m)-p_m \right| \ge \frac{1}{2}c_1\epsilon \right) \nonumber \\\le & {} 2r\cdot \text {exp}\left( -\frac{1}{2}c_1^2n\epsilon ^2\right) . \end{aligned}$$

(4.2)

Here, in the first inequality, we have used $\frac{E_1}{\hat{p}_{m}}\cdot \frac{2}{n+1}=\frac{\hat{E}(R(X_{j})|Y=y_m)}{(n+1)/2}$ and $0 \le \frac{\hat{E}(R(X_{j})|Y=y_m)}{(n+1)/2}\le 2$ from the second inequality in the proof of Lemma 4.3. The last inequality is based on Hoeffding’s inequality in Lemma 4.1.

Now, for the term $I_{j2}$, we have

$$\begin{aligned} \frac{E_{1}}{p_m}= & {} \frac{\frac{1}{n}\sum _{i=1}^n\left( \sum _{k=1}^nI(X_{kj}<X_{ij})+1\right) I(Y_i=y_m)}{p_m}\nonumber \\= & {} \frac{\frac{1}{n}\sum _{i=1}^n(\sum _{k=1}^nI(X_{kj}<X_{ij}))I(Y_i=y_m)}{p_m}\\&+\frac{\frac{1}{n}\sum _{i=1}^nI(Y_i=y_m)}{p_m}\nonumber \\=: & {} \frac{E_{11}}{p_m}+\frac{\hat{p}_m}{p_m}. \end{aligned}$$

Following the proof of Lemma 4.4, we easily have $E(\frac{E_{11}}{p_m})=E(R(X_{j})-1|Y=y_m)$, $E(\frac{\hat{p}_m}{p_m})=1$ and $E(R(X_{j})|Y=y_m)=E(\frac{E_{11}}{p_m})+E(\frac{\hat{p}_m}{p_m})$. Therefore,

$$\begin{aligned} I_{j2}= & {} \max \limits _{m}\left| \left( \frac{E_{1}}{p_m}-E(R(X_{j})|Y=y_m)\right) \cdot \frac{2}{n+1}\right| \nonumber \\= & {} \max \limits _{m}\left| \left( \frac{E_{11}}{p_m}-E(\frac{E_{11}}{p_m})+\frac{\hat{p}_m}{p_m}-E(\frac{\hat{p}_m}{p_m})\right) \cdot \frac{2}{n+1}\right| \nonumber \\\le & {} \max \limits _{m}\left| \left( \frac{E_{11}}{p_m}\cdot \frac{2}{n+1}-E(\frac{E_{11}}{p_m}\cdot \frac{2}{n+1})\right) \right| \\&+\,\max \limits _{m}\left| \left( \frac{\hat{p}_m}{p_m}-E(\frac{\hat{p}_m}{p_m})\right) \cdot \frac{2}{n+1}\right| \nonumber \\=: & {} (I_{j21}+I_{j22}). \end{aligned}$$

To study $I_{j21}$ in $I_{j2}$, we denote $\varphi _j=\frac{E_{11}}{p_m}\cdot \frac{2}{n+1}$,

$$\begin{aligned} \varphi _j= & {} \frac{\frac{1}{n}\sum _{i=1}^n(\sum _{k=1}^nI(X_{kj}<X_{ij}))I(Y_i=y_m)}{p_m\cdot (n+1)/2} \nonumber \\= & {} \frac{\frac{1}{n^2}\sum _{i=1}^n\sum _{k=1}^nI(X_{kj}<X_{ij})I(Y_i=y_m)}{p_m\cdot (n+1)/2n} \nonumber \\= & {} \frac{2}{n(n-1)}\sum _{1\le k<i\le n}h\left( X_{ij},Y_i;X_{kj},Y_k\right) , \end{aligned}$$

with

$$\begin{aligned} h\left( X_{ij},Y_i;X_{kj},Y_k\right)= & {} \frac{1}{2}\left[ \frac{(n-1)n}{n^2}\left( \frac{I(X_{kj}<X_{ij})I(Y_i=y_m)}{p_m\cdot (n+1)/2n} \right. \right. \nonumber \\&+\left. \left. \frac{I(X_{ij}<X_{kj})I(Y_k=y_m)}{p_m\cdot (n+1)/2n}\right) \right] . \end{aligned}$$

This means that $\varphi _j$ is an U-statistic with the symmetric kernel $h\left( X_{ij},Y_i;X_{kj},Y_k\right) $. Now we prove that $E(h\left( X_{ij},Y_i;X_{kj},Y_k\right) )=E(\varphi _j)$. In fact, we just need to prove that

$$\begin{aligned}&(n-1)E(I(X_{kj}<X_{ij})I(Y_i=y_m))/p_m\\&\quad =E(R(X_{j})-1|Y=y_m) \end{aligned}$$

because $h\left( X_{ij},Y_i;X_{kj},Y_k\right) )$ is a symmetric kernel of the U-statistics. We have

$$\begin{aligned}&(n-1)E(I(X_{kj}<X_{ij})I(Y_i=y_m))/p_m \nonumber \\&\quad =\frac{1}{p_m}\sum _{k\ne i}^nE\left( I(X_{kj}<X_{ij})I(Y_i=y_m)\right) \nonumber \\&\quad = \frac{1}{p_m}E\left( \sum _{k\ne i}^nI(X_{kj}<X_{ij})I(Y_i=y_m)\right) \nonumber \\&\quad = \frac{1}{p_m}E\left( \left( \sum _{k=1}^nI(X_{kj}<X_{ij})\right) I(Y_i=y_m)\right) \nonumber \\&\quad = E\left( R(X_{ij})-1|Y_i=y_m\right) \nonumber \\&\quad = E\left( R(X_{j})-1|Y=y_m\right) \end{aligned}$$

Under Condition (C1), we have $0\le h\left( X_{ij},Y_i;X_{kj},Y_k \right) \le \frac{1}{c_1}.$ Also, $0\le E(\varphi _j)\le \frac{1}{c_1}$. Taking application of the Hoeffding’s inequality for U-statistics in Lemma 4.2, we have

$$\begin{aligned} P\left( I_{j21}\ge \epsilon \right)= & {} P\left( \max \limits _{m}\left| \varphi _j - E(\varphi _j)\right| \ge \epsilon \right) \nonumber \\\le & {} 2r\cdot \text {exp}\left( -\frac{2\lfloor (n/2)\rfloor \epsilon ^{2}}{(\frac{1}{c_1})^{2}}\right) \nonumber \\\le & {} 2r \cdot \text {exp}\left( -c_1^2(n-1)\epsilon ^{2}\right) . \end{aligned}$$

(4.3)

Now, for $I_{j22}$ in $I_{j2}$, we use the Hoeffding’s inequality in Lemma 4.1,

$$\begin{aligned} P\left( I_{j22}\ge \epsilon \right)= & {} P\left( \max \limits _{m}\left| \left( \frac{\hat{p}_m}{p_m}-E(\frac{\hat{p}_m}{p_m})\right) \cdot \frac{2}{n+1}\right| \ge \epsilon \right) \nonumber \\\le & {} P\left( \max \limits _{m}\left| \left( \frac{\hat{p}_m}{p_m}-E(\frac{\hat{p}_m}{p_m})\right) \right| \ge \epsilon \right) \nonumber \\\le & {} P\left( \max \limits _{m}\left| \frac{1}{n}\sum _{i=1}^nI(Y_i=y_m)-p_m \right| \ge c_1\epsilon \right) \nonumber \\\le & {} 2r \cdot \text {exp}\left\{ -2c_1^2n\epsilon ^{2}\right\} . \end{aligned}$$

(4.4)

Here, the first inequality is due to $0\le \frac{2}{n+1}\le 1$. The last inequality is based on Hoeffding’s inequality in Lemma 4.1.

Next, we deal with the term $E_{j2}$ in (4.1).

$$\begin{aligned} E_{j2}= & {} \sum _{m=1}^r \left( \frac{1}{n}\sum _{i=1}^nI(Y_i=y_m)-p_m\right) \nonumber \\&\left( \frac{E(R(X_{j})|Y=y_m)}{E(R(X_{j}))}-1\right) ^2 \nonumber \\= & {} \frac{1}{n}\sum _{i=1}^n\sum _{m=1}^rI(Y_i=y_m)\left( \frac{E(R(X_{j})|Y=y_m)}{E(R(X_{j}))}-1\right) ^2 \nonumber \\&-\sum _{m=1}^rp_m\left( \frac{E(R(X_{j})|Y=y_m)}{E(R(X_{j}))}-1\right) ^2 \nonumber \\=: & {} E_{j21}-E_{j22}. \end{aligned}$$

Let $f_{(i)}=\sum _{m=1}^rI(Y_i=y_m)\left( \frac{E(R(X_{j})|Y=y_m)}{E(R(X_{j}))}-1\right) ^2$, $E_{j21}=\bar{f}_{(i)}=\frac{1}{n}\sum _{i=1}^nf_{(i)}$ and $E_{j22}=E(\bar{f}_{(i)})$. By Lemma 4.3, we have $0\le |f_{(i)}|\le 1$. Then we apply Hoeffding’s inequality in Lemma 4.1 to obtain that

$$\begin{aligned}&P\left( \left| E _{j2} \right| \ge \epsilon \right) =P\left( \left| \bar{f}_{(i)}-E(\bar{f}_{(i)})\right| \ge \epsilon \right) \nonumber \\&\quad \le 2\text {exp}(-2n\epsilon ^2). \end{aligned}$$

(4.5)

According to $(5.2){\sim }(5.5)$, there exists a positive constant $c_3$ such that

$$\begin{aligned} P\left( \left| \hat{\omega }_j-\omega _j\right| \ge \epsilon \right)= & {} P\left( \left| E_{j1}-E_{j2}\right| \ge \epsilon \right) \nonumber \\\le & {} P\left( \left| E_{j1}\right| \ge \frac{\epsilon }{2}\right) +P\left( \left| E_{j2}\right| \ge \frac{\epsilon }{2}\right) \nonumber \\\le & {} P\left( 2(I_{j1}+I_{j2})\ge \frac{\epsilon }{2}\right) \nonumber \\&+\,P\left( \left| E_{j2}\right| \ge \frac{\epsilon }{2}\right) \nonumber \\\le & {} P\left( I_{j1}\ge \frac{\epsilon }{8}\right) +P\left( I_{j2}\ge \frac{\epsilon }{8}\right) \nonumber \\&+\,P\left( \left| E_{j2}\right| \ge \frac{\epsilon }{2}\right) \nonumber \\\le & {} P\left( I_{j1}\ge \frac{\epsilon }{8}\right) +P\left( (I_{j21}+I_{j21})\ge \frac{\epsilon }{8}\right) \nonumber \\&+\,P\left( \left| E_{j2}\right| \ge \frac{\epsilon }{2}\right) \nonumber \\\le & {} P\left( I_{j1}\ge \frac{\epsilon }{8}\right) +P\left( I_{j21}\ge \frac{\epsilon }{16}\right) \nonumber \\&+\,P\left( I_{j22}\ge \frac{\epsilon }{16}\right) +P\left( \left| E_{j2}\right| \ge \frac{\epsilon }{2}\right) \nonumber \\\le & {} 8r\cdot \text {exp}(-c_3n\epsilon ^{2}). \end{aligned}$$

Therefore, there exists a positive constant $c_4$ such that

$$\begin{aligned} P\left( \max \limits _{1\le j\le p}\left| \hat{\omega }_j-\omega _j\right| \ge cn^{-\tau } \right)\le & {} 8rp\cdot \text {exp}(-c_3c^2n^{1-2\tau }) \nonumber \\\le & {} 8rp\cdot \text {exp}(-c_4n^{1-2\tau }) . \end{aligned}$$

Under Condition (C2) that $ \max \limits _{j \in {\mathcal {M}}} \omega _j \ge 2cn^{-\tau } $, if ${\mathcal {M}}\nsubseteq \hat{{\mathcal {M}}}$, there must exist some $j\in {\mathcal {M}}$ such that $\hat{\omega }_{j}<cn^{-\tau }$. This indicates that the event satisfies $\{ {\mathcal {M}}\nsubseteq \hat{{\mathcal {M}}} \}\subseteq \{|\hat{\omega }_{j}-\omega _j|>cn^{-\tau },\text {for some } j\in {\mathcal {M}}\}$. Hence, denote $S_n=\{\max \limits _{j \in {\mathcal {M}}} |\hat{\omega }_{j}-\omega _j|\le cn^{-\tau }\} \subseteq \{ {\mathcal {M}}\subseteq \hat{{\mathcal {M}}} \}.$ Consequently,

$$\begin{aligned} P\{ {\mathcal {M}}\subseteq \hat{{\mathcal {M}}} \}\ge & {} P\{S_n\}=1-P\{S^{c}_n\}\nonumber \\= & {} 1-P(\min \limits _{j \in {\mathcal {M}}} |\hat{\omega }_{j}-\omega _j|\ge cn^{-\tau })\nonumber \\\ge & {} 1-d\cdot P( |\hat{\omega }_{j}-\omega _j|\ge cn^{-\tau })\nonumber \\\ge & {} 1-8rd\cdot \text {exp}(-c_4n^{1-2\tau }), \end{aligned}$$

where d is the cardinality of ${\mathcal {M}}$. This completes the proof of Theorem 1. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cheng, G., Li, X., Lai, P. et al. Robust rank screening for ultrahigh dimensional discriminant analysis. Stat Comput 27, 535–545 (2017). https://doi.org/10.1007/s11222-016-9637-2

Download citation

Received: 27 July 2015
Accepted: 02 February 2016
Published: 12 February 2016
Issue Date: March 2017
DOI: https://doi.org/10.1007/s11222-016-9637-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust rank screening for ultrahigh dimensional discriminant analysis

Abstract

Access this article

Similar content being viewed by others

Robust composite weighted quantile screening for ultrahigh dimensional discriminant analysis

Unified mean-variance feature screening for ultrahigh-dimensional regression

Ridge-forward quadratic discriminant analysis in high-dimensional situations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: proof of the theorems

Lemma 4.1

Lemma 4.2

Lemma 4.3

Proof

Lemma 4.4

Proof

Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust rank screening for ultrahigh dimensional discriminant analysis

Abstract

Access this article

Similar content being viewed by others

Robust composite weighted quantile screening for ultrahigh dimensional discriminant analysis

Unified mean-variance feature screening for ultrahigh-dimensional regression

Ridge-forward quadratic discriminant analysis in high-dimensional situations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: proof of the theorems

Appendix: proof of the theorems

Lemma 4.1

Lemma 4.2

Lemma 4.3

Proof

Lemma 4.4

Proof

Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation