Abstract
In this paper, we consider sure independence feature screening for ultrahigh dimensional discriminant analysis. We propose a new method named robust rank screening based on the conditional expectation of the rank of predictor’s samples. We also establish the sure screening property for the proposed procedure under simple assumptions. The new procedure has some additional desirable characters. First, it is robust against heavy-tailed distributions, potential outliers and the sample shortage for some categories. Second, it is model-free without any specification of a regression model and directly applicable to the situation with many categories. Third, it is simple in theoretical derivation due to the boundedness of the resulting statistics. Forth, it is relatively inexpensive in computational cost because of the simple structure of the screening index. Monte Carlo simulations and real data examples are used to demonstrate the finite sample performance.
Similar content being viewed by others
References
Barrett, T., Suzek, T.O., Troup, D.B., Wilhite, S.E., Ngau, W.-C., Ledoux, P., Rudnev, D., Lash, A.E., Fujibuchi, W., Edgar, R.: NCBI GEO: mining millions of expression profiles database and tools. Nucleic Acids Res. 33, D562–D566 (2005)
Bickel, P.J., Levina, E.: Some theory for Fisher’s linear discriminant function’, naive Bayes’, and some alternatives when there are many more variables than observations. Bernoulli 10, 989–1010 (2004)
Clemmensen, L., Hastie, T., Witten, D., Ersbøll, B.: Sparse discriminant analysis. Technometrics 53, 406–415 (2011)
Cui, H., Li, R., Zhong, W. : Model-free feature screening for ultrahigh dimensional discriminant analysis. J. Am. Stat. Assoc. (2014)
Fan, J., Fan, Y.: High dimensional classification using features annealed independence rules. Ann. Stat. 36, 2605–2637 (2008)
Fan, J., Feng, Y., Song, R.: Nonparametric independence screening in sparse ultra-high-dimensional additive models. J. Am. Stat. Assoc. 106, 544–557 (2011)
Fan, J., Lv, J.: Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 70, 849–911 (2008)
Fan, J., Samworth, R., Wu, Y.: Ultrahigh dimensional feature selection: beyond the linear model. J. Mach. Learn. Res. 10, 1829–1853 (2009)
Fan, J., Song, R.: Sure independence screening in generalized linear models with NP-dimensionality. Ann. Stat. 38, 3567–3604 (2010)
Gordon, G.J., Jensen, R.V., Hsiao, L.-L., Gullans, S.R., Blumenstock, J.E., Ramaswamy, S., Richards, W.G., Sugarbaker, D.J., Bueno, R.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. 62, 4963–4967 (2002)
He, X., Wang, L., Hong, H.G., et al.: Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Ann. Stat. 41, 342–369 (2013)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–30 (1963)
Li, G., Peng, H., Zhang, J., Zhu, L., et al.: Robust rank correlation based screening. Ann. Stat. 40, 1846–1877 (2012a)
Li, R., Zhong, W., Zhu, L.: Feature screening via distance correlation learning. J. Am. Stat. Assoc. 107, 1129–1139 (2012b)
Mai, Q., Zou, H.: The Kolmogorov filter for variable screening in high-dimensional binary classification. Biometrika 100, 229–234 (2012)
Mai, Q., Zou, H.: The Fused Kolmogorov Filter: A Nonparametric Model-Free Screening Method, arXiv preprint arXiv:1403.7701 (2014)
Nakayama, R., Nemoto, T., Takahashi, H., Ohta, T., Kawai, A., Seki, K., Yoshida, T., Toyama, Y., Ichikawa, H., Hasegawa, T.: Gene expression analysis of soft tissue sarcomas: characterization and reclassification of malignant fibrous histiocytoma. Modern Pathol. 20, 749–759 (2007)
Pan, R., Wang, H., Li, R.: Ultrahigh dimensional multi-class linear discriminant analysis by pairwise sure independence screening. J. Am. Stat. Assoc. (2015)
Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. 99, 6567–6572 (2002)
Wang, H.: Forward regression for ultra-high dimensional variable screening. J. Am. Stat. Assoc. 104, 1512–1524 (2009)
Witten, D.M., Tibshirani, R.: Penalized classification using Fisher’s linear discriminant. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 73, 753–772 (2011)
Zhu, L.-P., Li, L., Li, R., Zhu, L.-X.: Model-free feature screening for ultrahigh-dimensional data. J. Am. Stat. Assoc. 106, 1464–1475 (2011)
Acknowledgments
The authors thank the editor and two referees for their valuable comments and suggestions. Peng Lai’s research was supported by National Natural Science Foundation of China (Grant No. 11301279). Fengli Song’s research was supported by Natural Science Foundation of Jiangsu Province for Youth (Grant No. BK20140983).
Author information
Authors and Affiliations
Corresponding author
Appendix: proof of the theorems
Appendix: proof of the theorems
Lemma 4.1
(Hoeffding’s inequality; Hoeffding 1963) Let \(X_1,\ldots ,X_n\) be independent random variables. Assume that \(P(X_i\in [a_i,b_i])=1\) for \(1\le i\le n\), where \(a_i\) and \(b_i\) are constants. Let \(\bar{X}=n^{-1}\sum _{i=1}^nX_i\). Then the following inequality holds
where t is a positive constant and \(E(\bar{X})\) is the expected value of \(\bar{X}\).
Lemma 4.2
(Hoeffding’s inequality for U-statistics; Hoeffding 1963) Let \(h=h(x_1,\ldots ,x_m)\) be a symmetric kernel of the U-statistics \(U_n\), with \(a\le h(x_1,\ldots ,x_m)\le b\). Put \(\theta =Eh(x_1,\ldots ,x_m)\). Then, for \(t>0\) and \(m\le n\), we have
In order to prove the Theorem 1 smoothly, we give the following inequality.
Lemma 4.3
Proof
Similarly, we have
\(\square \)
In order to apply Hoeffding’s inequality for U-statistics smoothly, we give the following equality.
Lemma 4.4
Denote
then
Proof
The last equality follows that the conditional expectation of X given the event \(Y=y\) is \(E(X|Y=y)=\frac{E(XI(Y=y))}{P(Y=y)}\). \(\square \)
Proof of Theorem 1
According the definitions of \(\omega _j\) and \(\hat{\omega }_j\), we have
We first deal with the term \(E_{j1}\) by Lemma 4.3.
For the term \(I_{j1}\), we have under Condition (C1), for any \(0<\epsilon <1/2\),
Here, in the first inequality, we have used \(\frac{E_1}{\hat{p}_{m}}\cdot \frac{2}{n+1}=\frac{\hat{E}(R(X_{j})|Y=y_m)}{(n+1)/2}\) and \(0 \le \frac{\hat{E}(R(X_{j})|Y=y_m)}{(n+1)/2}\le 2\) from the second inequality in the proof of Lemma 4.3. The last inequality is based on Hoeffding’s inequality in Lemma 4.1.
Now, for the term \(I_{j2}\), we have
Following the proof of Lemma 4.4, we easily have \(E(\frac{E_{11}}{p_m})=E(R(X_{j})-1|Y=y_m)\), \(E(\frac{\hat{p}_m}{p_m})=1\) and \(E(R(X_{j})|Y=y_m)=E(\frac{E_{11}}{p_m})+E(\frac{\hat{p}_m}{p_m})\). Therefore,
To study \(I_{j21}\) in \(I_{j2}\), we denote \(\varphi _j=\frac{E_{11}}{p_m}\cdot \frac{2}{n+1}\),
with
This means that \(\varphi _j\) is an U-statistic with the symmetric kernel \(h\left( X_{ij},Y_i;X_{kj},Y_k\right) \). Now we prove that \(E(h\left( X_{ij},Y_i;X_{kj},Y_k\right) )=E(\varphi _j)\). In fact, we just need to prove that
because \(h\left( X_{ij},Y_i;X_{kj},Y_k\right) )\) is a symmetric kernel of the U-statistics. We have
Under Condition (C1), we have \(0\le h\left( X_{ij},Y_i;X_{kj},Y_k \right) \le \frac{1}{c_1}.\) Also, \(0\le E(\varphi _j)\le \frac{1}{c_1}\). Taking application of the Hoeffding’s inequality for U-statistics in Lemma 4.2, we have
Now, for \(I_{j22}\) in \(I_{j2}\), we use the Hoeffding’s inequality in Lemma 4.1,
Here, the first inequality is due to \(0\le \frac{2}{n+1}\le 1\). The last inequality is based on Hoeffding’s inequality in Lemma 4.1.
Next, we deal with the term \(E_{j2}\) in (4.1).
Let \(f_{(i)}=\sum _{m=1}^rI(Y_i=y_m)\left( \frac{E(R(X_{j})|Y=y_m)}{E(R(X_{j}))}-1\right) ^2\), \(E_{j21}=\bar{f}_{(i)}=\frac{1}{n}\sum _{i=1}^nf_{(i)}\) and \(E_{j22}=E(\bar{f}_{(i)})\). By Lemma 4.3, we have \(0\le |f_{(i)}|\le 1\). Then we apply Hoeffding’s inequality in Lemma 4.1 to obtain that
According to \((5.2){\sim }(5.5)\), there exists a positive constant \(c_3\) such that
Therefore, there exists a positive constant \(c_4\) such that
Under Condition (C2) that \( \max \limits _{j \in {\mathcal {M}}} \omega _j \ge 2cn^{-\tau } \), if \({\mathcal {M}}\nsubseteq \hat{{\mathcal {M}}}\), there must exist some \(j\in {\mathcal {M}}\) such that \(\hat{\omega }_{j}<cn^{-\tau }\). This indicates that the event satisfies \(\{ {\mathcal {M}}\nsubseteq \hat{{\mathcal {M}}} \}\subseteq \{|\hat{\omega }_{j}-\omega _j|>cn^{-\tau },\text {for some } j\in {\mathcal {M}}\}\). Hence, denote \(S_n=\{\max \limits _{j \in {\mathcal {M}}} |\hat{\omega }_{j}-\omega _j|\le cn^{-\tau }\} \subseteq \{ {\mathcal {M}}\subseteq \hat{{\mathcal {M}}} \}.\) Consequently,
where d is the cardinality of \({\mathcal {M}}\). This completes the proof of Theorem 1. \(\square \)
Rights and permissions
About this article
Cite this article
Cheng, G., Li, X., Lai, P. et al. Robust rank screening for ultrahigh dimensional discriminant analysis. Stat Comput 27, 535–545 (2017). https://doi.org/10.1007/s11222-016-9637-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-016-9637-2