Skip to main content
Log in

Interaction Identification and Clique Screening for Classification with Ultra-high Dimensional Discrete Features

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

Interactions have greatly influenced recent scientific discoveries, but the identification of interactions is challenging in ultra-high dimensions. In this study, we propose an interaction identification method for classification with ultra-high dimensional discrete features. We utilize clique sets to capture interactions among features, where features in a common clique have interactions that can be used for classification. The number of features related to the interaction is the size of the clique. Hence, our method can consider interactions caused by more than two feature variables. We propose a Kullback-Leibler divergence-based approach to correctly identify the clique sets with a probability that tends to 1 as the sample size tends to infinity. A clique screening method is then proposed to filter out clique sets that are useless for classification, and the strong sure screening property can be guaranteed. Finally, a clique naïve Bayes classifier is proposed for classification. Numerical studies demonstrate that our proposed approach performs very well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • An, B., Wang, H., & Guo, J. (2013). Testing the statistical significance of an ultra-high-dimensional naive bayes classifier. Statistics and Its Interface, 6(2), 223–229.

    Article  MathSciNet  Google Scholar 

  • Cui, H., Li, R., & Zhong, W. (2015). Model-free feature screening for ultrahigh dimensional discriminant analysis. Journal of the American Statistical Association, 110(510), 630–641.

    Article  MathSciNet  Google Scholar 

  • Fan, J., Feng, Y., & Song, R. (2011). Nonparametric independence screening in sparse ultra-high-dimensional additive models. Journal of the American Statistical Association, 106(494), 544–557.

    Article  MathSciNet  Google Scholar 

  • Fan, J., & Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(5), 849–911.

    Article  MathSciNet  Google Scholar 

  • Fan, J., Song, R., & et al. (2010). Sure independence screening in generalized linear models with np-dimensionality. The Annals of Statistics, 38(6), 3567–3604.

    Article  MathSciNet  Google Scholar 

  • Fan, Y., Kong, Y., Li, D., Zheng, Z., & et al (2015). Innovated interaction screening for high-dimensional nonlinear classification. The Annals of Statistics, 43(3), 1243–1272.

    Article  MathSciNet  Google Scholar 

  • Guyon, I., Gunn S., Hur A. B., & Dror G. (2004). Result analysis of the nips 2003 feature selection challenge. In Proceedings of the 17th international conference on neural information processing systems.

  • Hao, N., & Zhang, H.H. (2014). Interaction screening for ultra-high dimensional data. Journal of the American Statistical Association, 109(507), 1285–1301.

    Article  MathSciNet  Google Scholar 

  • Huang, D., Li, R., & Wang, H. (2014). Feature screening for ultrahigh dimensional categorical data with applications. Journal of Business & Economic Statistics, 32(2), 237–244.

    Article  MathSciNet  Google Scholar 

  • Huang, J., Breheny, P., & Ma, S. (2012). A selective review of group selection in high-dimensional models. Statistical science: a review journal of the Institute of Mathematical Statistics, 27(4), 481–499.

    Article  MathSciNet  Google Scholar 

  • Joachims, T. (2002). Learning to classify text using support vector machines: methods, theory and algorithms. Boston: Kluwer Academic Publishers.

    Book  Google Scholar 

  • Kussul, E., & Baidyk, T. (2004). Improved method of handwritten digit recognition tested on mnist database. Image and Vision Computing, 22(12), 971–981.

    Article  Google Scholar 

  • Li, R., Zhong, W., & Zhu, L. (2012). Feature screening via distance correlation learning. Journal of the American Statistical Association, 107(499), 1129–1139.

    Article  MathSciNet  Google Scholar 

  • Mai, Q., & Zou, H. (2015). The fused kolmogorov filter: a nonparametric model-free screening method. The Annals of Statistics, 43(4), 1471–1497.

    Article  MathSciNet  Google Scholar 

  • Reese, R., Dai, X., & Fu, G. (2018). Strong sure screening of ultra-high dimensional categorical data. arXiv:1801.03539.

  • Webb, G.I., Boughton, J.R., & Wang, Z. (2005). Not so naive Bayes: aggregating one-dependence estimators. Machine Learning, 58(1), 5–24.

    Article  Google Scholar 

  • Wu, X., & Kumar, V. (2009). The top ten algorithms in data mining. Boca Raton: CRC Press.

    Book  Google Scholar 

  • Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1), 49–67.

    Article  MathSciNet  Google Scholar 

  • Zhao, W., Chellappa, R., & Krishnaswamy, A. (1998). Discriminant analysis of principal components for face recognition. In Proceedings third IEEE international conference on automatic face and gesture recognition.

  • Zhu, J., & Hastie, T. (2004). Classification of gene microarrays by penalized logistic regression. Biostatistics, 5(3), 427–443.

    Article  Google Scholar 

  • Zhu, L.-P., Li, L., Li, R., & Zhu, L.-X. (2011). Model-free feature screening for ultrahigh-dimensional data. Journal of the American Statistical Association, 106(496), 1464–1475.

    Article  MathSciNet  Google Scholar 

  • Zhu, X., Suk, H., & Shen, D. (2014). A novel matrix-similarity based loss function for joint regression and classification in ad diagnosis. NeuroImage, 100, 91–105.

    Article  Google Scholar 

Download references

Funding

The research of Baiguo An is partially supported by the National Natural Science Foundation of China (No. 12071308, No. 11601349), scientific research planned project of the National Bureau of Statistics of China(No. 2017LZ15). The research of Guozhong Feng is supported by the National Natural Science Foundation of China (No. 11501095). The research of Jianhua Guo is supported by the National Key Research and Development Program of China (No. 2020YFA0714102) and the National Natural Science Foundation of China (No. 11631003, No. 11690012).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianhua Guo.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proof of Theorem 1

We prove the claim of Theorem 1 by the following five steps.

Step 1. For j, k = 1, … , p and y ∈ {0, 1}, define \(\pi _{y}^{jk}=\left (\pi _{00y}^{jk},\pi _{01y}^{jk},\pi _{10y}^{jk},\pi _{11y}^{jk}\right )^{\top }\), and \(\hat {\pi }_{y}^{jk}=\left (\hat {\pi }_{00y}^{jk},\hat {\pi }_{01y}^{jk},\hat {\pi }_{10y}^{jk},\hat {\pi }_{11y}^{jk}\right )^{\top }\). Let \(\alpha =\min \limits \{\alpha _{0},\alpha _{1}\}\). In this step, we will prove that \(P\left (\|{\widehat {\pi }}_{y}^{jk}-\pi _{y}^{jk}\|_{1}\geq t\right )\leq 8\exp \left \{-\alpha nt^{2}/8\right \}\) for arbitrary t > 0.

One can see that \(P\left (\|\widehat \pi _{y}^{jk}-\pi _{y}^{jk}\|_{1}\geq t\right )=P\left (|{\widehat {\pi }}_{00y}^{jk}-\pi _{00y}^{jk}|+|{\widehat {\pi }}_{01y}^{jk}-\pi _{01y}^{jk}|+|{\widehat {\pi }}_{10y}^{jk}\right .\) \(\left .-\pi _{10y}^{jk}|+|{\widehat {\pi }}_{11y}^{jk}-\pi _{11y}^{jk}|\geq t\right )\leq {\sum }_{l,s\in \{0, 1\}}P\left (|{\widehat {\pi }}_{lsy}^{jk}-\pi _{lsy}^{jk}|>t/4\right )\). By the Hoeffdings Inequality, we have that \(P\left (|{\widehat {\pi }}_{lsy}^{jk}-\pi _{lsy}^{jk}|>t/4\right )\leq 2\exp \left \{-n_{y}t^{2}/8\right \}\leq 2\exp \left \{-\alpha nt^{2}/8\right \}\). Hence, \(P\left (\|\widehat \pi _{y}^{jk}-\pi _{y}^{jk}\|_{1}\geq t\right )\leq 8\exp \left \{-\alpha nt^{2}/8\right \}\).

Step 2. Let \(r_{n}=\left (|\hat {\pi }_{0}-\pi _{0}|+|\hat {\pi }_{1}-\pi _{1}|\right )\log \left (1/\left (4{\pi _{L}^{2}}\right )\right )\). Define \({\mathcal {E}}=\left \{|\hat {\pi }_{lsy}^{jk}-\pi _{lsy}^{jk}|<\pi _{L}/2 \text {for all } j,k=1,\ldots ,p,\text {~and~} l,s,y\in \{0, 1\}\right \}\), and \({\mathcal {E}}_{r} = \{r_{n}\leq \nu _{n}/2\}\). In this step, we will prove that \(P({\mathcal {E}})\rightarrow 1\) and \(P({\mathcal {E}}_{r})\rightarrow 1\) as n tends to \(\infty \).

Specifically, \(P(\mathcal {E})\geq 1-\sum \nolimits _{y\in \{0, 1\}}\sum \nolimits _{j\neq k}\sum \nolimits _{l,s\in \{0, 1\}}P\left (|\hat {\pi }_{lsy}^{jk}-\pi _{lsy}^{jk}|\geq \pi _{L}/2\right )\geq 1- \sum \nolimits _{y\in \{0, 1\}}\sum \nolimits _{j\neq k}\sum \nolimits _{l,s\in \{0, 1\}}2\exp \left \{-{\alpha \pi _{L}^{2}}n/2\right \}\geq 1-16p^{2}\exp \left \{-{\alpha \pi _{L}^{2}}n/2\right \}= 1-16\exp \left \{-{\alpha \pi _{L}^{2}}n/2+2Cn^{\xi }\right \} \rightarrow 1\).

By the assumption A4, we know that nκνn = O(1) with κ ∈ (0, (1 − ξ)/2). On the other hand, one can see that n1/2rn = Op(1). Hence, it is easy to show that \(P({\mathcal {E}}_{r})\rightarrow 1\). This completes the proof of Step 2.

Step 3. In this step, we will show that on the events \(\mathcal {E}\) and \(\mathcal {E}_{r}\), \(|{\widehat {\text {kl}}}(j,k)-{\text {kl}}(j,k)|\leq M\left (\|\hat {\pi }_{0}^{jk}-\pi _{0}^{jk}\|_{1}+\|\hat {\pi }_{1}^{jk}-\pi _{1}^{jk}\|_{1}\right )+\nu _{n}/2\) holds with some positive constant M for all j, k.

Recall that \(\widehat {\text {kl}}(j,k)={\sum }_{y}\hat {\pi }_{y}\widehat {\text {kl}}(j,k;y)\), where

$${\widehat{\text{kl}}}(j,k;y)=\sum\limits_{l,s}\hat{\pi}_{lsy}^{jk}\log\frac{\hat{\pi}_{lsy}^{jk}}{\hat{\pi}_{ly}^{j}\hat{\pi}_{sy}^{k}}=\sum\limits_{l,s}\hat{\pi}_{lsy}^{jk}\log\frac{\hat{\pi}_{lsy}^{jk}}{\left( \hat{\pi}^{jk}_{l0y}+\hat{\pi}^{jk}_{l1y}\right)\left( \hat{\pi}^{jk}_{0sy}+\hat{\pi}^{jk}_{1sy}\right)}.$$

Then, we have that

$$\widehat{\text{kl}}(j,k;y)-\text{kl}(j,k;y)=\left( \frac{\partial \text{kl}(j,k;y)}{\partial \pi_{y}^{jk}}|_{\pi_{y}^{jk*}}\right)^{\top}\left( \hat{\pi}_{y}^{jk}-\pi_{y}^{jk}\right),$$

where

$$\frac{\partial \text{kl}(j,k;y)}{\partial \pi_{y}^{jk}}=\left( \frac{\partial \text{kl}(j,k;y)}{\partial \pi_{00y}^{jk}},\frac{\partial {\text{kl}}(j,k;y)}{\partial \pi_{01y}^{jk}},\frac{\partial {\text{kl}}(j,k;y)}{\partial \pi_{10y}^{jk}},\frac{\partial {\text{kl}}(j,k;y)}{\partial \pi_{11y}^{jk}}\right)^{\top},$$

and \(\pi _{y}^{jk*}=\left (\pi _{00y}^{jk*},\pi _{01y}^{jk*},\pi _{10y}^{jk*},\pi _{11y}^{jk*}\right )^{\top }=\pi _{y}^{jk}+\zeta \left (\hat {\pi }_{y}^{jk}-\pi _{y}^{jk}\right )\) with some ζ ∈ (0, 1). One can see that for every l, s ∈ {0, 1},

$$\frac{\partial \text{kl}(j,k;y)}{\partial \pi_{lsy}^{jk}}=\log\pi_{lsy}^{jk}-\log\left( \pi_{l0y}^{jk}+\pi_{l1y}^{jk}\right)-\log\left( \pi_{0sy}^{jk}+\pi_{1sy}^{jk}\right)-1.$$

On event \({\mathcal {E}}\), we have that for every l, s, y ∈ {0, 1}, \(\pi _{lsy}^{jk*}=(1-\zeta )\pi _{lsy}^{jk}+\zeta \hat {\pi }_{lsy}^{jk}\geq (1-\zeta )\pi _{L}+\zeta /2\pi _{L}>\pi _{L}/2\). Consequently,

$$ \begin{array}{@{}rcl@{}} & & \left.\left|\frac{\partial \text{kl}(j,k;y)}{\partial \pi_{lsy}^{jk}}\right|_{\pi_{y}^{jk*}}\right|\\ &\leq&\left|\log\pi_{lsy}^{jk*}\right|+\left|\log\left( \pi_{l0y}^{jk*}+\pi_{l1y}^{jk*}\right)\right|+\left|\log\left( \pi_{0sy}^{jk*}+\pi_{1sy}^{jk*}\right)\right|+1\\ &\leq& 1+2\log2+\left|\log\pi_{lsy}^{jk*}\right|+\left|\log\left( \left( \pi_{l0y}^{jk*}+\pi_{l1y}^{jk*}\right)/2\right)\right|+\left|\log\left( \left( \pi_{0sy}^{jk*}+\pi_{1sy}^{jk*}\right)/2\right)\right|\\ &\leq& 1+2\log2-3\log(\pi_{L}/2). \end{array} $$

Denote \(1+2\log 2-3\log (\pi _{L}/2)\) by M/2, then we have that

$$|\widehat{\text{kl}}(j,k;y)-\text{kl}(j,k;y)|=\left|\left( \frac{\partial \text{kl}(j,k;y)}{\partial \pi_{y}^{jk}}|_{\pi_{y}^{jk*}}\right)^{\top}\left( \hat{\pi}_{y}^{jk}-\pi_{y}^{jk}\right)\right|\leq M\|\hat{\pi}_{y}^{jk}-\pi_{y}^{jk}\|_{1}.$$

Moreover,

$${\text{kl}}(j,k;y)=\sum\limits_{l\in\{0, 1\}}\sum\limits_{s\in\{0, 1\}}\pi_{lsy}^{jk}\log\frac{\pi_{lsy}^{jk}}{\pi_{ly}^{j}\pi_{sy}^{k}}\leq \sum\limits_{l\in\{0, 1\}}\sum\limits_{s\in\{0, 1\}}\pi_{lsy}^{jk}\log\frac{1}{4{\pi_{L}^{2}}}=\log\frac{1}{4{\pi_{L}^{2}}},$$

which means that kl(j, k; y) is uniformly bounded for all j, k, y ∈ {0, 1}. Consequently, \(|\widehat {\text {kl}}(j,k)-\text {kl}(j,k)| \leq \hat {\pi }_{0}|{\widehat {\text {kl}}}(j,k;0)-{\text {kl}}\)\((j,k;0)|+\hat {\pi }_{1}|{\widehat {\text {kl}}}(j,k;1)-{\text {kl}}(j,k;1)|+|\hat {\pi }_{0}-\pi _{0}|{\text {kl}}(j,k;0)+|\hat {\pi }_{1}-\pi _{1}|{\text {kl}}(j,k;1)\leq M\left (\|\hat {\pi }_{0}^{jk}-\pi _{0}^{jk}\|_{1}+\|\hat {\pi }_{1}^{jk}-\pi _{1}^{jk}\|_{1}\right )+r_{n},\) where \(r_{n}=\left (|\hat {\pi }_{0}-\pi _{0}|+|\hat {\pi }_{1}-\pi _{1}|\right )\log \frac {1}{4{\pi _{L}^{2}}}\).

Moreover, on the event \(\mathcal {E}_{r}\), rnνn/2. Consequently, on the events \(\mathcal {E}\) and \(\mathcal {E}_{r}\), we have that \(|\widehat {\text {kl}}(j,k)-\text {kl}(j,k)|\leq M\left (\|\hat {\pi }_{0}^{jk}-\pi _{0}^{jk}\|_{1}+\|\hat {\pi }_{1}^{jk}-\pi _{1}^{jk}\|_{1}\right )+\nu _{n}/2\).

Step 4. In this step, we will prove that with probability tending to 1, \({\widehat {\mathcal {G}}}_{\nu _{n}}\subseteq {\mathcal {G}}\) is true. Specifically,

$$ \begin{array}{@{}rcl@{}} & & P\left( \widehat{\mathcal{G}}_{\nu_{n}}\subseteq\mathcal{G}\right) \\ &=& 1 - P\left( \sum\limits_{(j,k)}I(\widehat{\text{kl}}(j,k)>\nu_{n},\text{kl}(j,k)=0)>0\right)\\ &\geq& 1-P\left( \bigcup\limits_{(j,k)}\left\{|\widehat{\text{kl}}(j,k)-\text{kl}(j,k)|>\nu_{n}\right\}\right)\\ &\geq& 1-P\left( \bigcup\limits_{(j,k)}\left\{|\widehat{\text{kl}}(j,k)-\text{kl}(j,k)|>\nu_{n},\right.\right.\\ &&\left.\left.|\widehat{\text{kl}}(j,k)-\text{kl}(j,k)|\leq M\left( \|\hat{\pi}_{0}^{jk}-\pi_{0}^{jk}\|_{1}+\|\hat{\pi}_{1}^{jk}-\pi_{1}^{jk}\|_{1}\right)+\nu_{n}/2\right\}\right)\\ &&- P\left( \bigcup\limits_{(j,k)}\left\{|\widehat{\text{kl}}(j,k)-\text{kl}(j,k)|> M\left( \|\hat{\pi}_{0}^{jk}-\pi_{0}^{jk}\|_{1}+\|\hat{\pi}_{1}^{jk}-\pi_{1}^{jk}\|_{1}\right)+\nu_{n}/2\right\}\right)\\ &\geq& 1- P\left( \bigcup\limits_{(j,k)}\left\{\|\hat{\pi}_{0}^{jk}-\pi_{0}^{jk}\|_{1}+\|\hat{\pi}_{1}^{jk}-\pi_{1}^{jk}\|_{1}\geq (2M)^{-1}\nu_{n}\right\}\right)\\ &&-P\left( {\mathcal{E}}^{c}\right)-P\left( {{\mathcal{E}}_{r}^{c}}\right)\\ &\geq& 1- \sum\limits_{(j,k)}P\left( \|\hat{\pi}_{0}^{jk}-\pi_{0}^{jk}\|_{1}+\|\hat{\pi}_{1}^{jk}-\pi_{1}^{jk}\|_{1}\geq (2M)^{-1}\nu_{n}\right)-P\left( {\mathcal{E}}^{c}\right)-P\left( {{\mathcal{E}}_{r}^{c}}\right)\\ &\geq& 1-\sum\limits_{(j,k)}P\left( \|\hat{\pi}_{0}^{jk}-\pi_{0}^{jk}\|_{1}\geq (4M)^{-1}\nu_{n}\right)\\ & & -\sum\limits_{(j,k)}P\left( \|\hat{\pi}_{1}^{jk}-\pi_{1}^{jk}\|_{1}\geq (4M)^{-1}\nu_{n}\right)-P\left( \mathcal{E}^{c}\right)-P\left( {\mathcal{E}_{r}^{c}}\right)\\ &\geq& 1-16 p^{2}\exp\left\{-\frac{\alpha n{\nu_{n}^{2}}}{128M^{2}}\right\} -P\left( \mathcal{E}^{c}\right)-P\left( {\mathcal{E}_{r}^{c}}\right)\\ &=& 1-16\exp\left\{2Cn^{\xi}-\frac{\alpha n{\nu_{n}^{2}}}{128M^{2}}\right\}-P\left( \mathcal{E}^{c}\right)-P\left( {\mathcal{E}_{r}^{c}}\right)\rightarrow 1. \end{array} $$

Step 5. In this step, we will prove that \(P\left (\widehat {\mathcal {G}}_{\nu _{n}}\supseteq \mathcal {G}\right )\rightarrow 1\). Specifically,

$$ \begin{array}{@{}rcl@{}} & & P\left( \widehat{\mathcal{G}}_{\nu_{n}}\supseteq\mathcal{G}\right) \\ &=& P\left( \bigcap\limits_{(j,k)\in\mathcal{G}}\left\{\widehat{\text{kl}}_{\nu_{n}}(j,k)>0\right\}\right)\\ & = & 1-P\left( \bigcup\limits_{(j,k)\in\mathcal{G}}\left\{\widehat{\text{kl}}(j,k)\leq \nu_{n}\right\}\right)\\ &\geq& 1-P\left( \bigcup\limits_{(j,k)}\left\{|\widehat{\text{kl}}(j,k)-\text{kl}(j,k)|\geq \tau_{n}-\nu_{n}\right\}\right)\\ &\geq& 1-\sum\limits_{j,k}P\left( \|\hat{\pi}_{0}^{jk}-\pi_{0}^{jk}\|_{1} \geq (2M)^{-1}(\tau_{n}-3\nu_{n}/2)\right)\\ & & -\sum\limits_{j,k}P\left( \|\hat{\pi}_{1}^{jk}-\pi_{1}^{jk}\|_{1} \geq (2M)^{-1}(\tau_{n}-3\nu_{n}/2)\right)-P\left( {\mathcal{E}}^{c}\right)-P\left( {{\mathcal{E}}_{r}^{c}}\right)\\ &\geq& 1 - 16p^{2}\exp\left\{-\alpha n\frac{(\tau_{n}-3/2\nu_{n})^{2}}{32M^{2}}\right\}-P\left( \mathcal{E}^{c}\right)-P\left( {\mathcal{E}_{r}^{c}}\right)\\ &\geq& 1-16\exp\left\{2Cn^{\xi}-\alpha n\frac{(\tau_{n}-3/2\nu_{n})^{2}}{32M^{2}}\right\}-P\left( {\mathcal{E}}^{c}\right)-P\left( {{\mathcal{E}}_{r}^{c}}\right)\rightarrow 1. \end{array} $$

Combining the above results, one can see that \(P\left (\widehat {\mathcal {G}}_{\nu _{n}}=\mathcal {G}\right )\rightarrow 1\). This completes the whole proof of Theorem 1.

Appendix B: Proof of Theorem 2

We prove Theorem 2 by the following two steps.

Step 1. In this step, we will prove that \( P\left (\mathcal {C}_{T}\subset \widehat {\mathcal {C}}(\gamma _{n})\right ) \rightarrow 1\).

One can see that \(P\left (\mathcal {C}_{T}\subset \widehat {\mathcal {C}}(\gamma _{n})\right )= P\left (\bigcap _{m\in {\mathcal {C}}_{T}}\left \{K_{m}^{-1/2}\|\left (\text {diag}\left ({\widehat {\Sigma }}^{(m)}\right )\right )^{-1/2}\left ({{\widehat {\Pi }}_{0}^{m}}\right .\right .\right .\) \(\left .\left .\left .-{{\widehat {\Pi }}_{1}^{m}}\right )\|_{2}\geq \gamma _{n}\right \}\right )\geq 1-P\left (\bigcup _{m\in {\mathcal {C}}_{T}}\left \{K_{m}^{-1/2}\|\left (\text {diag}\left ({\widehat {\Sigma }}^{(m)}\right )\right )^{-1/2}\left ({{\widehat {\Pi }}_{0}^{m}}-{{\widehat {\Pi }}_{1}^{m}}\right )\|_{2}\leq \gamma _{n}\right \}\right )\). Under the assumption A7 and γn = 2/3C2n𝜗, we have

$$ \begin{array}{@{}rcl@{}} &&P\left( K_{m}^{-1/2}\|\left( \text{diag}\left( \widehat{\Sigma}^{(m)}\right)\right)^{-1/2}\left( {\widehat{\Pi}_{0}^{m}}-{\widehat{\Pi}_{1}^{m}}\right)\|_{2}\leq\gamma_{n}\right)\\ &\leq&P\left( K_{m}^{-1}\|\left( \text{diag}\left( \widehat{\Sigma}^{(m)}\right)\right)^{-1/2}\left( {\widehat{\Pi}_{0}^{m}}-{\widehat{\Pi}_{1}^{m}}\right)\|_{1}\leq\gamma_{n}\right)\\ &\leq& P\left( K_{m}^{-1}\left|\|\left( \text{diag}\left( {\widehat{\Sigma}}^{(m)}\right)\right)^{-1/2}\left( {{\widehat{\Pi}}_{0}^{m}}-{{\widehat{\Pi}}_{1}^{m}}\right)\|_{1}\right.\right.\\ &&\left.\left.-\|\left( \text{diag}\left( {\Sigma}^{(m)}\right)\right)^{-1/2}\left( {{\Pi}_{0}^{m}}-{{\Pi}_{1}^{m}}\right)\|_{1}\right|\geq 1/3C_{2}n^{-\vartheta}\right)\\ &\leq& P\left( \left\| \left( \text{diag}\left( \widehat{\Sigma}^{(m)}\right)\right)^{-1/2}\left( {\widehat{\Pi}_{0}^{m}}-{\widehat{\Pi}_{1}^{m}}\right)\right.\right. \end{array} $$
$$ \begin{array}{@{}rcl@{}} &&\left.\left.-\left( \text{diag}\left( {\Sigma}^{(m)}\right)\right)^{-1/2}\left( {{\Pi}_{0}^{m}}-{{\Pi}_{1}^{m}}\right)\right\|_{1}\geq 1/3K_{m}C_{2}n^{-\vartheta}\right)\\ &\leq& P\left( \left\|\left( \left( \text{diag}\left( {\widehat{\Sigma}}^{(m)}\right)\right)^{-1/2}-\left( \text{diag}\left( {\Sigma}^{(m)}\right)\right)^{-1/2}\right)\left( {{\widehat{\Pi}}_{0}^{m}}-{{\widehat{\Pi}}_{1}^{m}}\right)\right\|_{1}\geq 1/6K_{m}C_{2}n^{-\vartheta}\right)\\ &&+P\left( \left\|\left( \text{diag}\left( {\Sigma}^{(m)}\right)\right)^{-1/2}\left( {\widehat{\Pi}_{0}^{m}}-{\widehat{\Pi}_{1}^{m}}-\left( {{\Pi}_{0}^{m}}-{{\Pi}_{1}^{m}}\right)\right)\right\|_{1}\geq 1/6K_{m}C_{2}n^{-\vartheta}\right)\\ &\leq& P\left( \left\|\left( \left( \text{diag}\left( {\widehat{\Sigma}}^{(m)}\right)\right)^{-1/2}-\left( \text{diag}\left( {\Sigma}^{(m)}\right)\right)^{-1/2}\right)\left( {{\widehat{\Pi}}_{0}^{m}}-{{\widehat{\Pi}}_{1}^{m}}\right)\right\|_{1}\geq 1/6K_{m}C_{2}n^{-\vartheta}\right)\\ &&+P\left( \left\|\left( \text{diag}\left( {\Sigma}^{(m)}\right)\right)^{-1/2}\left( {\widehat{\Pi}_{0}^{m}}-{{\Pi}_{0}^{m}}\right)\right\|_{1}\geq 1/12K_{m}C_{2}n^{-\vartheta}\right)\\ &&+P\left( \left\|\left( \text{diag}\left( {\Sigma}^{(m)}\right)\right)^{-1/2}\left( {\widehat{\Pi}_{1}^{m}}-{{\Pi}_{1}^{m}}\right)\right\|_{1}\geq 1/12K_{m}C_{2}n^{-\vartheta}\right). \end{array} $$

We first focus on \(P\left (\left \|\left (\left (\text {diag}\left ({\widehat {\Sigma }}^{(m)}\right )\right )^{-1/2}-\left (\text {diag}\left ({\Sigma }^{(m)}\right )\right )^{-1/2}\right )\cdot \left ({{\widehat {\Pi }}_{0}^{m}}-{{\widehat {\Pi }}_{1}^{m}}\right )\right \|_{1}\right .\) \(\left .\geq 1/6K_{m}C_{2}n^{-\vartheta }\right )\). Define \(\hat {W}^{(m)}=\left (\hat {w}_{1}^{(m)},\ldots ,\hat {w}_{K_{m}}^{(m)}\right )^{\top }\) and \(W^{(m)}=\left (w_{1}^{(m)},\ldots ,w_{K_{m}}^{(m)}\right )^{\top }\) with \(\hat {w}_{k}^{(m)}=\left (\alpha _{0}^{-1}{\widehat {\Pi }}_{k0}^{m}\left (1-{\widehat {\Pi }}_{k0}^{m}\right )+\alpha _{1}^{-1}{\widehat {\Pi }}_{k1}^{m}\left (1-{\widehat {\Pi }}_{k1}^{m}\right )\right )^{-1/2}\), \(w_{k}^{(m)}=\left (\alpha _{0}^{-1}{\Pi }_{k0}^{m}\right .\) \(\left .\left (1-{\Pi }_{k0}^{m}\right )+\alpha _{1}^{-1}{\Pi }_{k1}^{m}\left (1-{\Pi }_{k1}^{m}\right )\right )^{-1/2}\) for k = 1, … , Km.

Then, one can see that

$$ \begin{array}{@{}rcl@{}} && \left\|\left( \left( \text{diag}\left( {\widehat{\Sigma}}^{(m)}\right)\right)^{-1/2}-\left( \text{diag}\left( {\Sigma}^{(m)}\right)\right)^{-1/2}\right)\cdot\left( {{\widehat{\Pi}}_{0}^{m}}-{{\widehat{\Pi}}_{1}^{m}}\right)\right\|_{1}\\ &\leq&2\|\hat{W}^{(m)}-W^{(m)}\|_{1}\\ &=& 2\sum\limits_{k=1}^{K_{m}} |\hat{w}_{k}^{(m)}-w_{k}^{(m)}|\\ &=& 2\sum\limits_{k=1}^{K_{m}}|\left( \frac{\partial w_{k}^{(m)}}{\partial{\Pi}_{k0}^{m}},\frac{\partial w_{k}^{(m)}}{\partial{\Pi}_{k1}^{m}}\right)|_{({\Pi}_{k0\tau}^{m},{\Pi}_{k1\tau}^{m})}\left( {\widehat{\Pi}}_{k0}^{m}-{\Pi}_{k0}^{m},{\widehat{\Pi}}_{k1}^{m}-{\Pi}_{k1}^{m}\right)^{\top}|\\ &\leq& 2\sum\limits_{k=1}^{K_{m}}\sum\limits_{l=0}^{1} \left|\frac{\partial w_{k}^{(m)}}{\partial{\Pi}_{kl}^{m}}|_{({\Pi}_{k0\tau}^{m},{\Pi}_{k1\tau}^{m})}\right|\cdot|{\widehat{\Pi}}_{kl}^{m}-{\Pi}_{kl}^{m}|, \end{array} $$

where \({\Pi }_{ky\tau }^{m}=\tau \widehat {\Pi }_{ky}^{m}+(1-\tau ){\Pi }_{ky}^{m}\) for some τ ∈ (0, 1) and y ∈ {0, 1}. One can verify that \(\frac {\partial w_{k}^{(m)}}{\partial {\Pi }_{kl}^{m}}=-1/2\left [\alpha _{0}^{-1}{\Pi }_{k0}^{m}\left (1-{\Pi }_{k0}^{m}\right )+\alpha _{1}{\Pi }_{k1}^{m}\left (1-{\Pi }_{k1}^{m}\right )\right ]^{-3/2}\alpha _{l}^{-1}\left (1-2{\Pi }_{kl}^{m}\right )\).

Define \(\mathcal {E}=\left \{|\hat {\Pi }_{ky}^{m}-{\Pi }_{ky}^{m}|<{\Pi }_{L}/2 \text {~for~all~} k, m,\text {~and~} y\in \{0, 1\}\right \}\). Then, by the similar proof of Step 2 in the proof for Theorem 1, one can verify that \(P({\mathcal {E}})\rightarrow 1\). On the event \({\mathcal {E}}\), it is easy to show that \({\Pi }_{L}/2\leq {\Pi }_{ky\tau }^{m}\leq (1-{\Pi }_{L}/2)\) for k = 1, … , Km, y = 0, 1 and all m. Consequently, on the event \({\mathcal {E}}\), one can obtain that

$$ \begin{array}{@{}rcl@{}} && \left|\frac{\partial w_{k}^{(m)}}{\partial{\Pi}_{kl}^{m}}|_{({\Pi}_{k0\tau}^{m},{\Pi}_{k1\tau}^{m})}\right|\\ &\leq&1/2\left[\alpha_{0}^{-1}{\Pi}_{k0\tau}^{m}\left( 1-{\Pi}_{k0\tau}^{m}\right)+\alpha_{1}{\Pi}_{k1\tau}^{m}\left( 1-{\Pi}_{k1\tau}^{m}\right)\right]^{-3/2}\alpha_{l}^{-1}\left( 1-2{\Pi}_{kl\tau}^{m}\right)\\ &\leq& 1/2\left[\alpha_{0}^{-1}{{\Pi}_{L}^{2}}/4+\alpha_{1}^{-1}{{\Pi}_{L}^{2}}/4\right]^{-3/2}\alpha^{-1}\left( 1-{\Pi}_{L}\right)\\ &\leq& 4(\alpha_{0}\alpha_{1})^{3/2}\alpha^{-1}\frac{1-{\Pi}_{L}}{{{\Pi}_{L}^{3}}}\\ &\leq& 4\alpha^{-1}\frac{1-{\Pi}_{L}}{{{\Pi}_{L}^{3}}}. \end{array} $$

As a result, on the event \(\mathcal {E}\) we have that

$$ \begin{array}{@{}rcl@{}} & &P\left( \left\|\left( \left( \text{diag}\left( {\widehat{\Sigma}}^{(m)}\right)\right)^{-1/2}-\left( \text{diag}\left( {\Sigma}^{(m)}\right)\right)^{-1/2}\right)\left( {{\widehat{\Pi}}_{0}^{m}}-{{\widehat{\Pi}}_{1}^{m}}\right)\right\|_{1}\geq 1/6K_{m}C_{2}n^{-\vartheta}\right)\\ &\leq& P\left( 2\sum\limits_{k=1}^{K_{m}}\sum\limits_{l=0}^{1} \left|\frac{\partial w_{k}^{(m)}}{\partial{\Pi}_{kl}^{m}}|_{({\Pi}_{k0\tau}^{m},{\Pi}_{k1\tau}^{m})}\right|\cdot|{\widehat{\Pi}}_{kl}^{m}-{\Pi}_{kl}^{m}|\geq1/6K_{m}C_{2}n^{-\vartheta}\right)\\ &=& P\left( \sum\limits_{k=1}^{K_{m}}\sum\limits_{l=0}^{1}|\widehat {\Pi}_{kl}^{m}-{\Pi}_{kl}^{m}|\geq\frac{1}{48}\frac{{{\Pi}_{L}^{3}}}{1-{\Pi}_{L}}\alpha K_{m}C_{2}n^{-\vartheta}\right)\\ &\leq& P\left( \|{\widehat{\Pi}_{0}^{m}}-{{\Pi}_{0}^{m}}\|_{1}\geq\frac{1}{96}\frac{{{\Pi}_{L}^{3}}}{1-{\Pi}_{L}}\alpha K_{m}C_{2}n^{-\vartheta}\right)\\ &&+P\left( \|{\widehat{\Pi}_{1}^{m}}-{{\Pi}_{1}^{m}}\|_{1}\geq\frac{1}{96}\frac{{{\Pi}_{L}^{3}}}{1-{\Pi}_{L}}\alpha K_{m}C_{2}n^{-\vartheta}\right). \end{array} $$

Next, we consider \(P\left (\left \|\left (\text {diag}\left ({\Sigma }^{(m)}\right )\right )^{-1/2}\left ({{\widehat {\Pi }}_{y}^{m}}-{{\Pi }_{y}^{m}}\right )\right \|_{1}\geq 1/12K_{m}C_{2}n^{-\vartheta }\right )\) with y = 0, 1. Due to that \({w_{k}^{m}}=\left (\alpha _{0}^{-1}{\Pi }_{k0}^{m}\left (1-{\Pi }_{k0}^{m}\right )+\alpha _{1}^{-1}{\Pi }_{k1}^{m}\left (1-{\Pi }_{k1}^{m}\right )\right )^{-1/2}\leq {\Pi }_{L}^{-1}\), we have that \(P\left (\left \|\left (\text {diag}\left ({\Sigma }^{(m)}\right )\right )^{-1/2}\left ({\widehat {\Pi }_{y}^{m}}-{{\Pi }_{y}^{m}}\right )\right \|_{1}\geq 1/12K_{m}C_{2}n^{-\vartheta }\right ) \leq P\left (\max \limits _{k}({w_{k}^{m}})\right .\) \(\left .\|\left ({\widehat {\Pi }_{y}^{m}}-{{\Pi }_{y}^{m}}\right )\|_{1}\geq 1/12K_{m}C_{2}n^{-\vartheta }\right ) \leq P\left (\|\left ({{\widehat {\Pi }}_{y}^{m}}-{{\Pi }_{y}^{m}}\right )\|_{1}\right .\) \(\left .\geq 1/12{\Pi }_{L}K_{m}C_{2}n^{-\vartheta }\right )\).

Denote \(\min \limits \left \{\frac {1}{96}\frac {{{\Pi }_{L}^{3}}}{1-{\Pi }_{L}}\alpha ,\frac {1}{12}{\Pi }_{L}\right \}\) by M, then based on the above analysis, we have

$$ \begin{array}{@{}rcl@{}} &&P\left( \bigcup\limits_{m\in\mathcal{C}_{T}}\left\{K_{m}^{-1/2}\|\left( \text{diag}\left( \widehat{\Sigma}^{(m)}\right)\right)^{-1/2}\left( {\widehat{\Pi}_{0}^{m}}-{\widehat{\Pi}_{1}^{m}}\right)\|_{2}\leq\gamma_{n}\right\}\right)\\ &\leq& P\left( \bigcup_{m\in{\mathcal{C}}_{T}}\left\{K_{m}^{-1/2}\|\left( \text{diag}\left( {\widehat{\Sigma}}^{(m)}\right)\right)^{-1/2}\left( {{\widehat{\Pi}}_{0}^{m}}-{{\widehat{\Pi}}_{1}^{m}}\right)\|_{2}\leq\gamma_{n},{\mathcal{E}}\right\}\right)+P\left( {\mathcal{E}}^{C}\right) \end{array} $$
$$ \begin{array}{@{}rcl@{}} &\leq&\sum\limits_{m\in\mathcal{C}_{T}}P\left( K_{m}^{-1}\|\left( \text{diag}\left( \widehat{\Sigma}^{(m)}\right)\right)^{-1/2}\left( {\widehat{\Pi}_{0}^{m}}-{\widehat{\Pi}_{1}^{m}}\right)\|_{1}\leq\gamma_{n},\mathcal{E}\right)+P\left( \mathcal{E}^{C}\right)\\ &\leq&\sum\limits_{m\in\mathcal{C}_{T}}\left\{P\left( \left\|\left( \left( \text{diag}\left( {\widehat{\Sigma}}^{(m)}\right)\right)^{-1/2}-\left( \text{diag}\left( {\Sigma}^{(m)}\right)\right)^{-1/2}\right)\left( {{\widehat{\Pi}}_{0}^{m}}-{{\widehat{\Pi}}_{1}^{m}}\right)\right\|_{1}\geq 1/6K_{m}C_{2}n^{-\vartheta},{\mathcal{E}}\right)\right.\\ &&+P\left( \left\|\left( \text{diag}\left( {\Sigma}^{(m)}\right)\right)^{-1/2}\left( {\widehat{\Pi}_{0}^{m}}-{{\Pi}_{0}^{m}}\right)\right\|_{1}\geq 1/12K_{m}C_{2}n^{-\vartheta},{\mathcal{E}}\right)\\ &&\left.+P\left( \left\|\left( \text{diag}\left( {\Sigma}^{(m)}\right)\right)^{-1/2}\left( {\widehat{\Pi}_{1}^{m}}-{{\Pi}_{1}^{m}}\right)\right\|_{1}\geq 1/12K_{m}C_{2}n^{-\vartheta},{\mathcal{E}}\right)\right\} +P(\mathcal{E}^{C})\\ &\leq& 2\sum\limits_{m\in\mathcal{C}_{T}} \left\{P\left( \|\left( {\widehat{\Pi}_{0}^{m}}-{{\Pi}_{0}^{m}}\right)\|_{1}\geq MK_{m} C_{2}n^{-\vartheta}\right)+P\left( \|({{\widehat{\Pi}}_{1}^{m}}-{{\Pi}_{1}^{m}})\|_{1}\geq MK_{m} C_{2}n^{-\vartheta}\right)\right\}+P\left( \mathcal{E}^{C}\right)\\ &\leq& 2\sum\limits_{m\in\mathcal{C}_{T}} \left\{\sum\limits_{k=1}^{K_{m}}P\left( |(\widehat{\Pi}_{k0}^{m}-{\Pi}_{k0}^{m})|\geq M C_{2}n^{-\vartheta}\right)+\sum\limits_{k=1}^{K_{m}}P\left( |(\widehat{\Pi}_{k1}^{m}-{\Pi}_{k1}^{m})|\geq M C_{2}n^{-\vartheta}\right)\right\}+P\left( \mathcal{E}^{C}\right)\\ &\leq& 8Kp\exp\left\{-2\alpha M^{2}{C_{2}^{2}}n^{1-2\vartheta}\right\}+P\left( \mathcal{E}^{C}\right). \end{array} $$

Moreover, by the assumptions A2 and A7, we see that \(p=e^{Cn^{\xi }}\) and 𝜗 ∈ (0, (1 − ξ)/2); hence, we can obtain that \(p\exp \left \{-C_{4}n^{1-2\vartheta }\right \}\rightarrow 0\). Combining the fact that \(P({\mathcal {E}}^{C})\rightarrow 0\), we have that \(P\left (\bigcup _{m\in \mathcal {C}_{T}}\left \{K_{m}^{-1/2}\|\left (\text {diag}\left (\widehat {\Sigma }^{(m)}\right )\right )^{-1/2}\left ({\widehat {\Pi }_{0}^{m}}-{\widehat {\Pi }_{1}^{m}}\right )\|_{2}\leq \gamma _{n}\right \}\right )\) \(\rightarrow 0\). Furthermore, \( P\left (\mathcal {C}_{T}\subset \widehat {\mathcal {C}}(\gamma _{n})\right ) \geq 1-P\left (\bigcup _{m\in {\mathcal {C}}_{T}}\{K_{m}^{-1/2}\|\left (\text {diag}\left ({\widehat {\Sigma }}^{(m)}\right )\right )^{-1/2}\right .\) \(\left .\left ({{\widehat {\Pi }}_{0}^{m}}-{{\widehat {\Pi }}_{1}^{m}}\right )\|_{2}\leq \gamma _{n}\}\right )\rightarrow 1\).

Step 2. In this step, we will prove that \( P\left (\mathcal {C}_{T}\supset \widehat {\mathcal {C}}(\gamma _{n})\right ) \rightarrow 1\).

One can see that

$$ \begin{array}{@{}rcl@{}} & & P\left( \mathcal{C}_{T}\supset\widehat{\mathcal{C}}(\gamma_{n})\right) \\ &=&1-P\left( \bigcup\limits_{m{\in\mathcal{C}_{T}^{C}}}\left\{K_{m}^{-1/2}\|\left( \text{diag}\left( \widehat{\Sigma}^{(m)}\right)\right)^{-1/2}\left( {\widehat{\Pi}_{0}^{m}}-{\widehat{\Pi}_{1}^{m}}\right)\|_{2}\leq\gamma_{n}\right\}\right)\\ &\geq& 1-P\left( \bigcup\limits_{m{\in{\mathcal{C}}_{T}^{C}}}\left\{K_{m}^{-1}\|\left( \text{diag}\left( {\widehat{\Sigma}}^{(m)}\right)\right)^{-1/2}\left( {{\widehat{\Pi}}_{0}^{m}}-{{\widehat{\Pi}}_{1}^{m}}\right)\|_{1}\leq\gamma_{n}\right\}\right)\\ &\geq& 1-\sum\limits_{m{\in{\mathcal{C}}_{T}^{C}}}P\left( K_{m}^{-1}\left|\|\left( \text{diag}\left( {\widehat{\Sigma}}^{(m)}\right)\right)^{-1/2}\left( {{\widehat{\Pi}}_{0}^{m}}-{{\widehat{\Pi}}_{1}^{m}}\right)\|_{1}\right.\right.\\ &&\left.\left.-\|\left( \text{diag}\left( {\Sigma}^{(m)}\right)\right)^{-1/2}\left( {{\Pi}_{0}^{m}}-{{\Pi}_{1}^{m}}\right)\|_{1}\right|\geq 1/3C_{2}n^{-\vartheta}\right)\\ &\geq&1-\sum\limits_{m}P\left( K_{m}^{-1}\left|\|\left( \text{diag}\left( {\widehat{\Sigma}}^{(m)}\right)\right)^{-1/2}\left( {{\widehat{\Pi}}_{0}^{m}}-{{\widehat{\Pi}}_{1}^{m}}\right)\|_{1}\right.\right.\\ &&\left.\left.-\|\left( \text{diag}\left( {\Sigma}^{(m)}\right)\right)^{-1/2}\left( {{\Pi}_{0}^{m}}-{{\Pi}_{1}^{m}}\right)\|_{1}\right|\geq 1/3C_{2}n^{-\vartheta}\right). \end{array} $$

By a similar proof to Step 1, one can see that \({\sum }_{m}P\left (K_{m}^{-1}\left |\|\left (\text {diag}\left (\widehat {\Sigma }^{(m)}\right )\right )^{-1/2}\right .\right .\) \(\left .\left .\left ({\widehat {\Pi }_{0}^{m}}-{\widehat {\Pi }_{1}^{m}}\right )\|_{1}-\|\left (\text {diag}\left ({\Sigma }^{(m)}\right )\right )^{-1/2}\left ({{\Pi }_{0}^{m}}-{{\Pi }_{1}^{m}}\right )\|_{1}\right |\right .\) \(\left .\geq 1/3C_{2}n^{-\vartheta }\right )\rightarrow 0\). Hence, P \(\left ({\mathcal {C}}_{T}\supset {\widehat {\mathcal {C}}}(\gamma _{n})\right )\rightarrow 1\).

Combining the results of above two steps, we have that \(P\left (\mathcal {C}_{T}=\widehat {\mathcal {C}}(\gamma _{n})\right )\rightarrow 1\).

This completes the whole proof of Theorem 2.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

An, B., Feng, G. & Guo, J. Interaction Identification and Clique Screening for Classification with Ultra-high Dimensional Discrete Features. J Classif 39, 122–146 (2022). https://doi.org/10.1007/s00357-021-09399-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-021-09399-0

Keywords