Interaction Identification and Clique Screening for Classification with Ultra-high Dimensional Discrete Features

An, Baiguo; Feng, Guozhong; Guo, Jianhua

doi:10.1007/s00357-021-09399-0

Interaction Identification and Clique Screening for Classification with Ultra-high Dimensional Discrete Features

Published: 11 September 2021

Volume 39, pages 122–146, (2022)
Cite this article

Journal of Classification Aims and scope Submit manuscript

256 Accesses
Explore all metrics

Abstract

Interactions have greatly influenced recent scientific discoveries, but the identification of interactions is challenging in ultra-high dimensions. In this study, we propose an interaction identification method for classification with ultra-high dimensional discrete features. We utilize clique sets to capture interactions among features, where features in a common clique have interactions that can be used for classification. The number of features related to the interaction is the size of the clique. Hence, our method can consider interactions caused by more than two feature variables. We propose a Kullback-Leibler divergence-based approach to correctly identify the clique sets with a probability that tends to 1 as the sample size tends to infinity. A clique screening method is then proposed to filter out clique sets that are useless for classification, and the strong sure screening property can be guaranteed. Finally, a clique naïve Bayes classifier is proposed for classification. Numerical studies demonstrate that our proposed approach performs very well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Information-Theoretic Feature Selection Using High-Order Interactions

An Interaction-Enhanced Feature Selection Algorithm

Mutual information for feature selection: estimation or counting?

Article 20 August 2016

References

An, B., Wang, H., & Guo, J. (2013). Testing the statistical significance of an ultra-high-dimensional naive bayes classifier. Statistics and Its Interface, 6(2), 223–229.
Article MathSciNet Google Scholar
Cui, H., Li, R., & Zhong, W. (2015). Model-free feature screening for ultrahigh dimensional discriminant analysis. Journal of the American Statistical Association, 110(510), 630–641.
Article MathSciNet Google Scholar
Fan, J., Feng, Y., & Song, R. (2011). Nonparametric independence screening in sparse ultra-high-dimensional additive models. Journal of the American Statistical Association, 106(494), 544–557.
Article MathSciNet Google Scholar
Fan, J., & Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(5), 849–911.
Article MathSciNet Google Scholar
Fan, J., Song, R., & et al. (2010). Sure independence screening in generalized linear models with np-dimensionality. The Annals of Statistics, 38(6), 3567–3604.
Article MathSciNet Google Scholar
Fan, Y., Kong, Y., Li, D., Zheng, Z., & et al (2015). Innovated interaction screening for high-dimensional nonlinear classification. The Annals of Statistics, 43(3), 1243–1272.
Article MathSciNet Google Scholar
Guyon, I., Gunn S., Hur A. B., & Dror G. (2004). Result analysis of the nips 2003 feature selection challenge. In Proceedings of the 17th international conference on neural information processing systems.
Hao, N., & Zhang, H.H. (2014). Interaction screening for ultra-high dimensional data. Journal of the American Statistical Association, 109(507), 1285–1301.
Article MathSciNet Google Scholar
Huang, D., Li, R., & Wang, H. (2014). Feature screening for ultrahigh dimensional categorical data with applications. Journal of Business & Economic Statistics, 32(2), 237–244.
Article MathSciNet Google Scholar
Huang, J., Breheny, P., & Ma, S. (2012). A selective review of group selection in high-dimensional models. Statistical science: a review journal of the Institute of Mathematical Statistics, 27(4), 481–499.
Article MathSciNet Google Scholar
Joachims, T. (2002). Learning to classify text using support vector machines: methods, theory and algorithms. Boston: Kluwer Academic Publishers.
Book Google Scholar
Kussul, E., & Baidyk, T. (2004). Improved method of handwritten digit recognition tested on mnist database. Image and Vision Computing, 22(12), 971–981.
Article Google Scholar
Li, R., Zhong, W., & Zhu, L. (2012). Feature screening via distance correlation learning. Journal of the American Statistical Association, 107(499), 1129–1139.
Article MathSciNet Google Scholar
Mai, Q., & Zou, H. (2015). The fused kolmogorov filter: a nonparametric model-free screening method. The Annals of Statistics, 43(4), 1471–1497.
Article MathSciNet Google Scholar
Reese, R., Dai, X., & Fu, G. (2018). Strong sure screening of ultra-high dimensional categorical data. arXiv:1801.03539.
Webb, G.I., Boughton, J.R., & Wang, Z. (2005). Not so naive Bayes: aggregating one-dependence estimators. Machine Learning, 58(1), 5–24.
Article Google Scholar
Wu, X., & Kumar, V. (2009). The top ten algorithms in data mining. Boca Raton: CRC Press.
Book Google Scholar
Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1), 49–67.
Article MathSciNet Google Scholar
Zhao, W., Chellappa, R., & Krishnaswamy, A. (1998). Discriminant analysis of principal components for face recognition. In Proceedings third IEEE international conference on automatic face and gesture recognition.
Zhu, J., & Hastie, T. (2004). Classification of gene microarrays by penalized logistic regression. Biostatistics, 5(3), 427–443.
Article Google Scholar
Zhu, L.-P., Li, L., Li, R., & Zhu, L.-X. (2011). Model-free feature screening for ultrahigh-dimensional data. Journal of the American Statistical Association, 106(496), 1464–1475.
Article MathSciNet Google Scholar
Zhu, X., Suk, H., & Shen, D. (2014). A novel matrix-similarity based loss function for joint regression and classification in ad diagnosis. NeuroImage, 100, 91–105.
Article Google Scholar

Download references

Funding

The research of Baiguo An is partially supported by the National Natural Science Foundation of China (No. 12071308, No. 11601349), scientific research planned project of the National Bureau of Statistics of China(No. 2017LZ15). The research of Guozhong Feng is supported by the National Natural Science Foundation of China (No. 11501095). The research of Jianhua Guo is supported by the National Key Research and Development Program of China (No. 2020YFA0714102) and the National Natural Science Foundation of China (No. 11631003, No. 11690012).

Author information

Authors and Affiliations

School of Statistics, Capital University of Economics and Business, Beijing, China
Baiguo An
School of Computer Science & Information Technology and KLAS, Northeast Normal University, Changchun, China
Guozhong Feng
KLAS and School of Mathematics and Statistics, Northeast Normal University, Changchun, China
Jianhua Guo

Authors

Baiguo An
View author publications
You can also search for this author inPubMed Google Scholar
Guozhong Feng
View author publications
You can also search for this author inPubMed Google Scholar
Jianhua Guo
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Jianhua Guo.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proof of Theorem 1

We prove the claim of Theorem 1 by the following five steps.

Step 1. For j, k = 1, … , p and y ∈ {0, 1}, define $\pi _{y}^{jk}=\left (\pi _{00y}^{jk},\pi _{01y}^{jk},\pi _{10y}^{jk},\pi _{11y}^{jk}\right )^{\top }$, and $\hat {\pi }_{y}^{jk}=\left (\hat {\pi }_{00y}^{jk},\hat {\pi }_{01y}^{jk},\hat {\pi }_{10y}^{jk},\hat {\pi }_{11y}^{jk}\right )^{\top }$. Let $\alpha =\min \limits \{\alpha _{0},\alpha _{1}\}$. In this step, we will prove that $P\left (\|{\widehat {\pi }}_{y}^{jk}-\pi _{y}^{jk}\|_{1}\geq t\right )\leq 8\exp \left \{-\alpha nt^{2}/8\right \}$ for arbitrary t > 0.

One can see that $P\left (\|\widehat \pi _{y}^{jk}-\pi _{y}^{jk}\|_{1}\geq t\right )=P\left (|{\widehat {\pi }}_{00y}^{jk}-\pi _{00y}^{jk}|+|{\widehat {\pi }}_{01y}^{jk}-\pi _{01y}^{jk}|+|{\widehat {\pi }}_{10y}^{jk}\right .$ $\left .-\pi _{10y}^{jk}|+|{\widehat {\pi }}_{11y}^{jk}-\pi _{11y}^{jk}|\geq t\right )\leq {\sum }_{l,s\in \{0, 1\}}P\left (|{\widehat {\pi }}_{lsy}^{jk}-\pi _{lsy}^{jk}|>t/4\right )$. By the Hoeffdings Inequality, we have that $P\left (|{\widehat {\pi }}_{lsy}^{jk}-\pi _{lsy}^{jk}|>t/4\right )\leq 2\exp \left \{-n_{y}t^{2}/8\right \}\leq 2\exp \left \{-\alpha nt^{2}/8\right \}$. Hence, $P\left (\|\widehat \pi _{y}^{jk}-\pi _{y}^{jk}\|_{1}\geq t\right )\leq 8\exp \left \{-\alpha nt^{2}/8\right \}$.

Step 2. Let $r_{n}=\left (|\hat {\pi }_{0}-\pi _{0}|+|\hat {\pi }_{1}-\pi _{1}|\right )\log \left (1/\left (4{\pi _{L}^{2}}\right )\right )$. Define ${\mathcal {E}}=\left \{|\hat {\pi }_{lsy}^{jk}-\pi _{lsy}^{jk}|<\pi _{L}/2 \text {for all } j,k=1,\ldots ,p,\text {~and~} l,s,y\in \{0, 1\}\right \}$, and ${\mathcal {E}}_{r} = \{r_{n}\leq \nu _{n}/2\}$. In this step, we will prove that $P({\mathcal {E}})\rightarrow 1$ and $P({\mathcal {E}}_{r})\rightarrow 1$ as n tends to $\infty $.

Specifically, $P(\mathcal {E})\geq 1-\sum \nolimits _{y\in \{0, 1\}}\sum \nolimits _{j\neq k}\sum \nolimits _{l,s\in \{0, 1\}}P\left (|\hat {\pi }_{lsy}^{jk}-\pi _{lsy}^{jk}|\geq \pi _{L}/2\right )\geq 1- \sum \nolimits _{y\in \{0, 1\}}\sum \nolimits _{j\neq k}\sum \nolimits _{l,s\in \{0, 1\}}2\exp \left \{-{\alpha \pi _{L}^{2}}n/2\right \}\geq 1-16p^{2}\exp \left \{-{\alpha \pi _{L}^{2}}n/2\right \}= 1-16\exp \left \{-{\alpha \pi _{L}^{2}}n/2+2Cn^{\xi }\right \} \rightarrow 1$.

By the assumption A4, we know that n^κν_n = O(1) with κ ∈ (0, (1 − ξ)/2). On the other hand, one can see that n^1/2r_n = O_p(1). Hence, it is easy to show that $P({\mathcal {E}}_{r})\rightarrow 1$. This completes the proof of Step 2.

Step 3. In this step, we will show that on the events $\mathcal {E}$ and $\mathcal {E}_{r}$, $|{\widehat {\text {kl}}}(j,k)-{\text {kl}}(j,k)|\leq M\left (\|\hat {\pi }_{0}^{jk}-\pi _{0}^{jk}\|_{1}+\|\hat {\pi }_{1}^{jk}-\pi _{1}^{jk}\|_{1}\right )+\nu _{n}/2$ holds with some positive constant M for all j, k.

Recall that $\widehat {\text {kl}}(j,k)={\sum }_{y}\hat {\pi }_{y}\widehat {\text {kl}}(j,k;y)$, where

$${\widehat{\text{kl}}}(j,k;y)=\sum\limits_{l,s}\hat{\pi}_{lsy}^{jk}\log\frac{\hat{\pi}_{lsy}^{jk}}{\hat{\pi}_{ly}^{j}\hat{\pi}_{sy}^{k}}=\sum\limits_{l,s}\hat{\pi}_{lsy}^{jk}\log\frac{\hat{\pi}_{lsy}^{jk}}{\left( \hat{\pi}^{jk}_{l0y}+\hat{\pi}^{jk}_{l1y}\right)\left( \hat{\pi}^{jk}_{0sy}+\hat{\pi}^{jk}_{1sy}\right)}.$$

Then, we have that

$$\widehat{\text{kl}}(j,k;y)-\text{kl}(j,k;y)=\left( \frac{\partial \text{kl}(j,k;y)}{\partial \pi_{y}^{jk}}|_{\pi_{y}^{jk*}}\right)^{\top}\left( \hat{\pi}_{y}^{jk}-\pi_{y}^{jk}\right),$$

where

$$\frac{\partial \text{kl}(j,k;y)}{\partial \pi_{y}^{jk}}=\left( \frac{\partial \text{kl}(j,k;y)}{\partial \pi_{00y}^{jk}},\frac{\partial {\text{kl}}(j,k;y)}{\partial \pi_{01y}^{jk}},\frac{\partial {\text{kl}}(j,k;y)}{\partial \pi_{10y}^{jk}},\frac{\partial {\text{kl}}(j,k;y)}{\partial \pi_{11y}^{jk}}\right)^{\top},$$

and $\pi _{y}^{jk*}=\left (\pi _{00y}^{jk*},\pi _{01y}^{jk*},\pi _{10y}^{jk*},\pi _{11y}^{jk*}\right )^{\top }=\pi _{y}^{jk}+\zeta \left (\hat {\pi }_{y}^{jk}-\pi _{y}^{jk}\right )$ with some ζ ∈ (0, 1). One can see that for every l, s ∈ {0, 1},

$$\frac{\partial \text{kl}(j,k;y)}{\partial \pi_{lsy}^{jk}}=\log\pi_{lsy}^{jk}-\log\left( \pi_{l0y}^{jk}+\pi_{l1y}^{jk}\right)-\log\left( \pi_{0sy}^{jk}+\pi_{1sy}^{jk}\right)-1.$$

On event ${\mathcal {E}}$, we have that for every l, s, y ∈ {0, 1}, $\pi _{lsy}^{jk*}=(1-\zeta )\pi _{lsy}^{jk}+\zeta \hat {\pi }_{lsy}^{jk}\geq (1-\zeta )\pi _{L}+\zeta /2\pi _{L}>\pi _{L}/2$. Consequently,

$$ \begin{array}{@{}rcl@{}} & & \left.\left|\frac{\partial \text{kl}(j,k;y)}{\partial \pi_{lsy}^{jk}}\right|_{\pi_{y}^{jk*}}\right|\\ &\leq&\left|\log\pi_{lsy}^{jk*}\right|+\left|\log\left( \pi_{l0y}^{jk*}+\pi_{l1y}^{jk*}\right)\right|+\left|\log\left( \pi_{0sy}^{jk*}+\pi_{1sy}^{jk*}\right)\right|+1\\ &\leq& 1+2\log2+\left|\log\pi_{lsy}^{jk*}\right|+\left|\log\left( \left( \pi_{l0y}^{jk*}+\pi_{l1y}^{jk*}\right)/2\right)\right|+\left|\log\left( \left( \pi_{0sy}^{jk*}+\pi_{1sy}^{jk*}\right)/2\right)\right|\\ &\leq& 1+2\log2-3\log(\pi_{L}/2). \end{array} $$

Denote $1+2\log 2-3\log (\pi _{L}/2)$ by M/2, then we have that

$$|\widehat{\text{kl}}(j,k;y)-\text{kl}(j,k;y)|=\left|\left( \frac{\partial \text{kl}(j,k;y)}{\partial \pi_{y}^{jk}}|_{\pi_{y}^{jk*}}\right)^{\top}\left( \hat{\pi}_{y}^{jk}-\pi_{y}^{jk}\right)\right|\leq M\|\hat{\pi}_{y}^{jk}-\pi_{y}^{jk}\|_{1}.$$

Moreover,

$${\text{kl}}(j,k;y)=\sum\limits_{l\in\{0, 1\}}\sum\limits_{s\in\{0, 1\}}\pi_{lsy}^{jk}\log\frac{\pi_{lsy}^{jk}}{\pi_{ly}^{j}\pi_{sy}^{k}}\leq \sum\limits_{l\in\{0, 1\}}\sum\limits_{s\in\{0, 1\}}\pi_{lsy}^{jk}\log\frac{1}{4{\pi_{L}^{2}}}=\log\frac{1}{4{\pi_{L}^{2}}},$$

which means that kl(j, k; y) is uniformly bounded for all j, k, y ∈ {0, 1}. Consequently, $|\widehat {\text {kl}}(j,k)-\text {kl}(j,k)| \leq \hat {\pi }_{0}|{\widehat {\text {kl}}}(j,k;0)-{\text {kl}}$$(j,k;0)|+\hat {\pi }_{1}|{\widehat {\text {kl}}}(j,k;1)-{\text {kl}}(j,k;1)|+|\hat {\pi }_{0}-\pi _{0}|{\text {kl}}(j,k;0)+|\hat {\pi }_{1}-\pi _{1}|{\text {kl}}(j,k;1)\leq M\left (\|\hat {\pi }_{0}^{jk}-\pi _{0}^{jk}\|_{1}+\|\hat {\pi }_{1}^{jk}-\pi _{1}^{jk}\|_{1}\right )+r_{n},$ where $r_{n}=\left (|\hat {\pi }_{0}-\pi _{0}|+|\hat {\pi }_{1}-\pi _{1}|\right )\log \frac {1}{4{\pi _{L}^{2}}}$.

Moreover, on the event $\mathcal {E}_{r}$, r_n ≤ ν_n/2. Consequently, on the events $\mathcal {E}$ and $\mathcal {E}_{r}$, we have that $|\widehat {\text {kl}}(j,k)-\text {kl}(j,k)|\leq M\left (\|\hat {\pi }_{0}^{jk}-\pi _{0}^{jk}\|_{1}+\|\hat {\pi }_{1}^{jk}-\pi _{1}^{jk}\|_{1}\right )+\nu _{n}/2$.

Step 4. In this step, we will prove that with probability tending to 1, ${\widehat {\mathcal {G}}}_{\nu _{n}}\subseteq {\mathcal {G}}$ is true. Specifically,

$$ \begin{array}{@{}rcl@{}} & & P\left( \widehat{\mathcal{G}}_{\nu_{n}}\subseteq\mathcal{G}\right) \\ &=& 1 - P\left( \sum\limits_{(j,k)}I(\widehat{\text{kl}}(j,k)>\nu_{n},\text{kl}(j,k)=0)>0\right)\\ &\geq& 1-P\left( \bigcup\limits_{(j,k)}\left\{|\widehat{\text{kl}}(j,k)-\text{kl}(j,k)|>\nu_{n}\right\}\right)\\ &\geq& 1-P\left( \bigcup\limits_{(j,k)}\left\{|\widehat{\text{kl}}(j,k)-\text{kl}(j,k)|>\nu_{n},\right.\right.\\ &&\left.\left.|\widehat{\text{kl}}(j,k)-\text{kl}(j,k)|\leq M\left( \|\hat{\pi}_{0}^{jk}-\pi_{0}^{jk}\|_{1}+\|\hat{\pi}_{1}^{jk}-\pi_{1}^{jk}\|_{1}\right)+\nu_{n}/2\right\}\right)\\ &&- P\left( \bigcup\limits_{(j,k)}\left\{|\widehat{\text{kl}}(j,k)-\text{kl}(j,k)|> M\left( \|\hat{\pi}_{0}^{jk}-\pi_{0}^{jk}\|_{1}+\|\hat{\pi}_{1}^{jk}-\pi_{1}^{jk}\|_{1}\right)+\nu_{n}/2\right\}\right)\\ &\geq& 1- P\left( \bigcup\limits_{(j,k)}\left\{\|\hat{\pi}_{0}^{jk}-\pi_{0}^{jk}\|_{1}+\|\hat{\pi}_{1}^{jk}-\pi_{1}^{jk}\|_{1}\geq (2M)^{-1}\nu_{n}\right\}\right)\\ &&-P\left( {\mathcal{E}}^{c}\right)-P\left( {{\mathcal{E}}_{r}^{c}}\right)\\ &\geq& 1- \sum\limits_{(j,k)}P\left( \|\hat{\pi}_{0}^{jk}-\pi_{0}^{jk}\|_{1}+\|\hat{\pi}_{1}^{jk}-\pi_{1}^{jk}\|_{1}\geq (2M)^{-1}\nu_{n}\right)-P\left( {\mathcal{E}}^{c}\right)-P\left( {{\mathcal{E}}_{r}^{c}}\right)\\ &\geq& 1-\sum\limits_{(j,k)}P\left( \|\hat{\pi}_{0}^{jk}-\pi_{0}^{jk}\|_{1}\geq (4M)^{-1}\nu_{n}\right)\\ & & -\sum\limits_{(j,k)}P\left( \|\hat{\pi}_{1}^{jk}-\pi_{1}^{jk}\|_{1}\geq (4M)^{-1}\nu_{n}\right)-P\left( \mathcal{E}^{c}\right)-P\left( {\mathcal{E}_{r}^{c}}\right)\\ &\geq& 1-16 p^{2}\exp\left\{-\frac{\alpha n{\nu_{n}^{2}}}{128M^{2}}\right\} -P\left( \mathcal{E}^{c}\right)-P\left( {\mathcal{E}_{r}^{c}}\right)\\ &=& 1-16\exp\left\{2Cn^{\xi}-\frac{\alpha n{\nu_{n}^{2}}}{128M^{2}}\right\}-P\left( \mathcal{E}^{c}\right)-P\left( {\mathcal{E}_{r}^{c}}\right)\rightarrow 1. \end{array} $$

Step 5. In this step, we will prove that $P\left (\widehat {\mathcal {G}}_{\nu _{n}}\supseteq \mathcal {G}\right )\rightarrow 1$. Specifically,

$$ \begin{array}{@{}rcl@{}} & & P\left( \widehat{\mathcal{G}}_{\nu_{n}}\supseteq\mathcal{G}\right) \\ &=& P\left( \bigcap\limits_{(j,k)\in\mathcal{G}}\left\{\widehat{\text{kl}}_{\nu_{n}}(j,k)>0\right\}\right)\\ & = & 1-P\left( \bigcup\limits_{(j,k)\in\mathcal{G}}\left\{\widehat{\text{kl}}(j,k)\leq \nu_{n}\right\}\right)\\ &\geq& 1-P\left( \bigcup\limits_{(j,k)}\left\{|\widehat{\text{kl}}(j,k)-\text{kl}(j,k)|\geq \tau_{n}-\nu_{n}\right\}\right)\\ &\geq& 1-\sum\limits_{j,k}P\left( \|\hat{\pi}_{0}^{jk}-\pi_{0}^{jk}\|_{1} \geq (2M)^{-1}(\tau_{n}-3\nu_{n}/2)\right)\\ & & -\sum\limits_{j,k}P\left( \|\hat{\pi}_{1}^{jk}-\pi_{1}^{jk}\|_{1} \geq (2M)^{-1}(\tau_{n}-3\nu_{n}/2)\right)-P\left( {\mathcal{E}}^{c}\right)-P\left( {{\mathcal{E}}_{r}^{c}}\right)\\ &\geq& 1 - 16p^{2}\exp\left\{-\alpha n\frac{(\tau_{n}-3/2\nu_{n})^{2}}{32M^{2}}\right\}-P\left( \mathcal{E}^{c}\right)-P\left( {\mathcal{E}_{r}^{c}}\right)\\ &\geq& 1-16\exp\left\{2Cn^{\xi}-\alpha n\frac{(\tau_{n}-3/2\nu_{n})^{2}}{32M^{2}}\right\}-P\left( {\mathcal{E}}^{c}\right)-P\left( {{\mathcal{E}}_{r}^{c}}\right)\rightarrow 1. \end{array} $$

Combining the above results, one can see that $P\left (\widehat {\mathcal {G}}_{\nu _{n}}=\mathcal {G}\right )\rightarrow 1$. This completes the whole proof of Theorem 1.

Appendix B: Proof of Theorem 2

We prove Theorem 2 by the following two steps.

Step 1. In this step, we will prove that $ P\left (\mathcal {C}_{T}\subset \widehat {\mathcal {C}}(\gamma _{n})\right ) \rightarrow 1$.

One can see that $P\left (\mathcal {C}_{T}\subset \widehat {\mathcal {C}}(\gamma _{n})\right )= P\left (\bigcap _{m\in {\mathcal {C}}_{T}}\left \{K_{m}^{-1/2}\|\left (\text {diag}\left ({\widehat {\Sigma }}^{(m)}\right )\right )^{-1/2}\left ({{\widehat {\Pi }}_{0}^{m}}\right .\right .\right .$ $\left .\left .\left .-{{\widehat {\Pi }}_{1}^{m}}\right )\|_{2}\geq \gamma _{n}\right \}\right )\geq 1-P\left (\bigcup _{m\in {\mathcal {C}}_{T}}\left \{K_{m}^{-1/2}\|\left (\text {diag}\left ({\widehat {\Sigma }}^{(m)}\right )\right )^{-1/2}\left ({{\widehat {\Pi }}_{0}^{m}}-{{\widehat {\Pi }}_{1}^{m}}\right )\|_{2}\leq \gamma _{n}\right \}\right )$. Under the assumption A7 and γ_n = 2/3C₂n^−𝜗, we have

$$ \begin{array}{@{}rcl@{}} &&P\left( K_{m}^{-1/2}\|\left( \text{diag}\left( \widehat{\Sigma}^{(m)}\right)\right)^{-1/2}\left( {\widehat{\Pi}_{0}^{m}}-{\widehat{\Pi}_{1}^{m}}\right)\|_{2}\leq\gamma_{n}\right)\\ &\leq&P\left( K_{m}^{-1}\|\left( \text{diag}\left( \widehat{\Sigma}^{(m)}\right)\right)^{-1/2}\left( {\widehat{\Pi}_{0}^{m}}-{\widehat{\Pi}_{1}^{m}}\right)\|_{1}\leq\gamma_{n}\right)\\ &\leq& P\left( K_{m}^{-1}\left|\|\left( \text{diag}\left( {\widehat{\Sigma}}^{(m)}\right)\right)^{-1/2}\left( {{\widehat{\Pi}}_{0}^{m}}-{{\widehat{\Pi}}_{1}^{m}}\right)\|_{1}\right.\right.\\ &&\left.\left.-\|\left( \text{diag}\left( {\Sigma}^{(m)}\right)\right)^{-1/2}\left( {{\Pi}_{0}^{m}}-{{\Pi}_{1}^{m}}\right)\|_{1}\right|\geq 1/3C_{2}n^{-\vartheta}\right)\\ &\leq& P\left( \left\| \left( \text{diag}\left( \widehat{\Sigma}^{(m)}\right)\right)^{-1/2}\left( {\widehat{\Pi}_{0}^{m}}-{\widehat{\Pi}_{1}^{m}}\right)\right.\right. \end{array} $$

$$ \begin{array}{@{}rcl@{}} &&\left.\left.-\left( \text{diag}\left( {\Sigma}^{(m)}\right)\right)^{-1/2}\left( {{\Pi}_{0}^{m}}-{{\Pi}_{1}^{m}}\right)\right\|_{1}\geq 1/3K_{m}C_{2}n^{-\vartheta}\right)\\ &\leq& P\left( \left\|\left( \left( \text{diag}\left( {\widehat{\Sigma}}^{(m)}\right)\right)^{-1/2}-\left( \text{diag}\left( {\Sigma}^{(m)}\right)\right)^{-1/2}\right)\left( {{\widehat{\Pi}}_{0}^{m}}-{{\widehat{\Pi}}_{1}^{m}}\right)\right\|_{1}\geq 1/6K_{m}C_{2}n^{-\vartheta}\right)\\ &&+P\left( \left\|\left( \text{diag}\left( {\Sigma}^{(m)}\right)\right)^{-1/2}\left( {\widehat{\Pi}_{0}^{m}}-{\widehat{\Pi}_{1}^{m}}-\left( {{\Pi}_{0}^{m}}-{{\Pi}_{1}^{m}}\right)\right)\right\|_{1}\geq 1/6K_{m}C_{2}n^{-\vartheta}\right)\\ &\leq& P\left( \left\|\left( \left( \text{diag}\left( {\widehat{\Sigma}}^{(m)}\right)\right)^{-1/2}-\left( \text{diag}\left( {\Sigma}^{(m)}\right)\right)^{-1/2}\right)\left( {{\widehat{\Pi}}_{0}^{m}}-{{\widehat{\Pi}}_{1}^{m}}\right)\right\|_{1}\geq 1/6K_{m}C_{2}n^{-\vartheta}\right)\\ &&+P\left( \left\|\left( \text{diag}\left( {\Sigma}^{(m)}\right)\right)^{-1/2}\left( {\widehat{\Pi}_{0}^{m}}-{{\Pi}_{0}^{m}}\right)\right\|_{1}\geq 1/12K_{m}C_{2}n^{-\vartheta}\right)\\ &&+P\left( \left\|\left( \text{diag}\left( {\Sigma}^{(m)}\right)\right)^{-1/2}\left( {\widehat{\Pi}_{1}^{m}}-{{\Pi}_{1}^{m}}\right)\right\|_{1}\geq 1/12K_{m}C_{2}n^{-\vartheta}\right). \end{array} $$

We first focus on $P\left (\left \|\left (\left (\text {diag}\left ({\widehat {\Sigma }}^{(m)}\right )\right )^{-1/2}-\left (\text {diag}\left ({\Sigma }^{(m)}\right )\right )^{-1/2}\right )\cdot \left ({{\widehat {\Pi }}_{0}^{m}}-{{\widehat {\Pi }}_{1}^{m}}\right )\right \|_{1}\right .$ $\left .\geq 1/6K_{m}C_{2}n^{-\vartheta }\right )$. Define $\hat {W}^{(m)}=\left (\hat {w}_{1}^{(m)},\ldots ,\hat {w}_{K_{m}}^{(m)}\right )^{\top }$ and $W^{(m)}=\left (w_{1}^{(m)},\ldots ,w_{K_{m}}^{(m)}\right )^{\top }$ with $\hat {w}_{k}^{(m)}=\left (\alpha _{0}^{-1}{\widehat {\Pi }}_{k0}^{m}\left (1-{\widehat {\Pi }}_{k0}^{m}\right )+\alpha _{1}^{-1}{\widehat {\Pi }}_{k1}^{m}\left (1-{\widehat {\Pi }}_{k1}^{m}\right )\right )^{-1/2}$, $w_{k}^{(m)}=\left (\alpha _{0}^{-1}{\Pi }_{k0}^{m}\right .$ $\left .\left (1-{\Pi }_{k0}^{m}\right )+\alpha _{1}^{-1}{\Pi }_{k1}^{m}\left (1-{\Pi }_{k1}^{m}\right )\right )^{-1/2}$ for k = 1, … , K_m.

Then, one can see that

$$ \begin{array}{@{}rcl@{}} && \left\|\left( \left( \text{diag}\left( {\widehat{\Sigma}}^{(m)}\right)\right)^{-1/2}-\left( \text{diag}\left( {\Sigma}^{(m)}\right)\right)^{-1/2}\right)\cdot\left( {{\widehat{\Pi}}_{0}^{m}}-{{\widehat{\Pi}}_{1}^{m}}\right)\right\|_{1}\\ &\leq&2\|\hat{W}^{(m)}-W^{(m)}\|_{1}\\ &=& 2\sum\limits_{k=1}^{K_{m}} |\hat{w}_{k}^{(m)}-w_{k}^{(m)}|\\ &=& 2\sum\limits_{k=1}^{K_{m}}|\left( \frac{\partial w_{k}^{(m)}}{\partial{\Pi}_{k0}^{m}},\frac{\partial w_{k}^{(m)}}{\partial{\Pi}_{k1}^{m}}\right)|_{({\Pi}_{k0\tau}^{m},{\Pi}_{k1\tau}^{m})}\left( {\widehat{\Pi}}_{k0}^{m}-{\Pi}_{k0}^{m},{\widehat{\Pi}}_{k1}^{m}-{\Pi}_{k1}^{m}\right)^{\top}|\\ &\leq& 2\sum\limits_{k=1}^{K_{m}}\sum\limits_{l=0}^{1} \left|\frac{\partial w_{k}^{(m)}}{\partial{\Pi}_{kl}^{m}}|_{({\Pi}_{k0\tau}^{m},{\Pi}_{k1\tau}^{m})}\right|\cdot|{\widehat{\Pi}}_{kl}^{m}-{\Pi}_{kl}^{m}|, \end{array} $$

where ${\Pi }_{ky\tau }^{m}=\tau \widehat {\Pi }_{ky}^{m}+(1-\tau ){\Pi }_{ky}^{m}$ for some τ ∈ (0, 1) and y ∈ {0, 1}. One can verify that $\frac {\partial w_{k}^{(m)}}{\partial {\Pi }_{kl}^{m}}=-1/2\left [\alpha _{0}^{-1}{\Pi }_{k0}^{m}\left (1-{\Pi }_{k0}^{m}\right )+\alpha _{1}{\Pi }_{k1}^{m}\left (1-{\Pi }_{k1}^{m}\right )\right ]^{-3/2}\alpha _{l}^{-1}\left (1-2{\Pi }_{kl}^{m}\right )$.

Define $\mathcal {E}=\left \{|\hat {\Pi }_{ky}^{m}-{\Pi }_{ky}^{m}|<{\Pi }_{L}/2 \text {~for~all~} k, m,\text {~and~} y\in \{0, 1\}\right \}$. Then, by the similar proof of Step 2 in the proof for Theorem 1, one can verify that $P({\mathcal {E}})\rightarrow 1$. On the event ${\mathcal {E}}$, it is easy to show that ${\Pi }_{L}/2\leq {\Pi }_{ky\tau }^{m}\leq (1-{\Pi }_{L}/2)$ for k = 1, … , K_m, y = 0, 1 and all m. Consequently, on the event ${\mathcal {E}}$, one can obtain that

$$ \begin{array}{@{}rcl@{}} && \left|\frac{\partial w_{k}^{(m)}}{\partial{\Pi}_{kl}^{m}}|_{({\Pi}_{k0\tau}^{m},{\Pi}_{k1\tau}^{m})}\right|\\ &\leq&1/2\left[\alpha_{0}^{-1}{\Pi}_{k0\tau}^{m}\left( 1-{\Pi}_{k0\tau}^{m}\right)+\alpha_{1}{\Pi}_{k1\tau}^{m}\left( 1-{\Pi}_{k1\tau}^{m}\right)\right]^{-3/2}\alpha_{l}^{-1}\left( 1-2{\Pi}_{kl\tau}^{m}\right)\\ &\leq& 1/2\left[\alpha_{0}^{-1}{{\Pi}_{L}^{2}}/4+\alpha_{1}^{-1}{{\Pi}_{L}^{2}}/4\right]^{-3/2}\alpha^{-1}\left( 1-{\Pi}_{L}\right)\\ &\leq& 4(\alpha_{0}\alpha_{1})^{3/2}\alpha^{-1}\frac{1-{\Pi}_{L}}{{{\Pi}_{L}^{3}}}\\ &\leq& 4\alpha^{-1}\frac{1-{\Pi}_{L}}{{{\Pi}_{L}^{3}}}. \end{array} $$

As a result, on the event $\mathcal {E}$ we have that

$$ \begin{array}{@{}rcl@{}} & &P\left( \left\|\left( \left( \text{diag}\left( {\widehat{\Sigma}}^{(m)}\right)\right)^{-1/2}-\left( \text{diag}\left( {\Sigma}^{(m)}\right)\right)^{-1/2}\right)\left( {{\widehat{\Pi}}_{0}^{m}}-{{\widehat{\Pi}}_{1}^{m}}\right)\right\|_{1}\geq 1/6K_{m}C_{2}n^{-\vartheta}\right)\\ &\leq& P\left( 2\sum\limits_{k=1}^{K_{m}}\sum\limits_{l=0}^{1} \left|\frac{\partial w_{k}^{(m)}}{\partial{\Pi}_{kl}^{m}}|_{({\Pi}_{k0\tau}^{m},{\Pi}_{k1\tau}^{m})}\right|\cdot|{\widehat{\Pi}}_{kl}^{m}-{\Pi}_{kl}^{m}|\geq1/6K_{m}C_{2}n^{-\vartheta}\right)\\ &=& P\left( \sum\limits_{k=1}^{K_{m}}\sum\limits_{l=0}^{1}|\widehat {\Pi}_{kl}^{m}-{\Pi}_{kl}^{m}|\geq\frac{1}{48}\frac{{{\Pi}_{L}^{3}}}{1-{\Pi}_{L}}\alpha K_{m}C_{2}n^{-\vartheta}\right)\\ &\leq& P\left( \|{\widehat{\Pi}_{0}^{m}}-{{\Pi}_{0}^{m}}\|_{1}\geq\frac{1}{96}\frac{{{\Pi}_{L}^{3}}}{1-{\Pi}_{L}}\alpha K_{m}C_{2}n^{-\vartheta}\right)\\ &&+P\left( \|{\widehat{\Pi}_{1}^{m}}-{{\Pi}_{1}^{m}}\|_{1}\geq\frac{1}{96}\frac{{{\Pi}_{L}^{3}}}{1-{\Pi}_{L}}\alpha K_{m}C_{2}n^{-\vartheta}\right). \end{array} $$

Next, we consider $P\left (\left \|\left (\text {diag}\left ({\Sigma }^{(m)}\right )\right )^{-1/2}\left ({{\widehat {\Pi }}_{y}^{m}}-{{\Pi }_{y}^{m}}\right )\right \|_{1}\geq 1/12K_{m}C_{2}n^{-\vartheta }\right )$ with y = 0, 1. Due to that ${w_{k}^{m}}=\left (\alpha _{0}^{-1}{\Pi }_{k0}^{m}\left (1-{\Pi }_{k0}^{m}\right )+\alpha _{1}^{-1}{\Pi }_{k1}^{m}\left (1-{\Pi }_{k1}^{m}\right )\right )^{-1/2}\leq {\Pi }_{L}^{-1}$, we have that $P\left (\left \|\left (\text {diag}\left ({\Sigma }^{(m)}\right )\right )^{-1/2}\left ({\widehat {\Pi }_{y}^{m}}-{{\Pi }_{y}^{m}}\right )\right \|_{1}\geq 1/12K_{m}C_{2}n^{-\vartheta }\right ) \leq P\left (\max \limits _{k}({w_{k}^{m}})\right .$ $\left .\|\left ({\widehat {\Pi }_{y}^{m}}-{{\Pi }_{y}^{m}}\right )\|_{1}\geq 1/12K_{m}C_{2}n^{-\vartheta }\right ) \leq P\left (\|\left ({{\widehat {\Pi }}_{y}^{m}}-{{\Pi }_{y}^{m}}\right )\|_{1}\right .$ $\left .\geq 1/12{\Pi }_{L}K_{m}C_{2}n^{-\vartheta }\right )$.

Denote $\min \limits \left \{\frac {1}{96}\frac {{{\Pi }_{L}^{3}}}{1-{\Pi }_{L}}\alpha ,\frac {1}{12}{\Pi }_{L}\right \}$ by M, then based on the above analysis, we have

$$ \begin{array}{@{}rcl@{}} &&P\left( \bigcup\limits_{m\in\mathcal{C}_{T}}\left\{K_{m}^{-1/2}\|\left( \text{diag}\left( \widehat{\Sigma}^{(m)}\right)\right)^{-1/2}\left( {\widehat{\Pi}_{0}^{m}}-{\widehat{\Pi}_{1}^{m}}\right)\|_{2}\leq\gamma_{n}\right\}\right)\\ &\leq& P\left( \bigcup_{m\in{\mathcal{C}}_{T}}\left\{K_{m}^{-1/2}\|\left( \text{diag}\left( {\widehat{\Sigma}}^{(m)}\right)\right)^{-1/2}\left( {{\widehat{\Pi}}_{0}^{m}}-{{\widehat{\Pi}}_{1}^{m}}\right)\|_{2}\leq\gamma_{n},{\mathcal{E}}\right\}\right)+P\left( {\mathcal{E}}^{C}\right) \end{array} $$

$$ \begin{array}{@{}rcl@{}} &\leq&\sum\limits_{m\in\mathcal{C}_{T}}P\left( K_{m}^{-1}\|\left( \text{diag}\left( \widehat{\Sigma}^{(m)}\right)\right)^{-1/2}\left( {\widehat{\Pi}_{0}^{m}}-{\widehat{\Pi}_{1}^{m}}\right)\|_{1}\leq\gamma_{n},\mathcal{E}\right)+P\left( \mathcal{E}^{C}\right)\\ &\leq&\sum\limits_{m\in\mathcal{C}_{T}}\left\{P\left( \left\|\left( \left( \text{diag}\left( {\widehat{\Sigma}}^{(m)}\right)\right)^{-1/2}-\left( \text{diag}\left( {\Sigma}^{(m)}\right)\right)^{-1/2}\right)\left( {{\widehat{\Pi}}_{0}^{m}}-{{\widehat{\Pi}}_{1}^{m}}\right)\right\|_{1}\geq 1/6K_{m}C_{2}n^{-\vartheta},{\mathcal{E}}\right)\right.\\ &&+P\left( \left\|\left( \text{diag}\left( {\Sigma}^{(m)}\right)\right)^{-1/2}\left( {\widehat{\Pi}_{0}^{m}}-{{\Pi}_{0}^{m}}\right)\right\|_{1}\geq 1/12K_{m}C_{2}n^{-\vartheta},{\mathcal{E}}\right)\\ &&\left.+P\left( \left\|\left( \text{diag}\left( {\Sigma}^{(m)}\right)\right)^{-1/2}\left( {\widehat{\Pi}_{1}^{m}}-{{\Pi}_{1}^{m}}\right)\right\|_{1}\geq 1/12K_{m}C_{2}n^{-\vartheta},{\mathcal{E}}\right)\right\} +P(\mathcal{E}^{C})\\ &\leq& 2\sum\limits_{m\in\mathcal{C}_{T}} \left\{P\left( \|\left( {\widehat{\Pi}_{0}^{m}}-{{\Pi}_{0}^{m}}\right)\|_{1}\geq MK_{m} C_{2}n^{-\vartheta}\right)+P\left( \|({{\widehat{\Pi}}_{1}^{m}}-{{\Pi}_{1}^{m}})\|_{1}\geq MK_{m} C_{2}n^{-\vartheta}\right)\right\}+P\left( \mathcal{E}^{C}\right)\\ &\leq& 2\sum\limits_{m\in\mathcal{C}_{T}} \left\{\sum\limits_{k=1}^{K_{m}}P\left( |(\widehat{\Pi}_{k0}^{m}-{\Pi}_{k0}^{m})|\geq M C_{2}n^{-\vartheta}\right)+\sum\limits_{k=1}^{K_{m}}P\left( |(\widehat{\Pi}_{k1}^{m}-{\Pi}_{k1}^{m})|\geq M C_{2}n^{-\vartheta}\right)\right\}+P\left( \mathcal{E}^{C}\right)\\ &\leq& 8Kp\exp\left\{-2\alpha M^{2}{C_{2}^{2}}n^{1-2\vartheta}\right\}+P\left( \mathcal{E}^{C}\right). \end{array} $$

Moreover, by the assumptions A2 and A7, we see that $p=e^{Cn^{\xi }}$ and 𝜗 ∈ (0, (1 − ξ)/2); hence, we can obtain that $p\exp \left \{-C_{4}n^{1-2\vartheta }\right \}\rightarrow 0$. Combining the fact that $P({\mathcal {E}}^{C})\rightarrow 0$, we have that $P\left (\bigcup _{m\in \mathcal {C}_{T}}\left \{K_{m}^{-1/2}\|\left (\text {diag}\left (\widehat {\Sigma }^{(m)}\right )\right )^{-1/2}\left ({\widehat {\Pi }_{0}^{m}}-{\widehat {\Pi }_{1}^{m}}\right )\|_{2}\leq \gamma _{n}\right \}\right )$ $\rightarrow 0$. Furthermore, $ P\left (\mathcal {C}_{T}\subset \widehat {\mathcal {C}}(\gamma _{n})\right ) \geq 1-P\left (\bigcup _{m\in {\mathcal {C}}_{T}}\{K_{m}^{-1/2}\|\left (\text {diag}\left ({\widehat {\Sigma }}^{(m)}\right )\right )^{-1/2}\right .$ $\left .\left ({{\widehat {\Pi }}_{0}^{m}}-{{\widehat {\Pi }}_{1}^{m}}\right )\|_{2}\leq \gamma _{n}\}\right )\rightarrow 1$.

Step 2. In this step, we will prove that $ P\left (\mathcal {C}_{T}\supset \widehat {\mathcal {C}}(\gamma _{n})\right ) \rightarrow 1$.

One can see that

$$ \begin{array}{@{}rcl@{}} & & P\left( \mathcal{C}_{T}\supset\widehat{\mathcal{C}}(\gamma_{n})\right) \\ &=&1-P\left( \bigcup\limits_{m{\in\mathcal{C}_{T}^{C}}}\left\{K_{m}^{-1/2}\|\left( \text{diag}\left( \widehat{\Sigma}^{(m)}\right)\right)^{-1/2}\left( {\widehat{\Pi}_{0}^{m}}-{\widehat{\Pi}_{1}^{m}}\right)\|_{2}\leq\gamma_{n}\right\}\right)\\ &\geq& 1-P\left( \bigcup\limits_{m{\in{\mathcal{C}}_{T}^{C}}}\left\{K_{m}^{-1}\|\left( \text{diag}\left( {\widehat{\Sigma}}^{(m)}\right)\right)^{-1/2}\left( {{\widehat{\Pi}}_{0}^{m}}-{{\widehat{\Pi}}_{1}^{m}}\right)\|_{1}\leq\gamma_{n}\right\}\right)\\ &\geq& 1-\sum\limits_{m{\in{\mathcal{C}}_{T}^{C}}}P\left( K_{m}^{-1}\left|\|\left( \text{diag}\left( {\widehat{\Sigma}}^{(m)}\right)\right)^{-1/2}\left( {{\widehat{\Pi}}_{0}^{m}}-{{\widehat{\Pi}}_{1}^{m}}\right)\|_{1}\right.\right.\\ &&\left.\left.-\|\left( \text{diag}\left( {\Sigma}^{(m)}\right)\right)^{-1/2}\left( {{\Pi}_{0}^{m}}-{{\Pi}_{1}^{m}}\right)\|_{1}\right|\geq 1/3C_{2}n^{-\vartheta}\right)\\ &\geq&1-\sum\limits_{m}P\left( K_{m}^{-1}\left|\|\left( \text{diag}\left( {\widehat{\Sigma}}^{(m)}\right)\right)^{-1/2}\left( {{\widehat{\Pi}}_{0}^{m}}-{{\widehat{\Pi}}_{1}^{m}}\right)\|_{1}\right.\right.\\ &&\left.\left.-\|\left( \text{diag}\left( {\Sigma}^{(m)}\right)\right)^{-1/2}\left( {{\Pi}_{0}^{m}}-{{\Pi}_{1}^{m}}\right)\|_{1}\right|\geq 1/3C_{2}n^{-\vartheta}\right). \end{array} $$

By a similar proof to Step 1, one can see that ${\sum }_{m}P\left (K_{m}^{-1}\left |\|\left (\text {diag}\left (\widehat {\Sigma }^{(m)}\right )\right )^{-1/2}\right .\right .$ $\left .\left .\left ({\widehat {\Pi }_{0}^{m}}-{\widehat {\Pi }_{1}^{m}}\right )\|_{1}-\|\left (\text {diag}\left ({\Sigma }^{(m)}\right )\right )^{-1/2}\left ({{\Pi }_{0}^{m}}-{{\Pi }_{1}^{m}}\right )\|_{1}\right |\right .$ $\left .\geq 1/3C_{2}n^{-\vartheta }\right )\rightarrow 0$. Hence, P $\left ({\mathcal {C}}_{T}\supset {\widehat {\mathcal {C}}}(\gamma _{n})\right )\rightarrow 1$.

Combining the results of above two steps, we have that $P\left (\mathcal {C}_{T}=\widehat {\mathcal {C}}(\gamma _{n})\right )\rightarrow 1$.

This completes the whole proof of Theorem 2.

Rights and permissions

Reprints and permissions

About this article

Cite this article

An, B., Feng, G. & Guo, J. Interaction Identification and Clique Screening for Classification with Ultra-high Dimensional Discrete Features. J Classif 39, 122–146 (2022). https://doi.org/10.1007/s00357-021-09399-0

Download citation

Accepted: 16 August 2021
Published: 11 September 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s00357-021-09399-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Interaction Identification and Clique Screening for Classification with Ultra-high Dimensional Discrete Features

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Information-Theoretic Feature Selection Using High-Order Interactions

An Interaction-Enhanced Feature Selection Algorithm

Mutual information for feature selection: estimation or counting?

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendices

Appendix A: Proof of Theorem 1

Appendix B: Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now