A recursive feature retention method for semi-supervised feature selection

Pang, Qingqing; Zhang, Li

doi:10.1007/s13042-021-01346-0

A recursive feature retention method for semi-supervised feature selection

Original Article
Published: 29 May 2021

Volume 12, pages 2639–2657, (2021)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

265 Accesses
9 Citations
1 Altmetric
Explore all metrics

Abstract

To deal with semi-supervised feature selection tasks, this paper presents a recursive feature retention (RFR) method based on a neighborhood discriminant index (NDI) method (a supervised feature selection method) and a forward iterative Laplacian score (FILS) method (an unsupervised method), where FILS is designed specially for RFR. The goal of RFR is to determine an optimal feature subset that has not only a high discriminant ability but also a strong ability to maintain the local structure of data. The discriminant ability of a feature is measured by NDI, and the ability of a feature to maintain the local structure of data is described by FILS. RFR compromises these two scores to give a balanced score for a feature. RFR iteratively selects a feature with the smallest balanced score and moves it into the current optimal feature subset. This paper also shows theoretical analysis to speed up iterations. Extensive experiments are conducted on toy and real-world data sets. Experimental results confirm that RFR can achieve a better performance compared with the state-of-the-art semi-supervised methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semi-supervised local feature selection for data classification

Article 23 August 2021

Semi-supervised Feature Selection Based on Cost-Sensitive and Structural Information

Iteratively local fisher score for feature selection

Article 05 February 2021

References

Shang R, Chang J, Jiao L et al (2019) Unsupervised feature selection based on self-representation sparse regression and local similarity preserving. Int J Mach Learn Cybern 10(4):757–770
Article Google Scholar
Karagoz GN, Yazici A, Dökeroglu T et al (2021) A new framework of multi-objective evolutionary algorithms for feature selection and multi-label classification of video data. Int J Mach Learn Cybern 12(1):53–71
Article Google Scholar
Zhang W, Kang P, Fang X et al (2019) Joint sparse representation and locality preserving projection for feature extraction. Int J Mach Learn Cybern 10(7):1731–1745
Article Google Scholar
Valiant LG (1984) A Theory of the learnable. Commun ACM 27(11):1134–1142
Article Google Scholar
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188
Article Google Scholar
Smallman L, Artemiou A, Morgan J (2018) Sparse generalised principal component analysis. Pattern Recogn 83:443–455
Article Google Scholar
Lai Z, Xu Y, Chen Q et al (2014) Multilinear sparse principal component analysis. IEEE Trans Neural Netw Learn Syst 25(10):1942–1950
Article Google Scholar
Wang S, Lu J, Gu X et al (2016) Semi-supervised linear discriminant analysis for dimension reduction and classification. Pattern Recogn 57:179–189
Article Google Scholar
Sheikhpour R, Sarram MA, Gharaghani S et al (2017) A Survey on semi-supervised feature selection methods. Pattern Recogn 64:141–158
Article Google Scholar
Wang X, Chen RC, Hong C et al (2018) Unsupervised feature analysis with sparse adaptive learning. Pattern Recogn Lett 102:89–94
Article Google Scholar
Benabdeslem K, Hindawi M (2011) Constrained Laplacian score for semi-supervised feature selection, in Machine Learning and Knowledge Discovery in Databases, pp 204-218
Xu J, Tang B, He H et al (2017) Semisupervised feature selection based on relevance and redundancy criteria. IEEE Trans Neural Netw Learn Syst 28(9):1974–1984
Article MathSciNet Google Scholar
Zhao J, Lu K, He X (2008) Locality sensitive semi-supervised feature selection. Neurocomputing 71(10):1842–1848
Article Google Scholar
Yang M, Chen Y, Ji G (2010) Semi-Fisher score: a semisupervised method for feature selection. In: International conference on machine learning and cybernetics. IEEE, pp 527–532
Gu Q, Li Z, Han J (2011) Generalized Fisher score for feature selection. In: Twenty-seventh conference on uncertainty in arti cial intelligence. AUAI Press, pp 266-273
Bishop CM (1996) Neural networks for pattern recognition. Oxford University Press, USA
MATH Google Scholar
Lv S, Jiang H, Zhao L et al. (2013) Manifold based Fisher method for semi-supervised feature selection. In: International conference on fuzzy systems and knowledge discovery. IEEE, pp 664-668
Li Z, Liao B, Cai L et al (2018) Semi-supervised maximum discriminative local margin for gene selection. Sci Rep 8:8619
Article Google Scholar
He X, Cai D, Han J (2008) Learning a maximum margin subspace for image retrieval. IEEE Trans Knowl Data Eng 20(2):189–201
Article Google Scholar
He X, Cai D, Niyog P (2005) Laplacian score for feature selection. Neural Inf Process Syst 18:507–514
Google Scholar
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Article Google Scholar
Sedgwick P (2012) Pearson’s correlation coefficient, BMJ (online), 345(jul04 1): e4483-e4483
Tang B, Zhang L (2019) Multi-class semi-supervised Logistic I-RELIEF feature selection based on nearest neighbor, knowledge discovery and data mining, lecture notes in computer science. Lect Notes Comput Sci 11440:281–292
Article Google Scholar
Sun Y, Todorovic S, Goodison S (2010) Local learning-based feature selection for high-dimensional data analysis. IEEE Trans Pattern Anal Mach Intell 32(9):1610–1626
Article Google Scholar
Tang B, Zhang L (2020) Local preserving logistic I-relief for semi-supervised feature selection. Neurocomputing 399:48–64
Article Google Scholar
Zhu L, Miao L, Zhang D (2012) Iterative Laplacian score for feature felection, Chinese conference on pattern recognition, 507–541
Wang C, Hu Q, Wang X et al (2018) Feature selection based on neighborhood discrimination index. IEEE Trans Neural Netw Learn Syst 29(7):2986–2999
MathSciNet Google Scholar
Zelnik-manor L, Perona P (2004) Self-tuning spectral clustering. In: Advances in neural information processing systems. Vol 17, MIT Press, Cambridge pp 1601–1608
Dheeru D, Karra Taniskidou E (2017) UCI machine learning repository, UCI machine learning repository. URL http://archive.ics.uci.edu/ml
Monti S, Tamayo P, Mesirov J et al (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52(1–2):91–118
Article Google Scholar
Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression. Science 286(5439):531–537
Article Google Scholar
Cooke MP, Ching KA, Hakak Y et al (2002) Large-scale analysis of the human and house transcriptomes. Proc Natl Acad Sci 99(7):4465–4470
Article Google Scholar
Yeoh EJ, Ross ME, Shurtle SA et al (2002) Classification, subtype discovery, and prediction of outcome in pediatricacute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1(2):133–143
Article Google Scholar
Bhattacharjee A, Richards WG, Staunton J et al (2001) Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinomas sub-classes. Proc Natl Acad Sci 98(24):13790–13795
Article Google Scholar
Pomeroy S, Tamayo P, Gaasenbeek M et al (2001) Gene expression-based classification and outcome prediction of central nervous system embryonal tumors. Nature 415(6870):436–442
Article Google Scholar
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York
MATH Google Scholar
Shieh MD, Yang CC (2008) Multiclass SVM-REF for product from feature selection. Expert Syst Appl 35(1–2):531–541
Article Google Scholar
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701
Article Google Scholar
Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56(293):52–64
Article MathSciNet Google Scholar
Chen H, Tiňo P, Yao X (2009) Predictive ensemble pruning by expectation propagation. IEEE Trans Knowl Data Eng 21(7):999–1013
Article Google Scholar
Zhang L, Huang X, Zhou W (2019) Logistic local hyperplane-Relief: A feature weighting method for classification. Knowl Based Syst 181:1
Google Scholar
Huang X, Zhang L, Li F et al (2018) Feature weight estimation based on dynamic representation and neighbor sparse reconstruction. Pattern Recogn 81(9):338–403
Google Scholar

Download references

Acknowledgements

This work was supported in part by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant No. 19KJA550002, by the Six Talent Peak Project of Jiangsu Province of China under Grant No. XYDXX-054, by the Priority Academic Program Development of Jiangsu Higher Education Institutions, and by the Collaborative Innovation Center of Novel Software Technology and Industrialization.

Author information

Authors and Affiliations

School of Computer Science and Technology, Soochow University, Suzhou, 215006, Jiangsu, China
Qingqing Pang & Li Zhang
Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, Suzhou, 215006, Jiangsu, China
Li Zhang

Authors

Qingqing Pang
View author publications
You can also search for this author in PubMed Google Scholar
Li Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Proof of Theorem 1

Proof

For a given non-empty set A, We need to prove that $J(A)\ge 0$ holds true for $1\le t\le n$. When $A \ne \emptyset$, its Laplacian score is described as

$$\begin{aligned} J(A)=\frac{trace\left( \widetilde{{\mathbf {Z}}}_{A}^T{\mathbf {L}} \widetilde{{\mathbf {Z}}}_{A}\right) }{trace\left( \widetilde{{\mathbf {Z}}}_{A}^T{\mathbf {D}}\widetilde{{\mathbf {Z}}}_{A}\right) } \end{aligned}$$

(19)

The numerator of J(A) can be rewritten as

$$\begin{aligned} trace\left( \widetilde{{\mathbf {Z}}}_{A}^T{\mathbf {L}}\widetilde{{\mathbf {Z}}}_{A}\right) = \sum _{f_m\in A}\widetilde{{\mathbf {z}}}_{f_m}^T{\mathbf {L}}\widetilde{{\mathbf {z}}}_{f_m} \end{aligned}$$

(20)

where $\widetilde{{\mathbf {z}}}_{f_m}$ is a column of $\widetilde{{\mathbf {Z}}}_{A}$. Because the Laplacian matrix ${\mathbf {L}}$ is symmetric and positive semi-definite, we have

$$\begin{aligned} \widetilde{{\mathbf {z}}}_{f_m}^T{\mathbf {L}}\widetilde{{\mathbf {z}}}_{f_m} \ge 0 \end{aligned}$$

(21)

which indicates that the numerator (20) of $J(A^*(t))$ is nonnegative, or

$$\begin{aligned} trace\left( \widetilde{{\mathbf {Z}}}_{A}^T{\mathbf {L}}\widetilde{{\mathbf {Z}}}_{A}\right) \ge 0 \end{aligned}$$

(22)

Similarly, the denominator of J(A) can be rewritten as

$$\begin{aligned} trace\left( \widetilde{{\mathbf {Z}}}_{A}^T{\mathbf {D}}\widetilde{{\mathbf {Z}}}_{A}\right) = \sum _{f_m\in A}\widetilde{{\mathbf {z}}}_{f_m}^T{\mathbf {D}}\widetilde{{\mathbf {z}}}_{f_m} \end{aligned}$$

(23)

Moreover, the matrix ${\mathbf {D}}$ is a diagonal matrix that is positive definite. Thus we have

$$\begin{aligned} trace\left( \widetilde{{\mathbf {Z}}}_{A}^T{\mathbf {D}}\widetilde{{\mathbf {Z}}}_{A}\right) > 0 \end{aligned}$$

(24)

By (22) and (24), when $A\ne \emptyset$ we can have the conclusion:

$$\begin{aligned} J(A)\ge 0 \end{aligned}$$

(25)

which completes the proof of Theorem 1. $\square$

Proof of Theorem 2

Proof

To prove the inequalities (13) in Theorem 2, we use the mathematical induction method.

When $t=1$, $A^*(0)=\emptyset$ and $A^*(1)\ne \emptyset$. Since $J(A^*(0))=-\infty$ and $J(A^*(1))\ge 0$ according to (11), $J(A^*(0))\le J(A^*(1))$ is true. For simplification, let

$$\begin{aligned} trace\left( \widetilde{{\mathbf {Z}}}_{A}^T{\mathbf {L}}\widetilde{{\mathbf {Z}}}_{A}\right) =\sum _{f_m\in A} a_{f_m},~~trace\left( \widetilde{{\mathbf {Z}}}_{A}^T{\mathbf {D}}\widetilde{{\mathbf {Z}}}_{A} \right) =\sum _{f_m\in A} b_{f_m} \end{aligned}$$

When $t=2$, the Laplacian score of a feature subset A(1) with $|A(1)|=1$ can be reduced as:

$$\begin{aligned} J(A^*(1))=\frac{a_1^*}{b_1^*}=\min _{f_k\in F}\frac{a_{f_k}}{b_{f_k}} \end{aligned}$$

(26)

where $a_1^*$ and $b_1^*$ are the corresponding $a_{f_m}$ and $b_{f_m}$ under the optimal solution, respectively.

$$\begin{aligned} J(A^*(2))=\frac{a_2^*}{b_2^*}=\frac{a_1^*+a_{p_1}}{b_1^*+b_{p_1}}=\min _{f_k\in \overline{A^*(1)}}\frac{a_1^*+a_{f_k}}{b_1^*+b_{f_k}} \end{aligned}$$

(27)

where $a_2^*=a_1^*+a_{p_1}$ and $b_2^*=b_1^*+b_{p_1}$, ${p_1}$ corresponds the optimal feature index in the second iteration. Combining (26) and (27), we have

$$\begin{aligned} J(A^*(1))-J(A^*(2))= & {} \frac{a_1^*}{b_1^*}-\frac{a_2^*}{b_2^*} \nonumber \\= & {} \frac{a_1^*}{b_1^*}-\frac{a_1^*+a_{p_1}}{b_1^*+b_{p_1}}\nonumber \\= & {} \frac{a_1^*(a_1^*+a_{p_1})-b_1^*(a_1^*+a_{p_1})}{b_1^*(b_1^*+b_{p_1})}\nonumber \\= & {} \frac{a_1^*b_{p_1}-b_1^*a_{p_1}}{b_1^*(b_1^*+b_{p_1})} \end{aligned}$$

(28)

According to (26), we know

$$\begin{aligned} \frac{a_1^*}{b_1^*}\le \frac{a_{p_1}}{b_{p_1}} \end{aligned}$$

(29)

Thus, we have

$$\begin{aligned} a_1^*b_{p_1}-b_1^*a_{p_1}\le 0 \end{aligned}$$

(30)

Considering $b_1^*(b_1^*+b_{p_1})>0$ and (30), we lead a conclusion that $J(A^*(1))\le J(A^*(2))$ is true.

Assume that $J(A^*(N-1))\le J(A^*(N))$ is true for $t=N<n$. Without loss of generality, let

$$\begin{aligned} J(A^*(N-1))=\frac{a_{N-1}^*}{b_{N-1}^*} \end{aligned}$$

(31)

and

$$\begin{aligned} J(A^*(N))=\frac{a_{N}^*}{b_{N}^*}=\frac{a_{N-1}^*+a_{p_{N-1}}}{b_{N-1}^*+b_{p_{N-1}}}= \min _{f_k\in \overline{A^*(N-1)}} \frac{a_{N-1}^*+a_{f_k}}{b_{N-1}^*+b_{f_k}} \end{aligned}$$

(32)

where ${p_{N-1}}$ is the optimal feature index in the N-th iteration. According to $J(A^*(N-1))\le J(A^*(N))$, (31) and (32), for $\forall f_k \in \overline{A^*(N-1)}$ we have

$$\begin{aligned} \frac{a_{N-1}^*}{b_{N-1}^*}\le \frac{a_{N-1}^*+a_{p_{N-1}}}{b_{N-1}^*+b_{p_{N-1}}}\le \frac{a_{N-1}^*+a_{f_k}}{b_{N-1}^*+b_{f_k}} \end{aligned}$$

(33)

Further, we have

$$\begin{aligned} \frac{a_{N-1}^*}{b_{N-1}^*} \le \frac{a_{N-1}^*+a_{p_{N-1}}}{b_{N-1}^*+b_{p_{N-1}}} \Rightarrow \frac{a_{N-1}^*}{b_{N-1}^*} \le \frac{a_{p_{N-1}}}{b_{p_{N-1}}} \Rightarrow \nonumber \\ a_{N-1}^*b_{p_{N-1}}-b_{N-1}^*a_{p_{N-1}} \le 0 \end{aligned}$$

(34)

$$\begin{aligned} \frac{a_{N-1}^*}{b_{N-1}^*}\le \frac{a_{N-1}^*+a_{f_k}}{b_{N-1}^*+b_{f_k}} \Rightarrow \frac{a_{N-1}^*}{b_{N-1}^*}\le \frac{a_{f_{k}}}{b_{f_{k}}} \Rightarrow \nonumber \\ a_{N-1}^*b_{f_{k}}-b_{N-1}^*a_{f_{k}}\le 0 \end{aligned}$$

(35)

$$\begin{aligned} \frac{a_{N-1}^*+a_{p_{N-1}}}{b_{N-1}^*+b_{p_{N-1}}} \le \frac{a_{N-1}^*+a_{f_k}}{b_{N-1}^* +b_{f_k}} \Rightarrow \nonumber \\ (a^*_{N-1}b_{f_k}-b^*_{N-1}a_{f_k})+ \left( a_{p_{N-1}}b_{f_k}-b_{p_{N-1}}a_{f_k}\right) \le \nonumber \\ \left( a^*_{N-1}b_{{p_{N-1}}}-b^*_{N-1}a_{{p_{N-1}}} \right) \end{aligned}$$

(36)

where $f_k \in \overline{A^*(N-1)}$.

When $t=N+1 \le n$, we want to prove that $J(A^*(N))\le J(A^*(N+1))$ is true. Let

$$\begin{aligned} J(A^*(N+1))=\frac{a_{N+1}^*}{b_{N+1}^*}=\frac{a_{N}^*+a_{p_{N}}}{b_{N}^*+b_{p_{N}}} \end{aligned}$$

(37)

where ${p_{N}}$ is the optimal feature index in the $(N+1)$-th iteration. We compute $J(A^*(N))-J(A^*(N+1))$, and have

$$\begin{aligned}&J(A^*(N))-J(A^*(N+1)) \nonumber \\= & {} \frac{a_{N}^*}{b_{N}^*}-\frac{a_{N+1}^*}{b_{N+1}^*} \nonumber \\= & {} \frac{a_{N}^*}{b_{N}^*}-\frac{a_{N}^*+a_{p_{N}}}{b_{N}^*+b_{p_{N}}}\nonumber \\= & {} \frac{a_{N}^*(a_{N}^*+a_{p_{N}})-b_{N}^*(a_{N}^*+a_{p_{N}})}{b_{N}^*(b_{N}^*+b_{p_{N}})}\nonumber \\= & {} \frac{a_{N}^*b_{p_{N}}-b_{N}^*a_{p_{N}}}{b_{N}^*(b_{N}^*+b_{p_{N}})}\nonumber \\= & {} \frac{\left( a_{N-1}^*b_{p_{N}}-b_{N-1}^*a_{p_{N}}\right) +\left( a_{p_{N-1}}b_{p_{N}} -b_{p_{N-1}}a_{p_{N}}\right) }{b_{N}^*(b_{N}^*+b_{p_{N}})} \end{aligned}$$

(38)

Substituting (36) into the last equation in (38) and replacing $f_k$ with $p_{N}$ owing to the arbitrariness of $f_k$, we have

$$\begin{aligned} J(A^*(N))-J(A^*(N+1)) \le \frac{\left( a^*_{N-1}b_{{p_{N-1}}}-b^*_{N-1}a_{{p_{N-1}}}\right) }{b_{N}^*(b_{N}^* +b_{p_{N}})}\le 0 \end{aligned}$$

(39)

which shows that $J(A^*(N))\le J(A^*(N+1))$ is true when $t=N+1$.

Consequently, by the Principle of Induction, $J(A^*(t-1))\le J(A^*(t))$ for $1 \le t\le n$. $\square$

Proof of Theorem 3

Proof

Note that the set $|B|=1$. We follow the notations in the proof procedure of Theorem 2. In the t-th iteration, assume that

$$\begin{aligned} J(A^*(t))=\frac{a_{t}^*}{b_{t}^*} \end{aligned}$$

(40)

For the $(t+1)$-th iteration, we have

$$\begin{aligned} J(A^*(t+1))=\frac{a_{t+1}^*}{b_{t+1}^*}=\frac{a_{t}^*+a_{p_{t}}}{b_{t}^*+b_{p_{t}}} \end{aligned}$$

(41)

where ${p_{t}}$ is the optimal feature index in the $(t+1)$-th iteration, or $A^*(t+1)-A^*(t)=B={f_{p_t}}$.

According to Theorem 2, we know that $J(A^*(t))-J(A^*(t+1))\le 0$, and have

$$\begin{aligned} J(A^*(t))-J(A^*(t+1))= & {} \frac{a_{t}^*b_{p_{t}}-b_{t}^*a_{p_{t}}}{b_{t}^*(b_{t}^*+b_{p_{t}})} \end{aligned}$$

(42)

Thus, according to $a_{t}^*b_{p_{t}}-b_{t}^*a_{p_{t}}\le 0$, we have

$$\begin{aligned} \frac{a_{t}^*}{b_{t}^*}\le \frac{a_{p_{t}}}{b_{p_{t}}} \end{aligned}$$

(43)

Since $a_{p_{t}}=\widetilde{{\mathbf {z}}}_{f_{p_{t}}}^T{\mathbf {L}} \widetilde{{\mathbf {z}}}_{f_{p_{t}}}=\widetilde{{\mathbf {Z}}}^T_B{\mathbf {L}} \widetilde{{\mathbf {Z}}}^T_B$, and $b_{p_{t}}=\widetilde{{\mathbf {z}}}_{f_{p_{t}}}^T{\mathbf {D}} \widetilde{{\mathbf {z}}}_{f_{p_{t}}}=\widetilde{{\mathbf {Z}}}^T_B{\mathbf {D}} \widetilde{{\mathbf {Z}}}^T_B$, (43) can be rewritten as

$$\begin{aligned} J(A^*(t))\le \frac{\widetilde{{\mathbf {Z}}}^T_B{\mathbf {L}}\widetilde{{\mathbf {Z}}}^T_B}{\widetilde{{\mathbf {Z}}}^T_B{\mathbf {D}}\widetilde{{\mathbf {Z}}}^T_B} \end{aligned}$$

(44)

which completes the proof of Theorem 3. $\square$

Proof of Theorem 5

Proof

According to the definition of neighborhood relation, we have

$$\begin{aligned} R_A^{\varepsilon }=\left\{ \left( \mathbf{x }_i,\mathbf{x }_j\right) |\Delta ^A \left( \mathbf{x }_i,\mathbf{x }_j\right) \le \varepsilon ,\left( \mathbf{x }_i,y_i\right) ,\left( \mathbf{x }_j,y_j\right) \in X_L\right\} \end{aligned}$$

(45)

and

$$\begin{aligned} R_{A^k}^{\varepsilon }=\left\{ \left( {\mathbf {x}}_i,{\mathbf {x}}_j\right) |\Delta ^{A^k} \left( \mathbf{x }_i,\mathbf{x }_j\right) \le \varepsilon ,\left( \mathbf{x }_i,y_i\right) ,\left( \mathbf{x }_j,y_j\right) \in X_L\right\} \end{aligned}$$

(46)

On the basis of $\Delta ^{A}\left( \mathbf{x }_i,\mathbf{x }_j\right)$, the distance function $\Delta ^{A^k}\left( \mathbf{x }_i,\mathbf{x }_j\right)$ can be rewritten as

$$\begin{aligned} \Delta ^{A^k}\left( \mathbf{x }_i,\mathbf{x }_j\right) =&\max _{f_q\in A^k} {\left| x_{iq}-x_{jq}\right| } \nonumber \\ =&\max \left( \Delta ^{A}\left( \mathbf{x }_i,\mathbf{x }_j\right) ,\left| x_{ik}-x_{jk} \right| \right) \end{aligned}$$

(47)

$\forall \left( {\mathbf {x}}_i,{\mathbf {x}}_j\right) \in R_{A^k}^{\varepsilon }$, we have $\Delta ^{A^k}\left( \mathbf{x }_i,\mathbf{x }_j\right) \le \varepsilon$. According to (47), we know that $\Delta ^{A}\left( \mathbf{x }_i,\mathbf{x }_j\right) \le \varepsilon$ holds true. Thus, we have $\left( {\mathbf {x}}_i,{\mathbf {x}}_j\right) \in R_{A}^{\varepsilon }$ in terms of (45). In other words, $R_{A^k}^{\varepsilon } \subseteq R_{A}^{\varepsilon }$, which completes the proof of Theorem 5. $\square$

Proof of Theorem 6

Proof

To prove the rules in Theorem 6, we can use the relationship between $\Delta ^{A^k}\left( \mathbf{x }_i,\mathbf{x }_j\right)$ and $\Delta ^{A}\left( \mathbf{x }_i,\mathbf{x }_j\right)$:

$$\begin{aligned} \Delta ^{A^k}\left( \mathbf{x }_i,\mathbf{x }_j\right) = \max \left( \Delta ^{A}\left( \mathbf{x }_i,\mathbf{x }_j\right) ,\left| x_{ik} -x_{jk}\right| \right) \end{aligned}$$

(48)

For the rule (1), we have

$$\begin{aligned}&\forall \left( {\mathbf {x}}_i,{\mathbf {x}}_j\right) \notin R_{A}^{\varepsilon } \Leftrightarrow \Delta ^{A}\left( \mathbf{x }_i,\mathbf{x }_j\right)>\varepsilon \\&\Rightarrow \Delta ^{A^k}\left( \mathbf{x }_i,\mathbf{x }_j\right) >\varepsilon \Leftrightarrow \left( {\mathbf {x}}_i,{\mathbf {x}}_j\right) \notin R_{A^k}^{\varepsilon } \end{aligned}$$

For the rule (2), we have

$$\begin{aligned}&\forall \left( {\mathbf {x}}_i,{\mathbf {x}}_j\right) \in R_{A}^{\varepsilon } \wedge \left| {x}_{ik}-x_{jk}\right|>\varepsilon \Leftrightarrow \Delta ^{A}\left( \mathbf{x }_i,\mathbf{x }_j\right) \le \varepsilon \wedge \left| {x}_{ik}-x_{jk}\right|>\varepsilon \\&\Rightarrow \Delta ^{A^k}\left( \mathbf{x }_i,\mathbf{x }_j\right) >\varepsilon \Leftrightarrow \left( {\mathbf {x}}_i,{\mathbf {x}}_j\right) \notin R_{A^k}^{\varepsilon } \end{aligned}$$

For the rule (3), we have

$$\begin{aligned}&\forall \left( {\mathbf {x}}_i,{\mathbf {x}}_j\right) \in R_{A}^{\varepsilon } \wedge \left| {x}_{ik}-x_{jk}\right| \le \varepsilon \Leftrightarrow \Delta ^{A}\left( \mathbf{x }_i,\mathbf{x }_j\right) \le \varepsilon \wedge \left| {x}_{ik}-x_{jk}\right| \le \varepsilon \\&\Rightarrow \Delta ^{A^k}\left( \mathbf{x }_i,\mathbf{x }_j\right) \le \varepsilon \Leftrightarrow \left( {\mathbf {x}}_i,{\mathbf {x}}_j\right) \in R_{A^k}^{\varepsilon } \end{aligned}$$

In summary, the three rules hold true. This completes the proof of Theorem 6. $\square$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pang, Q., Zhang, L. A recursive feature retention method for semi-supervised feature selection. Int. J. Mach. Learn. & Cyber. 12, 2639–2657 (2021). https://doi.org/10.1007/s13042-021-01346-0

Download citation

Received: 14 July 2020
Accepted: 17 May 2021
Published: 29 May 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s13042-021-01346-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A recursive feature retention method for semi-supervised feature selection

Abstract

Access this article

Similar content being viewed by others

Semi-supervised local feature selection for data classification

Semi-supervised Feature Selection Based on Cost-Sensitive and Structural Information

Iteratively local fisher score for feature selection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Proof of Theorem 1

Proof

Proof of Theorem 2

Proof

Proof of Theorem 3

Proof

Proof of Theorem 5

Proof

Proof of Theorem 6

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A recursive feature retention method for semi-supervised feature selection

Abstract

Access this article

Similar content being viewed by others

Semi-supervised local feature selection for data classification

Semi-supervised Feature Selection Based on Cost-Sensitive and Structural Information

Iteratively local fisher score for feature selection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Proof of Theorem 1

Proof

Proof of Theorem 2

Proof

Proof of Theorem 3

Proof

Proof of Theorem 5

Proof

Proof of Theorem 6

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation