Constrained feature weighting for semi-supervised learning

Chen, Xinyi; Zhang, Li; Zhao, Lei; Zhang, Xiaofang

doi:10.1007/s10489-024-05691-9

Constrained feature weighting for semi-supervised learning

Published: 31 July 2024

Volume 54, pages 9987–10006, (2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Xinyi Chen¹,
Li Zhang ORCID: orcid.org/0000-0001-7914-0679¹,
Lei Zhao¹ &
…
Xiaofang Zhang¹

224 Accesses
Explore all metrics

Abstract

Semi-supervised feature selection plays a crucial role in semi-supervised classification tasks by identifying the most informative and relevant features while discarding irrelevant or redundant features. Many semi-supervised feature selection approaches take advantage of pairwise constraints. However, these methods either encounter obstacles when attempting to automatically determine the appropriate number of features or cannot make full use of the given pairwise constraints. Thus, we propose a constrained feature weighting (CFW) approach for semi-supervised feature selection. CFW has two goals: maximizing the modified hypothesis margin related to cannot-link constraints and minimizing the must-link preserving regularization related to must-link constraints. The former makes the selected features strongly discriminative, and the latter makes similar samples with selected features more similar in the weighted feature space. In addition, L1-norm regularization is incorporated in the objective function of CFW to automatically determine the number of features. Extensive experiments are conducted on real-world datasets, and experimental results demonstrate the superior effectiveness of CFW compared to that of the existing popular supervised and semi-supervised feature selection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Algorithm 1

Semi-supervised Feature Selection Based on Cost-Sensitive and Structural Information

Weighting Based Approach for Semi-supervised Feature Selection

Feature selection considering weighted relevancy

Article 13 July 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability and material

Data is openly available in public repositories. http://archive.ics.uci.edu/ml/index.php; https://cam-orl.co.uk/facedatabase.html.

References

Sheikhpour R, Sarram MA, Gharaghani S, Chahooki MAZ (2017) A survey on semi-supervised feature selection methods. Pattern Recogn 64:141–158
Article Google Scholar
Bouchlaghem Y, Akhiat Y, Amjad S (2022) Feature selection: A review and comparative study. E3S Web of Conferences 351:01046
Pang Q, Zhang L (2020) Semi-supervised neighborhood discrimination index for feature selection. Knowl-Based Syst 204:106224
Article Google Scholar
Chen H, Chen H, Li W, Li T, Luo C, Wan J (2022) Robust dual-graph regularized and minimum redundancy based on self-representation for semi-supervised feature selection. Neurocomputing 490:104–123
Article Google Scholar
Jin L, Zhang L, Zhao L (2023) Max-difference maximization criterion: A feature selection method for text categorization. Front Comp Sci 17(1):171337
Article Google Scholar
Jin L, Zhang L, Zhao L (2023) Feature selection based on absolute deviation factor for text classification. Inform Process Manag 60(3):103251
Article Google Scholar
Li Z, Tang J (2021) Semi-supervised local feature selection for data classification. Inf Sci 64(9):192108
Google Scholar
Pang Q, Zhang L (2021) A recursive feature retention method for semi-supervised feature selection. Int J Mach Learn Cybern 12(9):2639–2657
Article Google Scholar
Tang B, Zhang L (2019) Multi-class semi-supervised logistic I-Relief feature selection based on nearest neighbor. In: Advances in knowledge discovery and data mining. pp 281–292
Tang B, Zhang L (2020) Local preserving logistic I-Relief for semi-supervised feature selection. Neurocomputing 399:48–64
Article Google Scholar
Sun Y, Todorovic S, Goodison S (2009) Local-learning-based feature selection for high-dimensional data analysis. IEEE Trans Pattern Anal Mach Intell 32(9):1610–1626
Google Scholar
Xu J, Tang B, He H, Man H (2016) Semi-supervised feature selection based on relevance and redundancy criteria. IEEE Trans Neural Netw Learn Syst 28(9):1974–1984
Article Google Scholar
Wang C, Hu Q, Wang X, Chen D, Qian Y, Dong Z (2017) Feature selection based on neighborhood discrimination index. IEEE Trans Neural Netw Learn Syst 29(7):2986–2999
MathSciNet Google Scholar
He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. Adv Neural Inf Process Syst 18:507–514
Google Scholar
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Article Google Scholar
Zhao J, Lu K, He X (2008) Locality sensitive semi-supervised feature selection. Neurocomputing 71(10–12):1842–1849
Article Google Scholar
Salmi A, Hammouche K, Macaire L (2020) Similarity-based constraint score for feature selection. Knowl-Based Syst 209:106429
Article Google Scholar
Kalakech M, Biela P, Macaire L, Hamad D (2011) Constraint scores for semi-supervised feature selection: A comparative study. Pattern Recogn Lett 32(5):656–665
Article Google Scholar
Hindawi M, Allab K, Benabdeslem K (2011) Constraint selection-based semi-supervised feature selection. In: 2011 IEEE 11th international conference on data mining. pp 1080–1085
Zhang D, Chen S, Zhou Z-H (2008) Constraint score: A new filter method for feature selection with pairwise constraints. Pattern Recogn 41(5):1440–1451
Article Google Scholar
Benabdeslem K, Hindawi M: Constrained laplacian score for semi-supervised feature selection. In: Joint European conference on machine learning and knowledge discovery in databases. pp 204–218
Hijazi S, Kalakech M, Hamad D, Kalakech A (2018) Feature selection approach based on hypothesis-margin and pairwise constraints. In: 2018 IEEE Middle East and North Africa Communications Conference, pp 1–6
Chen X, Zhang L, Zhao L (2023) Iterative constraint score based on hypothesis margin for semi-supervised feature selection. Knowl-Based Syst 271:110577
Article Google Scholar
Sun Y (2007) Iterative Relief for feature weighting: Algorithms, theories, and applications. IEEE Trans Pattern Anal Mach Intell 29(6):1035–1051
Article Google Scholar
Asuncion A, Newman D (2013) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml
Pomeroy S, Tamayo P, Gaasenbeek M, Sturla L, Angelo M, McLaughlin M, Kim J, Goumnerova L, Black P, Lau C (2002) Gene expression-based classification and outcome prediction of central nervous system embryonal tumors. Nature 415(24):436–442
Article Google Scholar
Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96(12):6745–6750
Article Google Scholar
Zhao Z, Zhang K-N, Wang Q, Li G, Zeng F, Zhang Y, Wu F, Chai R, Wang Z, Zhang C (2021) Chinese glioma genome atlas (CGGA): a comprehensive resource with functional genomic data from chinese glioma patients. Genom Proteom Bioinf 19(1):1–12
Article Google Scholar
Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang C-H, Angelo M, Ladd C, Reich M, Latulippe E, Mesirov JP (2001) Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci 98(26):15149–15154
Article Google Scholar
Su AI, Cooke MP, Ching KA, Hakak Y, Walker JR, Wiltshire T, Orth AP, Vega RG, Sapinoso LM, Moqrich A (2002) Large-scale analysis of the human and mouse transcriptomes. Proc Natl Acad Sci 99(7):4465–4470
Article Google Scholar
Gross R (2005) Face databases. In: Handbook of face recognition. Springer, Pittsburgh, USA pp 301–327
Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209
Article Google Scholar
Yeoh E-J, Ross ME, Shurtleff SA, Williams WK, Patel D, Mahfouz R, Behm FG, Raimondi SC, Relling MV, Patel A (2002) Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1(2):133–143
Article Google Scholar
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701
Article Google Scholar
Lai J, Chen H, Li W, Li T, Wan J (2022) Semi-supervised feature selection via adaptive structure learning and constrained graph learning. Knowl-Based Syst 251:109243
Yi Y, Zhang H, Zhang N, Zhou W, Huang X, Xie G, Zheng C (2024) SFS-AGGL: Semi-supervised feature selection integrating adaptive graph with global and local information. Information 15(1):57
Article Google Scholar
Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56(293):52–64
Article MathSciNet Google Scholar
Chen H, Tiňo P, Yao X (2009) Predictive ensemble pruning by expectation propagation. IEEE Trans Knowl Data Eng 21(7):999–1013
Article Google Scholar
Huang X, Zhang L, Wang B, Li F, Zhang Z (2018) Feature clustering-based support vector machine recursive feature elimination for gene selection. Appl Intell 48(3):594–607
Article Google Scholar

Download references

Funding

This work was supported in part by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant No. 19KJA550002, by the Six Talent Peak Project of Jiangsu Province of China under Grant No. XYDXX-054, and by the Priority Academic Program Development of Jiangsu Higher Education Institutions.

Author information

Authors and Affiliations

School of Computer Science and Technology, Soochow University, Suzhou, 215006, Jiangsu, China
Xinyi Chen, Li Zhang, Lei Zhao & Xiaofang Zhang

Authors

Xinyi Chen
View author publications
You can also search for this author inPubMed Google Scholar
Li Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Lei Zhao
View author publications
You can also search for this author inPubMed Google Scholar
Xiaofang Zhang
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Xinyi Chen: Conceptualization, Methodology, Software, Validation, Formal analysis, Writing - original draft; Li Zhang: Writing- reviewing & editing, Supervision, Project administration; Lei Zhao: Supervision, Project administration; Xiaofang Zhang: Supervision, Project administration.

Corresponding author

Correspondence to Li Zhang.

Ethics declarations

Conflicts of interest

We declare that there have been no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethics approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Consent to participate

All authors agreed to participate.

Consent for publication

All authors have consented to publication.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proof of Theorem 1

It is well known that a function defined on an open set is convex if and only if its Hessian matrix is positive semi-definite. Therefore, to prove the convexity of the function $J_{R}\left( \textbf{w} \right) $ in (19), we must demonstrate that its Hessian matrix is positive semi-definite.

Because the Laplacian matrix $\textbf{L}^{\mathcal {M}}$ is symmetric and positive semi-definite, $\textbf{Q}=\textbf{F}^T \textbf{L}^{\mathcal {M}} \textbf{F}$ is also symmetric and positive semi-definite. Therefore, we can express the must-link preserving regularization as $J_{R}\left( \textbf{w} \right) =\text {trace} \left( 2\textbf{M}^T \textbf{Q} \textbf{M} \right) $, as shown in (21).

Then, the first partial derivative of $J_{R}\left( \textbf{w} \right) $ with respect to $w_r$ can be calculated as follows:

$$\begin{aligned} \frac{\partial J_{R}\left( \textbf{w} \right) }{\partial w_r} = 4 w_r Q_{rr}, ~~~~r=1,...,d \end{aligned}$$

(A1)

where $Q_{rr}$ is the element in the r-th row and r-th column of $\textbf{Q}$. The second partial derivative of $J_{R}\left( \textbf{w} \right) $ with respect to $w_s$ can be expressed as

$$\begin{aligned} \frac{\partial ^{2} J_{R}\left( \textbf{w}\right) }{\partial w_{r} \partial w_{s}}=\left\{ \begin{array}{ll} 4 Q_{rr}, & \text{ if } r=s \\ 0, & \text{ otherwise } \end{array}\right. \end{aligned}$$

(A2)

Therefore, the Hessian matrix $\textbf{H}$ of $J_{R}\left( \textbf{w}\right) $ is a diagonal matrix, where the diagonal elements are $H_{rr}=4Q_{rr}$.

Since $\textbf{Q}$ is positive semi-definite, we have that $Q_{rr} \ge 0$. As a result, the Hessian matrix $\textbf{H}$ of $J_{R}\left( \textbf{w}\right) $ is positive semi-definite. In other words, $J_{R}\left( \textbf{w} \right) $ is a convex function of $\textbf{w}$ when $\textbf{w} \ge 0 $. This concludes the proof.

Appendix B: Proof of Theorem 2

Let $J_{1}\left( \textbf{w}\right) =\log \left( 1+\exp \left( -\textbf{w}^{T} {\textbf{z}}\right) \right) $ and $J_{2}\left( \textbf{w}\right) =\lambda _{1}\Vert \textbf{w}\Vert _{1}$, then the objective function (24) can be rewritten as:

$$\begin{aligned} J\left( \textbf{w}\right) =J_{1}\left( \textbf{w}\right) + \lambda _{1}J_{2}\left( \textbf{w}\right) + \lambda _{2}J_{R}\left( \textbf{w} \right) \end{aligned}$$

(B3)

According to the properties of convex functions, $J\left( \textbf{w}\right) $ is a convex function if and only if $J_{1}\left( \textbf{w}\right) $, $J_{2}\left( \textbf{w}\right) $, and $J_{R}\left( \textbf{w} \right) $ are convex functions. Theorem 1 states that $J_{R}\left( \textbf{w} \right) $ is a convex function. Now, we need to prove that the other two functions are also convex. Following the approach used to prove Theorem 1, we simply need to demonstrate that the Hessian matrices of both $J_{1}\left( \textbf{w}\right) $ and $J_{2}\left( \textbf{w}\right) $ are positive semi-definite.

We start by calculating the first and second partial derivatives of $J_{1}\left( \textbf{w}\right) $ with respect to $\textbf{w}$, as shown below

$$\begin{aligned} \frac{\partial J_{1}\left( \textbf{w}\right) }{\partial \textbf{w}}= -\frac{\exp \left( -\textbf{w}^{T} {\textbf{z}}\right) }{1+\exp \left( -\textbf{w}^{T} {\textbf{z}}\right) } {\textbf{z}} \end{aligned}$$

(B4)

and

$$\begin{aligned} \frac{\partial ^{2} J_{1}\left( \textbf{w}\right) }{\partial ^{2} \textbf{w}}= \frac{\exp \left( -\textbf{w}^{T} {\textbf{z}}\right) }{\left( 1+\exp \left( -\textbf{w}^{T} {\textbf{z}}\right) \right) ^{2}} {\textbf{z}} {\textbf{z}}^{T} \end{aligned}$$

(B5)

Without loss of generality, let $c=\sqrt{\frac{\exp \left( -\textbf{w}^{T} \textbf{z}\right) }{\left( 1+\exp \left( -\textbf{w}^{T} {\textbf{z}}\right) \right) ^{2}}}$. Substituting c into (B5), we have

$$\begin{aligned} \frac{\partial ^{2} J_{1}\left( \textbf{w}\right) }{\partial ^{2} \textbf{w}} =\left( c {\textbf{z}}\right) \left( c {\textbf{z}}\right) ^{T} =\textbf{H}_1 \end{aligned}$$

(B6)

where $\textbf{H}_1$ is the Hessian matrix of $J_{1}(\textbf{w})$. Because $\textbf{H}_1$ can be regarded as the outer product of a column vector $c{\textbf{z}} $ and its own transpose vector $\left( c {\textbf{z}}\right) ^{T}$. So $\textbf{H}_1$ is a matrix of rank 1 with only one non-zero eigenvalue. It can be calculated that the non-zero eigenvalue of matrix $\textbf{H}_1$ is $c^2\left\| \textbf{z}\right\| ^2$, which is greater than 0. In this case, the Hessian matrix of (B6) is positive semi-definite. Therefore, $J_{1}\left( \textbf{w}\right) $ is a convex function.

As for $J_{2}\left( \textbf{w}\right) $, we have

$$\begin{aligned} \frac{\partial J_{2}\left( \textbf{w} \right) }{\partial w_r} = 1, ~~~~ r=1,...,d \end{aligned}$$

(B7)

and

$$\begin{aligned} \frac{\partial ^{2} J_{2}\left( \textbf{w}\right) }{\partial w_{r} \partial w_{s}}=0, ~~~~ r,s=1,...,d. \end{aligned}$$

(B8)

Thus, the Hessian matrix of $J_{2}\left( \textbf{w} \right) $ is a matrix with all zeros, which means that the Hessian matrix of $ J_{2}\left( \textbf{w} \right) $ is positive semi-definite. Hence, $ J_{2}\left( \textbf{w} \right) $ is a also convex function.

In summary, $J_{1}\left( \textbf{w} \right) $, $J_{2}\left( \textbf{w} \right) $ and $J_{R}\left( \textbf{w} \right) $ are convex functions. Thus, $J\left( \textbf{w} \right) $ is a convex function. This completes the proof.

Appendix C: Proof of Theorem 3

Note that $\textbf{z}(t)$ is updated by $\textbf{w}(t-1)$ in the t-th iteration. When $\textbf{z}(t)$ and $\textbf{w}(t-1)$ are given, $\textbf{w}(t)$ is updated using the gradient descent scheme in (27) and the truncation rule in (28). Consequently, the objective function achieves its minimum $J(\textbf{w}(t) \mid \textbf{z}(t))$ for fixed $\textbf{z}(t)$. In other words,

$$\begin{aligned} J(\textbf{w}(t) \mid \textbf{z}(t)) \le J(\textbf{w}(t-1) \mid \textbf{z}(t)) \end{aligned}$$

(C9)

which completes the proof of Theorem 3.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, X., Zhang, L., Zhao, L. et al. Constrained feature weighting for semi-supervised learning. Appl Intell 54, 9987–10006 (2024). https://doi.org/10.1007/s10489-024-05691-9

Download citation

Accepted: 14 July 2024
Published: 31 July 2024
Issue Date: October 2024
DOI: https://doi.org/10.1007/s10489-024-05691-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Constrained feature weighting for semi-supervised learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semi-supervised Feature Selection Based on Cost-Sensitive and Structural Information

Weighting Based Approach for Semi-supervised Feature Selection

Feature selection considering weighted relevancy

Explore related subjects

Data availability and material

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Appendices

Appendices

Appendix A: Proof of Theorem 1

Appendix B: Proof of Theorem 2

Appendix C: Proof of Theorem 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now