Skip to main content
Log in

Constrained feature weighting for semi-supervised learning

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Semi-supervised feature selection plays a crucial role in semi-supervised classification tasks by identifying the most informative and relevant features while discarding irrelevant or redundant features. Many semi-supervised feature selection approaches take advantage of pairwise constraints. However, these methods either encounter obstacles when attempting to automatically determine the appropriate number of features or cannot make full use of the given pairwise constraints. Thus, we propose a constrained feature weighting (CFW) approach for semi-supervised feature selection. CFW has two goals: maximizing the modified hypothesis margin related to cannot-link constraints and minimizing the must-link preserving regularization related to must-link constraints. The former makes the selected features strongly discriminative, and the latter makes similar samples with selected features more similar in the weighted feature space. In addition, L1-norm regularization is incorporated in the objective function of CFW to automatically determine the number of features. Extensive experiments are conducted on real-world datasets, and experimental results demonstrate the superior effectiveness of CFW compared to that of the existing popular supervised and semi-supervised feature selection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability and material

Data is openly available in public repositories. http://archive.ics.uci.edu/ml/index.php; https://cam-orl.co.uk/facedatabase.html.

References

  1. Sheikhpour R, Sarram MA, Gharaghani S, Chahooki MAZ (2017) A survey on semi-supervised feature selection methods. Pattern Recogn 64:141–158

    Article  Google Scholar 

  2. Bouchlaghem Y, Akhiat Y, Amjad S (2022) Feature selection: A review and comparative study. E3S Web of Conferences 351:01046

  3. Pang Q, Zhang L (2020) Semi-supervised neighborhood discrimination index for feature selection. Knowl-Based Syst 204:106224

    Article  Google Scholar 

  4. Chen H, Chen H, Li W, Li T, Luo C, Wan J (2022) Robust dual-graph regularized and minimum redundancy based on self-representation for semi-supervised feature selection. Neurocomputing 490:104–123

    Article  Google Scholar 

  5. Jin L, Zhang L, Zhao L (2023) Max-difference maximization criterion: A feature selection method for text categorization. Front Comp Sci 17(1):171337

    Article  Google Scholar 

  6. Jin L, Zhang L, Zhao L (2023) Feature selection based on absolute deviation factor for text classification. Inform Process Manag 60(3):103251

    Article  Google Scholar 

  7. Li Z, Tang J (2021) Semi-supervised local feature selection for data classification. Inf Sci 64(9):192108

    Google Scholar 

  8. Pang Q, Zhang L (2021) A recursive feature retention method for semi-supervised feature selection. Int J Mach Learn Cybern 12(9):2639–2657

    Article  Google Scholar 

  9. Tang B, Zhang L (2019) Multi-class semi-supervised logistic I-Relief feature selection based on nearest neighbor. In: Advances in knowledge discovery and data mining. pp 281–292

  10. Tang B, Zhang L (2020) Local preserving logistic I-Relief for semi-supervised feature selection. Neurocomputing 399:48–64

    Article  Google Scholar 

  11. Sun Y, Todorovic S, Goodison S (2009) Local-learning-based feature selection for high-dimensional data analysis. IEEE Trans Pattern Anal Mach Intell 32(9):1610–1626

    Google Scholar 

  12. Xu J, Tang B, He H, Man H (2016) Semi-supervised feature selection based on relevance and redundancy criteria. IEEE Trans Neural Netw Learn Syst 28(9):1974–1984

    Article  Google Scholar 

  13. Wang C, Hu Q, Wang X, Chen D, Qian Y, Dong Z (2017) Feature selection based on neighborhood discrimination index. IEEE Trans Neural Netw Learn Syst 29(7):2986–2999

    MathSciNet  Google Scholar 

  14. He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. Adv Neural Inf Process Syst 18:507–514

    Google Scholar 

  15. Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

    Article  Google Scholar 

  16. Zhao J, Lu K, He X (2008) Locality sensitive semi-supervised feature selection. Neurocomputing 71(10–12):1842–1849

    Article  Google Scholar 

  17. Salmi A, Hammouche K, Macaire L (2020) Similarity-based constraint score for feature selection. Knowl-Based Syst 209:106429

    Article  Google Scholar 

  18. Kalakech M, Biela P, Macaire L, Hamad D (2011) Constraint scores for semi-supervised feature selection: A comparative study. Pattern Recogn Lett 32(5):656–665

    Article  Google Scholar 

  19. Hindawi M, Allab K, Benabdeslem K (2011) Constraint selection-based semi-supervised feature selection. In: 2011 IEEE 11th international conference on data mining. pp 1080–1085

  20. Zhang D, Chen S, Zhou Z-H (2008) Constraint score: A new filter method for feature selection with pairwise constraints. Pattern Recogn 41(5):1440–1451

    Article  Google Scholar 

  21. Benabdeslem K, Hindawi M: Constrained laplacian score for semi-supervised feature selection. In: Joint European conference on machine learning and knowledge discovery in databases. pp 204–218

  22. Hijazi S, Kalakech M, Hamad D, Kalakech A (2018) Feature selection approach based on hypothesis-margin and pairwise constraints. In: 2018 IEEE Middle East and North Africa Communications Conference, pp 1–6

  23. Chen X, Zhang L, Zhao L (2023) Iterative constraint score based on hypothesis margin for semi-supervised feature selection. Knowl-Based Syst 271:110577

    Article  Google Scholar 

  24. Sun Y (2007) Iterative Relief for feature weighting: Algorithms, theories, and applications. IEEE Trans Pattern Anal Mach Intell 29(6):1035–1051

    Article  Google Scholar 

  25. Asuncion A, Newman D (2013) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml

  26. Pomeroy S, Tamayo P, Gaasenbeek M, Sturla L, Angelo M, McLaughlin M, Kim J, Goumnerova L, Black P, Lau C (2002) Gene expression-based classification and outcome prediction of central nervous system embryonal tumors. Nature 415(24):436–442

    Article  Google Scholar 

  27. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96(12):6745–6750

    Article  Google Scholar 

  28. Zhao Z, Zhang K-N, Wang Q, Li G, Zeng F, Zhang Y, Wu F, Chai R, Wang Z, Zhang C (2021) Chinese glioma genome atlas (CGGA): a comprehensive resource with functional genomic data from chinese glioma patients. Genom Proteom Bioinf 19(1):1–12

    Article  Google Scholar 

  29. Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang C-H, Angelo M, Ladd C, Reich M, Latulippe E, Mesirov JP (2001) Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci 98(26):15149–15154

    Article  Google Scholar 

  30. Su AI, Cooke MP, Ching KA, Hakak Y, Walker JR, Wiltshire T, Orth AP, Vega RG, Sapinoso LM, Moqrich A (2002) Large-scale analysis of the human and mouse transcriptomes. Proc Natl Acad Sci 99(7):4465–4470

    Article  Google Scholar 

  31. Gross R (2005) Face databases. In: Handbook of face recognition. Springer, Pittsburgh, USA pp 301–327

  32. Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209

    Article  Google Scholar 

  33. Yeoh E-J, Ross ME, Shurtleff SA, Williams WK, Patel D, Mahfouz R, Behm FG, Raimondi SC, Relling MV, Patel A (2002) Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1(2):133–143

    Article  Google Scholar 

  34. Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701

    Article  Google Scholar 

  35. Lai J, Chen H, Li W, Li T, Wan J (2022) Semi-supervised feature selection via adaptive structure learning and constrained graph learning. Knowl-Based Syst 251:109243

  36. Yi Y, Zhang H, Zhang N, Zhou W, Huang X, Xie G, Zheng C (2024) SFS-AGGL: Semi-supervised feature selection integrating adaptive graph with global and local information. Information 15(1):57

    Article  Google Scholar 

  37. Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56(293):52–64

    Article  MathSciNet  Google Scholar 

  38. Chen H, Tiňo P, Yao X (2009) Predictive ensemble pruning by expectation propagation. IEEE Trans Knowl Data Eng 21(7):999–1013

    Article  Google Scholar 

  39. Huang X, Zhang L, Wang B, Li F, Zhang Z (2018) Feature clustering-based support vector machine recursive feature elimination for gene selection. Appl Intell 48(3):594–607

    Article  Google Scholar 

Download references

Funding

This work was supported in part by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant No. 19KJA550002, by the Six Talent Peak Project of Jiangsu Province of China under Grant No. XYDXX-054, and by the Priority Academic Program Development of Jiangsu Higher Education Institutions.

Author information

Authors and Affiliations

Authors

Contributions

Xinyi Chen: Conceptualization, Methodology, Software, Validation, Formal analysis, Writing - original draft; Li Zhang: Writing- reviewing & editing, Supervision, Project administration; Lei Zhao: Supervision, Project administration; Xiaofang Zhang: Supervision, Project administration.

Corresponding author

Correspondence to Li Zhang.

Ethics declarations

Conflicts of interest

We declare that there have been no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethics approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Consent to participate

All authors agreed to participate.

Consent for publication

All authors have consented to publication.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendices

Appendix A: Proof of Theorem 1

It is well known that a function defined on an open set is convex if and only if its Hessian matrix is positive semi-definite. Therefore, to prove the convexity of the function \(J_{R}\left( \textbf{w} \right) \) in (19), we must demonstrate that its Hessian matrix is positive semi-definite.

Because the Laplacian matrix \(\textbf{L}^{\mathcal {M}}\) is symmetric and positive semi-definite, \(\textbf{Q}=\textbf{F}^T \textbf{L}^{\mathcal {M}} \textbf{F}\) is also symmetric and positive semi-definite. Therefore, we can express the must-link preserving regularization as \(J_{R}\left( \textbf{w} \right) =\text {trace} \left( 2\textbf{M}^T \textbf{Q} \textbf{M} \right) \), as shown in (21).

Then, the first partial derivative of \(J_{R}\left( \textbf{w} \right) \) with respect to \(w_r\) can be calculated as follows:

$$\begin{aligned} \frac{\partial J_{R}\left( \textbf{w} \right) }{\partial w_r} = 4 w_r Q_{rr}, ~~~~r=1,...,d \end{aligned}$$
(A1)

where \(Q_{rr}\) is the element in the r-th row and r-th column of \(\textbf{Q}\). The second partial derivative of \(J_{R}\left( \textbf{w} \right) \) with respect to \(w_s\) can be expressed as

$$\begin{aligned} \frac{\partial ^{2} J_{R}\left( \textbf{w}\right) }{\partial w_{r} \partial w_{s}}=\left\{ \begin{array}{ll} 4 Q_{rr}, & \text{ if } r=s \\ 0, & \text{ otherwise } \end{array}\right. \end{aligned}$$
(A2)

Therefore, the Hessian matrix \(\textbf{H}\) of \(J_{R}\left( \textbf{w}\right) \) is a diagonal matrix, where the diagonal elements are \(H_{rr}=4Q_{rr}\).

Since \(\textbf{Q}\) is positive semi-definite, we have that \(Q_{rr} \ge 0\). As a result, the Hessian matrix \(\textbf{H}\) of \(J_{R}\left( \textbf{w}\right) \) is positive semi-definite. In other words, \(J_{R}\left( \textbf{w} \right) \) is a convex function of \(\textbf{w}\) when \(\textbf{w} \ge 0 \). This concludes the proof.

Appendix B: Proof of Theorem 2

Let \(J_{1}\left( \textbf{w}\right) =\log \left( 1+\exp \left( -\textbf{w}^{T} {\textbf{z}}\right) \right) \) and \(J_{2}\left( \textbf{w}\right) =\lambda _{1}\Vert \textbf{w}\Vert _{1}\), then the objective function (24) can be rewritten as:

$$\begin{aligned} J\left( \textbf{w}\right) =J_{1}\left( \textbf{w}\right) + \lambda _{1}J_{2}\left( \textbf{w}\right) + \lambda _{2}J_{R}\left( \textbf{w} \right) \end{aligned}$$
(B3)

According to the properties of convex functions, \(J\left( \textbf{w}\right) \) is a convex function if and only if \(J_{1}\left( \textbf{w}\right) \), \(J_{2}\left( \textbf{w}\right) \), and \(J_{R}\left( \textbf{w} \right) \) are convex functions. Theorem 1 states that \(J_{R}\left( \textbf{w} \right) \) is a convex function. Now, we need to prove that the other two functions are also convex. Following the approach used to prove Theorem 1, we simply need to demonstrate that the Hessian matrices of both \(J_{1}\left( \textbf{w}\right) \) and \(J_{2}\left( \textbf{w}\right) \) are positive semi-definite.

We start by calculating the first and second partial derivatives of \(J_{1}\left( \textbf{w}\right) \) with respect to \(\textbf{w}\), as shown below

$$\begin{aligned} \frac{\partial J_{1}\left( \textbf{w}\right) }{\partial \textbf{w}}= -\frac{\exp \left( -\textbf{w}^{T} {\textbf{z}}\right) }{1+\exp \left( -\textbf{w}^{T} {\textbf{z}}\right) } {\textbf{z}} \end{aligned}$$
(B4)

and

$$\begin{aligned} \frac{\partial ^{2} J_{1}\left( \textbf{w}\right) }{\partial ^{2} \textbf{w}}= \frac{\exp \left( -\textbf{w}^{T} {\textbf{z}}\right) }{\left( 1+\exp \left( -\textbf{w}^{T} {\textbf{z}}\right) \right) ^{2}} {\textbf{z}} {\textbf{z}}^{T} \end{aligned}$$
(B5)

Without loss of generality, let \(c=\sqrt{\frac{\exp \left( -\textbf{w}^{T} \textbf{z}\right) }{\left( 1+\exp \left( -\textbf{w}^{T} {\textbf{z}}\right) \right) ^{2}}}\). Substituting c into (B5), we have

$$\begin{aligned} \frac{\partial ^{2} J_{1}\left( \textbf{w}\right) }{\partial ^{2} \textbf{w}} =\left( c {\textbf{z}}\right) \left( c {\textbf{z}}\right) ^{T} =\textbf{H}_1 \end{aligned}$$
(B6)

where \(\textbf{H}_1\) is the Hessian matrix of \(J_{1}(\textbf{w})\). Because \(\textbf{H}_1\) can be regarded as the outer product of a column vector \(c{\textbf{z}} \) and its own transpose vector \(\left( c {\textbf{z}}\right) ^{T}\). So \(\textbf{H}_1\) is a matrix of rank 1 with only one non-zero eigenvalue. It can be calculated that the non-zero eigenvalue of matrix \(\textbf{H}_1\) is \(c^2\left\| \textbf{z}\right\| ^2\), which is greater than 0. In this case, the Hessian matrix of (B6) is positive semi-definite. Therefore, \(J_{1}\left( \textbf{w}\right) \) is a convex function.

As for \(J_{2}\left( \textbf{w}\right) \), we have

$$\begin{aligned} \frac{\partial J_{2}\left( \textbf{w} \right) }{\partial w_r} = 1, ~~~~ r=1,...,d \end{aligned}$$
(B7)

and

$$\begin{aligned} \frac{\partial ^{2} J_{2}\left( \textbf{w}\right) }{\partial w_{r} \partial w_{s}}=0, ~~~~ r,s=1,...,d. \end{aligned}$$
(B8)

Thus, the Hessian matrix of \(J_{2}\left( \textbf{w} \right) \) is a matrix with all zeros, which means that the Hessian matrix of \( J_{2}\left( \textbf{w} \right) \) is positive semi-definite. Hence, \( J_{2}\left( \textbf{w} \right) \) is a also convex function.

In summary, \(J_{1}\left( \textbf{w} \right) \), \(J_{2}\left( \textbf{w} \right) \) and \(J_{R}\left( \textbf{w} \right) \) are convex functions. Thus, \(J\left( \textbf{w} \right) \) is a convex function. This completes the proof.

Appendix C: Proof of Theorem 3

Note that \(\textbf{z}(t)\) is updated by \(\textbf{w}(t-1)\) in the t-th iteration. When \(\textbf{z}(t)\) and \(\textbf{w}(t-1)\) are given, \(\textbf{w}(t)\) is updated using the gradient descent scheme in (27) and the truncation rule in (28). Consequently, the objective function achieves its minimum \(J(\textbf{w}(t) \mid \textbf{z}(t))\) for fixed \(\textbf{z}(t)\). In other words,

$$\begin{aligned} J(\textbf{w}(t) \mid \textbf{z}(t)) \le J(\textbf{w}(t-1) \mid \textbf{z}(t)) \end{aligned}$$
(C9)

which completes the proof of Theorem 3.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, X., Zhang, L., Zhao, L. et al. Constrained feature weighting for semi-supervised learning. Appl Intell 54, 9987–10006 (2024). https://doi.org/10.1007/s10489-024-05691-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-024-05691-9

Keywords