Skip to main content

Advertisement

Robust unsupervised feature selection based on matrix factorization with adaptive loss via bi-stochastic graph regularization

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Unsupervised feature selection (UFS) has gained increasing attention and research interest in various domains, such as machine learning and data mining. Recently, numerous matrix factorization-based methods have been widely adopted for UFS. However, the following issues still exist. First, most methods based on matrix factorization use the squared Frobenius-norm to measure the loss term, making them sensitive to outliers. Although using \( \varvec{l}_{\varvec{2,1}} \)-norm-based loss improves the robustness, it is sensitive to small loss. Second, most existing matrix factorization-based methods utilize a fixed graph to preserve the local structure of data, making them fail to capture the intrinsic local structure accurately. To address the above drawbacks, we present a novel robust UFS model named matrix factorization with adaptive loss via bi-stochastic graph regularization (MFALBS). Specifically, MFALBS decomposes the projected data points into two orthogonal matrices to facilitate discriminative feature selection. Meanwhile, MFALBS utilizes an adaptive loss function for measuring the loss term in order to enhance model robustness. Moreover, MFALBS adopts an adaptive learning strategy for optimal bi-stochastic graph learning in order to improve local structure preservation ability. Finally, an optimization algorithm is proposed to solve the proposed model. Experimental results on real-world datasets show the effectiveness and superiority of MFALBS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability and Access

Data will be made available on request.

Notes

  1. https://jundongl.github.io/scikit-feature/datasets.html.

  2. The code and datasets will be made available at: https://github.com/xiangfasong/MFALBS.

References

  1. Georgiou T, Liu Y, Chen W, Lew MS (2020) A survey of traditional and deep learning-based feature descriptors for high dimensional data in computer vision. Int J Multim Inf Retr 9(3):135–170. https://doi.org/10.1007/S13735-019-00183-W

    Article  MATH  Google Scholar 

  2. Bommert A, Welchowski T, Schmid M, Rahnenführer J (2022) Benchmark of filter methods for feature selection in high-dimensional gene expression survival data. Briefings Bioinform 23(1). https://doi.org/10.1093/BIB/BBAB354

  3. Mostafa RR, Khedr AM, Aghbari ZA, Afyouni I, Kamel I, Ahmed N (2024) An adaptive hybrid mutated differential evolution feature selection method for low and high-dimensional medical datasets. Knowl Based Syst 283:111218. https://doi.org/10.1016/J.KNOSYS.2023.111218

  4. Omuya EO, Okeyo GO, Kimwele MW (2021) Feature selection for classification using principal component analysis and information gain. Expert Syst Appl 174:114765. https://doi.org/10.1016/J.ESWA.2021.114765

  5. Liu T, Lu Y, Zhu B, Zhao H (2023) Clustering high-dimensional data via feature selection. Biometrics 79(2):940–950. https://doi.org/10.1111/biom.13665

    Article  MathSciNet  MATH  Google Scholar 

  6. Lu X, Long J, Wen J, Fei L, Zhang B, Xu Y (2022) Locality preserving projection with symmetric graph embedding for unsupervised dimensionality reduction. Pattern Recognit 131:108844. https://doi.org/10.1016/J.PATCOG.2022.108844

  7. Zhou J, Zhang Q, Zeng S, Zhang B, Fang L (2024) Latent linear discriminant analysis for feature extraction via isometric structural learning. Pattern Recognit 149:110218. https://doi.org/10.1016/J.PATCOG.2023.110218

  8. He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. In: International conference on neural information processing systems, pp 507–514

  9. Miao J, Zhao J, Yang T, Fan C, Tian Y, Shi Y, Xu M (2024) Explicit unsupervised feature selection based on structured graph and locally linear embedding. Expert Syst Appl 255:124568. https://doi.org/10.1016/J.ESWA.2024.124568

  10. Xie X, Cao Z, Sun F (2023) Joint learning of graph and latent representation for unsupervised feature selection. Appl Intell 53(21):25282–25295. https://doi.org/10.1007/S10489-023-04893-X

    Article  MATH  Google Scholar 

  11. Mozafari M, Seyedi SA, Mohammadiani RP, Tab FA (2024) Unsupervised feature selection using orthogonal encoder-decoder factorization. Inf Sci 663:120277. https://doi.org/10.1016/J.INS.2024.120277

  12. Lin X, Guan J, Chen B, Zeng Y (2022) Unsupervised feature selection via orthogonal basis clustering and local structure preserving. IEEE Trans Neural Networks Learn Syst 33(11):6881–6892. https://doi.org/10.1109/TNNLS.2021.3083763

    Article  MathSciNet  MATH  Google Scholar 

  13. Moslemi A, Ahmadian A (2023) Dual regularized subspace learning using adaptive graph learning and rank constraint: Unsupervised feature selection on gene expression microarray datasets. Comput Biol Medicine 167:107659. https://doi.org/10.1016/J.COMPBIOMED.2023.107659

  14. Wang X, Wu P, Xu Q, Zeng Z, Xie Y (2021) Joint image clustering and feature selection with auto-adjoined learning for high-dimensional data. Knowl Based Syst 232:107443. https://doi.org/10.1016/J.KNOSYS.2021.107443

  15. Dhal P, Azad C (2024) A fine-tuning deep learning with multi-objective-based feature selection approach for the classification of text. Neural Comput Appl 36(7):3525–3553. https://doi.org/10.1007/S00521-023-09225-1

    Article  MATH  Google Scholar 

  16. Wu X, Xu X, Liu J, Wang H, Hu B, Nie F (2021) Supervised feature selection with orthogonal regression and feature weighting. IEEE Trans Neural Networks Learn Syst 32(5):1831–1838. https://doi.org/10.1109/TNNLS.2020.2991336

    Article  MathSciNet  MATH  Google Scholar 

  17. Li Z, Nie F, Wu D, Wang Z, Li X (2024) Sparse trace ratio LDA for supervised feature selection. IEEE Trans Cybern 54(4):2420–2433. https://doi.org/10.1109/TCYB.2023.3264907

    Article  MATH  Google Scholar 

  18. Lai J, Chen H, Li T, Yang X (2022) Adaptive graph learning for semi-supervised feature selection with redundancy minimization. Inf Sci 609:465–488. https://doi.org/10.1016/J.INS.2022.07.102

    Article  MATH  Google Scholar 

  19. Sheikhpour R, Berahmand K, Forouzandeh S (2023) Hessian-based semi-supervised feature selection using generalized uncorrelated constraint. Knowl Based Syst 269:110521. https://doi.org/10.1016/J.KNOSYS.2023.110521

  20. Li Z, Yang Y, Liu J, Zhou X, Lu H (2012) Unsupervised feature selection using nonnegative spectral analysis. In: International conference on artificial intelligence, pp 1026–1032. https://doi.org/10.1609/AAAI.V26I1.8289

  21. Wang C, Wang J, Gu Z, Wei J, Liu J (2024) Unsupervised feature selection by learning exponential weights. Pattern Recognit 148:110183. https://doi.org/10.1016/J.PATCOG.2023.110183

  22. Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2018) Feature selection: A data perspective. ACM Comput Surv 50(6):94–19445. https://doi.org/10.1145/3136625

    Article  MATH  Google Scholar 

  23. Solorio-Fernández S, Carrasco-Ochoa JA, Martínez-Trinidad JF (2020) A review of unsupervised feature selection methods. Artif Intell Rev 53(2):907–948. https://doi.org/10.1007/S10462-019-09682-Y

    Article  MATH  Google Scholar 

  24. Hou C, Nie F, Li X, Yi D, Wu Y (2014) Joint embedding learning and sparse regression: A framework for unsupervised feature selection. IEEE Trans Cybern 44(6):793–804. https://doi.org/10.1109/TCYB.2013.2272642

    Article  MATH  Google Scholar 

  25. Zhu P, Zuo W, Zhang L, Hu Q, Shiu SCK (2015) Unsupervised feature selection by regularized self-representation. Pattern Recognit 48(2):438–446. https://doi.org/10.1016/J.PATCOG.2014.08.006

    Article  MATH  Google Scholar 

  26. Liu Y, Ye D, Li W, Wang H, Gao Y (2020) Robust neighborhood embedding for unsupervised feature selection. Knowl Based Syst 193:105462. https://doi.org/10.1016/J.KNOSYS.2019.105462

  27. Du L, Shen Y (2015) Unsupervised feature selection with adaptive structure learning. In: International conference on knowledge discovery and data mining, pp 209–218. https://doi.org/10.1145/2783258.2783345

  28. Nie F, Zhu W, Li X (2016) Unsupervised feature selection with structured graph optimization. In: International conference on artificial intelligence, pp 1302–1308. https://doi.org/10.1609/AAAI.V30I1.10168

  29. Huang Y, Shen Z, Cai F, Li T, Lv F (2021) Adaptive graph-based generalized regression model for unsupervised feature selection. Knowl Based Syst 227:107156. https://doi.org/10.1016/J.KNOSYS.2021.107156

  30. Zhao H, Li Q, Wang Z, Nie F (2022) Joint adaptive graph learning and discriminative analysis for unsupervised feature selection. Cogn Comput 14(3):1211–1221. https://doi.org/10.1007/S12559-021-09875-0

    Article  MATH  Google Scholar 

  31. Tang C, Zheng X, Zhang W, Liu X, Zhu X, Zhu E (2023) Unsupervised feature selection via multiple graph fusion and feature weight learning. Sci China Inf Sci 66(5). https://doi.org/10.1007/S11432-022-3579-1

  32. Zhou Q, Wang Q, Gao Q, Yang M, Gao X (2024) Unsupervised discriminative feature selection via contrastive graph learning. IEEE Trans Image Process 33:972–986. https://doi.org/10.1109/TIP.2024.3353572

    Article  MATH  Google Scholar 

  33. Wang S, Pedrycz W, Zhu Q, Zhu W (2015) Subspace learning for unsupervised feature selection via matrix factorization. Pattern Recognit 48(1):10–19. https://doi.org/10.1016/J.PATCOG.2014.08.004

    Article  MATH  Google Scholar 

  34. Shang R, Wang W, Stolkin R, Jiao L (2016) Subspace learning-based graph regularized feature selection. Knowl Based Syst 112:152–165. https://doi.org/10.1016/J.KNOSYS.2016.09.006

    Article  MATH  Google Scholar 

  35. Shang R, Xu K, Shang F, Jiao L (2020) Sparse and low-redundant subspace learning-based dual-graph regularized robust feature selection. Knowl Based Syst 187. https://doi.org/10.1016/J.KNOSYS.2019.07.001

  36. Han D, Kim J (2015) Unsupervised simultaneous orthogonal basis clustering feature selection. In: International conference on computer vision and pattern recognition, pp 5016–5023. https://doi.org/10.1109/CVPR.2015.7299136

  37. Qian M, Zhai C (2013) Robust unsupervised feature selection. In: International joint conference on artificial intelligence, pp 1621–1627

  38. Du S, Ma Y, Li S, Ma Y (2017) Robust unsupervised feature selection via matrix factorization. Neurocomputing 241:115–127. https://doi.org/10.1016/J.NEUCOM.2017.02.034

    Article  MATH  Google Scholar 

  39. Luo C, Zheng J, Li T, Chen H, Huang Y, Peng X (2022) Orthogonally constrained matrix factorization for robust unsupervised feature selection with local preserving. Inf Sci 586:662–675. https://doi.org/10.1016/J.INS.2021.11.068

    Article  MATH  Google Scholar 

  40. Nie F, Wang H, Huang H, Ding CHQ (2013) Adaptive loss minimization for semi-supervised elastic embedding. In: International joint conference on artificial intelligence, pp 1565–1571

  41. Nie F, Huang H, Cai X, Ding CHQ (2010) Efficient and robust feature selection via joint 2, 1-norms minimization. In: International conference on neural information processing systems, pp 1813–1821

  42. Sun Z, Xie H, Liu J, Gou J, Yu Y (2023) Dual-graph with non-convex sparse regularization for multi-label feature selection. Appl Intell 53(18):21227–21247. https://doi.org/10.1007/S10489-023-04515-6

    Article  Google Scholar 

  43. Wang Q, He X, Jiang X, Li X (2022) Robust bi-stochastic graph regularized matrix factorization for data clustering. IEEE Trans Pattern Anal Mach Intell 44(1):390–403. https://doi.org/10.1109/TPAMI.2020.3007673

  44. Boyd SP, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122. https://doi.org/10.1561/2200000016

  45. Li S, Tang C, Liu X, Liu Y, Chen J (2019) Dual graph regularized compact feature representation for unsupervised feature selection. Neurocomputing 331:77–96. https://doi.org/10.1016/J.NEUCOM.2018.11.060

    Article  MATH  Google Scholar 

  46. Shi Z, Liu J (2023) Noise-tolerant clustering via joint doubly stochastic matrix regularization and dual sparse coding. Expert Syst Appl 222:119814. https://doi.org/10.1016/J.ESWA.2023.119814

  47. Neumann JV (1950) Functional Operators (AM-22), Volume 2: The Geometry of Orthogonal Spaces.(AM-22). Princeton University Press, Princeton

  48. Zhang X (2020) A Matrix Algebra Approach to Artificial Intelligence. Springer, Singapore. https://doi.org/10.1007/978-981-15-2770-8

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

Xiangfa Song: Conceptualization, Formal analysis, Methodology, Software, Validation, Writing-review & editing.

Corresponding author

Correspondence to Xiangfa Song.

Ethics declarations

Competing Interests

The author declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical and Informed Consent for Data Used

The dataset used is public and available, and there are no ethical implications.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

Appendix A

Lemma 3

The closed-form solution to the optimization problem (15) is \( \textbf{S}=\textbf{R}+\frac{n+\textbf{1}_{n}^{T} \textbf{R} \textbf{1}_{n}}{n^{2}} \textbf{1}_{n} \textbf{1}_{n}^{T}-\frac{1}{n} \textbf{R} \textbf{1}_{n} \textbf{1}_{n}^{T}-\frac{1}{n} \textbf{1}_{n} \textbf{1}_{n}^{T}\textbf{R} \), where \( \textbf{R}=\frac{\textbf{A}+\textbf{A}^T}{2} \).

Proof

The Lagrangian function for problem (15) can be formulated as follows:

$$\begin{aligned} \min _{\textbf{S}} \frac{1}{2}\Vert \textbf{S}-\textbf{A}\Vert _{F}^{2}-\left( \eta ^{T}(\textbf{S 1}_n -\textbf{1}_n)\right) -T r\left( \Xi ^{T}\left( \textbf{S}^{T}-\textbf{S}\right) \right) , \end{aligned}$$
(A1)

where \(\eta \in \mathbb {R}^{n}\) and \(\Xi \in \mathbb {R}^{n \times n}\) are Lagrangian multipliers.

By differentiating w.r.t. \( \textbf{S} \) and setting it to 0, we obatain

$$\begin{aligned} \textbf{S}-\textbf{A}-\left( \Xi ^{T}-\Xi \right) - \eta \textbf{1}_n^T=0. \end{aligned}$$
(A2)

By calculating the transpose on both sides of (A2), we get:

$$\begin{aligned} \textbf{S}-\textbf{A}^{T}+\left( \Xi ^{T}-\Xi \right) -\textbf{1}_n \eta ^T=0. \end{aligned}$$
(A3)

By combining (A2) with (A3), we obtain:

$$\begin{aligned} -\left( \Xi ^{T}-\Xi \right) =\frac{1}{2}\left( \textbf{A}-\textbf{A}^{T}+\eta \textbf{1}_n^{T}-\textbf{1}_n \eta ^{T}\right) . \end{aligned}$$
(A4)

By combining (A2) with (A4), we further have:

$$\begin{aligned} \textbf{S}=\textbf{R}+\frac{1}{2}\eta \textbf{1}_n^T+\frac{1}{2}\textbf{1}_n\eta ^T. \end{aligned}$$
(A5)

where \( \textbf{R}=\frac{\textbf{A}+\textbf{A}^T}{2} \). By the constraint \( \textbf{S 1}_n=\textbf{1}_n \), we multiply vector \( \textbf{1}_n \) on both sides of (A5), and we can obtain

$$\begin{aligned} \textbf{1}_n=\textbf{R 1}_n+\frac{n}{2}\eta +\frac{1}{2}\textbf{1}_n\textbf{1}_n^T\eta , \end{aligned}$$
(A6)

from which we get

$$\begin{aligned} \eta =2\left( n \textbf{I}+\textbf{1}_n\textbf{ 1}_n^{T}\right) ^{-1}(\textbf{I}-\textbf{R}) \textbf{1}_n. \end{aligned}$$
(A7)

Based on the Woodbury formula [48], we get

$$\begin{aligned} \left( n \textbf{I}+\textbf{1}_n\textbf{ 1}_n^{T}\right) ^{-1}=\frac{1}{n}\left( \textbf{I}-\frac{1}{2n}\textbf{1}_n\textbf{ 1}_n^{T}\right) . \end{aligned}$$
(A8)

By combining (A5), (A7) and (A8), we can get the solution of problem (15) as

$$\begin{aligned} \textbf{S}=\textbf{R}+\frac{n+\textbf{1}_{n}^{T} \textbf{R} \textbf{1}_{n}}{n^{2}} \textbf{1}_{n} \textbf{1}_{n}^{T}-\frac{1}{n} \textbf{R} \textbf{1}_{n} \textbf{1}_{n}^{T}-\frac{1}{n} \textbf{1}_{n} \textbf{1}_{n}^{T}\textbf{R}, \end{aligned}$$
(A9)

\(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Song, X. Robust unsupervised feature selection based on matrix factorization with adaptive loss via bi-stochastic graph regularization. Appl Intell 55, 55 (2025). https://doi.org/10.1007/s10489-024-05876-2

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-05876-2

Keywords