Abstract
With the speedy development of network technology, diverse data increase by hundreds of millions per hour, causing increasing pressure on the acquisition of data labels. Semi-supervised feature selection has been among the forefront of dimensionality reduction research due to the outstanding achievements of “small labels” and “high efficiency”. Especially, the graph-based methods use data with missing labels completely and effectively, prompting it to become a research hotspots in semi-supervised feature selection. However, the existing graph-based methods do not take into account the effects of outliers, noise, and redundancy of selected features simultaneously. To solve those problems, a novel semi-supervised feature selection method based on local adaptive and minimal redundancy is proposed. The local structure is flexibly assigned weights according to the data conditions, thereby reducing the impact of outliers and noise; moreover, a high similarity penalty mechanism is introduced in the feature mapping matrix to promote discrimination and low redundancy of the selected feature subset. In addition, an iterative method is designed and its convergence is proved theoretically and experimentally. Finally, the proposed algorithm is verified to be stable and effective through experiments from five aspects on sixteen public datasets.
Similar content being viewed by others
Notes
Sum of Squared Errors
Non-negative Matrix Factorization
References
Chu Y, Lin H, Yang L, Diao Y, Zhang D, Zhang S, Fan X, Shen C, Xu B, Yan D (2020) Discriminative globality-locality preserving extreme learning machine for image classification. Neurocomputing 387:13–21
Lei Y, Chen X, Min M, Xie Y (2020) A semi-supervised laplacian extreme learning machine and feature fusion with cnn for industrial superheat identification. Neurocomputing 381:186–195
Chen J, Zeng Y, Li Y, Huang G B (2020) Unsupervised feature selection based extreme learning machine for clustering. Neurocomputing 386:198–207
Uċar M (2020) Classification performance-based feature selection algorithm for machine learning: P-score. Innov Res BioMed Eng 41:229–239
Bai X, Zhu L, Liang C, Li J, Nie X, Chang X (2020) Multi-view feature selection via nonnegative structured graph learning. Neurocomputing 387:110–122
Shang R, Wang W, Stolkin R, Jiao L (2016) Subspace learning-based graph regularized feature selection. Knowl-Based Syst 112:152–165
Song Q, Jiang H, Liu J (2017) Feature selection based on FDA and F-score for multi-class classification. Expert Syst Appl 81:22–27
Zhou H, Wang X, Zhang Y (2020) Feature selection based on weighted conditional mutual information. Applied Computing and Informatics. https://doi.org/10.1016/j.aci.2019.12.003
Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press
Huang R, Jiang W, Sun G (2018) Manifold-based constraint laplacian score for multi-label feature selection. Pattern Recogn Lett 112:346–352
Li C, Luo X, Qi Y, Gao Z, Lin X (2020) A new feature selection algorithm based on relevance, redundancy and complementarity. Comput Biol Med 119(103667)
Lin M, Cui H, Chen W, Engelen A V, Bruijne M D, Azarpazhooh M R, Sohrevardi S M, Spence J D, Chiu B (2020) Longitudinal assessment of carotid plaque texture in three-dimensional ultrasound images based on semi-supervised graph-based dimensionality reduction and feature selection. Comput Biol Med 116(103586)
Jiang L, Yu G, Guo M, Wang J (2020) Feature selection with missing labels based on label compression and local feature correlation. Neurocomputing 395:95–106
Ma J, Chow T W (2019) Label-specific feature selection and two-level label recovery for multi-label classification with missing labels. Neural Netw 118:110–126
Coelho F, Castro C, Braga A P, Verleysen M (2019) Semi-supervised relevance index for feature selection. Neural Comput Appl 31(2):989–997
Yu E, Sun J, Li J, Chang X, Han X H, Hauptmann A G (2018) Adaptive semi-supervised feature selection for cross-modal retrieval. IEEE Trans Multimed 21(5):1276–1288
Sheikhpour R, Sarram M A, Gharaghani S, Chahooki M A Z (2017) A survey on semi-supervised feature selection methods. Pattern Recogn 64:141–158
Hasanloei M A V, Sheikhpour R, Sarram M A, Sheikhpour E, Sharifi H (2018) A combined Fisher and Laplacian score for feature selection in QSAR based drug design using compounds with known and unknown activities. J Comput-Aaided Mol Des 32(2):375–384
Feofanov V, Amini M R, Devijver E (2019) Semi-supervised wrapper feature selection with imperfect labels. arXiv:1911.04841
Gu Y, Li K, Guo Z, Wang Y (2019) Semi-supervised k-means DDos detection method using hybrid feature selection algorithm. IEEE Access 7:64351–64365
Xu Z, King I, Lyu M R T, Jin R (2010) Discriminative semi-supervised feature selection via manifold regularization. IEEE Trans Neural netw 21(7):1033–1047
Nie F, Xu D, Tsang I W H, Zhang C (2010) Flexible manifold embedding: a framework for semi-supervised and unsupervised dimension reduction. IEEE Trans Image Process 19(7):1921–1932
Ma Z, Nie F, Yang Y, Uijlings J R, Sebe N, Hauptmann A G (2012) Discriminating joint feature analysis for multimedia data understanding. IEEE Trans Multimed 14(6):1662–1672
Sheikhpour R, Sarram M A, Gharaghani S, Chahooki M A Z (2020) A robust graph-based semi-supervised sparse feature selection method. Inf Sci 531:13–30
Jiang L, Yu G, Guo M, Wang J (2020) Feature selection with missing labels based on label compression and local feature correlation. Neurocomputing 395:95–106
Zheng J, Yuan H, Lai LL, Zheng H, Wang Z, Wang F (2018) SGL-RFS: Semi-supervised graph learning robust feature selection. In: Proceedings of the 13th International Conference on Wavelet Analysis and Pattern Recognition DOI: https://doi.org/10.1109/ICWAPR.2018.8521274
Sheikhpour R, Sarram M A, Sheikhpour E (2018) Semi-supervised sparse feature selection via graph laplacian based scatter matrix for regression problems. Inf Sci 468:14–28
Wang X, Chen R, Hong C, Zeng Z, Zhou Z (2017) Semi-supervised multi-label feature selection via label correlation analysis with l1-norm graph embedding. Image Vis Comput 63:10–23
Zhao Z K, Qian J S (2012) Locality sensitive semi-supervised dimensionality reduction on multimodal data. Appl Mech Mater 148-149:258–261
Shi C, An G, Zhao R, Ruan Q, Tian Q (2016) Multiview hessian semisupervised sparse feature selection for multimedia analysis. IEEE Trans Circ Syst Video Technol 27(9):1947–1961
Shi C, Duan C, Gu Z, Tian Q, An G, Zhao R (2019) Semi-supervised feature selection analysis with structured multi-view sparse regularization. Neurocomputing 330:412–424
Cai H, Zheng V W, Chang K C C (2018) A comprehensive survey of graph embedding: problems, techniques, and applications. IEEE Trans Knowl Data Eng 30(9):1616–1637
Song X, Zhang J, Han Y, Jiang J (2016) Semi-supervised feature selection via hierarchical regression for web image classification. Multimed Syst 22(1):41–49
Yan Y, Shen H, Liu G, Ma Z, Gao C, Sebe N (2014) GLOcal tells you more: Coupling GLocal structural for feature selection with sparsity for image and video classification. Comput Vis Image Underst 124:99–109
Shi C, Gu Z, Duan C, Tian Q (2020) Multi-view adaptive semi-supervised feature selection with the self-paced learning. Signal Process 168(107332)
Cai Z, Zhu W (2018) Multi-label feature selection via feature manifold learning and sparsity regularization. Int J Mach Learn Cybern 9(8):1321–1334
Chartrand R (2007) Exact reconstruction of sparse signals via nonconvex minimization. IEEE Signal Process Lett 14(10):707–710
Ye Y, Shao Y, Deng N, Li C, Hua X (2017) Robust lp-norm least squares support vector regression with feature selection. Appl Math Comput 305:32–52
Li C N, Ren P W, Shao Y H, Ye Y F, Guo Y R (2020) Generalized elastic net lp-norm nonparallel support vector machine. Eng Appl Artif Intell 88(103397)
Xu S, Dai J, Hong S (2018) Semi-supervised feature selection by mutual information based on kernel density estimation. In: Proceedings of the 24th International Conference on Pattern Recognition DOI: https://doi.org/10.1109/ICPR.2018.8546181
Chen S B, Zhang Y, Ding C H, Zhou Z L, Luo B (2018) A discriminative multi-class feature selection method via weighted L2,1,-norm and extended elastic net. Neurocomputing 275:1140–1149
Wang L, Chen S (2013) L2,p matrix norm and its application in feature selection. arXiv:1911.04841
Bishop C M (2006) Pattern recognition and machine learning. J Electron Imaging 16(4):140–155
Wang Y, Wang J, Liao H, Chen H (2017) An efficient semi-supervised representatives feature selection algorithm based on information theory. Pattern Recogn 61:511–523
Xu J, Tang B, He H, Man H (2016) Semisupervised feature selection based on relevance and redundancy criteria. IEEE Trans Neural Netw Learn Syst 28(9):1974–1984
Benabdeslem K, Hindawi M (2013) Efficient semi-supervised feature selection: constraint, relevance, and redundancy. IEEE Trans Knowl Data Eng 26(5):1131–1143
Yang X, He L, Qu D, Zhang W (2018) Semi-supervised minimum redundancy maximum relevance feature selection for audio classification. Multimed Tools Appl 77(1):713–739
Xu S, Dai J, Shi H (2018) Semi-supervised Feature selection based on least square regression with redundancy minimization. In: Proceedings of the 2018 International Joint Conference on Neural Networks. https://doi.org/10.1109/IJCNN.2018.8489384
Nie F, Wang H, Huang H, Ding C (2011) Unsupervised and semi-supervised learning via l1 − norm graph. In: Proceedings of the 13th International Conference on Computer Vision 2268–2273 DOI: https://doi.org/10.1109/ICCV.2011.6126506
Wang X, Zhang X, Zeng Z, Wu Q, Zhang J (2016) Unsupervised spectral feature selection with l1-norm graph. Neurocomputing 200:47–54
Ding C (2013) A new robust function that smoothly interpolates between l1 and l2 error functions. Univerisity of Texas at Arlington Technology Report
Luo M, Nie F, Chang X, Yang Y, Hauptmann A G, Zheng Q (2017) Adaptive unsupervised feature selection with structure regularization. IEEE Trans Neural Netw Learn Syst 29(4):944–956
Gao Y, Wang D, Pan J, Wang Z, Chen B (2019) A novel fuzzy c-means clustering algorithm using adaptive norm. Int J Fuzzy Syst 21(8):2632–2649
Wang X, Chen R, Yan F, Zeng Z, Hong C (2019) Fast adaptive k-means subspace clustering for high-dimensional data. IEEE Access 7:42639–42651
Zeng Z, Wang X, Yan F, Chen Y (2019) Local adaptive learning for semi-supervised feature selection with group sparsity. Knowl-Based Syst 181(104787)
Chen X, Yuan G, Nie F, Ming Z (2018) Semi-supervised feature selection via sparse rescaled linear square regression. IEEE Trans Knowl Data Eng 32(1):165–176
Nie F, Wang H, Huang H, Ding C (2013) Adaptive loss minimization for semi-supervised elastic embedding. In: Proceedings of the 23th International Joint Conference on Artificial Intelligence, pp 1565–1571
Chang X, Nie F, Yang Y, Huang H (2014) A convex formulation for semi-supervised multi-label feature selection. In: Proceedings of the 37th National Conference on Artificial Intelligence, vol 2, pp 1171–1177
Bache K, Lichman M (2013) UCI machine learning repository
Zhang Y, Wang Q, Gong DW, Song XF (2019) Nonnegative Laplacian embedding guided subspace learning for unsupervised feature selection. Pattern Recogn 93:337–352
Zhang Y, Gong D W, Gao X Z, Tian T, Sun Z Y (2020) Binary differential evolution with self-learning for multi-objective feature selection. Inf Sci 507:67–85
Song X F, Zhang Y, Guo Y N, Sun X Y, Wang Y L (2020) Variable-size cooperative coevolutionary particle swarm optimization for feature selection on high-dimensional data. IEEE Trans Evol Comput 24:882–895
Tang BG, Zhang L (2018) Semi-supervised feature selection based on logistic I-RELIEF for multi-classification. In: Proceedings of the 15th Pacific Rim International Conference on Artificial Intelligence, pp 719–731
Zhang Y, Li H G, Wang Q W, Peng C (2020) A filter-based bare-bone particle swarm optimization algorithm for unsupervised feature selection. Appl Intell 49:2889–2898
Nie F, Huang H, Cai X, Ding C H Q (2010) Efficient and robust feature selection via joint l2,1-norms minimization. In: Proceedings of the 23th International Conference on Neural Information Processing Systems, vol 2, pp 1813–1821
Nie F, Yang S, Zhang R, Li X L (2019) A general framework for auto-weighted feature selection via global redundancy minimization. IEEE Trans Image Process 28(5):2428– 2438
Liu Y, Nie F, Wu J, Chen L (2013) Efficient semi-supervised feature selection with noise insensitive trace ratio criterion. Neurocomputing 5:12–18
Wang D, Nie F, Huang H (2015) Feature selection via global redundancy minimization. IEEE Trans Knowl Data Eng 27(10):2743–2755
Acknowledgements
This work is supported by the National Natural Science Foundation of China (Nos. 61976182, 61602327, 61876157, 61976245, 62076171), Key program for International S&T Cooperation of Sichuan Province (2019YFH0097) and Sichuan Key R&D project (2020YFG0035).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wu, X., Chen, H., Li, T. et al. Semi-supervised feature selection with minimal redundancy based on local adaptive. Appl Intell 51, 8542–8563 (2021). https://doi.org/10.1007/s10489-021-02288-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02288-4