Abstract
Feature selection is one of the most important machine learning procedure, and it has been successfully applied to make a preprocessing before using classification and clustering methods. High-dimensional features often appear in big data, and it’s characters block data processing. So spectral feature selection algorithms have been increasing attention by researchers. However, most feature selection methods, they consider these tasks as two steps, learn similarity matrix from original feature space (may be include redundancy for all features), and then conduct data clustering. Due to these limitations, they do not get good performance on classification and clustering tasks in big data processing applications. To address this problem, we propose an Unsupervised Feature Selection method with graph learning framework, which can reduce the redundancy features influence and utilize a low-rank constraint on the weight matrix simultaneously. More importantly, we design a new objective function to handle this problem. We evaluate our approach by six benchmark datasets. And all empirical classification results show that our new approach outperforms state-of-the-art feature selection approaches.
Similar content being viewed by others
Notes
Available at http://featureselection.asu.edu/datasets.php
References
Boyd, Vandenberghe, Faybusovich (2006) Convex optimization. IEEE Trans Autom Control, pp 243–249
Cai D, Zhang C, He X (2010) Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 333–342
Cai X, Ding C, Nie F, Huang H (2013) On the equivalent of low-rank linear regressions and linear discriminant analysis based regressions. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1124–1132
Cai X, Nie F, Huang H (2013) Exact top-k feature selection via ℓ 2,0-norm constraint. In: Twenty-third international joint conference on artificial intelligence
Chang X, Nie F, Yang Y, Huang H (2014) A convex formulation for semi-supervised multi-label feature selection. In: Twenty-Eighth AAAI Conference on Artificial Intelligence, vol 2, pp 1171–1177
Daubechies I, Devore R, Fornasier M, Gunturk CS (2010) Iteratively re-weighted least squares minimization for sparse recovery. Commun Pur Appl Math 63(1):1–38
Du L, Shen YD (2015) Unsupervised feature selection with adaptive structure learning. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 209–218
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(6):1157–1182
Han Y, Yang Y, Zhou X (2013) Co-regularized ensemble for feature selection. In: International joint conference on artificial intelligence, pp 1380–1386
He X, Niyogi P (2003) Locality preserving projections. Neural information processing systems pp 153–160
He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. In: Proceedings of 18th International conference on neural information processing systems, pp 507–514
Hinrichs A, Novak E, Ullrich M, Woźniakowski H (2014) The curse of dimensionality for numerical integration of smooth functions. Math Comput 83(290):2853–2863
Hu R, Zhu X, Cheng D, He W, Yan Y, Song J, Zhang S (2017) Graph self-representation method for unsupervised feature selection. Neurocomputing 220:130–137
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1-2):273–324
Kong X, Yu PS (2010) Semi-supervised feature selection for graph classification. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 793–802
Li Z, Yang Y, Liu J, Zhou X, Lu H (2012) Unsupervised feature selection using nonnegative spectral analysis. In: Proceedings of the twenty-sixth AAAI conference on artificial intelligence, vol 2, pp 1026–1032
Liu H, Motoda H (1998) Feature extraction, construction and selection: a data mining perspective. J Am Stat Assoc 94(448):1390
Luo M, Chang X, Nie L, Yang Y, Hauptmann AG, Zheng Q (2017) An adaptive semisupervised feature analysis for video semantic recognition. IEEE Transactions on Cybernetics, pp 1–13
Nie F, Xiang S, Jia Y, Zhang C, Yan S (2008) Trace ratio criterion for feature selection. In: Twenty-Third AAAI Conference on Artificial Intelligence, pp 671–676
Nie F, Huang H, Cai X, Ding C (2010) Efficient and robust feature selection via joint ℓ 2,1-norms minimization. Adv Neural Inf Proces Syst 1813–1821
Nie F, Yuan J, Huang H (2014) Optimal mean robust principal component analysis. In: Proceedings of the 31st International Conference on Machine Learning, pp 1062–1070
Nie F, Zhu W, Li X (2016) Unsupervised feature selection with structured graph optimization. In: Thirtieth AAAI conference on artificial intelligence, pp 1302–1308
Nie F, Zhu W, Li X (2017) Unsupervised large graph embedding. In: Thirty-first AAAI conference on artificial intelligence
Song T, Cai J, Zhang T, Gao C, Meng F, Wu Q (2017) Semi-supervised manifold-embedded hashing with joint feature representation and classifier learning. Pattern Recogn 68:99–110
Sun Y, Todorovic S, Goodison S (2010) Local-learning-based feature selection for high-dimensional data analysis. IEEE Trans Pattern Anal Mach Intell 32(9):1610–1626
Tan, Steinbach, Ning P, Kumar M, Vipin (2006) Introduction to data mining. Posts & Telecom Press, Beijing
Tenenbaum JB, De Silva V, Langford J (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
Van Der Maaten LJP, Postma EO, Van Den Herik HJ (2009) Dimensionality reduction: a comparative review. J Mach Learn Res 10(1):66–71
Wang D, Nie F, Huang H (2014) Unsupervised feature selection via unified trace ratio formulation and k-means clustering (track). In: Ecml/pkdd, pp 306–321
Wen Z, Yin W (2013) A feasible method for optimization with orthogonality constraints. Math Program 142:397–434
Weston J, Mukherjee S, Chapelle O, Pontil M, Poggio T, Vapnik V (2000) Feature selection for svms. Adv Neural Inf Proces Syst 13:668–674
Xu Z, Jin R, Lyu MR, King I (2010) Discriminative semi-supervised feature selection via manifold regularization. IEEE Trans Neural Netw 21(7):1033–47
Zhang S (2012) Nearest neighbor selection for iteratively knn imputation. J Syst Softw 85(11):2541–2552
Zhang S, Li X, Zong M, Zhu X, Cheng D (2017) Learning k for knn classification. ACM Trans Intell Syst Technol 8(3):43
Zhang S, Li X, Zong M, Zhu X, Wang R (2017) Efficient knn classification with different numbers of nearest neighbors. IEEE Transactions on Neural Networks and Learning Systems, pp 1–12
Zhao Z, Wang L, Liu H, Ye J (2013) On similarity preserving feature selection. IEEE Trans Knowl Data Eng 25(3):619–632
Zhu P, Zuo W, Zhang L, Hu Q, Shiu SCK (2015) Unsupervised feature selection by regularized self-representation. Pattern Recogn 48(2):438–446
Zhu P, Zhu W, Hu Q, Zhang C, Zuo W (2017) Subspace clustering guided unsupervised feature selection. Pattern Recognition 66, 364-374
Zhu X, Huang Z, Yang Y, Shen HT, Xu C, Luo J (2013) Self-taught dimensionality reduction on the high-dimensional small-sized data. Pattern Recogn 46(1):215–229
Zhu X, Zhang L, Huang Z (2014) A sparse embedding and least variance encoding approach to hashing. IEEE Trans Image Process 23(9):3737–3750
Zhu X, Li X, Zhang S (2016) Block-row sparse multiview multilabel learning for image classification. IEEE Transactions on Cybernetics 46(2):450–461
Zhu X, Li X, Zhang S, Ju C, Wu X (2016) Robust joint graph sparse coding for unsupervised spectral feature selection. IEEE Trans Neural Networks 28(6):1263–1275
Zhu X, Suk HI, Lee SW, Shen D (2016) Subspace regularized sparse multitask learning for multiclass neurodegenerative disease identification. IEEE Trans Biomed Eng 63(3):607–618
Zhu X, Suk H, Wang L, Lee S, Shen D (2017) A novel relational regularization feature selection method for joint regression and classification in AD diagnosis. Med Image Anal 38:205–214
Zhu X, Suk HI, Huang H, Shen D (2017) Low-rank graph-regularized structured sparse regression for identifying genetic biomarkers. IEEE Transactions on Big Data
Acknowledgements
This work is supported by the Research Foundation of Science and Technology Plan Project in Guangdong Province (2013A011403001, 2014B030301007, 2015A030401057, 2016B030307002). Also this work is supported by Program for Science Research and Technology Development of Guangxi Province (15248003-8) and Science and Technology development project of Wuzhou (2014B01039). Besides we would like to thank the anonymous reviewers for their helpful comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lu, G., Li, B., Yang, W. et al. Unsupervised feature selection with graph learning via low-rank constraint. Multimed Tools Appl 77, 29531–29549 (2018). https://doi.org/10.1007/s11042-017-5207-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-5207-7