Abstract
Feature selection and subspace learning are two popular approaches of dimensionality reduction for solving the issue of ‘curse of dimensionality’ in high-dimensional data. However, most of previous methods of feature selection and subspace learning ignore the fact that there exist noise and outliers in high-dimensional data, which increase the rank of the data matrix so that decreasing the stability of learning models. In this paper, we integrate a feature-level self-representation loss function, a low-rank constraint, a graph Laplacian regularizer, and a sparsity regularizer into a unified framework to conduct unsupervised feature selection for solving mentioned issues. Specifically, we first propose a new feature-level self-representation loss function plus a sparsity regularizer (ℓ 2,1-norm regularizer) to select representative features, and then push a low-rank constraint on the coefficient matrix which considers the response variables as a whole group to avoid the impact of noise and outliers, and a graph regularizer to preserve the local structures of the data to conduct subspace learning in the framework of feature selection. Experimental results on real databases implied that the proposed method effectively selected the most representative features and removed the adverse effect of irrelevant features, compared to the state-of-the-art methods.
Similar content being viewed by others
Notes
Available at http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Available at http://archive.ics.uci.edu/ml/
Available at http://featureselection.asu.edu/datasets.php
Available at http://see.xidian.edu.cn/vipsl/database_Face.html
References
Cai X, Ding C, Nie F, Huang H (2013) On the equivalent of low-rank linear regressions and linear discriminant analysis based regressions. In: SIGKDD, pp 1124–1132
Cao J, Wu Z, Wang Y, Zhuang Y (2012) Hybrid collaborative filtering algorithm for bidirectional web service recommendation. Knowl Inf Syst 36(3):607–627
Cao J, Wu Z, Mao B, Zhang Y (2013a) Shilling attack detection utilizing semi-supervised learning method for collaborative recommender system. World Wide Web-internet & Web Information Systems 16(5–6):729–748
Cao J, Wu Z, Wu J, Liu W (2013b) Towards information-theoretic k-means clustering for image indexing. Signal Process 93(7):2026–2037
Cao J, Wu Z, Wu J (2014) Scaling up cosine interesting pattern discovery: a depth-first method. Inf Sci 266(5):31–46
Gu Q, Li Z, Han J (2011) Joint feature selection and subspace learning. In: IJCAI, vol 22, pp 1294–1299
Hou C, Nie F, Li X, Yi D, Wu Y (2013) Joint embedding learning and sparse regression: a framework for unsupervised feature selection. IEEE Transactions on Cybernetics 44(6):793–804
Jie C, Wu Z, Wu J, Hui X (2013) Sail: Summation-based incremental learning for information-theoretic text clustering. IEEE Transactions on Cybernetics 43(2):570–584
Leung Y, Hung Y (2010) A multiple-filter-multiple-wrapper approach to gene selection and microarray data classification. IEEE/ACM Trans Comput Biol Bioinform 7(1):108–17
Lewandowski M, Makris D, Velastin S, Nebel JC (2014) Structural laplacian eigenmaps for modeling sets of multivariate sequences. IEEE Transactions on Cybernetics 44(6):936–949
Liu R, Yang N, Ding X, Ma L (2009) An unsupervised feature selection algorithm: Laplacian score combined with distance-based entropy measure. In: IITA, pp 65–68
Liu G, Lin Z, Yu Y (2010) Robust subspace segmentation by low-rank representation. In: CVPR, pp 663–670
Nie F, Xiang S, Jia Y, Zhang C, Yan S (2008) Trace ratio criterion for feature selection. In: AAAI, pp 671–676
Nie F, Huang H, Cai X, Ding CH (2010) Efficient and robust feature selection via joint ℓ 2,1-norms minimization. In: NIPS, pp 1813–1821
Qin Y, Zhang S, Zhu X, Zhang J, Zhang C (2007) Semi-parametric optimization for missing data imputation. Appl Intell 27(1):79–88
Sunzhong LV, Jiang H, Zhao L, Wang D, Fan M (2013) Manifold based fisher method for semi-supervised feature selection. In: FSKD, pp 664–668
Tabakhi S, Moradi P, Akhlaghian F (2014) An unsupervised feature selection algorithm based on ant colony optimization. Eng Appl Artif Intell 32:112–123
Thung KH, Paramesran R, Lim CL (2012) Content-based image quality metric using similarity measure of moment vectors. Pattern Recogn 45(6):2193–2204
Thung KH, Wee CY, Yap PT, Shen D (2014) Neurodegenerative disease diagnosis using incomplete multi-modality data via matrix shrinkage and completion. Neuroimage 91(2):386–400
Thung KH, Wee CY, Yap PT, Shen D (2015a) Identification of progressive mild cognitive impairment patients using incomplete longitudinal mri scans. Brain Struct Funct:1–17
Thung KH, Yap PT, Adeli-M E, Shen D (2015b) Joint diagnosis and conversion time prediction of progressive mild cognitive impairment (pmci) using low-rank subspace clustering and matrix completion. pp 527–534
Unler A, Murat A, Chinnam RB (2011) Mr2pso: a maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification. Inf Sci 181(20):4625–4641
Wang JY, Yao J, Sun Y (2014) Semi-supervised local-learning-based feature selection. In: IJCNN, pp 1942–1948
Xiang S, Zhu Y, Shen X, Ye J (2011) Optimal exact least squares rank minimization. In: KDD : proceedings / International Conference on Knowledge Discovery & Data Mining. International Conference on Knowledge Discovery & Data Mining, pp 480–488
Xu Y, Song F, Feng G, Zhao Y (2010) A novel local preserving projection scheme for use with face recognition. Expert Syst Appl 37(9):6718–6721
Yang Y, Yang Y, Shen HT (2013) Effective transfer tagging from image to video. ACM Trans Multimed Comput Commun Appl 9(2):1137–1140
Yang Y, Ma Z, Yang Y, Nie F, Shen HT (2014a) Multitask spectral clustering by exploring intertask correlation. IEEE Transactions on Cybernetics 45 (5):1069–1080
Yang Y, Zha ZJ, Gao Y, Zhu X (2014b) Corrections to exploiting web images for semantic video indexing via robust sample-specific loss. IEEE Trans Multimedia 16 (6):1677–1689
Yang Y, Zhang H, Zhang M, Shen F, Li X (2015) Visual coding in a semantic hierarchy. pp 59–68
Zhang C, Qin Y, Zhu X, Zhang J, Zhang S (2006) Clustering-based missing value imputation for data preprocessing. In: IEEE International Conference on Industrial Informatics, pp 1081– 1086
Zhang J, Liang J, Zhao H (2013) Local energy pattern for texture classification using self-adaptive quantization thresholds. IEEE Trans Image Process 22(1):31–42
Zhang J, Chen D, Liang J, Xue H, Lei J, Wang Q, Chen D, Meng M, Jin Z, Tian J (2014) Incorporating mri structural information into bioluminescence tomography: system, heterogeneous reconstruction and in vivo quantification. Biomedical Optics Express 5(6):1861–76
Zhang Q, Tian Y, Yang Y, Pan C (2015) Automatic spatialspectral feature selection for hyperspectral image via discriminative sparse multimodal learning. IEEE Trans Geosci Remote Sens 53(1):261–279
Zhu X, Zhang S, Zhang J, Zhang C (2007) Cost-sensitive imputing missing values with ordering. In: AAAI, pp 1922–1923
Zhu X, Huang Z, Shen HT, Cheng J, Xu C (2012) Dimensionality reduction by mixed kernel canonical correlation analysis. Pattern Recogn 45(8):3003–3016
Zhu X, Suk H, Shen D (2014a) Multi-modality canonical feature selection for alzheimer’s disease diagnosis. In: MICCAI, pp 162–169
Zhu X, Suk HI, Shen D (2014b) A novel matrix-similarity based loss function for joint regression and classification in ad diagnosis. Neuroimage 100:91–105
Zhu P, Zuo W, Zhang L, Hu Q, Shiu SC (2015a) Unsupervised feature selection by regularized self-representation. Pattern Recogn 48(2):438–446
Zhu X, Suk HI, Wang L, Lee SW, Shen D (2015) Alzheimers Disease Neuroimaging Initiative. A novel relational regularization feature selection method for joint regression and classification in ad diagnosis. Med Image Anal. doi:10.1016/j.media.2015.10.008
Zhu X, Li X, Zhang S (2016a) Block-row sparse multiview multilabel learning for image classification. IEEE Trans Cybernetics 46(2):450–461
Zhu X, Li X, Zhang S, Ju C, Wu X (2016) Robust joint graph sparse coding for unsupervised spectral feature selection. IEEE Transactions on Neural Networks and Learning Systems. doi:10.1109/TNNLS.2016.2521602
Zhu X, Suk H, Lee S, Shen D (2016c) Subspace regularized sparse multitask learning for multiclass neurodegenerative disease identification. IEEE Trans Biomed Engineering 63(3):607–618
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China (Grant No: 61263035, 61573270, 61450001 and 61363009), the China 973 Program (Grant No: 2013CB329404), the Guangxi Natural Science Foundation (Grant No: 2015GXNSFCB139011), the Guangxi Higher Institutions’ Program of Introducing 100 High-Level Over-seas Talents, the Guangxi Collaborative Innovation Center of Multi-Source Information Integration and Intelligent Processing, Innovation Project of Guangxi Graduate Education under grant YCSZ2016046 and the project “Application and Research of Big Data Fusion in Inter-City Traffic Integration of The Xijiang River - Pearl River Economic Belt(da shu jv rong he zai xijiang zhujiang jing ji dai cheng ji jiao tong yi ti hua zhong de ying yong yu yan jiu )”.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
He, W., Zhu, X., Cheng, D. et al. Low-rank unsupervised graph feature selection via feature self-representation. Multimed Tools Appl 76, 12149–12164 (2017). https://doi.org/10.1007/s11042-016-3937-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-3937-6