Abstract
This paper proposes a new clustering method that combines the k Near Neighbor (k NN) method and the local Principal Component Analysis (PCA) to consider the global and local information of data points for clustering. Specifically, we propose firstly preserving the local information of samples using the k NN method to obtain a neighborhood subset and a covariance matrix for each data point, and then preserving the global information of the data by conducting the local PCA on each covariance matrix to obtain a binary affinity matrix of the data. Furthermore, our method conducts clustering on the resulting affinity matrix without the assignment of clustering number. Experimental analysis on 8 UCI benchmark datasets showed that our proposed method outperformed the state-of-the-art clustering methods in terms of clustering performance.


Similar content being viewed by others
References
Arias-Castro E, Lerman G, Zhang T (2017) Spectral clustering based on local pca. J Mach Learn Res 18(9):1–57
Bhatia N (2010) Vandana Survey of nearest neighbor techniques. Comput Sci Inform Secur 8(2):302–305
Chen YS, Yi PH, Fuh CS (2007) Fast algorithm for nearest neighbor search based on a lower bound tree. In: ICCV, pp 446–453
Deng X, Li Y, Weng J, Zhang J (2018) Feature selection for text classification A review. Multimedia Tools and Applications, pp 1–20
Domeniconi C, Peng J, Gunopulos D (2002) Locally adaptive metric nearest-neighbor classification. IEEE Trans Pattern Anal Mach Intell 24(9):1281–1285
Elhamifar E, Vidal R (2013) Sparse subspace clustering: Algorithm, theory, and applications. IEEE Trans Pattern Anal Mach Intell 35(11):2765–2781
Fayed HA, Atiya AF (2009) A novel template reduction approach for the K-nearest neighbor method. IEEE Press
Gao L, Guo Z, Zhang H, Xu X, Shen HT (2017) Video captioning with attention-based LSTM and semantic consistency. IEEE Trans Multimed 19(9):2045–2055
Goldberg AB, Zhu X, Singh A, Xu Z, Nowak R (2010) Multi-manifold semi-supervised learning. Ynh Lr on Arfal Nllgn Mahn Larnng 5(1):169–176
Gong D, Zhao X, Medioni G (2012) Robust multiple manifolds structure learning, pp 25–32
Hagen L, Kahng A (1991) Fast spectral methods for ratio cut partitioning and clustering. In IEEE International Conference on Computer-Aided Design, 1991. Iccad-91. Digest of Technical Papers, pp 10–13
Hartigan J (1979) A k-means clustering algorithm. Appl Stat 28(1):100–108
Hu R, Zhu X, Cheng D, He W, Yan Y, Song J, Zhang S (2017) Graph self-representation method for unsupervised feature selection. Neurocomputing 220:130–137
Lei C, Zhu X (2017) Unsupervised feature selection via local structure learning and sparse learning. https://doi.org/10.1007/s11042--017--5381--7, pp 11
Liu G, Lin Z, Yan S, Ju S, Yu Y, Ma Y (2013) Robust recovery of subspace structures by low-rank representation. IEEE Trans Pattern Anal Mach Intell 35(1):171–184
Lu CY, Min H, Zhao ZQ, Zhu L, De SH, Yan S (2012) Robust and efficient subspace segmentation via least squares regression. In: European Conference on Computer Vision, pp 347–360
Luo D, Nie F, Ding C, Huang H (2011) Multi-subspace representation and discovery. Mach Learn Knowl Discov Databases 6912(1):405–420
Meila M, Xu L (2003) Multiway cuts and spectral clustering
Nie F, Huang H (2016) Subspace clustering via new low-rank model with discrete group structure constraint. IJCAI, pp 1874–1880
Nie F, Wang X, Jordan MI, Huang H (2016) The constrained laplacian rank algorithm for graph-based clustering. In: Thirtieth AAAI Conference on Artificial Intelligence, pp 1969–1976
Shah SA, Koltun V (2017) Robust continuous clustering. Proc Natl Acad Sci USA 114(37):9814– 9819
Shen F, Xu Y, Liu Li, Yang Y, Huang Z, Shen HT (2018) Unsupervised deep hashing with similarity-adaptive and discrete optimization. https://doi.org/10.1109/TPAMI.2018.2789887
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Song Y, Huang J, Zhou D, Zha H, Iknn C, Giles L (2007) Informative k-nearest neighbor pattern classification. In: Knowledge Discovery in Databases: Pkdd 2007, European Conference on Principles and Practice of Knowledge Discovery in Databases. Proceedings, Warsaw, pp 248–264
Song J, Gao L, Nie F, Shen HT, Yan Y, Sebe N (2016) Optimized graph learning using partial tags and multiple features for image and video annotation. IEEE Trans Image Process 25(11):4999–5011
Song J, Shen HT, Wang J, Zi H, Sebe N, Wang J (2016) A distance-computation-free search scheme for binary code databases. IEEE Trans Multimed 18(3):484–495
Song J, Gao L, Li L, Zhu X, Sebe N (2018) Quantization-based hashing: a general framework for scalable image and video retrieval. Pattern Recogn 75:175–187
Wang Y, Jiang Y, Wu Y, Zhou ZH (2011) Spectral clustering on multiple manifolds. IEEE Trans Neural Netw 22(7):1149
Wang S, Yuan X, Yao T, Yan S, Shen J (2011) Efficient subspace segmentation via quadratic programming. In: AAAI Conference on Artificial Intelligence, pp 519–524
Wojna A (2002) Riona: A classifier combining rule induction and k. Lect Notes Comput Sci 2430:111–123
Yang Y, Duan Y, Wang X, Huang Z, Xie N, Shen HT (2018) Hierarchical multi-clue modelling for poi popularity prediction with heterogeneous tourist information. IEEE Transactions on Knowledge and Data Engineering
Yang Y, Zhou J, Ai J, Yi B, Hanjalic A, Shen HT (2018) Video captioning by adversarial lstm. IEEE Transactions on Image Processing, https://doi.org/10.1109/TIP.2018.2855422
Yi B, Yang Y, Shen F, Xie N, Shen HT, Li X (2018) Describing video with attention based bidirectional lstm. IEEE Transactions on Cybernetics, pp 10.1109/TCYB.2018.2831447
Yu Z, Jin J, Qing X, Wang B, Wang X (2012) Lasso based stimulus frequency recognition model for ssvep bcis. Biomed Signal Process Control 7(2):104–111
Zhang Y, Zhao Q, Jin J, Wang X, Cichocki A (2012) A novel bci based on erp components sensitive to configural processing of human faces. J Neural Eng 9 (2):026018
Zhang S, Li X, Zong M, Zhu X, Cheng D (2017) Learning k for knn classification. ACM Trans Intell Syst Technol 8(3):43
Zhang S, Li X, Zong M, Zhu X, Wang R (2018) Efficient knn classification with different numbers of nearest neighbors. IEEE Trans Neural Netw Learn Syst 29 (5):1774–1785
Zhao J, Xiaojun WU, Dong W (2017) Locality constraint enhanced least squares regression subspace clustering. Pattern Recogn Artif Intell 205(c):22–31
Zheng Q, Liu Z (2016) Research on improved normalized cut spectral clustering algorithm. In: Control and Decision Conference, pp 1981–1984
Zheng W, Zhu X, Zhu Y, Hu R, Lei C (2017) Dynamic graph learning for spectral feature selection. Multimedia Tools and Applications, https://doi.org/10.1007/s11042-017-5272-y
Zheng W, Zhu X, Wen G, Zhu Y, Yu H, Gan J (2018) Unsupervised feature selection by self-paced learning regularization. Pattern Recognition Letters, https://doi.org/10.1016/j.patrec.2018.06.029
Zhu Y, Lucey S (2015) Convolutional sparse coding for trajectory reconstruction. IEEE Trans Pattern Anal Mach Intell 37(3):529–540
Zhu X, Zhang S, Jin Z, Zhang Z, Xu Z (2011) Missing value estimation for mixed attribute data sets. IEEE Trans Knowl Data Eng 23(1):110–121
Zhu X, Zi H, Yang Y, Shen HT, Xu C, Luo J (2013) Self-taught dimensionality reduction on the high-dimensional small-sized data. Pattern Recogn 46 (1):215–229
Zhu X, Zhang L, Zi H (2014) A sparse embedding and least variance encoding approach to hashing. IEEE Trans Image Process 23(9):3737–3750
Zhu X, Li X, Zhang S (2016) Block-row sparse multiview multilabel learning for image classification. IEEE Trans Cybern 46(2):450–461
Zhu Y, Kim M, Zhu X, Yan J, Kaufer D, Wu G (2017) Personalized diagnosis for alzheimer’s disease. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp 205–213
Zhu X, Li X, Zhang S, Ju C, Wu X (2017) Robust joint graph sparse coding for unsupervised spectral feature selection. IEEE Trans Neural Netw Learn Syst 28(6):1263–1275
Zhu X, Li X, Zhang S, Xu Z, Yu L, Wang C (2017) Graph pca hashing for similarity search. IEEE Trans Multimed 19(9):2033–2044
Zhu X, Suk H-I, Huang H, Shen D (2017) Low-rank graph-regularized structured sparse regression for identifying genetic biomarkers. IEEE Trans Big Data 3(4):405–414
Zhu X, Suk H-I, Wang L, Lee S-W, Shen D (2017) A novel relational regularization feature selection method for joint regression and classification in AD diagnosis. Med Image Anal 38:205–214
Zhu Y, Zhu X, Kim M, Kaufer D, Wu G (2017) A novel dynamic hyper-graph inference framework for computer assisted diagnosis of neuro-diseases. In: International Conference on Information Processing in Medical Imaging, pp 158–169
Zhu X, Zhang S, Hu R, Zhu Y et al (2018) Local and global structure preservation for robust unsupervised spectral feature selection. IEEE Trans Knowl Data Eng 30(3):517–529
Zhu X, Zhang S, Li Y, Zhang J, Yang L, Fang Y (2018) Low-rank sparse subspace for spectral clustering. IEEE Transactions on Knowledge and Data Engineering. https://doi.org/10.1109/TKDE.2018.2858782
Acknowledgments
This work is partially supported by the China Key Research Program (Grant No: 2016YFB1000905); the Natural Science Foundation of China (Grants No: 61573270 and 61672177); the Project of Guangxi Science and Technology (GuiKeAD17195062); the Guangxi Natural Science Foundation (Grant No: 2015GXNSFCB139011); the Guangxi Collaborative Innovation Center of Multi-Source Information Integration and Intelligent Processing; the Guangxi High Institutions Program of Introducing 100 High-Level Overseas Talents; and the Research Fund of Guangxi Key Lab of Multisource Information Mining & Security (18-A-01-01).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wu, L., Zhu, X. & Tong, T. Global and local clustering with kNN and local PCA. Multimed Tools Appl 77, 29727–29738 (2018). https://doi.org/10.1007/s11042-018-6488-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6488-1