Skip to main content
Log in

A General Framework for Dimensionality Reduction of K-Means Clustering

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

Dimensionality reduction plays an important role in many machine learning and pattern recognition applications. Linear discriminant analysis (LDA) is the most popular supervised dimensionality reduction technique which searches for the projection matrix that makes the data points of different classes to be far from each other while requiring data points of the same class to be close to each other. In this paper, trace ratio LDA is combined with K-means clustering into a unified framework, in which K-means clustering is employed to generate class labels for unlabeled data and LDA is used to investigate low-dimensional representation of data. Therefore, by combining the subspace clustering with dimensionality reduction together, the optimal subspace can be obtained. Differing from other existing dimensionality reduction methods, our novel framework is suitable for different scenarios: supervised, semi-supervised, and unsupervised dimensionality reduction cases. Experimental results on benchmark datasets validate the effectiveness and superiority of our algorithm compared with other relevant techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. http://archive.ics.uci.edu/ml/.

References

  • Belkin, M., & Niyogi, P. (2001). Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in neural information processing systems 14 (pp. 585–591): MIT Press.

  • Cai, D., He, X., Han, J. (2007). Semi-supervised discriminant analysis. In 2007 IEEE 11th international conference on computer vision (pp. 1–7): IEEE.

  • Cai, D., Zhang, C., He, X. (2010). Unsupervised feature selection for multi-cluster data. In ACM SIGKDD international conference on knowledge discovery and data mining (pp. 333–342).

  • Chen, P., Jiao, L., Liu, F., Zhao, J., Zhao, Z., Liu, S. (2017). Semi-supervised double sparse graphs based discriminant analysis for dimensionality reduction. Pattern Recognition, 61, 361–378.

    Article  Google Scholar 

  • Cui, Y., & Fan, L. (2012). A novel supervised dimensionality reduction algorithm: graph-based fisher analysis. Pattern Recognition, 45(4), 1471–1481.

    Article  Google Scholar 

  • Delac, K., Grgic, M., Grgic, S. (2005). Independent comparative study of pca, ica, and lda on the feret data set. International Journal of Imaging Systems & Technology, 15(5), 252–260.

    Article  Google Scholar 

  • Ding, C., & Peng, H. (2005). Minimum redundancy feature selection from microarray gene expression data. Journal of Bioinformatics and Computational Biology, 3(02), 185–205.

    Article  Google Scholar 

  • Feng, Z., Yang, M., Zhang, L., Liu, Y., Zhang, D. (2013). Joint discriminative dimensionality reduction and dictionary learning for face recognition. Pattern Recognition, 46(8), 2134–2143.

    Article  Google Scholar 

  • Fukunaga, K. (1972). Introduction to statistical pattern recognition, 2nd edn. New York: Academic Press.

    MATH  Google Scholar 

  • He, X., Cai, D., Yan, S., Zhang, H.-J. (2005). Neighborhood preserving embedding. In Tenth IEEE international conference on computer vision (ICCV’05) Volume 1, (Vol. 2 pp. 1208–1213): IEEE.

  • Hoi, S., Liu, W., Lyu, M., Ma, W.-Y. (2006). Learning distance metrics with contextual constraints for image retrieval. In 2006 IEEE computer society conference on computer vision and pattern recognition, (Vol. 2 pp. 2072–2078): IEEE.

  • Hou, C., Nie, F., Li, X., Yi, D., Wu, Y. (2014). Joint embedding learning and sparse regression: a framework for unsupervised feature selection. IEEE Transactions on Cybernetics, 44(6), 793.

    Article  Google Scholar 

  • Jia, Y., Nie, F., Zhang, C. (2009). Trace ratio problem revisited. IEEE Transactions on Neural Networks, 20(4), 729–735.

    Article  Google Scholar 

  • Kokiopoulou, E., & Saad, Y. (2007). Orthogonal neighborhood preserving projections: a projection-based dimensionality reduction technique. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(12), 2143–2156.

    Article  Google Scholar 

  • Li, H., Jiang, T., Zhang, K. (2006). Efficient and robust feature extraction by maximum margin criterion. IEEE Transactions on Neural Networks, 17(1), 157–165.

    Article  Google Scholar 

  • Lin, Y.-Y., Liu, T.-L., Chen, H.-T. (2005). Semantic manifold learning for image retrieval. In Proceedings of the 13th annual ACM international conference on multimedia (pp. 249–258): ACM.

  • Liu, W., Jiang, W., Chang, S.-F. (2008). Relevance aggregation projections for image retrieval. In Proceedings of the 2008 international conference on content-based image and video retrieval (pp. 119–126): ACM.

  • Lyons, M.J., Budynek, J., Akamatsu, S. (1999). Automatic classification of single facial images. Pattern Analysis & Machine Intelligence IEEE Transactions on, 21 (12), 1357–1362.

    Article  Google Scholar 

  • Mahapatra, D. (2017). Semi-supervised learning and graph cuts for consensus based medical image segmentation. Pattern Recognition, 63, 700–709.

    Article  Google Scholar 

  • Mardia, K.V., Kent, J.T., Bibby, J.M. (2001). Multivariate analysis.

  • Nie, F., Xiang, S., Jia, Y., Zhang, C. (2009). Semi-supervised orthogonal discriminant analysis via label propagation. Pattern Recognition, 42(11), 2615–2627.

    Article  Google Scholar 

  • Nie, F., Xiang, S., Zhang, C. (2007). Neighborhood minmax projections. In International Joint Conference on Artifical Intelligence (pp. 993–998).

  • Niyogi, X. (2004). Locality preserving projections. In Neural information processing systems, (Vol. 16 p. 153): MIT.

  • Nutt, C.L., Mani, D.R., Betensky, R.A., Tamayo, P., Cairncross, J.G., Ladd, C., Pohl, U., Hartmann, C., Mclaughlin, M.E., Batchelor, T.T. (2003). Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Research, 63(7), 1602–7.

    Google Scholar 

  • Pedronette, D.C.G., Gonçalves, F.M.F., Guilherme, I.R. (2018). Unsupervised manifold learning through reciprocal knn graph and connected components for image retrieval tasks. Pattern Recognition, 75, 161–174.

    Article  Google Scholar 

  • Raducanu, B., & Dornaika, F. (2012). A supervised non-linear dimensionality reduction approach for manifold learning. Pattern Recognition, 45(6), 2432–2444.

    Article  Google Scholar 

  • Roweis, S.T., & Saul, L.K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323–2326.

    Article  Google Scholar 

  • Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1(2), 203.

    Article  Google Scholar 

  • Sugiyama, M. (2007). Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. Journal of Machine Learning Research, 8(May), 1027–1061.

    MATH  Google Scholar 

  • Sugiyama, M., Idé, T., Nakajima, S., Sese, J. (2010). Semi-supervised local fisher discriminant analysis for dimensionality reduction. Machine Learning, 78 (1-2), 35–61.

    Article  MathSciNet  Google Scholar 

  • Tenenbaum, J.B., De Silva, V., Langford, J.C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319–2323.

    Article  Google Scholar 

  • Wang, D., Nie, F., Huang, H., Yan, J., Risacher, S.L., Saykin, A.J., Shen, L. (2013). Structural brain network constrained neuroimaging marker identification for predicting cognitive functions. Inf Process Med Imaging, 23, 536–547.

    Google Scholar 

  • Wang, H., Nie, F., Huang, H., Kim, S., Nho, K., Risacher, S.L., Saykin, A.J., Shen, L. (2012). Identifying quantitative trait loci via group-sparse multitask regression and feature selection: an imaging genetics study of the adni cohort. Bioinformatics, 28(2), 229.

    Article  Google Scholar 

  • Wang, H., Nie, F., Huang, H., Risacher, S., Ding, C., Saykin, A.J., Shen, L. (2011). Sparse multi-task regression and feature selection to identify brain imaging predictors for memory performance. In International conference on computer vision (pp. 557–562).

  • Wang, H., Yan, S., Xu, D., Tang, X. (2007). Trace ratio vs. ratio trace for dimensionality reduction. In IEEE conference on computer vision and pattern recognition (pp. 1–8).

  • Wang, S., Lu, J., Gu, X., Du, H., Yang, J. (2016). Semi-supervised linear discriminant analysis for dimension reduction and classification. Pattern Recognition, 57, 179–189.

    Article  Google Scholar 

  • Wang, X., Liu, Y., Nie, F., Huang, H. (2015). Discriminative unsupervised dimensionality reduction. In Proceedings of the 24th international conference on artificial intelligence (pp. 3925–3931): AAAI Press.

  • Wu, H., & Prasad, S. (2018). Semi-supervised dimensionality reduction of hyperspectral imagery using pseudo-labels. Pattern Recognition, 74, 212–224.

    Article  Google Scholar 

  • Yan, S., Xu, D., Zhang, B., Zhang, H.-J., Yang, Q., Lin, S. (2007). Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(1), 40–51.

    Article  Google Scholar 

  • Yu, G., Zhang, G., Domeniconi, C., Yu, Z., You, J. (2012). Semi-supervised classification based on random subspace dimensionality reduction. Pattern Recognition, 45(3), 1119–1135.

    Article  Google Scholar 

  • Yu, J., & Tian, Q. (2006). Learning image manifolds by semantic subspace projection. In Proceedings of the 14th ACM international conference on multimedia (pp. 297–306): ACM.

  • Zhang, D., Zhou, Z.-H., Chen, S. (2007). Semi-supervised dimensionality reduction. In SDM, SIAM (pp. 629–634).

  • Zhang, H., Wu, Q.M.J., Chow, T.W.S., Zhao, M. (2012). A two-dimensional neighborhood preserving projection for appearance-based face recognition. Pattern Recognition, 45(5), 1866–1876.

    Article  Google Scholar 

  • Zhang, Z., Zhang, Y., Li, F., Zhao, M., Zhang, L., Yan, S. (2017). Discriminative sparse flexible manifold embedding with novel graph for robust visual representation and label propagation. Pattern Recognition, 61, 492–510.

    Article  Google Scholar 

  • Zhuang, X., & Dai, D. (2007). Improved discriminate analysis for high-dimensional data and its application to face recognition. Pattern Recognition, 40(5), 1570–1578.

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by CSC funding under grant 201806280140 and by National Natural Science Foundation of China under grant 11631012.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanni Xiao.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, T., Xiao, Y., Guo, M. et al. A General Framework for Dimensionality Reduction of K-Means Clustering. J Classif 37, 616–631 (2020). https://doi.org/10.1007/s00357-019-09342-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-019-09342-4

Keywords

Navigation