Skip to main content
Log in

Distance metric learning guided adaptive subspace semi-supervised clustering

  • Research Article
  • Published:
Frontiers of Computer Science in China Aims and scope Submit manuscript

Abstract

Most existing semi-supervised clustering algorithms are not designed for handling high-dimensional data. On the other hand, semi-supervised dimensionality reduction methods may not necessarily improve the clustering performance, due to the fact that the inherent relationship between subspace selection and clustering is ignored. In order to mitigate the above problems, we present a semi-supervised clustering algorithm using adaptive distance metric learning (SCADM) which performs semi-supervised clustering and distance metric learning simultaneously. SCADM applies the clustering results to learn a distance metric and then projects the data onto a low-dimensional space where the separability of the data is maximized. Experimental results on real-world data sets show that the proposed method can effectively deal with high-dimensional data and provides an appealing clustering performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Wagstaff K, Cardie C, Rogers S, Schroedl S. Constrained k-means clustering with background knowledge. In: Proceedings of International Conference on Machine Learning (ICML), 2001, 577–584

  2. Basu S, Bilenko M, Mooney R. A probabilistic framework for semi-supervised clustering. In: Proceedings of International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2004, 59–68

  3. Lu Z, Leen T. Semi-supervised learning with penalized probabilistic clustering. In: Advances in neural information processing systems 17 (NIPS), 2005, 849–856

  4. Xing E P, Ng A Y, Jordan M I, Russell S. Distance metric learning with application to clustering with side-information. In: Advances in Neural Information Processing Systems 15 (NIPS), 2003, 505–512

  5. Xiang S M, Nie F P, Zhang C S. Learning a Mahalanobis distance metric for data clustering and classification. Pattern Recognition, 2008, 41(12): 3600–3612

    Article  MATH  Google Scholar 

  6. Bar-Hillel A, Hertz T, Shental N, Weinshall D. Learning Distance Functions Using Equivalence Relations. In: Proceedings of International Conference on Machine Learning (ICML), 2003, 11–18

  7. Yeung D Y, Chang H. A kernel approach for semi-supervised metric learning. IEEE Transactions on Neural Networks, 2007, 18(1): 141–149

    Article  Google Scholar 

  8. Tsang I W, Cheung P M, Kwok J T. Kernel Relevant Component Analysis for Distance Metric Learning. In: Proceedings of International Joint Conference on Neural Networks (IJCNN), 2005, 954–959

  9. Yeung D Y, Chang H. Extending the relevant component analysis algorithm for metric learning using both positive and negative equivalence constraints. Pattern Recognition, 2006, 39(5): 1007–1010

    Article  MATH  Google Scholar 

  10. Zhang D Q, Zhou Z H, Chen S C. Semi-supervised Dimensionality Reduction. In: Proceedings of SIAM International Conference on Data Mining (SDM), 2007, 629–634

  11. Tang W, Xiong H, Zhong S, Wu J. Enhancing Semi-Supervised Clustering: A Feature Projection Perspective. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2007, 707–716

  12. Bilenko M, Basu S, Mooney R J. Integrating Constraints and Metric Learning in Semi-Supervised Clustering. In: Proceedings of International Conference on Machine Learning (ICML), 2004, 81–88

  13. Cai D, He X F, Han J W. Semi-Supervised Discriminant Analysis. In: Proceedings of International Conference on Computer Vision (ICCV), 2007, 1–7

  14. Jolliffe I. Principal Component Analysis. 2nd edition. Berlin: Springer, 2002

    MATH  Google Scholar 

  15. Yin X S, Hu E L, Chen S C. Discriminative semi-supervised clustering analysis with pairwise constraints. Journal of Software, 2008, 19(11): 2791–2802 (in Chinese)

    MATH  Google Scholar 

  16. Dhillon I S, Guan Y, Kulis B. A unified view of kernel k-means, spectral clustering and graph partitioning. Technical report. Department of Computer Sciences, University of Texas at Austin, 2005

  17. Tseng P. Convergence of a block coordinate descent method for nondifferentiable minimization. Journal of Optimization Theory and Applications, 2001, 109(3): 475–494

    Article  MathSciNet  MATH  Google Scholar 

  18. Martinez A M, Kak A C. PCA versus LDA. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(2): 228–233

    Article  Google Scholar 

  19. Ding C. Li T. Adaptive dimension reduction using discriminant analysis and k-means clustering. In: Proceedings of International Conference on Machine Learning (ICML), 2007, 521–528

  20. Ye J, Zhao Z, Wu M. Discriminative K-Means for Clustering. In: Advances in Neural Information Processing Systems 19 (NIPS), 2007, 1649–1656

  21. Ye J P, Zhao Z, Liu H. Adaptive Distance Metric Learning for Clustering. In: Proceedings of Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2007, 1–7

  22. Wu M. Scholkopf B. A Local Learning Approach for Clustering. In: Advances in Neural Information Processing Systems 18 (NIPS), 2006, 1529–1536

  23. Wang F, Zhang C, Li T. Clustering with Local and Global Regularization. In: Proceedings of Conference on Artificial Intelligence (AAAI), 2007, 657–662

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuesong Yin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yin, X., Hu, E. Distance metric learning guided adaptive subspace semi-supervised clustering. Front. Comput. Sci. China 5, 100–108 (2011). https://doi.org/10.1007/s11704-010-0376-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-010-0376-9

Keywords

Navigation