Semi-supervised hybrid clustering by integrating Gaussian mixture model and distance metric learning

Zhang, Yihao; Wen, Junhao; Wang, Xibin; Jiang, Zhuo

doi:10.1007/s10844-013-0264-5

Semi-supervised hybrid clustering by integrating Gaussian mixture model and distance metric learning

Published: 16 July 2013

Volume 45, pages 113–130, (2015)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Yihao Zhang¹,
Junhao Wen^1,2,
Xibin Wang¹ &
…
Zhuo Jiang¹

737 Accesses
5 Citations
Explore all metrics

Abstract

Semi-supervised clustering aim to aid and bias the unsupervised clustering by employing a small amount of supervised information. The supervised information is generally given as pairwise constraints, which was used to either modify the objective function or to learn the distance measure. Many previous work have shown that the cluster algorithm based on distance metric is significantly better than the cluster algorithm based on probability distribution in the some data set, there are a totally opposite result in another data set, so how to balance the two methods become a key problem. In this paper, we proposed a semi-supervised hybrid clustering algorithm that provides a principled framework integrating distance metric into Gaussian mixture model, which consider not only the intrinsic geometry information but also the probability distribution information of the data. In comparison to only using the pairwise constraints, the labeled data was used to initialize Gaussian distribution parameter and to construct the weight matrix of regularizer, and then we adopt Kullback-Leibler Divergence as the “distance” measurement to regularize the objective function. Experiments on several UCI data sets and the real world data sets of Chinese Word Sense Induction demonstrate the effectiveness of our semi-supervised cluster algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

KPML: A Novel Probabilistic Perspective Kernel Mahalanobis Distance Metric Learning Model for Semi-supervised Clustering

UNIT: A unified metric learning framework based on maximum entropy regularization

Article 26 July 2023

Semi-supervised Learning with Local and Global Consistency by Geodesic Distance and Sparse Representation

References

Basu, S., Banerjee, A., Mooney, R. (2002). Semi-supervised clustering by seeding[C]. In Proceedings of 19th international conference on machine learning (pp. 19–26).
Belkin, M., Niyogi, P., Sindhwani, V. (2006). Manifold regularization: a geometric framework for learning from labeled and unlabeled examples [J]. Journal of Machine Learning Research, 7, 2399–2434.
MathSciNet MATH Google Scholar
Bilenko, M., Basu, S., Mooney, R.J. (2004). Integrating constraints and metric learning in semi-supervised clustering [C]. In Proceedings of the 21th international conference on machine learning (pp. 81–88).
Bonifati, A., & Cuzzocrea, A. (2006). Storing and retrieving Xpath fragments in structured P2P networks [J]. Data & Knowledge Engineering, 59(2), 247–269.
Article Google Scholar
Cai, D., He, X.F., Han, J.W. (2010). Locally consistent concept factorization for document clustering [J]. IEEE Transactions on Knowledge and Data Engineering, 23(6), 902–913.
Article Google Scholar
Chandra, B., & Gupta, M. (2013). A novel approach for distance-based semi-supervised clustering using functional link neural network [J]. Soft Computing, 17(3), 369–379.
Article Google Scholar
Chang, C.C., & Chen, H.Y. (2012). Semi-supervised clustering with discriminative random fields [J]. Pattern Recognition, 45(12), 4402–4413.
Article MATH Google Scholar
Cheung, Y.M, & Zeng, H. (2012). Semi-supervised maximum margin clustering with pairwise constraints [J]. IEEE Transactions on Knowledge and Data Engineering, 24(5), 926–939.
Article Google Scholar
Cohn, D., Caruana, R., McCallum, A. (2003). Semi-supervised clustering with user feedback. Technical Report TR2003-1892, Cornell University.
Cuzzocrea, A., Furfaro, F., et al. (2004). A grid framework for approximate aggregate query answering on summarized sensor network readings [C]. In On the move to meaningful internet systems (pp. 144–153).
da Costa, A.F.B.F., Pimentel, B.A., de Souza R.M.C.R. (2013). Clustering interval data through kernel-induced feature space [J]. Journal of Intelligent Information Systems, 40(1), 109–140.
Article Google Scholar
Demiriz, A., Bennett, K.P., Embrechts, M.J. (1999). Semi-supervised clustering using genetic algorithms [C]. In Proceedings of artificial neural networks in engineering (ANNIE-99) (pp. 809–814).
Dempster, A.P., Laird, N.M., Rubin, D.B. (1997). Maximum likelihood from incomplete data via the EM algorithm [J]. Journal of the Royal Statistical Society, Series B, 39(1), 1–38.
MathSciNet Google Scholar
Figueiredo, M.A., & Jain, A.K. (2002). Unsupervised learning of finite mixture models [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 381–396.
Article MATH Google Scholar
Grira, N., Crucianu, M., Boujemaa, N. (2005). Unsupervised and semi-supervised clustering: A brief survey. In A review of machine learning techniques for processing multimedia content. Report of the MUSCLE European Network of Excellence (6th Framework Programme).
He, X.F., Cai, D., Shao, Y.L., et al. (2011). Laplacian regularized Gaussian mixture model for data clustering [J]. IEEE Transactions on Knowledge and Data Engineering, 23(9), 1406–1418.
Article Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J. (1999). Data clustering: a review [J]. ACM Computing Surveys, 31(3), 264–323.
Article Google Scholar
Klein, D., Kamvar, S.D., Manning, C.D. (2002). From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering [C]. In Proceedings of the 19th international conference on machine learning (ICML-02) (pp. 307–314).
Kulis, B., Basu, S., Dhillon, I., et al. (2009). Semi-supervised graph clustering: a kernel approach [J]. Machine Learning, 74(1), 1–22.
Article Google Scholar
Luxburg, U.V. (2007). A tutorial on spectral clustering [J]. Statistics and Computing, 17(4), 395–416.
Article MathSciNet Google Scholar
Macqueen, J. (1965). Some methods for classification and analysis of multivariate observations [C]. In Proceedings of the 5th Berkeley symposium on mathematical statistics and probability (pp. 281–297).
Ng, A.Y., Jordan, M.I., Weiss, Y. (2001). On spectral clustering: analysis and an algorithm [J]. Advances in Neural Information Processing Systems, 14, 849–856.
Google Scholar
Ruiz, C., Spiliopoulou, M., Menasalvas, E. (2010). Density-based semi-supervised clustering [J]. Data Mining and Knowledge Discovery, 21(3), 345–370.
Article MathSciNet Google Scholar
Theobald, M. (2013). The program of the svmlight algorithm. http://www.mpi-inf.mpg.de/~mtb/svmlight/JNI_SVM-light-6.01.zip. Accessed 4 Mar 2013.
Tong, B., Shao, H., Chou B.H., et al. (2012). Linear semi-supervised projection clustering by transferred centroid regularization [J]. Journal of Intelligent Information Systems, 39(2), 461–490.
Article Google Scholar
Wagstaff, K., & Cardie, C. (2000). Clustering with instance-level constraints [C]. In Proceedings of the 17th international conference on machine learning (pp. 1103–1110).
Wan, M., Li, L.X., Xiao, J.H., et al. (2012). Data clustering using bacterial foraging optimization [J]. Journal of Intelligent Information Systems, 38(2), 321–341.
Article Google Scholar
Wang, X., Rostoker, C., Hamilton, H.J., et al. (2012). A density-based spatial clustering for physical constraints [J]. Journal of Intelligent Information Systems, 38(1), 269–297.
Article Google Scholar
Witten, I.H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques. San Francisco: Morgan Kaufmann. http://prdownloads.sourceforge.net/weka/datasets-UCI.jar.
Google Scholar
Xing, E.P., Ng, A.Y., Jordan, M.I., et al. (2003). Distance metric learning with application to clustering with side-information [C]. In Proceedings of the conference on advances in neural information processing systems (NIPS) (pp. 505–512).
Xu, R., & Wunsch, D. II. (2005). Survey of clustering algorithms [J]. IEEE Transactions on Neural Networks, 16(3), 645–678.
Article Google Scholar
Xu, L., Neufeld, J., Larson, B, et al. (2005). Maximum margin clustering [J]. Advances in Neural Information Processing Systems, 17, 1537–1544.
Google Scholar
Yin, X.S., Chen, S.C., Hu, E.L., et al. (2010). Semi-supervised clustering with metric learning: an adaptive Kernel method [J]. Pattern Recognition, 43(4), 1320–1333.
Article Google Scholar
Yin, X.S., Shu, T., Huang, Q. (2012). Semi-supervised fuzzy clustering with metric learning and entropy regularization [J]. Knowledge-Based Systems, 35, 304–311.
Article Google Scholar
Zhao, Y., & Kapypis, G. (2005). Hierarchical clustering algorithms for document datasets [J]. Data Mining and Knowledge Discovery, 10(2), 141–168.
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions to significantly improve the quality of this paper. The research reported in this paper has been partially supported by National Science Foundations of China under Grant Nos. 61075053 and 71102065, the Ph.D. Programs Foundation of Ministry of Education of China No. 20120191110028, and the Fundamental Research Funds for the Central Universities Project No. CDJZR10090001.

Author information

Authors and Affiliations

College of Computer Science, Chongqing University, Chongqing, 400030, China
Yihao Zhang, Junhao Wen, Xibin Wang & Zhuo Jiang
College of Software Engineering, Chongqing University, Chongqing, 400030, China
Junhao Wen

Authors

Yihao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Junhao Wen
View author publications
You can also search for this author in PubMed Google Scholar
Xibin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhuo Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junhao Wen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Y., Wen, J., Wang, X. et al. Semi-supervised hybrid clustering by integrating Gaussian mixture model and distance metric learning. J Intell Inf Syst 45, 113–130 (2015). https://doi.org/10.1007/s10844-013-0264-5

Download citation

Received: 05 December 2012
Revised: 28 May 2013
Accepted: 01 July 2013
Published: 16 July 2013
Issue Date: August 2015
DOI: https://doi.org/10.1007/s10844-013-0264-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semi-supervised hybrid clustering by integrating Gaussian mixture model and distance metric learning

Abstract

Access this article

Similar content being viewed by others

KPML: A Novel Probabilistic Perspective Kernel Mahalanobis Distance Metric Learning Model for Semi-supervised Clustering

UNIT: A unified metric learning framework based on maximum entropy regularization

Semi-supervised Learning with Local and Global Consistency by Geodesic Distance and Sparse Representation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Semi-supervised hybrid clustering by integrating Gaussian mixture model and distance metric learning

Abstract

Access this article

Similar content being viewed by others

KPML: A Novel Probabilistic Perspective Kernel Mahalanobis Distance Metric Learning Model for Semi-supervised Clustering

UNIT: A unified metric learning framework based on maximum entropy regularization

Semi-supervised Learning with Local and Global Consistency by Geodesic Distance and Sparse Representation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation