Abstract
Clustering is an unsupervised learning technique that helps us quickly classify short texts. It works by effectively capturing the semantic themes of texts and assigning the similar texts into the same cluster. Due to the excellent ability of contrastive learning to learn representations, using contrastive learning to extract semantic features for clustering tasks has become a new trend for short text clustering. However, the existing short text clustering methods pay more attention to the global information, and lead to wrong classification for samples with ambiguous clusters. Therefore, we propose graph-based short text clustering via contrastive learning with graph embedding (GCCL) - a novel framework that leverages the affinity between samples and neighbors to impose constraints on the low-dimensional representation space. To verify the effectiveness of our method, we evaluate GCCL on short text benchmark datasets. The experimental results show that GCCL outperforms the baseline method in terms of accuracy (ACC) and normalized mutual information (NMI). In addition, our approach achieves impressive results in terms of convergence speed, demonstrating the guidance of graph embeddings for short text clustering tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Gao, T., Yao, X., Chen, D.: Simcse: simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821 (2021)
Hadifar, A., Sterckx, L., Demeester, T., Develder, C.: A self-training approach for short text clustering. In: Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), pp. 194–199 (2019)
Jiang, T., et al.: Promptbert: Improving bert sentence embeddings with prompts. arXiv preprint arXiv:2201.04337 (2022)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Mathematical Statistics and Probability, p. 281 (1965)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & webwith hidden topics from large-scale data collections. In: Proceedings of the 17th International Conference on World Wide Web, pp. 91–100 (2008)
Rakib, M.R.H., Zeh, N., Jankowska, M., Milios, E.: Enhancement of short text clustering by iterative classification. In: Métais, E., Meziane, F., Horacek, H., Cimiano, P. (eds.) NLDB 2020. LNCS, vol. 12089, pp. 105–117. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-51310-8_10
Reynolds, D.A., et al.: Gaussian mixture models. Encyclopedia of biometrics 741(659–663) (2009)
Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: International Conference on Machine Learning, pp. 478–487. PMLR (2016)
Xu, H., Xia, W., Gao, Q., Han, J., Gao, X.: Graph embedding clustering: graph attention auto-encoder with cluster-specificity distribution. Neural Netw. 142, 221–230 (2021)
Xu, J., Xu, B., Wang, P., Zheng, S., Tian, G., Zhao, J.: Self-taught convolutional neural networks for short text clustering. Neural Netw. 88, 22–31 (2017)
Yin, J., Wang, J.: A model-based approach for text clustering with outlier detection. In: 2016 IEEE 32nd International Conference on Data Engineering (ICDE), pp. 625–636. IEEE (2016)
Zhang, D., et al.: Supporting clustering with contrastive learning. arXiv preprint arXiv:2103.12953 (2021)
Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. ACM SIGMOD Rec. 25(2), 103–114 (1996)
Zhang, X., Liu, H., Wu, X.M., Zhang, X., Liu, X.: Spectral embedding network for attributed graph clustering. Neural Netw. 142, 388–396 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wei, Y. et al. (2023). Graph-Based Short Text Clustering via Contrastive Learning with Graph Embedding. In: Huang, DS., Premaratne, P., Jin, B., Qu, B., Jo, KH., Hussain, A. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2023. Lecture Notes in Computer Science, vol 14086. Springer, Singapore. https://doi.org/10.1007/978-981-99-4755-3_63
Download citation
DOI: https://doi.org/10.1007/978-981-99-4755-3_63
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-4754-6
Online ISBN: 978-981-99-4755-3
eBook Packages: Computer ScienceComputer Science (R0)