Instant Message Clustering Based on Extended Vector Space Model

Wang, Le; Jia, Yan; Han, Weihong

doi:10.1007/978-3-540-74581-5_48

Le Wang¹,
Yan Jia¹ &
Weihong Han¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4683))

Included in the following conference series:

International Symposium on Intelligence Computation and Applications

1590 Accesses

Abstract

Instant intercommunion techniques such as Instant Messaging (IM) are widely popularized. Aiming at such kind of large scale mass communication media, clustering on its text content is a practical method to analyze the characteristic of text content in instant messages, and find or track the social hot topics. However, key words in one instant message usually are few, even latent; moreover, single message can not describe the conversational context. This is very different from general document and makes common clustering algorithms unsuitable. A novel method called WR − KMeans is proposed, which synthesizes related instant messages as a conversation and enriches conversation’s vector by words which are not included in this conversation but are closely related with existing words in this conversation. WR − KMeans performs clustering like k-means on this extended vector space of conversations. Experiments on the public datasets show that WR − KMeans outperforms the traditional k-means and bisecting k-means algorithms.

This project is sponsored by national 863 high technology development foundation (No. 2006AA01Z451, No.2006AA10Z237).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Sentence Clustering Using Continuous Vector Space Representation

Text mining using nonnegative matrix factorization and latent semantic analysis

Article 21 April 2021

Topic Clustering of Social Media Using Multilayer Text Analysis

References

Resig, J., Teredesai, A.: A framework for mining instant messaging services. In: Proceedings of the 2004 SIAM Lake Buena Vista, Florida (2004)
Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th berkeley SMSP, pp. 281–297 (1967)
Google Scholar
Guan, Y., et al.: Quantifying Semantic Similarity of Chinese Words from Hownet. In: IEEE Proceedings of ICMLC 2002, Beijing, vol. 1, pp. 234–239. IEEE Computer Society Press, Los Alamitos (2002)
Google Scholar
Sack, et al.: A Content-Based Usenet Newsgroup Browser. In: Proceedings of the international conference on Intelligent user interfaces, New Orleans, Louisianna, pp. 233–240 (2000)
Google Scholar
Khan, F.M., Fisher, T.A., Shuler, L., Wu, T., Pottenger, W.M.: Mining chat-room conversations for social and semantic interactions (2002)
Google Scholar
Hearst, M.A.: TextTiling: A Quantitative Approach to Discourse Segmentation, Technical Report UCB: S2K-93-24 (1993)
Google Scholar
Deerwester, S., et al.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Article Google Scholar
Ding, C.H.Q.: A probabilistic model for dimensionality reduction in information retrieval and filtering. In: Proc. of the 1st SIAM, Raleigh, NC (2000)
Google Scholar
Ikehara, S., et al.: Vector space model based on semantic attributes of words. In: PACLING. Proc. of the Pacific Association for Computational Linguistics, Kitakyushu, Japan (2001)
Google Scholar
Daemi, A., et al.: From Ontologies to Trust through Entropy. In: Proceedings of the International Conference on Advances in Intelligent System, Luxembourg (2004)
Google Scholar
Hotho, A., et al.: Ontology-based Text Document Clustering. KI 16(4), 48–54 (2002)
Google Scholar
Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining partitions. Journal of Machine Learning Research 3, 583–617 (2002)
Article Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer School, National University of Defense Technology, Changsha, China
Le Wang, Yan Jia & Weihong Han

Authors

Le Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yan Jia
View author publications
You can also search for this author in PubMed Google Scholar
Weihong Han
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Lishan Kang Yong Liu Sanyou Zeng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, L., Jia, Y., Han, W. (2007). Instant Message Clustering Based on Extended Vector Space Model. In: Kang, L., Liu, Y., Zeng, S. (eds) Advances in Computation and Intelligence. ISICA 2007. Lecture Notes in Computer Science, vol 4683. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74581-5_48

Download citation

DOI: https://doi.org/10.1007/978-3-540-74581-5_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74580-8
Online ISBN: 978-3-540-74581-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics