Abstract
Clustering of text documents is an important data mining issue and has wide application fields. However, many clustering approaches fail to yield high clustering quality because of the complex document semantics. Recently, semantic smoothing, which has been widely studied in the field of Information Retrieval, is proposed as an efficient solution. However, the existing semantic smoothing methods are not effective for partitional clustering. In this paper, based on the principle of TF*IDF schema, we propose an improved semantic smoothing method which is suitable for both agglomerative and partitional clustering. The experimental results show our method is more effective than the previous methods in terms of cluster quality.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Zhang, X., Zhou, X., Hu, X.: Semantic Smoothing for Model-based Document Clustering. In: Proc. IEEE ICDM, pp. 1193–1198. IEEE Computer Society Press, Los Alamitos (2006)
Zhai, C., Lafferty, J.: A Study of Smoothing Methods for Language Models Applied to Ad hoc Information Retrieval. In: Proc. ACM SIGIR, pp. 334–342. ACM Press, New York (2001)
Zhou, X., et al.: Context-Sensitive Semantic Smoothing for the Language Modeling Approach to Genomic IR. In: Proc. ACM SIGIR, pp. 170–177. ACM Press, New York (2006)
Zhong, S., Ghosh, J.: Generative model-based document clustering: A comparative study. Knowledge and Information Systems 8(3), 374–384 (2005)
Kullback, S., Leibler, R.A.: On information and sufficiency. Annals of Mathematical Statistics 22(1), 79–86 (1951)
Steinbach, M., Karypis, G., Kumar, V.: A Comparison of Document Clustering Techniques. In: Proc. of Text Mining Workshop, KDD 2000, pp. 1–20 (2000)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, Y., Cai, J., Yin, J., Huang, Z. (2007). Document Clustering Based on Semantic Smoothing Approach. In: Wegrzyn-Wolska, K.M., Szczepaniak, P.S. (eds) Advances in Intelligent Web Mastering. Advances in Soft Computing, vol 43. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72575-6_35
Download citation
DOI: https://doi.org/10.1007/978-3-540-72575-6_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72574-9
Online ISBN: 978-3-540-72575-6
eBook Packages: EngineeringEngineering (R0)