Abstract
Most of traditional text clustering methods are based on bag of words representation, which ignore the important information on semantic relationship between key terms. To overcome this problem, researchers have recently proposed several new methods for improving short text clustering accuracy based on enriching short text representation. However, the computational costs of these methods based on expanding words appeared in short texts are usually time-consuming. In this paper, we improve previous work by enriching short text representation with keyword expansion. Empirical results show that the proposed method can greatly save time without sacrificing clustering accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Banerjee, S., Ramanathan, K., Gupta, A.: Clustering Short Texts Using Wikipedia. In: 30th Annual International ACM SIGIR Conference on Research and Development In Information Retrieval, pp. 787–788. ACM Press, New York (2007)
Hotho, A., Staab, S., Stumme, G.: Ontologies Improve Text Document Clustering. In: Third IEEE International Conference on Data Mining. IEEE Computer Society Press, Florida (2003)
Fellbaum, C.: An Electronic Lexical Database (Language, Speech, and Communication). MIT Press, Cambridge (1998)
The Reuters-21578 benchmark corpus, http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml
Hersh, W., Buckley, C., Leone, T.J.: OHSUMED: an Interactive Retrieval Evaluation and New Large Test Collection for Research. In: 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 192–201. Springer, New York (1994)
Li, J.Z., Fan, Q., Kuo, Z.: Keyword Extraction Based on tf/idf for Chinese News Document. Wuhan University Journal of Natural Sciences (2007)
Zhao, Y., Karypis, G.: Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering. Machine Learning 55(3), 311–331 (2004)
George, H., Adam, S.: Agreement, the F-Measure, and Reliability in Information Retrieval. J. Am. Med. Inform. Assoc. 12, 296–298 (2005)
Diego, I., David, P., Paolo, R.: Evaluation of Internal Validity Measures in Short-Text Corpor. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 555–567. Springer, Heidelberg (2008)
Resnik, P.: Using Information Content to Evaluate Semantic Similarity in Taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, Canada, pp. 448–453 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Wang, J., Zhou, Y., Li, L., Hu, B., Hu, X. (2009). Improving Short Text Clustering Performance with Keyword Expansion. In: Wang, H., Shen, Y., Huang, T., Zeng, Z. (eds) The Sixth International Symposium on Neural Networks (ISNN 2009). Advances in Intelligent and Soft Computing, vol 56. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01216-7_31
Download citation
DOI: https://doi.org/10.1007/978-3-642-01216-7_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01215-0
Online ISBN: 978-3-642-01216-7
eBook Packages: EngineeringEngineering (R0)