Abstract
This research adopt the method of word expansion to compose relevant features into the same semantic concept, then conduct the corresponding documents to concept clusters, and finally merge the concepts with common documents into document clusters. We expect the mechanism, the use of semantic concept to form a feature index, can reduce the problems of polysemy and synonymy. The frequent two or three sequent nouns in the same sentence are used to form a key pattern to replace the keyword as the feature of the text. The distributive strength of key patterns is measured by Pattern Frequency, Pattern Frequency-Inverse Document Frequency, Conditional Probability, Mutual Information, and Association Norm. According to the strength the agglomerate hierarchical clustering technique is applied to cluster these key patterns into semantic concepts. Then, based on the common documents between concepts, several semantic concepts are merged to a group, in which the corresponding text will be considered as topic-related. The experimental results show that our proposed text clustering based on five strength measures of key patterns are all better than the traditional VSM clustering. PFIDF is the best in average F-measure, 97.5%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Reference
Attar, R.; Fraenkel, A.S. Local Feedback in Full-Text Retrieval Systems. Journal of the ACM 1977, 24 (3), 397–417.
Baeza-Yates, R.; Ribeiro-Neto, B. Modern Information Retrieval; Addison Wesley, 1999.
Church, K.W.; Hanks, P. Word Association Norms, Mutual Information, and Lexicography. Computational Linguistics 1990,16(1), 22–29.
Fragos, K.; Maistros, Y.; Skourlas:, C. Discovering Collocations in Modern Greek Language. In Proceedings of 1st International Conference on Natural Language Understanding and Cognitive Science: Porto, Portugal, 2004,151–158.
Lee, C.-M. Vector Information Retrieval Technique with Word Bigram Relation Model. Master Thesis, Department of Information Management, Tatung University, 2004.
Lin, S.-C. Topic Extraction Based on Techniques of Term Extraction and Term Clustering. Computational Linguistics & Chinese Language Processing 2004,9,97–111.
Punj, G.; Stewart, D.W. Cluster Analysis in Marketing Research: Review and Suggestions for Application. Journal of Marketing Research 1983,20(2), 134–148.
Salton, G.; Wong, A.; Yang, CS. A Vector Space Model for Automatic Indexing Commun. ACM 1975,18(11), 613–620.
Seo, Y.-W.; Sycara, K. Text Clustering for Topic Detection, CMU-RI-TR-04-03; Robotics Institute, Carnegie Mellon University, 2004
Steels, L.; Kaplan, F.; Mclntyre, A.; Looveren, J.V. Crucial Factors in the Origins of Word-Meaning; Oxford University Press: Oxford, 2002.
Zamir, O.; Etzioni, O. Web Document Clustering: A Feasibility Demonstration. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval: Melbourne, Australia 1998;Vol. 6,46–54.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag London Limited
About this paper
Cite this paper
Yang, YJ., Yu, SH. (2007). Chinese Text Clustering for Topic Detection Based on Word Pattern Relation. In: Bramer, M., Coenen, F., Tuson, A. (eds) Research and Development in Intelligent Systems XXIII. SGAI 2006. Springer, London. https://doi.org/10.1007/978-1-84628-663-6_33
Download citation
DOI: https://doi.org/10.1007/978-1-84628-663-6_33
Publisher Name: Springer, London
Print ISBN: 978-1-84628-662-9
Online ISBN: 978-1-84628-663-6
eBook Packages: Computer ScienceComputer Science (R0)