Chinese Text Clustering for Topic Detection Based on Word Pattern Relation

Yang, Yen-Ju; Yu, Su-Hsin

doi:10.1007/978-1-84628-663-6_33

Yen-Ju Yang⁴ &
Su-Hsin Yu⁴

Included in the following conference series:

International Conference on Innovative Techniques and Applications of Artificial Intelligence

519 Accesses
1 Citations

Abstract

This research adopt the method of word expansion to compose relevant features into the same semantic concept, then conduct the corresponding documents to concept clusters, and finally merge the concepts with common documents into document clusters. We expect the mechanism, the use of semantic concept to form a feature index, can reduce the problems of polysemy and synonymy. The frequent two or three sequent nouns in the same sentence are used to form a key pattern to replace the keyword as the feature of the text. The distributive strength of key patterns is measured by Pattern Frequency, Pattern Frequency-Inverse Document Frequency, Conditional Probability, Mutual Information, and Association Norm. According to the strength the agglomerate hierarchical clustering technique is applied to cluster these key patterns into semantic concepts. Then, based on the common documents between concepts, several semantic concepts are merged to a group, in which the corresponding text will be considered as topic-related. The experimental results show that our proposed text clustering based on five strength measures of key patterns are all better than the traditional VSM clustering. PFIDF is the best in average F-measure, 97.5%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Reference

Attar, R.; Fraenkel, A.S. Local Feedback in Full-Text Retrieval Systems. Journal of the ACM 1977, 24 (3), 397–417.
Article MATH Google Scholar
Baeza-Yates, R.; Ribeiro-Neto, B. Modern Information Retrieval; Addison Wesley, 1999.
Google Scholar
Church, K.W.; Hanks, P. Word Association Norms, Mutual Information, and Lexicography. Computational Linguistics 1990,16(1), 22–29.
Google Scholar
Fragos, K.; Maistros, Y.; Skourlas:, C. Discovering Collocations in Modern Greek Language. In Proceedings of 1st International Conference on Natural Language Understanding and Cognitive Science: Porto, Portugal, 2004,151–158.
Google Scholar
Lee, C.-M. Vector Information Retrieval Technique with Word Bigram Relation Model. Master Thesis, Department of Information Management, Tatung University, 2004.
Google Scholar
Lin, S.-C. Topic Extraction Based on Techniques of Term Extraction and Term Clustering. Computational Linguistics & Chinese Language Processing 2004,9,97–111.
Google Scholar
Punj, G.; Stewart, D.W. Cluster Analysis in Marketing Research: Review and Suggestions for Application. Journal of Marketing Research 1983,20(2), 134–148.
Article Google Scholar
Salton, G.; Wong, A.; Yang, CS. A Vector Space Model for Automatic Indexing Commun. ACM 1975,18(11), 613–620.
Article Google Scholar
Seo, Y.-W.; Sycara, K. Text Clustering for Topic Detection, CMU-RI-TR-04-03; Robotics Institute, Carnegie Mellon University, 2004
Google Scholar
Steels, L.; Kaplan, F.; Mclntyre, A.; Looveren, J.V. Crucial Factors in the Origins of Word-Meaning; Oxford University Press: Oxford, 2002.
Google Scholar
Zamir, O.; Etzioni, O. Web Document Clustering: A Feasibility Demonstration. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval: Melbourne, Australia 1998;Vol. 6,46–54.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Management, Tatung University, Taiwan
Yen-Ju Yang & Su-Hsin Yu

Authors

Yen-Ju Yang
View author publications
You can also search for this author in PubMed Google Scholar
Su-Hsin Yu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Technology, University of Portsmouth, Portsmouth, UK
Max Bramer BSc, PhD, CEng, FBCS, FIEE, FRSA
Department of Computer Science, University of Liverpool, Liverpool, UK
Frans Coenen PhD
Department of Computing, City University, London
Andrew Tuson MA, MSc, PhD, MBCS

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, YJ., Yu, SH. (2007). Chinese Text Clustering for Topic Detection Based on Word Pattern Relation. In: Bramer, M., Coenen, F., Tuson, A. (eds) Research and Development in Intelligent Systems XXIII. SGAI 2006. Springer, London. https://doi.org/10.1007/978-1-84628-663-6_33

Download citation

DOI: https://doi.org/10.1007/978-1-84628-663-6_33
Publisher Name: Springer, London
Print ISBN: 978-1-84628-662-9
Online ISBN: 978-1-84628-663-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics