Skip to main content

Chinese Text Clustering for Topic Detection Based on Word Pattern Relation

  • Conference paper
Research and Development in Intelligent Systems XXIII (SGAI 2006)

Abstract

This research adopt the method of word expansion to compose relevant features into the same semantic concept, then conduct the corresponding documents to concept clusters, and finally merge the concepts with common documents into document clusters. We expect the mechanism, the use of semantic concept to form a feature index, can reduce the problems of polysemy and synonymy. The frequent two or three sequent nouns in the same sentence are used to form a key pattern to replace the keyword as the feature of the text. The distributive strength of key patterns is measured by Pattern Frequency, Pattern Frequency-Inverse Document Frequency, Conditional Probability, Mutual Information, and Association Norm. According to the strength the agglomerate hierarchical clustering technique is applied to cluster these key patterns into semantic concepts. Then, based on the common documents between concepts, several semantic concepts are merged to a group, in which the corresponding text will be considered as topic-related. The experimental results show that our proposed text clustering based on five strength measures of key patterns are all better than the traditional VSM clustering. PFIDF is the best in average F-measure, 97.5%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Reference

  1. Attar, R.; Fraenkel, A.S. Local Feedback in Full-Text Retrieval Systems. Journal of the ACM 1977, 24 (3), 397–417.

    Article  MATH  Google Scholar 

  2. Baeza-Yates, R.; Ribeiro-Neto, B. Modern Information Retrieval; Addison Wesley, 1999.

    Google Scholar 

  3. Church, K.W.; Hanks, P. Word Association Norms, Mutual Information, and Lexicography. Computational Linguistics 1990,16(1), 22–29.

    Google Scholar 

  4. Fragos, K.; Maistros, Y.; Skourlas:, C. Discovering Collocations in Modern Greek Language. In Proceedings of 1st International Conference on Natural Language Understanding and Cognitive Science: Porto, Portugal, 2004,151–158.

    Google Scholar 

  5. Lee, C.-M. Vector Information Retrieval Technique with Word Bigram Relation Model. Master Thesis, Department of Information Management, Tatung University, 2004.

    Google Scholar 

  6. Lin, S.-C. Topic Extraction Based on Techniques of Term Extraction and Term Clustering. Computational Linguistics & Chinese Language Processing 2004,9,97–111.

    Google Scholar 

  7. Punj, G.; Stewart, D.W. Cluster Analysis in Marketing Research: Review and Suggestions for Application. Journal of Marketing Research 1983,20(2), 134–148.

    Article  Google Scholar 

  8. Salton, G.; Wong, A.; Yang, CS. A Vector Space Model for Automatic Indexing Commun. ACM 1975,18(11), 613–620.

    Article  Google Scholar 

  9. Seo, Y.-W.; Sycara, K. Text Clustering for Topic Detection, CMU-RI-TR-04-03; Robotics Institute, Carnegie Mellon University, 2004

    Google Scholar 

  10. Steels, L.; Kaplan, F.; Mclntyre, A.; Looveren, J.V. Crucial Factors in the Origins of Word-Meaning; Oxford University Press: Oxford, 2002.

    Google Scholar 

  11. Zamir, O.; Etzioni, O. Web Document Clustering: A Feasibility Demonstration. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval: Melbourne, Australia 1998;Vol. 6,46–54.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag London Limited

About this paper

Cite this paper

Yang, YJ., Yu, SH. (2007). Chinese Text Clustering for Topic Detection Based on Word Pattern Relation. In: Bramer, M., Coenen, F., Tuson, A. (eds) Research and Development in Intelligent Systems XXIII. SGAI 2006. Springer, London. https://doi.org/10.1007/978-1-84628-663-6_33

Download citation

  • DOI: https://doi.org/10.1007/978-1-84628-663-6_33

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84628-662-9

  • Online ISBN: 978-1-84628-663-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics