On the Chinese Document Clustering Based on Dynamical Term Clustering

Tseng, Chih-Ming; Tsai, Kun-Hsiu; Hsu, Chiun-Chieh; Chang, His-Cheng

doi:10.1007/11562382_46

Chih-Ming Tseng^20,21,
Kun-Hsiu Tsai^20,21,
Chiun-Chieh Hsu^20,21 &
…
His-Cheng Chang^20,21

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3689))

Included in the following conference series:

Asia Information Retrieval Symposium

1003 Accesses
1 Citations

Abstract

With the rapid development of global networking through the network, more and more information is accessible on-line. It makes the document clustering technique more dispensable. With the clustering process we can efficiently browse the large information. In this paper, we focus on Chinese document clustering process, which uses data mining technique and neural network model. There are two main phases: preprocessing phase and clustering phase. In the preprocessing phase, we propose another Chinese sentence segmentation method, which based on data mining technique of using a hash-based method. In the clustering phase, we adopt the dynamical SOM model with a view to dynamically clustering data. Furthermore, we use term vectors clustering process instead of document vectors clustering process. Our experiments demonstrate that the term clustering results in better precision rate, and the term clustering will be more efficiently when the amount of documents grows gradually.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kohonen, T.: Self Organizing Maps, 3rd edn. Springer, Heidelberg (2001)
MATH Google Scholar
Kohonen, T., Kaski, S., Lagus, K., Salojarvi, J., Honkela, J., Paatero, V., Saarela, A.: Self Organization of a Massive Document Collection. IEEE Transactions on Neural Networks, Special Issue on Neural Networks for Data Mining and Knowledge Discovery 11, 574–585 (2000)
Google Scholar
Kohonen, T.: Self-organization of very large document collections: State of the art. In: Proceedings of ICANN, vol. 1, pp. 65–74 (1998)
Google Scholar
Kowalski, G.: Information Retrieval System—Theory and Implementation. Kluwer Academic Publishers, Dordrecht (1997)
Google Scholar
Park, J.S., Chen, M.-S., Yu, P.S.: Using a Hash-Based Method with Transaction Trimming for Mining Association Rules. IEEE Transactions On Knowledge And Data Engineering 9(5), 813–825 (1997)
Article Google Scholar
van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Butter-worths, London (1979)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Management, National Taiwan University of Science and Technology,
Chih-Ming Tseng, Kun-Hsiu Tsai, Chiun-Chieh Hsu & His-Cheng Chang
Department of Information Management, Jin-Wen Institute of Technology, Taipei, Taiwan
Chih-Ming Tseng, Kun-Hsiu Tsai, Chiun-Chieh Hsu & His-Cheng Chang

Authors

Chih-Ming Tseng
View author publications
You can also search for this author in PubMed Google Scholar
Kun-Hsiu Tsai
View author publications
You can also search for this author in PubMed Google Scholar
Chiun-Chieh Hsu
View author publications
You can also search for this author in PubMed Google Scholar
His-Cheng Chang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Pohang University of Science and Technology, San 31, Hyoja-dong, Nam-gu, 790-784, Pohang, Korea
Gary Geunbae Lee
Computer and Communication Media Research, NEC Corp., Miyazaki 4-1-1, Miyamae-ku, 216-8555, Kawasaki, Japan
Akio Yamada
Human-Computer Communications Laboratory, Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong
Helen Meng
School of Engineering, Information and Communications University, 119, Munjiro, Yuseong-gu, 305-732, Daejeon, Korea
Sung Hyon Myaeng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tseng, CM., Tsai, KH., Hsu, CC., Chang, HC. (2005). On the Chinese Document Clustering Based on Dynamical Term Clustering. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.H. (eds) Information Retrieval Technology. AIRS 2005. Lecture Notes in Computer Science, vol 3689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562382_46

Download citation

DOI: https://doi.org/10.1007/11562382_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29186-2
Online ISBN: 978-3-540-32001-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics