Clustering News Articles in NewsPage.com Using NTSO

Jo, Taeho

doi:10.1007/978-3-642-10583-8_4

Taeho Jo⁶

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 64))

Included in the following conference series:

International Conference on Database Theory and Application

434 Accesses

Abstract

In this research, the NTSO (Neural Text Self Organizer) is proposed as the approach to text clustering. It is required to encode documents into numerical vectors for using a traditional approach to text clustering. The two main problems, huge dimensionality and sparse distribution are caused by encoding so. The idea of this research is to encode documents into string vectors and use the NTSO as the approach to text clustering. As the empirical validation, we will compare the NTSO with other text clustering approaches with respect to the speed and the performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On Clustering Validation Tech- niques. Journal of Intelligent Information Systems 17(2), 107–145 (2001)
Article MATH Google Scholar
Sylwester, D., Seth, S.: A trainable, singlepass algorithm for column segmenta- tion, Technical Report UNL-CSE-95-003 of the Departement of Computer Science and Engineering at University of Nebraska-Lincoln (1995)
Google Scholar
Papka, R., Allan, J.: On-Line New Event Detection using Single Pass Clustering, Technical Report UM-CS-1998-021 of the Department of Computer Science at University of Massachusetts (1998)
Google Scholar
Hatzivassiloglou, V., Gravano, L., Maganti, A.: An Investigation of Linguistic Features and Clustering Algorithms for Topical Document Clustering. In: The Proceedings of 23rd SIGIR, pp. 224–231 (2000)
Google Scholar
Hartigan, J.A., Wong, M.A.: A K-Means Clustering Algorithm. Applied Statistics 28(1), 101–108 (1979)
Article Google Scholar
Beil, F.F., Ester, M., Xu, X.: Frequent term-based text clustering. In: The Proceedings of the eighth ACM SIGKDD international conference on Knowl- edge discovery and data mining, pp. 436–442 (1994)
Google Scholar
Larsen, B., Aone, C.: Fast and Effective Text Mining Using Linear-time Doc- ument Clustering. In: The Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 16–22 (1999)
Google Scholar
Kohonen, T.: Self Organized Formation of Topologically Correct Feature Maps. Biological Cybernetics 43, 59–69 (1982)
Article MATH MathSciNet Google Scholar
Kaski, S., Honkela, T., Lagus, K., Kohonen, T.: WEBSOM-Self Organizing Maps of Document Collections. Neurocomputing 21, 101–117 (1998)
Article MATH Google Scholar
Kohonen, T., Kaski, S., Lagus, K., Salojarvi, J., Paatero, V., Saarela, A.: Self Organization of a Massive Document Collection. IEEE Transaction on Neural Networks 11(3), 574–585 (2000)
Article Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood from In- complete Data via EM algorithm. Journal of the Royal Statistics Society, Series B 39(1), 1–38 (1977)
MATH MathSciNet Google Scholar
Ambroise, C., Govaert, G.: Convergence of an EM-type algorithm for spatial clustering. Pattern Recognition Letters 19(10), 919–927 (1998)
Article Google Scholar
Vinokourov, A., Girolami, M.: A Probabilistic Hierarchical Clustering Method for Organizing Collections of Text Documents. In: The Proceedings of 15th International Conference on Pattern Recognition, pp. 182–185 (2000)
Google Scholar
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text Classification with String Kernels. Journal of Machine Learning Research 2(2), 419–444 (2002)
Article MATH Google Scholar
Jo, T., Lee, M.: The Evaluation Measure of Text Clustering for the Variable Number of Clusters. In: Liu, D., Fei, S., Hou, Z., Zhang, H., Sun, C. (eds.) ISNN 2007. LNCS, vol. 4492, pp. 871–879. Springer, Heidelberg (2007)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer and Information Engineering, Inha University, 230 Yonghyundong Namgu, Incheon, 402-751, South Korea
Taeho Jo

Authors

Taeho Jo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Warsaw and Infobright Inc., Poland
Dominik Ślęzak
Hannam University, 306-791, Daejeon, South Korea
Tai-hoon Kim
Utrecht University, The Netherlands
Yanchun Zhang
Hosei University, Tokyo, Japan
Jianhua Ma
ETRI, South Korea
Kyo-il Chung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jo, T. (2009). Clustering News Articles in NewsPage.com Using NTSO. In: Ślęzak, D., Kim, Th., Zhang, Y., Ma, J., Chung, Ki. (eds) Database Theory and Application. DTA 2009. Communications in Computer and Information Science, vol 64. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10583-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-10583-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10582-1
Online ISBN: 978-3-642-10583-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics