Skip to main content

Clustering News Articles in NewsPage.com Using NTSO

  • Conference paper
Database Theory and Application (DTA 2009)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 64))

Included in the following conference series:

  • 434 Accesses

Abstract

In this research, the NTSO (Neural Text Self Organizer) is proposed as the approach to text clustering. It is required to encode documents into numerical vectors for using a traditional approach to text clustering. The two main problems, huge dimensionality and sparse distribution are caused by encoding so. The idea of this research is to encode documents into string vectors and use the NTSO as the approach to text clustering. As the empirical validation, we will compare the NTSO with other text clustering approaches with respect to the speed and the performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On Clustering Validation Tech- niques. Journal of Intelligent Information Systems 17(2), 107–145 (2001)

    Article  MATH  Google Scholar 

  2. Sylwester, D., Seth, S.: A trainable, singlepass algorithm for column segmenta- tion, Technical Report UNL-CSE-95-003 of the Departement of Computer Science and Engineering at University of Nebraska-Lincoln (1995)

    Google Scholar 

  3. Papka, R., Allan, J.: On-Line New Event Detection using Single Pass Clustering, Technical Report UM-CS-1998-021 of the Department of Computer Science at University of Massachusetts (1998)

    Google Scholar 

  4. Hatzivassiloglou, V., Gravano, L., Maganti, A.: An Investigation of Linguistic Features and Clustering Algorithms for Topical Document Clustering. In: The Proceedings of 23rd SIGIR, pp. 224–231 (2000)

    Google Scholar 

  5. Hartigan, J.A., Wong, M.A.: A K-Means Clustering Algorithm. Applied Statistics 28(1), 101–108 (1979)

    Article  Google Scholar 

  6. Beil, F.F., Ester, M., Xu, X.: Frequent term-based text clustering. In: The Proceedings of the eighth ACM SIGKDD international conference on Knowl- edge discovery and data mining, pp. 436–442 (1994)

    Google Scholar 

  7. Larsen, B., Aone, C.: Fast and Effective Text Mining Using Linear-time Doc- ument Clustering. In: The Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 16–22 (1999)

    Google Scholar 

  8. Kohonen, T.: Self Organized Formation of Topologically Correct Feature Maps. Biological Cybernetics 43, 59–69 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  9. Kaski, S., Honkela, T., Lagus, K., Kohonen, T.: WEBSOM-Self Organizing Maps of Document Collections. Neurocomputing 21, 101–117 (1998)

    Article  MATH  Google Scholar 

  10. Kohonen, T., Kaski, S., Lagus, K., Salojarvi, J., Paatero, V., Saarela, A.: Self Organization of a Massive Document Collection. IEEE Transaction on Neural Networks 11(3), 574–585 (2000)

    Article  Google Scholar 

  11. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood from In- complete Data via EM algorithm. Journal of the Royal Statistics Society, Series B 39(1), 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  12. Ambroise, C., Govaert, G.: Convergence of an EM-type algorithm for spatial clustering. Pattern Recognition Letters 19(10), 919–927 (1998)

    Article  Google Scholar 

  13. Vinokourov, A., Girolami, M.: A Probabilistic Hierarchical Clustering Method for Organizing Collections of Text Documents. In: The Proceedings of 15th International Conference on Pattern Recognition, pp. 182–185 (2000)

    Google Scholar 

  14. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text Classification with String Kernels. Journal of Machine Learning Research 2(2), 419–444 (2002)

    Article  MATH  Google Scholar 

  15. Jo, T., Lee, M.: The Evaluation Measure of Text Clustering for the Variable Number of Clusters. In: Liu, D., Fei, S., Hou, Z., Zhang, H., Sun, C. (eds.) ISNN 2007. LNCS, vol. 4492, pp. 871–879. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jo, T. (2009). Clustering News Articles in NewsPage.com Using NTSO. In: Ślęzak, D., Kim, Th., Zhang, Y., Ma, J., Chung, Ki. (eds) Database Theory and Application. DTA 2009. Communications in Computer and Information Science, vol 64. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10583-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-10583-8_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-10582-1

  • Online ISBN: 978-3-642-10583-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics