Skip to main content

Thematic Clustering Methods Applied to News Texts Analysis

  • Conference paper
Knowledge-Based Software Engineering (JCKBSE 2014)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 466))

Included in the following conference series:

Abstract

This paper is devoted to a problem of partition documents from the news flow into groups, where each group contains documents that are similar to each other. We use thematic clustering to solve this problem. The existing clustering algorithms such as k-means, minimum spanning tree and etc. are considered and analyzed. It is shown which of these algorithms give the best results working with news texts. Clustering is a powerful tool for text processing, but it can’t give a complete picture of news article semantics. This paper also presents a methodic of comprehensive news texts analysis based on a combination of statistical algorithms for keywords extracting and algorithms forming the semantic coherence of text blocks. Particular attention is paid to the structural features of the news texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ali, M., Dapoigny, R.: Advances in Applied Artificial Intelligence. Springer, Heidelberg (2006)

    Book  Google Scholar 

  2. Artem’ev, K.: Probabilistic method of the morphological analysis for full-text indexed search tasks. In: Proceedings of the Russian Conference of Young Scientists in Information Retrieval, RuSSIR, pp. 6–12 (2008)

    Google Scholar 

  3. Bandyopadhyay, S., Saha, S.: Unsupervised Classification. Springer, Heidelberg (2013)

    Book  MATH  Google Scholar 

  4. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, New York (1981)

    Book  MATH  Google Scholar 

  5. Bol’shakova, E.I.: Automated processing of natural language texts and computational linguistics. MIEM, Moscow (2011)

    Google Scholar 

  6. Manning, C.D., Raghavan, P.: Information Retrieval. Cambridge University Press (2008)

    Google Scholar 

  7. Dmitriev, A.S., Zaboleeva-Zotova, A.V., Orlova, Y.A., Rozaliev, V.L.: Automatic identification of time and space categories in the natural language text. In: Applied Computing 2013 Proceedings of the IADIS International Conference, Fort Worth, Texas, USA, October 23-25, pp. 187–190. IADIS (International Association for Development of the Information Society), UNT (University of North Texas) (2013)

    Google Scholar 

  8. Dmitriev, A.S., Zaboleeva-Zotova, A.V., Orlova, Y.A., Rozaliev, V.L.: Processing of Spatial and Temporal Information in the Text. In: World Applied Sciences Journal (WASJ), vol. 24(spec.issue24), pp. 133–137. Information Technologies in Modern Industry, Education & Society (2013)

    Google Scholar 

  9. Dobrov, B.V.: Basic line for news clusterization methods evaluation. In: Proceedings of the 12th Scientific Conference on Digital Libraries: Advanced Methods and Technologies, Digital Collections, RCDL 2010, Kazan, pp. 287–295 (2010)

    Google Scholar 

  10. Dostal, P., Pokorny, P.: Cluster analysis and neural network. Brno University of Technology (2007)

    Google Scholar 

  11. Grune, D.: Tokens to Syntax Tree – Syntax Analysis. Springer, New York (2012)

    Google Scholar 

  12. Kiryakov, A.: Semantic annotation, indexing, and retrieval. Web Semantics: Science, Services and Agents on the World Wide Web 2, 49–79 (2004)

    Article  Google Scholar 

  13. Lande, D.V.: Knowledge Search in INTERNET. Professional work. Dialectics, Moscow (2005)

    Google Scholar 

  14. Pera, M.S., Ng, Y.-K.D.: Using maximal spanning trees and word similarity to generate hierarchical clusters of non-redundant RSS news articles. J. Intell. Inf. Syst. 39, 513–534 (2012)

    Article  Google Scholar 

  15. Petrica, C.: Pop: The Generalized Minimum Spanning Tree Problem. University of Twente (2002)

    Google Scholar 

  16. Rozaliev, V.L., Bobkov, A.S., Orlova, Y.A., Zaboleeva-Zotova, A.V., Dmitriev, A.S.: Detailed Analysis of Postures and Gestures for the Identification of Human Emotional Reactions. In: World Applied Sciences Journal (WASJ), vol. 24(spec. issue 24), pp. 151–158. Information Technologies in Modern Industry, Education & Society (2013)

    Google Scholar 

  17. Zaboleeva-Zotova, A.V., Orlova, Y.A., Rozaliev, V.L., Fomenkov, S.A., Petrovskij, A.B.: Formalization of initial stage of designing multi-component software. In: Multi Conference on Computer Science and Information Systems 2013: Proceedings of the IADIS International Conference Intelligent Systems and Agents, Prague, Czech Republic, July 23-26. IADIS (International Association for Development of the Information Society), Prague, pp. 107–111 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Soloshenko, A.N., Orlova, Y.A., Rozaliev, V.L., Zaboleeva-Zotova, A.V. (2014). Thematic Clustering Methods Applied to News Texts Analysis. In: Kravets, A., Shcherbakov, M., Kultsova, M., Iijima, T. (eds) Knowledge-Based Software Engineering. JCKBSE 2014. Communications in Computer and Information Science, vol 466. Springer, Cham. https://doi.org/10.1007/978-3-319-11854-3_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11854-3_25

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11853-6

  • Online ISBN: 978-3-319-11854-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics