Abstract
This paper is devoted to a problem of partition documents from the news flow into groups, where each group contains documents that are similar to each other. We use thematic clustering to solve this problem. The existing clustering algorithms such as k-means, minimum spanning tree and etc. are considered and analyzed. It is shown which of these algorithms give the best results working with news texts. Clustering is a powerful tool for text processing, but it can’t give a complete picture of news article semantics. This paper also presents a methodic of comprehensive news texts analysis based on a combination of statistical algorithms for keywords extracting and algorithms forming the semantic coherence of text blocks. Particular attention is paid to the structural features of the news texts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ali, M., Dapoigny, R.: Advances in Applied Artificial Intelligence. Springer, Heidelberg (2006)
Artem’ev, K.: Probabilistic method of the morphological analysis for full-text indexed search tasks. In: Proceedings of the Russian Conference of Young Scientists in Information Retrieval, RuSSIR, pp. 6–12 (2008)
Bandyopadhyay, S., Saha, S.: Unsupervised Classification. Springer, Heidelberg (2013)
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, New York (1981)
Bol’shakova, E.I.: Automated processing of natural language texts and computational linguistics. MIEM, Moscow (2011)
Manning, C.D., Raghavan, P.: Information Retrieval. Cambridge University Press (2008)
Dmitriev, A.S., Zaboleeva-Zotova, A.V., Orlova, Y.A., Rozaliev, V.L.: Automatic identification of time and space categories in the natural language text. In: Applied Computing 2013 Proceedings of the IADIS International Conference, Fort Worth, Texas, USA, October 23-25, pp. 187–190. IADIS (International Association for Development of the Information Society), UNT (University of North Texas) (2013)
Dmitriev, A.S., Zaboleeva-Zotova, A.V., Orlova, Y.A., Rozaliev, V.L.: Processing of Spatial and Temporal Information in the Text. In: World Applied Sciences Journal (WASJ), vol. 24(spec.issue24), pp. 133–137. Information Technologies in Modern Industry, Education & Society (2013)
Dobrov, B.V.: Basic line for news clusterization methods evaluation. In: Proceedings of the 12th Scientific Conference on Digital Libraries: Advanced Methods and Technologies, Digital Collections, RCDL 2010, Kazan, pp. 287–295 (2010)
Dostal, P., Pokorny, P.: Cluster analysis and neural network. Brno University of Technology (2007)
Grune, D.: Tokens to Syntax Tree – Syntax Analysis. Springer, New York (2012)
Kiryakov, A.: Semantic annotation, indexing, and retrieval. Web Semantics: Science, Services and Agents on the World Wide Web 2, 49–79 (2004)
Lande, D.V.: Knowledge Search in INTERNET. Professional work. Dialectics, Moscow (2005)
Pera, M.S., Ng, Y.-K.D.: Using maximal spanning trees and word similarity to generate hierarchical clusters of non-redundant RSS news articles. J. Intell. Inf. Syst. 39, 513–534 (2012)
Petrica, C.: Pop: The Generalized Minimum Spanning Tree Problem. University of Twente (2002)
Rozaliev, V.L., Bobkov, A.S., Orlova, Y.A., Zaboleeva-Zotova, A.V., Dmitriev, A.S.: Detailed Analysis of Postures and Gestures for the Identification of Human Emotional Reactions. In: World Applied Sciences Journal (WASJ), vol. 24(spec. issue 24), pp. 151–158. Information Technologies in Modern Industry, Education & Society (2013)
Zaboleeva-Zotova, A.V., Orlova, Y.A., Rozaliev, V.L., Fomenkov, S.A., Petrovskij, A.B.: Formalization of initial stage of designing multi-component software. In: Multi Conference on Computer Science and Information Systems 2013: Proceedings of the IADIS International Conference Intelligent Systems and Agents, Prague, Czech Republic, July 23-26. IADIS (International Association for Development of the Information Society), Prague, pp. 107–111 (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Soloshenko, A.N., Orlova, Y.A., Rozaliev, V.L., Zaboleeva-Zotova, A.V. (2014). Thematic Clustering Methods Applied to News Texts Analysis. In: Kravets, A., Shcherbakov, M., Kultsova, M., Iijima, T. (eds) Knowledge-Based Software Engineering. JCKBSE 2014. Communications in Computer and Information Science, vol 466. Springer, Cham. https://doi.org/10.1007/978-3-319-11854-3_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-11854-3_25
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11853-6
Online ISBN: 978-3-319-11854-3
eBook Packages: Computer ScienceComputer Science (R0)