Abstract
The document clustering is an important technique of Natural Language Processing (NLP). The paper presents performance of partitional and agglomerative algorithms applied to clustering large number of Polish newspaper articles. We investigate different representations of the documents. The focus of the paper is on the applicability of the Latent Semantic Analysis to such clustering for Polish.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Broda, B., Piasecki, M.: Experiments in clustering documents for automatic acquisition of lexical semantic networks for Polish. In: Proc. of the 16th Int. Conf. Intelligent Information Systems, pp. 203–212 (2008)
Weiss, D.: The corpus of the Polish daily Rzeczpospolita (1993-2002), http://www.cs.put.poznan.pl/dweiss/rzeczpospolita
Brants, T.: TnT – a statistical part-of-speech tagger. In: Proc. of the 6th Applied Natural Language Processing Conf., pp. 224–231 (2000)
Kuta, M., Chrza̧szcz, P., Kitowski, J.: Increasing quality of the Corpus of Frequency Dictionary of Contemporary Polish for morphosyntactic tagging of the Polish language. Computing and Informatics 28(3), 319–338 (2009)
Kuta, M., Wójcik, W., Wrzeszcz, M., Kitowski, J.: Application of stacked methods to part-of-speech tagging of Polish. In: Proc. of the 8th Int. Conf. on Parallel Processing and Applied Mathematics, PPAM 2009 (2009)
Woliński, M.: Morfeusz - a practical tool for the morphological analysis of Polish. In: Proc. of the Int. Conf. Intelligent Information Systems, pp. 503–512 (2006)
Landauer, T., Dumais, S.: A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychological Review 104(2), 211–240 (1997)
Zhao, Y., Karypis, G.: Hierarchical clustering algorithms for document datasets. Data Mining and Knowledge Discovery 10(2), 141–168 (2005)
Kurdziel, M.: Visual Clustering Methods for Pattern Recognition in Biomedical Data. PhD thesis, University of Science and Technology (2010)
Zhao, Y., Karypis, G.: Criterion functions for document clustering. Experiments and analysis. Technical Report 01–40, University of Minnesota, Department of Computer Science / Army HPC Research Center Minneapolis (2001)
Karypis, G.: CLUTO. A clustering toolkit. Technical Report 02–017, University of Minnesota, Department of Computer Science (2003)
Rohde, D.: SVDLIBC, http://tedlab.mit.edu/~dr/svdlibc
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kuta, M., Kitowski, J. (2010). Clustering Polish Texts with Latent Semantic Analysis. In: Rutkowski, L., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artifical Intelligence and Soft Computing. ICAISC 2010. Lecture Notes in Computer Science(), vol 6114. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13232-2_65
Download citation
DOI: https://doi.org/10.1007/978-3-642-13232-2_65
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13231-5
Online ISBN: 978-3-642-13232-2
eBook Packages: Computer ScienceComputer Science (R0)