Clustering Polish Texts with Latent Semantic Analysis

Kuta, Marcin; Kitowski, Jacek

doi:10.1007/978-3-642-13232-2_65

Marcin Kuta²⁴ &
Jacek Kitowski²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6114))

Included in the following conference series:

International Conference on Artificial Intelligence and Soft Computing

2028 Accesses
2 Citations

Abstract

The document clustering is an important technique of Natural Language Processing (NLP). The paper presents performance of partitional and agglomerative algorithms applied to clustering large number of Polish newspaper articles. We investigate different representations of the documents. The focus of the paper is on the applicability of the Latent Semantic Analysis to such clustering for Polish.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Mining Hidden Topics from Newspaper Quotations: The COVID-19 Pandemic

Topic Modelling-Based Approach for Clustering Legal Documents

Transformer-based Pouranic topic classification in Indian mythology

Article 19 September 2024

References

Broda, B., Piasecki, M.: Experiments in clustering documents for automatic acquisition of lexical semantic networks for Polish. In: Proc. of the 16th Int. Conf. Intelligent Information Systems, pp. 203–212 (2008)
Google Scholar
Weiss, D.: The corpus of the Polish daily Rzeczpospolita (1993-2002), http://www.cs.put.poznan.pl/dweiss/rzeczpospolita
Brants, T.: TnT – a statistical part-of-speech tagger. In: Proc. of the 6th Applied Natural Language Processing Conf., pp. 224–231 (2000)
Google Scholar
Kuta, M., Chrza̧szcz, P., Kitowski, J.: Increasing quality of the Corpus of Frequency Dictionary of Contemporary Polish for morphosyntactic tagging of the Polish language. Computing and Informatics 28(3), 319–338 (2009)
Google Scholar
Kuta, M., Wójcik, W., Wrzeszcz, M., Kitowski, J.: Application of stacked methods to part-of-speech tagging of Polish. In: Proc. of the 8th Int. Conf. on Parallel Processing and Applied Mathematics, PPAM 2009 (2009)
Google Scholar
Woliński, M.: Morfeusz - a practical tool for the morphological analysis of Polish. In: Proc. of the Int. Conf. Intelligent Information Systems, pp. 503–512 (2006)
Google Scholar
Landauer, T., Dumais, S.: A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychological Review 104(2), 211–240 (1997)
Article Google Scholar
Zhao, Y., Karypis, G.: Hierarchical clustering algorithms for document datasets. Data Mining and Knowledge Discovery 10(2), 141–168 (2005)
Article MathSciNet Google Scholar
Kurdziel, M.: Visual Clustering Methods for Pattern Recognition in Biomedical Data. PhD thesis, University of Science and Technology (2010)
Google Scholar
Zhao, Y., Karypis, G.: Criterion functions for document clustering. Experiments and analysis. Technical Report 01–40, University of Minnesota, Department of Computer Science / Army HPC Research Center Minneapolis (2001)
Google Scholar
Karypis, G.: CLUTO. A clustering toolkit. Technical Report 02–017, University of Minnesota, Department of Computer Science (2003)
Google Scholar
Rohde, D.: SVDLIBC, http://tedlab.mit.edu/~dr/svdlibc

Download references

Author information

Authors and Affiliations

Institute of Computer Science, AGH University of Science and Technology, Al. Mickiewicza 30, 30-059, Kraków, Poland
Marcin Kuta & Jacek Kitowski

Authors

Marcin Kuta
View author publications
You can also search for this author in PubMed Google Scholar
Jacek Kitowski
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Artificial Intelligence, Academy of Humanities and Economics, Poland
Leszek Rutkowski
Academy of Humanities and Economics in Łódź, ul. Rewolucji 1905 nr 64, Łódź, Poland
Rafał Scherer
Institute of Automatics, AGH University of Science and Technology, Al. Mickiewicza 30, PL-30-059, Kraków, Poland
Ryszard Tadeusiewicz
Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley Initiative in Soft Computing (BISC), 94720-1776, CA
Lotfi A. Zadeh
Computational Intelligence Laboratory Department of Electrical and Computer Engineering, University of Louisville, 40292, Louisville, KY
Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kuta, M., Kitowski, J. (2010). Clustering Polish Texts with Latent Semantic Analysis. In: Rutkowski, L., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artifical Intelligence and Soft Computing. ICAISC 2010. Lecture Notes in Computer Science(), vol 6114. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13232-2_65

Download citation

DOI: https://doi.org/10.1007/978-3-642-13232-2_65
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13231-5
Online ISBN: 978-3-642-13232-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics