Skip to main content

Clustering Polish Texts with Latent Semantic Analysis

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6114))

Abstract

The document clustering is an important technique of Natural Language Processing (NLP). The paper presents performance of partitional and agglomerative algorithms applied to clustering large number of Polish newspaper articles. We investigate different representations of the documents. The focus of the paper is on the applicability of the Latent Semantic Analysis to such clustering for Polish.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Broda, B., Piasecki, M.: Experiments in clustering documents for automatic acquisition of lexical semantic networks for Polish. In: Proc. of the 16th Int. Conf. Intelligent Information Systems, pp. 203–212 (2008)

    Google Scholar 

  2. Weiss, D.: The corpus of the Polish daily Rzeczpospolita (1993-2002), http://www.cs.put.poznan.pl/dweiss/rzeczpospolita

  3. Brants, T.: TnT – a statistical part-of-speech tagger. In: Proc. of the 6th Applied Natural Language Processing Conf., pp. 224–231 (2000)

    Google Scholar 

  4. Kuta, M., Chrza̧szcz, P., Kitowski, J.: Increasing quality of the Corpus of Frequency Dictionary of Contemporary Polish for morphosyntactic tagging of the Polish language. Computing and Informatics 28(3), 319–338 (2009)

    Google Scholar 

  5. Kuta, M., Wójcik, W., Wrzeszcz, M., Kitowski, J.: Application of stacked methods to part-of-speech tagging of Polish. In: Proc. of the 8th Int. Conf. on Parallel Processing and Applied Mathematics, PPAM 2009 (2009)

    Google Scholar 

  6. Woliński, M.: Morfeusz - a practical tool for the morphological analysis of Polish. In: Proc. of the Int. Conf. Intelligent Information Systems, pp. 503–512 (2006)

    Google Scholar 

  7. Landauer, T., Dumais, S.: A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychological Review 104(2), 211–240 (1997)

    Article  Google Scholar 

  8. Zhao, Y., Karypis, G.: Hierarchical clustering algorithms for document datasets. Data Mining and Knowledge Discovery 10(2), 141–168 (2005)

    Article  MathSciNet  Google Scholar 

  9. Kurdziel, M.: Visual Clustering Methods for Pattern Recognition in Biomedical Data. PhD thesis, University of Science and Technology (2010)

    Google Scholar 

  10. Zhao, Y., Karypis, G.: Criterion functions for document clustering. Experiments and analysis. Technical Report 01–40, University of Minnesota, Department of Computer Science / Army HPC Research Center Minneapolis (2001)

    Google Scholar 

  11. Karypis, G.: CLUTO. A clustering toolkit. Technical Report 02–017, University of Minnesota, Department of Computer Science (2003)

    Google Scholar 

  12. Rohde, D.: SVDLIBC, http://tedlab.mit.edu/~dr/svdlibc

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kuta, M., Kitowski, J. (2010). Clustering Polish Texts with Latent Semantic Analysis. In: Rutkowski, L., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artifical Intelligence and Soft Computing. ICAISC 2010. Lecture Notes in Computer Science(), vol 6114. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13232-2_65

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13232-2_65

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13231-5

  • Online ISBN: 978-3-642-13232-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics