Skip to main content

Construction of Text Corpus of Polish Using the Internet

  • Conference paper
Human Language Technology. Challenges of the Information Society (LTC 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5603))

Included in the following conference series:

  • 651 Accesses

Abstract

This paper describes a system, which is used to construct a corpus of Polish. It works in the same way as Internet search engines. The corpus contains Polish texts downloaded from the Internet, which are supplemented with results of linguistic analysis (morphological, syntactic and semantic). Results of statistical research based on data in the corpus are also presented. The corpus is used to test and improve linguistic analyzer developed at Silesian University of Technology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Linguistic Analysis Server, http://las.aei.polsl.pl/las2/

  2. Kulików, S.: Implementation of Linguistic Analysis Server for Thetos – Polish Text into Sign Language Translator. Studia Informatica 24(3(55)), 171–178 (2003) (in Polish)

    Google Scholar 

  3. Thetos, http://thetos.polsl.pl/

  4. Suszczańska, N., Szmal, P., Francik, J.: Translating Polish Texts into Sign Language in the TGT System. In: 20th IASTED International Multi–Conference Applied Informatics AI 2002, Innsbruck, Austria, pp. 282–287 (2002)

    Google Scholar 

  5. Szmal, P., Suszczańska, N.: Selected Problems of Translation from the Polish Written Language to the Sign Language. Archiwum Informatyki Teoretycznej i Stosowanej 13(1), 37–51 (2001)

    Google Scholar 

  6. Suszczanska, N., Szmal, P., Kulików, S.: Continuous Text Translation Using Text Modeling in the Thetos System. In: Okatan, A. (ed.) International Conference on Computational Intelligence. International Computational Intelligence Society, pp. 156–160 (2005)

    Google Scholar 

  7. Ciura, M., Grund, D., Kulików, S., Suszczańska, N.: A System to Adapt Techniques of Text Summarizing to Polish. In: Proceedings of the International Conference on Computational Intelligence, Istanbul, Turkey, pp. 117–120 (2004)

    Google Scholar 

  8. Kilgarriff, A.: Web as corpus. In: Proceedings of Corpus Linguistics 2001, pp. 342–344. Lancaster University (2001)

    Google Scholar 

  9. IPI PAN Corpus, http://korpus.pl/

  10. Kłopotek, M.A.: Inteligentne wyszukiwarki internetowe. EXIT, Warszawa (2001) (in Polish)

    Google Scholar 

  11. The Web Robots Pages, http://www.robotstxt.org/

  12. Suszczanska, N.: GS-gramatyka jezyka polskiego. In: Demenko, G., Izworski, A., Michałek, M. (eds.) Speech Analysis, Synthesis and Recognition in Technology, Linguistics and Medicine, Szczyrk 2003, Uczelniane Wydawnictwa Naukowo-Dydaktyczne AGH, Kraków (2003)

    Google Scholar 

  13. Suszczanska, N.: GS-model składni jezyka polskiego. In: Demenko, G., Karpinski, M., Jassem, K. (eds.) Speech and Language Technology, wolumen 7. Polskie Towarzystwo Fonetyczne, Poznan (2003)

    Google Scholar 

  14. Suszczańska, N., Lubiński, M.: POLMORPH, Polish Language Morphological Analysis Tool. In: 19th IASTED International Conference Applied Informatics AI 2001, Innsbruck, Austria, pp. 84–89 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kulików, S. (2009). Construction of Text Corpus of Polish Using the Internet. In: Vetulani, Z., Uszkoreit, H. (eds) Human Language Technology. Challenges of the Information Society. LTC 2007. Lecture Notes in Computer Science(), vol 5603. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04235-5_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04235-5_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04234-8

  • Online ISBN: 978-3-642-04235-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics