Construction of Text Corpus of Polish Using the Internet

Kulików, Sławomir

doi:10.1007/978-3-642-04235-5_26

Sławomir Kulików²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5603))

Included in the following conference series:

Language and Technology Conference

651 Accesses

Abstract

This paper describes a system, which is used to construct a corpus of Polish. It works in the same way as Internet search engines. The corpus contains Polish texts downloaded from the Internet, which are supplemented with results of linguistic analysis (morphological, syntactic and semantic). Results of statistical research based on data in the corpus are also presented. The corpus is used to test and improve linguistic analyzer developed at Silesian University of Technology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Linguistic Analysis Server, http://las.aei.polsl.pl/las2/
Kulików, S.: Implementation of Linguistic Analysis Server for Thetos – Polish Text into Sign Language Translator. Studia Informatica 24(3(55)), 171–178 (2003) (in Polish)
Google Scholar
Thetos, http://thetos.polsl.pl/
Suszczańska, N., Szmal, P., Francik, J.: Translating Polish Texts into Sign Language in the TGT System. In: 20th IASTED International Multi–Conference Applied Informatics AI 2002, Innsbruck, Austria, pp. 282–287 (2002)
Google Scholar
Szmal, P., Suszczańska, N.: Selected Problems of Translation from the Polish Written Language to the Sign Language. Archiwum Informatyki Teoretycznej i Stosowanej 13(1), 37–51 (2001)
Google Scholar
Suszczanska, N., Szmal, P., Kulików, S.: Continuous Text Translation Using Text Modeling in the Thetos System. In: Okatan, A. (ed.) International Conference on Computational Intelligence. International Computational Intelligence Society, pp. 156–160 (2005)
Google Scholar
Ciura, M., Grund, D., Kulików, S., Suszczańska, N.: A System to Adapt Techniques of Text Summarizing to Polish. In: Proceedings of the International Conference on Computational Intelligence, Istanbul, Turkey, pp. 117–120 (2004)
Google Scholar
Kilgarriff, A.: Web as corpus. In: Proceedings of Corpus Linguistics 2001, pp. 342–344. Lancaster University (2001)
Google Scholar
IPI PAN Corpus, http://korpus.pl/
Kłopotek, M.A.: Inteligentne wyszukiwarki internetowe. EXIT, Warszawa (2001) (in Polish)
Google Scholar
The Web Robots Pages, http://www.robotstxt.org/
Suszczanska, N.: GS-gramatyka jezyka polskiego. In: Demenko, G., Izworski, A., Michałek, M. (eds.) Speech Analysis, Synthesis and Recognition in Technology, Linguistics and Medicine, Szczyrk 2003, Uczelniane Wydawnictwa Naukowo-Dydaktyczne AGH, Kraków (2003)
Google Scholar
Suszczanska, N.: GS-model składni jezyka polskiego. In: Demenko, G., Karpinski, M., Jassem, K. (eds.) Speech and Language Technology, wolumen 7. Polskie Towarzystwo Fonetyczne, Poznan (2003)
Google Scholar
Suszczańska, N., Lubiński, M.: POLMORPH, Polish Language Morphological Analysis Tool. In: 19th IASTED International Conference Applied Informatics AI 2001, Innsbruck, Austria, pp. 84–89 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Science, Silesian University of Technology, ul. Akademicka 16, 44-100, Gliwice, Poland
Sławomir Kulików

Authors

Sławomir Kulików
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Mathematics and Computer Science, Adam Mickiewicz University in Poznań, ul. Umultowska 87, P.O. Box, 61614, Poznań, Poland
Zygmunt Vetulani
Language Technology Lab, German Research Center for Artificial Intelligence (DFKI), Campus D 3 1, Stuhlsatzenhausweg 3, D-66123, Saarbrücken, Germany
Hans Uszkoreit

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kulików, S. (2009). Construction of Text Corpus of Polish Using the Internet. In: Vetulani, Z., Uszkoreit, H. (eds) Human Language Technology. Challenges of the Information Society. LTC 2007. Lecture Notes in Computer Science(), vol 5603. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04235-5_26

Download citation

DOI: https://doi.org/10.1007/978-3-642-04235-5_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04234-8
Online ISBN: 978-3-642-04235-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics