Abstract
This paper describes a system, which is used to construct a corpus of Polish. It works in the same way as Internet search engines. The corpus contains Polish texts downloaded from the Internet, which are supplemented with results of linguistic analysis (morphological, syntactic and semantic). Results of statistical research based on data in the corpus are also presented. The corpus is used to test and improve linguistic analyzer developed at Silesian University of Technology.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Linguistic Analysis Server, http://las.aei.polsl.pl/las2/
Kulików, S.: Implementation of Linguistic Analysis Server for Thetos – Polish Text into Sign Language Translator. Studia Informatica 24(3(55)), 171–178 (2003) (in Polish)
Thetos, http://thetos.polsl.pl/
Suszczańska, N., Szmal, P., Francik, J.: Translating Polish Texts into Sign Language in the TGT System. In: 20th IASTED International Multi–Conference Applied Informatics AI 2002, Innsbruck, Austria, pp. 282–287 (2002)
Szmal, P., Suszczańska, N.: Selected Problems of Translation from the Polish Written Language to the Sign Language. Archiwum Informatyki Teoretycznej i Stosowanej 13(1), 37–51 (2001)
Suszczanska, N., Szmal, P., Kulików, S.: Continuous Text Translation Using Text Modeling in the Thetos System. In: Okatan, A. (ed.) International Conference on Computational Intelligence. International Computational Intelligence Society, pp. 156–160 (2005)
Ciura, M., Grund, D., Kulików, S., Suszczańska, N.: A System to Adapt Techniques of Text Summarizing to Polish. In: Proceedings of the International Conference on Computational Intelligence, Istanbul, Turkey, pp. 117–120 (2004)
Kilgarriff, A.: Web as corpus. In: Proceedings of Corpus Linguistics 2001, pp. 342–344. Lancaster University (2001)
IPI PAN Corpus, http://korpus.pl/
Kłopotek, M.A.: Inteligentne wyszukiwarki internetowe. EXIT, Warszawa (2001) (in Polish)
The Web Robots Pages, http://www.robotstxt.org/
Suszczanska, N.: GS-gramatyka jezyka polskiego. In: Demenko, G., Izworski, A., Michałek, M. (eds.) Speech Analysis, Synthesis and Recognition in Technology, Linguistics and Medicine, Szczyrk 2003, Uczelniane Wydawnictwa Naukowo-Dydaktyczne AGH, Kraków (2003)
Suszczanska, N.: GS-model składni jezyka polskiego. In: Demenko, G., Karpinski, M., Jassem, K. (eds.) Speech and Language Technology, wolumen 7. Polskie Towarzystwo Fonetyczne, Poznan (2003)
Suszczańska, N., Lubiński, M.: POLMORPH, Polish Language Morphological Analysis Tool. In: 19th IASTED International Conference Applied Informatics AI 2001, Innsbruck, Austria, pp. 84–89 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kulików, S. (2009). Construction of Text Corpus of Polish Using the Internet. In: Vetulani, Z., Uszkoreit, H. (eds) Human Language Technology. Challenges of the Information Society. LTC 2007. Lecture Notes in Computer Science(), vol 5603. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04235-5_26
Download citation
DOI: https://doi.org/10.1007/978-3-642-04235-5_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04234-8
Online ISBN: 978-3-642-04235-5
eBook Packages: Computer ScienceComputer Science (R0)