Abstract
This paper discusses the problem of syntactic relation recognition in Polish data. We consider subject, object and copula relations between VP and NP or AdjP chunks. The problem has been studied for English, while it has received very little attention in the context of Slavic languages. Slavic languages, including Polish, are characterised with relatively free word order, which makes the task more challenging than in the case of English.
The task may be formulated as a classification problem and dealt with using supervised learning techniques. We propose a feature set tailored to the characteristics of Polish language and perform experiments with a number of classifiers.
This work was financed by the National Centre for Research and Development (NCBiR) project SP/I/1/77065/10 (“SyNaT”).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Acedański, S., Slaski, A., Przepiórkowski, A.: Machine learning of syntactic attachment from morphosyntactic and semantic co-occurrence statistics. In: Proceedings of the ACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages, pp. 42–47. Association for Computational Linguistics, Jeju (2012)
Argamon, S., Dagan, I., Krymolowski, Y.: A memory-based approach to learning shallow natural language patterns. In: COLING-ACL, pp. 67–73 (1998)
Bies, A., Ferguson, M., Katz, K., MacIntyre, R., Tredinnick, V., Kim, G., Marcinkiewicz, M.A., Schasberger, B.: Bracketing guidelines for treebank II style Penn Treebank project. Tech. rep., University of Pennsylvania (1995), http://nlp.korea.ac.kr/~hjchung/sprg/paper/treebank1.pdf
Broda, B., Kędzia, P., Marcińczuk, M., Radziszewski, A., Ramocki, R., Wardyński, A.: Fextor: A feature extraction framework for natural language processing: A case study in word sense disambiguation, relation recognition and anaphora resolution. In: Przepiórkowski, A., Piasecki, M., Jassem, K., Fuglewicz, P. (eds.) Computational Linguistics. SCI, vol. 458, pp. 41–62. Springer, Heidelberg (2013)
Broda, B., Marcińczuk, M., Maziarz, M., Radziszewski, A., Wardyński, A.: KPWr: Towards a free corpus of Polish. In: Calzolari, N., Choukri, K., Declerck, T., Doğan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of LREC 2012. ELRA, Istanbul (2012)
Broda, B., Piasecki, M.: Evaluating LexCSD in a large scale experiment. Control and Cybernetics 40(2) (2011)
Buchholz, S.: Memory-Based Grammatical Relation Finding. Ph.D. thesis, Tilburg University (2002)
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Daelemans, W., Buchholz, S., Veenstra, J.: Memory-based shallow parsing. In: Proceedings of the CoNLL 1999. Association for Computational Linguistics (1999)
Daelemans, W., Zavrel, J., Ko van der Sloot, A.V.D.B.: TiMBL: Tilburg Memory Based Learner, version 6.3, reference guide. Tech. Rep. 10-01, ILK (2010)
Grác, M., Jakubíček, M., Kovář, V.: Through low-cost annotation to reliable parsing evaluation. In: Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation, pp. 555–562. Waseda University, Tokio (2010)
Głowińska, K.: Anotacja składniowa. In: Przepiórkowski, A., Bańko, M., Grski, R.L., Lewandowska-Tomaszczyk, B. (eds.) Narodowy Korpus Języka Polskiego. Wydawnictwo Naukowe PWN, Warsaw (2012)
Maziarz, M., Radziszewski, A., Wieczorek, J.: Chunking of Polish: guidelines, discussion and experiments with Machine Learning. In: Proceedings of the 5th Language & Technology Conference, LTC 2011, Poznań, Poland (2011)
Osenova, P.: Bulgarian nominal chunks and mapping strategies for deeper syntactic analyses. In: Proceedings of the Workshop on Treebanks and Linguistic Theories (TLT 2002), Sozopol, Bulgaria, September 20-21 (2002)
Przepiórkowski, A.: Powierzchniowe przetwarzanie języka polskiego. Akademicka Oficyna Wydawnicza EXIT, Warsaw (2008)
Quinlan, J.R.: C4. 5: programs for machine learning. Morgan Kaufmann (1993)
Radziszewski, A., Maziarz, M., Wieczorek, J.: Shallow syntactic annotation in the Corpus of Wrocław University of Technology. Cognitive Studies 12 (2012)
Radziszewski, A., Pawlaczek, A.: Large-scale experiments with NP chunking of polish. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 143–149. Springer, Heidelberg (2012)
Radziszewski, A., Piasecki, M.: A preliminary noun phrase chunker for Polish. In: Proceedings of the Intelligent Information Systems (2010)
Radziszewski, A., Śniatowski, T.: A memory-based tagger for Polish. In: Proceedings of the 5th Language & Technology Conference, Poznań (2011)
Vučković, K.: Model parsera za hrvatski jezik. Ph.D. thesis, Department of Information Sciences, Faculty of Humanities and Social Sciences, University of Zagreb, Croatia (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Radziszewski, A., Orłowicz, P., Broda, B. (2013). Classification of Predicate-Argument Relations in Polish Data. In: Kłopotek, M.A., Koronacki, J., Marciniak, M., Mykowiecka, A., Wierzchoń, S.T. (eds) Language Processing and Intelligent Information Systems. IIS 2013. Lecture Notes in Computer Science, vol 7912. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38634-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-38634-3_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38633-6
Online ISBN: 978-3-642-38634-3
eBook Packages: Computer ScienceComputer Science (R0)