Evaluating Natural Language Processing tools for Polish during PolEval 2019

Kobyliński, Łukasz; Ogrodniczuk, Maciej; Kocoń, Jan; Marcińczuk, Michał; Smywiński-Pohl, Aleksander; Wołk, Krzysztof; Koržinek, Danijel; Ptaszynski, Michal; Pieciukiewicz, Agata; Dybała, Paweł

doi:10.1007/978-3-031-05328-3_20

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13212))

Included in the following conference series:

Language and Technology Conference

329 Accesses

Abstract

PolEval is a SemEval-inspired evaluation campaign for natural language processing tools for Polish. Submitted tools compete against one another within certain tasks selected by organizers, using available data and are evaluated according to pre-established procedures. It is organized since 2017 and each year the winning systems become the state-of-the-art in Polish language processing in the respective tasks. In 2019 we have organized six different tasks, creating an even greater opportunity for NLP researchers to evaluate their systems in an objective manner.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72 (2005)
Google Scholar
Broda, B., Marcińczuk, M., Maziarz, M., Radziszewski, A., Wardyński, A.: KPWr: towards a free corpus of polish. In: Calzolari et al. [3]
Google Scholar
Calzolari, N., et al. (eds.): Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012). European Language Resource Association, Istanbul, Turkey (2012)
Google Scholar
Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the 2nd International Conference on Human Language Technology Research, pp. 138–145. Morgan Kaufmann Publishers Inc. (2002)
Google Scholar
Fiscus, J.: Sclite scoring package version 1.5. US National Institute of Standard Technology (NIST) (1998). http://www.itl.nist.gov/iaui/894.01/tools
Fiscus, J.G., Ajot, J., Michel, M., Garofolo, J.S.: The rich transcription 2006 spring meeting recognition evaluation. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, pp. 309–322. Springer, Heidelberg (2006). https://doi.org/10.1007/11965152_28
Chapter Google Scholar
Forcada, M.L., et al.: Apertium: a free/open-source platform for rule-based machine translation. Mach. Transl. 25(2), 127–144 (2011)
Article Google Scholar
Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: International Conference on Machine Learning, pp. 1764–1772 (2014)
Google Scholar
Harper, M.: The automatic speech recognition in reverberant environments (ASpIRE) challenge. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 547–554. IEEE (2015)
Google Scholar
Kobyliński, Ł., Ogrodniczuk, M.: Results of the PolEval 2017 competition: part-of-speech tagging shared task. In: Vetulani and Paroubek [33], pp. 362–366
Google Scholar
Kocoń, J., Marcińczuk, M., Oleksy, M., Bernaś, T., Wolski, M.: Temporal Expressions in Polish Corpus KPWr. Cognit. Stud. Études Cognitives 15 (2015)
Google Scholar
Kocoń, J., Oleksy, M., Bernaś, T., Marcińczuk, M.: Results of the PolEval 2019 shared Task 1: recognition and normalization of temporal expressions. In: Proceedings of the PolEval 2019 Workshop (2019)
Google Scholar
Koehn, P., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. Companion Volume: Proceedings of the Demo and Poster Sessions, pp. 177–180 (2007)
Google Scholar
Koržinek, D., Marasek, K., Brocki, Ł., Wołk, K.: Polish read speech corpus for speech tools and services. arXiv preprint arXiv:1706.00245 (2017)
Marasek, K., Koržinek, D., Brocki, Ł: System for automatic transcription of sessions of the polish senate. Arch. Acoust. 39(4), 501–509 (2014)
Article Google Scholar
Marcińczuk, M.: Lemmatization of multi-word common noun phrases and named entities in polish. In: Mitkov, R., Angelova, G. (eds.) Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2017), pp. 483–491. INCOMA Ltd. (2017). https://doi.org/10.26615/978-954-452-049-6_064
Mohri, M., Pereira, F., Riley, M.: Weighted finite-state transducers in speech recognition. Comput. Speech Lang. 16(1), 69–88 (2002)
Article Google Scholar
Moro, A., Navigli, R.: Semeval-2015 Task 13: multilingual all-words sense disambiguation and entity linking. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp. 288–297 (2015)
Google Scholar
Ogrodniczuk, M.: The polish sejm corpus. In: Calzolari et al. [3], pp. 2219–2223
Google Scholar
Ogrodniczuk, M.: Polish parliamentary corpus. In: Fišer, D., Eskevich, M., de Jong, F. (eds.) Proceedings of the LREC 2018 Workshop ParlaCLARIN: Creating and Using Parliamentary Corpora, pp. 15–19. European Language Resources Association (ELRA), Miyazaki, Japan (2018)
Google Scholar
Ogrodniczuk, M., Łukasz Kobyliński (eds.): Proceedings of the PolEval 2019 Workshop. Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland (2019). http://2019.poleval.pl/files/poleval2019.pdf
Ogrodniczuk, M., Kobyliński, Ł. (eds.): Proceedings of the PolEval 2018 Workshop. Institute of Computer Science, Polish Academy of Sciences, Warsaw (2018)
Google Scholar
Ogrodniczuk, M., Nitoń, B.: New developments in the polish parliamentary corpus. In: Fišer, D., Eskevich, M., de Jong, F. (eds.) Proceedings of the Second ParlaCLARIN Workshop, pp. 1–4. European Language Resources Association (ELRA), Marseille, France (2020). https://www.aclweb.org/anthology/2020.parlaclarin-1.1
Oleksy, M., Radziszewski, A., Wieczorek, J.: KPWr annotation guidelines - phrase lemmatization (2018). http://hdl.handle.net/11321/591. CLARIN-PL digital repository
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a Method for Automatic Evaluation of Machine Translation. In: Proceedings of the 40th Annual Meeting of Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)
Google Scholar
Pęzik, P.: Increasing the accessibility of time-aligned speech corpora with spokes Mix. In: Calzolari, N., (eds.) Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018), pp. 4297–4300. European Languages Resources Association, Miyazaki, Japan (2018). https://www.aclweb.org/anthology/L18-1000
Ptaszynski, M., Eronen, J.K.K., Masui, F.: Learning deep on cyberbullying is always better than brute force. In: IJCAI 2017 3rd Workshop on Linguistic and Cognitive Approaches to Dialogue Agents (LaCATODA 2017), Melbourne, Australia, pp. 19–25 (2017)
Google Scholar
Ptaszynski, M., Masui, F.: Automatic Cyberbullying Detection: Emerging Research and Opportunities, 1st edn. IGI Global Publishing, Pennsylvania (2018)
Google Scholar
Rosales-Méndez, H., Hogan, A., Poblete, B.: VoxEL: a benchmark dataset for multilingual entity linking. In: Vrandečić, D., Bontcheva, K., Suárez-Figueroa, M.C., Presutti, V., Celino, I., Sabou, M., Kaffee, L.-A., Simperl, E. (eds.) ISWC 2018. LNCS, vol. 11137, pp. 170–186. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00668-6_11
Chapter Google Scholar
Saurí, R., Littman, J., Gaizauskas, R., Setzer, A., Pustejovsky, J.: TimeML annotation guidelines, version 1.2.1 (2006)
Google Scholar
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers, Cambridge, Massachusetts, USA, pp. 223–231. Association for Machine Translation in the Americas (2006)
Google Scholar
UzZaman, N., et al.: SemEval-2013 Task 1: TempEval-3: evaluating time expressions, events, and temporal relations. In: 2nd Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval 2013), vol. 2, pp. 1–9 (2013)
Google Scholar
Vetulani, Z., Paroubek, P. (eds.): Proceedings of the 8th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics. Fundacja Uniwersytetu im. Adama Mickiewicza w Poznaniu, Poznań, Poland (2017)
Google Scholar
Vincent, E., Watanabe, S., Barker, J., Marxer, R.: The 4th CHiME speech separation and recognition challenge (2016). http://spandh.dcs.shef.ac.uk/chime_challenge/CHiME4/. Accessed 21 Sept 2021
Wawer, A., Ogrodniczuk, M.: Results of the PolEval 2017 competition: sentiment analysis shared task. In: Vetulani and Paroubek [33], pp. 406–409
Google Scholar
Wolk, K., Marasek, K.: Survey on neural machine translation into polish. In: Choroś, K., Kopel, M., Kukla, E., Siemiński, A. (eds.) MISSI 2018. AISC, vol. 833, pp. 260–272. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-98678-4_27
Chapter Google Scholar
Wróbel, K.: KRNNT: polish recurrent neural network tagger. In: Vetulani and Paroubek [33]
Google Scholar
Young, S., et al.: The HTK Book. Cambridge University Engineering Department, vol. 3, p. 175 (2002)
Google Scholar

Download references

Acknowledgements

The work on temporal expression recognition and phrase lemmatization were financed as part of the investment in the CLARIN-PL research infrastructure funded by the Polish Ministry of Science and Higher Education.

The work on Entity Linking was supported by the Polish National Centre for Research and Development – LIDER Program under Grant LIDER/ 27/0164/L-8/16/NCBR/2017 titled “Lemkin - intelligent legal information system” and also supported in part by PLGrid Infrastructure.

Author information

Authors and Affiliations

Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland
Łukasz Kobyliński & Maciej Ogrodniczuk
Wrocław University of Science and Technology, Wrocław, Poland
Jan Kocoń & Michał Marcińczuk
AGH University of Science and Technology, Kraków, Poland
Aleksander Smywiński-Pohl
Kitami Institute of Technology, Kitami, Japan
Michal Ptaszynski
Polish-Japanese Academy of Information Technology, Warsaw, Poland
Krzysztof Wołk, Danijel Koržinek & Agata Pieciukiewicz
Jagiellonian University in Kraków, Kraków, Poland
Paweł Dybała

Authors

Łukasz Kobyliński
View author publications
You can also search for this author in PubMed Google Scholar
Maciej Ogrodniczuk
View author publications
You can also search for this author in PubMed Google Scholar
Jan Kocoń
View author publications
You can also search for this author in PubMed Google Scholar
Michał Marcińczuk
View author publications
You can also search for this author in PubMed Google Scholar
Aleksander Smywiński-Pohl
View author publications
You can also search for this author in PubMed Google Scholar
Krzysztof Wołk
View author publications
You can also search for this author in PubMed Google Scholar
Danijel Koržinek
View author publications
You can also search for this author in PubMed Google Scholar
Michal Ptaszynski
View author publications
You can also search for this author in PubMed Google Scholar
Agata Pieciukiewicz
View author publications
You can also search for this author in PubMed Google Scholar
Paweł Dybała
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Łukasz Kobyliński .

Editor information

Editors and Affiliations

Adam Mickiewicz University, Poznań, Poland
Zygmunt Vetulani
LIMSI-CNRS, Orsay, France
Patrick Paroubek
Adam Mickiewicz University, Poznań, Poland
Marek Kubis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kobyliński, Ł. et al. (2022). Evaluating Natural Language Processing tools for Polish during PolEval 2019. In: Vetulani, Z., Paroubek, P., Kubis, M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2019. Lecture Notes in Computer Science(), vol 13212. Springer, Cham. https://doi.org/10.1007/978-3-031-05328-3_20

Download citation

DOI: https://doi.org/10.1007/978-3-031-05328-3_20
Published: 05 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-05327-6
Online ISBN: 978-3-031-05328-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics