The Lemmatisation Task at the EVALITA 2011 Evaluation Campaign

Tamburini, Fabio

doi:10.1007/978-3-642-35828-9_25

Fabio Tamburini²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7689))

Included in the following conference series:

International Workshop on Evaluation of Natural Language and Speech Tool for Italian

656 Accesses

Abstract

This paper reports on the EVALITA 2011 Lemmatisation task, an initiative for the evaluation of automatic lemmatisation tools specifically developed for the Italian language. Despite lemmatisation is often considered a subproduct of a PoS-tagging procedure that does not cause any particular problem, there are a lot of specific cases, certainly in Italian and in some other highly inflected languages, in which, given the same lexical class, we face a lemma ambiguity. A relevant number of scholars and teams participated experimenting their systems on the data provided by the task organisers. The results are very interesting and the overall performances of the participating systems were very high, exceeding, on interesting cases, 99% of lemmatisation accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agic, Z., Tadic, M., Dovedan, Z.: Evaluating Full Lemmatization of Croatian Texts. Recent Advances in Intelligent Information Systems, pp. 175–184. Academic Publishing House (2009)
Google Scholar
Airio, E.: Word normalization and decompounding in mono- and bilingual. IR Information Retrieval 9, 249–271 (2006)
Article Google Scholar
Creutz, M., Lagus, K.: Unsupervised models for morpheme segmentation and morphology learning. ACM Transactions on Speech and Language Processing 4(1), 3:1–3:34 (2007)
Article Google Scholar
De Mauro, T.: Il dizionario della lingua italiana, Paravia (2000)
Google Scholar
Hammarström, H., Borin, L.: Unsupervised Learning of Morphology. Computational Linguistics 37(2), 309–350 (2011)
Article Google Scholar
Hardie, A., Lohani Yogendra, R.R., Yadava, P.: Extending corpus annotation of Nepali: advances in tokenisation and lemmatisation. Himalayan Linguistics 10(1), 151–165 (2011)
Google Scholar
Ingason, A.K., Helgadóttir, S., Loftsson, H., Rögnvaldsson, E.: A Mixed Method Lemmatization Algorithm Using a Hierarchy of Linguistic Identities (HOLI). In: Nordström, B., Ranta, A. (eds.) GoTAL 2008. LNCS (LNAI), vol. 5221, pp. 205–216. Springer, Heidelberg (2008)
Chapter Google Scholar
Mendes, A., Amaro, R., Bacelar do Nascimento, M.F.: Reusing Available Resources for Tagging a Spoken Portuguese Corpus. In: Branco, A., Mendes, A., Ribeiro, R. (eds.) Language Technology for Portuguese: Shallow Processing Tools and Resources, pp. 25–28. Lisbon, Edicoes Colibri (2003)
Google Scholar
Monachini, M.: ELM-IT: EAGLES Specification for Italian morphosintax Lexicon Specification and Classification Guidelines. EAGLES Document EAG CLWG ELM IT/F (1996)
Google Scholar
Pirkola, A.: Morphological typology of languages for IR. Journal of Documentation 57(3), 330–348 (2001)
Article Google Scholar
Plisson, J., Lavrač, N., Mladenić, D., Erjavec, T.: Ripple Down Rule Learning for Automated Word Lemmatisation. AI Communications 21, 15–26 (2008)
MathSciNet MATH Google Scholar
Tamburini, F.: EVALITA 2007: the Part-of-Speech Tagging Task. Intelligenza Artificiale IV(2), 4–7 (2007)
Google Scholar
The Turin University Treebank, http://www.di.unito.it/~tutreeb
Van Eynde, F., Zavrel, J., Daelemans, W.: Lemmatisation and morphosyntactic annotation for the spoken Dutch corpus. In: Proceedings of CLIN 1999, pp. 53–62. Utrecht Institute of Linguistics OTS, Utrecht (1999)
Google Scholar
Zanchetta, E., Baroni, M.: Morph-it! A free corpus-based morphological resource for the Italian language. In: Proceedings of Corpus Linguistics 2005. University of Birmingham (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Linguistics and Oriental Studies, University of Bologna, Italy
Fabio Tamburini

Authors

Fabio Tamburini
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Fondazione Bruno Kessler, Via Sommarive 18, 38123, Povo, TN, Italy
Bernardo Magnini
University of Naples, Via Cinthia, 80126, Napoli, NA, Italy
Francesco Cutugno
Fondazione Ugo Bordoni, Viale del Policlinico, 161, Roma, Italy
Mauro Falcone
CELCT, Via alla Cascata, 38123, Povo, TN, Italy
Emanuele Pianta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tamburini, F. (2013). The Lemmatisation Task at the EVALITA 2011 Evaluation Campaign. In: Magnini, B., Cutugno, F., Falcone, M., Pianta, E. (eds) Evaluation of Natural Language and Speech Tools for Italian. EVALITA 2012. Lecture Notes in Computer Science(), vol 7689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35828-9_25

Download citation

DOI: https://doi.org/10.1007/978-3-642-35828-9_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35827-2
Online ISBN: 978-3-642-35828-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics