Skip to main content
Log in

A study on methods for revising dependency treebanks: in search of gold

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Reliably annotated corpora with reliable annotation are a valuable resource for Natural Language Processing, which justifies the search for methods capable of assisting linguistic revision. In this context, we present a study on methods for revising dependency treebanks, investigating the contribution of three different strategies to the corpus review: (i) linguistic rules; (ii) an adaptation of the n-grams method proposed by Boyd et al. (2008) applied to Portuguese; and (iii) Inter-Annotator Disagreement, a linguistically motivated approach that draws inspiration from the human annotation process. The results are promising, and taken together the three methods can lead to the revision of up to 58% of the errors in a specific corpus at the cost of revising only 20% of the corpus. We also present a tool that integrates treebank editing, evaluation and search capabilities with the review methods, as well as a gold-standard Portuguese corpus from the oil and gas domain.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. Besides, as noted by Baker (1997), even fully human annotation is susceptible to error and inconsistency.

  2. As noted by a reviewer, consistency and correctness are distinct but related phenomena. In the context of linguist annotation and IAA, high consistency (measured by high inter-annotator agreement rates) is taken as an indicator—but not a guarantee—of high quality annotation, as consistently poorly annotated text would also lead to a high rates of IAA. On the other hand, two (or more) identical segments can be annotated in an “inconsistent” way, that is, differently, without this difference being an error, if they perform different functions in the context in which they are inserted (Fig. 2).

  3. Numbers from Petro1, following the trend found in de Marneffe et al. (2017) for English and French.

  4. These rules can be found at https://github.com/alvelvis/ACDC-UD/blob/master/validar_UD.txt and are written in Python syntax.

  5. See https://universaldependencies.org/release_checklist.html#validation.

  6. The dependency relation “det” should be used to tag relations between determiners and their heads.

  7. As mentioned by a reviewer, although the correlation between consistency and correctness underlies the use of inter-annotator agreement as a measure of annotation quality, this assumption should be viewed with caution, as we report in Sect. 6.

  8. According to Zeman et al. (2018), the performance of UDPipe v1.2 for Portuguese using LAS is 82.07% while Stanza achieved a 87.81% score for the same dataset.

  9. The need for two parsers is justified by the automatic correction method proposed by the authors, in which three different parsers are used.

  10. https://universaldependencies.org/format.html.

  11. Julgamento is open source and can be downloaded from https://github.com/alvelvis/Julgamento.

  12. Available at https://github.com/alvelvis/conllu-merge-resolver.

  13. https://universaldependencies.org/u/dep/advcl.html.

  14. https://universaldependencies.org/u/dep/acl.html.

  15. This error can be seen in two ways: deprel is wrong because dephead is wrong, or dephead is wrong because deprel is wrong. As the revision begins with the simplified CM, we started with deprel.

  16. Since we were concerned with creating the best parsing model for Petro2, these changes were designed to decrease the number of unseen words and structures, possibly decreasing the amount of revision required, even though the addition to the training data was too small—Petro1 represents less than 7% of the training data.

  17. http://petroles.puc-rio.ai/.

  18. Even to annotate Petro2, to whoose training data we added Petro1, the vast majority of the data (93%) still came from Bosque-UD.

References

  • Afonso, S., Bick, E., Haber, R., & Santos, D. (2002). Floresta sintá (c) tica: a treebank for portuguese. Proceedings of the third international conference on language resources and evaluation (LREC 2002). ELRA.

    Google Scholar 

  • Artstein, R. (2017). Inter-annotator agreement. In N. Ide & J. Pustejovsky (Eds.), Handbook of linguistic annotation (pp. 297–313). Springer.

    Chapter  Google Scholar 

  • Baker, J. P. (1997). Consistency and accuracy in correcting automatically tagged data. In R. Garside, G. Leech, & A. McEnery (Eds.), Corpus annotation: Linguistic information from computer text corpora (pp. 243–250). Longman.

    Google Scholar 

  • Blaheta, D. (2002). Handling noisy training and testing data. Proceedings of the 2002 conference on empirical methods in natural language processing (EMNLP 2002) (pp. 111–116). EMNLP.

    Google Scholar 

  • Boyd, A., Dickinson, M., & Meurers, W. D. (2008). On detecting errors in dependency treebanks. Research on Language and Computation, 6(2), 113–137.

    Article  Google Scholar 

  • Consoli, B., Santos, J., Gomes, D., Cordeiro, F., Vieira, R., & Moreira, V. (2020). Embeddings for named entity recognition in geoscience Portuguese literature. Proceedings of the 12th language resources and evaluation conference (pp. 4625–4630). European Language Resources Association.

    Google Scholar 

  • de Eckart Castilho, R., Mújdricza-Maydt, E., Yimam, S. M., Hartmann, S., Gurevych, I., Frank, A., & Biemann, C. (2016). A web-based tool for the integrated annotation of semantic and syntactic structures. Proceedings of the workshop on language technology resources and tools for digital humanities (LT4DH) (pp. 74–86). The COLING 2016 Organizing Committee.

    Google Scholar 

  • de Marneffe, M.C., Grioni, M., Kanerva, J., & Ginter, F. (2017). Assessing the annotation consistency of the Universal Dependencies corpora. Proceedings of the fourth iternational conference on dependency linguistics (Depling 2017) (pp. 108–115). Linköping University Electronic Press. https://www.aclweb.org/anthology/W17-6514

  • de Souza, E., & Freitas, C. (2022). Polishing the gold—how much revision do we need in treebanks? In T. Pardo, A. Felippo, N. Roman (Eds.), Universal dependencies Brazilian festival—proceedings of the conference. SBC.

  • Dickinson, M. (2015). Detection of annotation errors in corpora. Language and Linguistics Compass, 9(3), 119–138.

    Article  Google Scholar 

  • Dickinson, M., & Meurers, D. (2003a). Detecting errors in part-of-speech annotation. 10th conference of the European chapter of the association for computational linguistics. The Ohio State University.

    Google Scholar 

  • Dickinson, M., & Meurers, W. D. (2003b). Detecting inconsistencies in treebanks. Proceedings of TLT (pp. 45–56). The Ohio State University.

    Google Scholar 

  • Freitas, C., Rocha, P., & Bick, E. (2008). Floresta Sintá(c)tica: Bigger, Thicker and Easier. In: A. Teixeira, V.L.S. de Lima, L.C. de Oliveira, P. Quaresma (Eds.), Computational processing of the portuguese language, 8th international conference, proceedings (PROPOR 2008) (pp. 216–219). Springer.

  • Gerdes, K. (2013). Collaborative dependency annotation. Proceedings of the second international conference on dependency linguistics (DepLing 2013) (pp. 88–97).

  • Manning, C.D. (2011). Part-of-speech tagging from 97% to 100%: is it time for some linguistics? International conference on intelligent text processing and computational linguistics (pp. 171–189). Springer.

  • Nivre, J., De Marneffe, M.C., Ginter, F., Goldberg, Y., Hajic, J., Manning, C.D., McDonald, R., Petrov, S., Pyysalo, S., Silveira, N., Reut, T., & Daniel, Z. (2016). Universal dependencies v1: A multilingual treebank collection. Proceedings of the tenth international conference on language resources and evaluation (LREC’16) (pp. 1659–1666).

  • Nivre, J., & Fang, C.T. (2017). Universal dependency evaluation. Proceedings of the NoDaLiDa 2017 workshop on universal dependencies (UDW 2017) (pp. 86–95).

  • Oliva, K. (2001). The possibilities of automatic detection/correction of errors in tagged corpora: A pilot study on a german corpus. International conference on text, speech and dialogue (pp. 39–46). Springer.

  • Qi, P., Zhang, Y., Zhang, Y., Bolton, J., & Manning, C.D. (2020). Stanza: A python natural language processing toolkit for many human languages. Proceedings of the 58th annual meeting of the association for computational linguistics: system demonstrations (pp. 101–108). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-demos.14.

  • Rademaker, A., Chalub, F., Real, L., Freitas, C., Bick, E., & de Paiva, V. (2017). Universal dependencies for portuguese. Proceedings of the fourth international conference on dependency linguistics (Depling 2017) (pp. 197–206)

  • Schneider, G., & Volk, M. (1998). Comparing a statistical and a rule-based tagger for german. KONVENS-98

  • Straka, M., Hajic, J., & Straková, J. (2016). Udpipe: trainable pipeline for processing conll-u files performing tokenization, morphological analysis, pos tagging and parsing. Proceedings of the tenth international conference on language resources and evaluation (LREC’16) (pp. 4290–4297). LRCE.

  • van Halteren, H. (2000). The detection of inconsistency in manually tagged text. Proceedings of the COLING-2000 workshop on linguistically interpreted corpora (pp. 48–55). International Committee on Computational Linguistics, Centre Universitaire. https://aclanthology.org/W00-1907

  • Volokh, A., & Neumann, G. (2011). Automatic detection and correction of errors in dependency treebanks. Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies (pp. 346–350). Association for Computational Linguistics. https://aclanthology.org/P11-2060

  • Wallis, S. (2003). Completing parsed corpora. Treebanks (pp. 61–71). Springer.

  • Zeman, D., Hajic, J., Popel, M., Potthast, M., Straka, M., Ginter, F., Nivre, J., & Petrov, S. (2018). Conll 2018 shared task: multilingual parsing from raw text to universal dependencies. Proceedings of the CoNLL 2018 shared task: multilingual parsing from raw text to universal dependencies (pp. 1–21)

Download references

Acknowledgements

This study was partially funded by the National Agency for Petroleum, Natural Gas and Biofuels (ANP), Brazil, associated with the investment of resources from the R, D & I Clauses, through a Cooperation Agreement between Petrobras and PUC-Rio. We would like to thank the team at the Applied Computational Intelligence Laboratory (ICA) at PUC-Rio for the generation of morphosyntactic annotation models trained in Stanza, and Elvis de Souza thanks the National Council for Scientific and Technological Development (CNPq) for the Masters scholarship process no. 130495/2021-2.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cláudia Freitas.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Freitas, C., de Souza, E. A study on methods for revising dependency treebanks: in search of gold. Lang Resources & Evaluation 58, 111–131 (2024). https://doi.org/10.1007/s10579-023-09653-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-023-09653-4

Keywords

Navigation