Abstract
Research on dependency parsing has always had a strong multilingual orientation, but the lack of standardized annotations for a long time made it difficult both to meaningfully compare results across languages and to develop truly multilingual systems. The Universal Dependencies project has during the last five years tried to overcome this obstacle by developing cross-linguistically consistent morphosyntactic annotation for many languages. During the same period, dependency parsing (like the rest of NLP) has been transformed by the adoption of continuous vector representations and neural network techniques. In this paper, I will introduce the framework and resources of Universal Dependencies, and discuss advances in dependency parsing enabled by these resources in combination with deep learning techniques, ranging from traditional word and character embeddings to deep contextualized word representations like ELMo and BERT.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
There was an overlap of 4 languages between the two shared tasks.
- 3.
This research trend was not limited to dependency parsing, but also included groundbreaking work on constituency parsing.
- 4.
- 5.
The features displayed in Fig. 1 are only a small subset of the features that would appear in a complete annotation of the two sentences.
- 6.
UD releases are numbered by letting the first digit (2) refer to the version of the guidelines and the second digit (5) to the number of releases under that version.
- 7.
The proportion of Indo-European languages has gone from 60% in v2.1 to 53% in v2.5.
- 8.
Except for Chinese, for which we make use of a separate, pretrained model.
- 9.
- 10.
The published paper contains a third extension, which we omit here because of space constraints, where we investigate whether the models exhibit a preference for different syntactic frameworks.
References
Andrews, A.D.: The major functions of the noun phrase. In: Shopen, T. (ed.) Language Typology and Syntactic Description. Volume I: Clause Structure, 2nd edn., pp. 132–223. Cambridge University Press, Cambridge (2007)
Buchholz, S., Marsi, E.: CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLL), pp. 149–164 (2006)
Che, W., Liu, Y., Wang, Y., Zheng, B., Liu, T.: Towards better UD parsing: deep contextualized word embeddings, ensemble, and treebank concatenation. In: Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 55–64 (2018)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2019)
Dozat, T., Manning, C.D.: Deep biaffine attention for neural dependency parsing. In: Proceedings of the 5th International Conference on Learning Representations (2017)
Dozat, T., Qi, P., Manning, C.D.: Stanford’s graph-based neural dependency parser at the CoNLL 2017 shared task. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 20–30 (2017)
Duong, L., Cohn, T., Bird, S., Cook, P.: Low resource dependency parsing: cross-lingual parameter sharing in a neural network parser. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL), pp. 845–850 (2015)
Futrell, R., Mahowald, K., Gibson, E.: Quantifying word order freedom in dependency corpora. In: Proceedings of the Third International Conference on Dependency Linguistics (Depling), pp. 91–100 (2015)
Guo, J., Che, W., Yarowsky, D., Wang, H., Liu, T.: Cross-lingual dependency parsing based on distributed representations. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL), pp. 1234–1244 (2015)
Hewitt, J., Manning, C.D.: A structural probe for finding syntax in word representations. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2019)
Kiperwasser, E., Goldberg, Y.: Simple and accurate dependency parsing using bidirectional LSTM feature representations. Trans. Assoc. Comput. Linguist. 4, 313–327 (2016)
Kondratyuk, D., Straka, M.: 75 languages, 1 model: parsing Universal Dependencies universally. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2779–2795 (2019)
Kuhlmann, M., Gómez-Rodríguez, C., Satta, G.: Dynamic programming algorithms for transition-based dependency parsers. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 673–682 (2011)
Kulmizev, A., de Lhoneux, M., Gontrum, J., Fano, E., Nivre, J.: Deep contextualized word embeddings in transition-based and graph-based dependency parsing - a tale of two parsers revisited. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2755–2768 (2019)
Kulmizev, A., Ravishankar, V., Abdou, M., Nivre, J.: Do neural language models show preferences for syntactic formalisms? In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 4077–4091 (2020)
Levshina, N.: Token-based typology and word order entropy: a study based on Universal Dependencies. Linguist. Typology 23, 533–572 (2019)
de Lhoneux, M.: Linguistically informed neural dependency parsing for typologically diverse languages. Ph.D. thesis, Uppsala University (2019)
de Lhoneux, M., et al.: From raw text to Universal Dependencies - look, no tags! In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 207–217 (2017)
de Lhoneux, M., Stymne, S., Nivre, J.: Arc-hybrid non-projective dependency parsing with a static-dynamic oracle. In: Proceedings of the 15th International Conference on Parsing Technologies, pp. 99–104 (2017)
McDonald, R., Nivre, J.: Characterizing the errors of data-driven dependency parsing models. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 122–131 (2007)
McDonald, R., Nivre, J.: Analyzing and integrating dependency parsers. Comput. Linguist. 37(1), 197–230 (2011)
McDonald, R., Pereira, F., Ribarov, K., Hajič, J.: Non-projective dependency parsing using spanning tree algorithms. In: Proceedings of the Human Language Technology Conference and the Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pp. 523–530 (2005)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Nivre, J.: An efficient algorithm for projective dependency parsing. In: Proceedings of the 8th International Workshop on Parsing Technologies (IWPT), pp. 149–160 (2003)
Nivre, J.: Algorithms for deterministic incremental dependency parsing. Comput. Linguist. 34, 513–553 (2008)
Nivre, J.: Non-projective dependency parsing in expected linear time. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP (ACL-IJCNLP), pp. 351–359 (2009)
Nivre, J.: Towards a universal grammar for natural language processing. In: Gelbukh, A. (ed.) CICLing 2015. LNCS, vol. 9041, pp. 3–16. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18111-0_1
Nivre, J., Boguslavsky, I.M., Iomdin, L.L.: Parsing the SynTagRus treebank of Russian. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pp. 641–648 (2008)
Nivre, J., et al.: The CoNLL 2007 shared task on dependency parsing. In: Proceedings of the CoNLL Shared Task of EMNLP-CoNLL 2007, pp. 915–932 (2007)
Nivre, J., et al.: Universal Dependencies v1: a multilingual treebank collection. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC) (2016)
Nivre, J., et al.: Universal Dependencies v2: an evergrowing multilingual treebank collection. In: Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC) (2020)
Östling, R.: Word order typology through multilingual word alignment. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 205–211 (2015)
Peters, M.E., et al.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 2227–2237 (2018)
Schubert, K., Maxwell, D.: Metataxis in Practice: Dependency Syntax for Multilingual Machine Translation. Mouton de Gruyter, Berlin (1987)
Smith, A., Bohnet, B., de Lhoneux, M., Nivre, J., Shao, Y., Stymne, S.: 82 treebanks, 34 models: universal dependency parsing with multi-treebank models. In: Proceedings of the 2018 CoNLL Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies (2018)
Smith, A., de Lhoneux, M., Stymne, S., Nivre, J.: An investigation of the interactions between pre-trained word embeddings, character models and POS tags in dependency parsing. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (2018)
Stevenson, M., Greenwood, M.A.: Dependency pattern models for information extraction. Res. Lang. Comput. 7, 13–39 (2009). https://doi.org/10.1007/s11168-009-9061-2
Tesnière, L.: Éléments de syntaxe structurale. Editions Klincksieck (1959)
Thompson, S.A.: Discourse motivations for the core-oblique distinction as a language universal. In: Kamio, A. (ed.) Directions in Functional Linguistics, pp. 59–82. John Benjamins, Amsterdam (1997)
Tiedemann, J.: Cross-lingual dependency parsing with Universal Dependencies and predicted PoS labels. In: Proceedings of the Third International Conference on Dependency Linguistics (Depling), pp. 340–349 (2015)
Tsarfaty, R., Seddah, D., Kübler, S., Nivre, J.: Parsing morphologically rich languages: introduction to the special issue. Computat. Linguist. 39, 15–22 (2013)
Zeman, D., et al.: CoNLL 2018 shared task: multilingual parsing from raw text to universal dependencies. In: Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies (2018)
Zeman, D., et al.: Universal Dependencies 2.5 (2019). http://hdl.handle.net/11234/1-3105, LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University. http://hdl.handle.net/11234/1-3105
Zeman, D., et al.: CoNLL 2017 shared task: multilingual parsing from raw text to universal dependencies. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 1–19 (2017)
Zhang, Y., Nivre, J.: Analyzing the effect of global learning and beam-search on transition-based dependency parsing. In: Proceedings of COLING 2012: Posters, pp. 1391–1400 (2012)
Acknowledgments
I want to thank (present and former) members of the Uppsala parsing group – Ali Basirat, Miryam de Lhoneux, Artur Kulmizev, Paola Merlo, Aaron Smith and Sara Stymne – colleagues in the core UD group – Marie de Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajič, Chris Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Sebastian Schuster, Reut Tsarfaty, Francis Tyers and Dan Zeman – and all contributors in the UD community. I acknowledge the computational resources provided by CSC in Helsinki and Sigma2 in Oslo through NeIC-NLPL (www.nlpl.eu).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Nivre, J. (2020). Multilingual Dependency Parsing from Universal Dependencies to Sesame Street. In: Sojka, P., Kopeček, I., Pala, K., Horák, A. (eds) Text, Speech, and Dialogue. TSD 2020. Lecture Notes in Computer Science(), vol 12284. Springer, Cham. https://doi.org/10.1007/978-3-030-58323-1_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-58323-1_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58322-4
Online ISBN: 978-3-030-58323-1
eBook Packages: Computer ScienceComputer Science (R0)