Abstract
The paper deals with Google’s universal parser SyntaxNet. The system was used to analyze the Universal Dependencies linguistic corpora. We conducted an error analysis of the output of the parser to reveal to what extent the error types are connected with or preconditioned by the language types. In particular, we carried out several experiments, clustering the languages based on the frequency of different errors made by SyntaxNet, and studied the similarity of the resulting clustering with the traditional typology of languages. Three types of errors were separately considered: part-of-speech tagging, dependency labeling, and attachment errors. We show that there is indeed a correlation between error frequencies and language types, which might indicate that to further improve the performance of a universal parser, one needs to take into account language-specific morphological and syntactic structures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Made available by Google: https://github.com/tensorflow/models/blob/master/syntaxnet/universal.md.
References
Andor, D., Alberti, C., Weiss, D., Severyn, A., Presta, A., Ganchev, K., Petrov, S., Collins, M.: Globally normalized transition-based neural networks. arXiv preprint arXiv:1603.06042 (2016)
Buchholz, S., Marsi, E.: CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of the Tenth Conference on Computational Natural Language Learning, pp. 149–164. Association for Computational Linguistics (2006)
Charniak, E.: A maximum-entropy-inspired parser. In: Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference, pp. 132–139. Association for Computational Linguistics (2000)
Collins, M.: Three generative, lexicalised models for statistical parsing. In: Proceedings of the Eighth Conference on European Chapter of the Association for Computational Linguistics, pp. 16–23. Association for Computational Linguistics (1997)
Collins, M.: Head-driven statistical models for natural language parsing. Comput. Linguis. 29(4), 589–637 (2003)
Covington, M.A.: A fundamental algorithm for dependency parsing. In: Proceedings of the 39th Annual ACM Southeast Conference, pp. 95–102 (2001)
Eisner, J.M.: Three new probabilistic models for dependency parsing: an exploration. In: Proceedings of the 16th Conference on Computational Linguistics, vol. 1, pp. 340–345. Association for Computational Linguistics (1996)
Ferguson, C.A.: Sports announcer talk: syntactic aspects of register variation. Lang. Soc. 12(2), 153–172 (1983)
Ferrara, K., Brunner, H., Whittemore, G.: Interactive written discourse as an emergent register. Written Commun. 8(1), 8–34 (1991)
Haegeman, L.: Understood subjects in English diaries. On the relevance of theoretical syntax for the study of register variation. Multilingua J. Cross Cult. Interlanguage Commun. 9(2), 157–199 (1990)
Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, pp. 423–430. Association for Computational Linguistics (2003)
Li, J., Monroe, W., Jurafsky, D.: Understanding neural networks through representation erasure. arXiv preprint arXiv:1612.08220 (2017)
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the penn treebank. Comput. Linguist. 19(2), 313–330 (1993)
Matthews, P.H.: Syntax. Cambridge Textbooks in Linguistics, pp. 69–75. Cambridge University Press, Cambridge (1981)
McDonald, R., Lerman, K., Pereira, F.: Multilingual dependency analysis with a two-stage discriminative parser. In: Proceedings of the Tenth Conference on Computational Natural Language Learning, pp. 216–220. Association for Computational Linguistics (2006)
McDonald, R.T., Nivre, J.: Characterizing the errors of data-driven dependency parsing models. In: EMNLP-CoNLL, pp. 122–131 (2007)
McDonald, R.T., Nivre, J., Quirmbach-Brundage, Y., Goldberg, Y., Das, D., Ganchev, K., Hall, K., Petrov, S., Zhang, H., Täckström, O., Bedini, C., Castelló, N.B., Lee, J.: Universal dependency annotation for multilingual parsing. In: ACL (2), pp. 92–97 (2013)
Nivre, J.: Dependency grammar and dependency parsing. MSI Rep. 5133(1959), 1–32 (2005)
Nivre, J., Hall, J., Nilsson, J.: Maltparser: a data-driven parser-generator for dependency parsing. In: Proceedings of LREC, vol. 6, pp. 2216–2219 (2006)
Nivre, J., de Marneffe, M.C., Ginter, F., Goldberg, Y., Hajic, J., Manning, C.D., McDonald, R., Petrov, S., Pyysalo, S., Silveira, N., Tsarfaty, R., Zeman, D.: Universal dependencies v1: a multilingual treebank collection. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), pp. 1659–1666, May 2016
Petrov, S.: Announcing syntaxnet: The world’s most accurate parser goes open source. Google Research Blog, 12 May 2016
Ward Jr., J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244 (1963)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Durandin, O., Malafeev, A., Zolotykh, N. (2018). SyntaxNet Errors from the Linguistic Point of View. In: van der Aalst, W., et al. Analysis of Images, Social Networks and Texts. AIST 2017. Lecture Notes in Computer Science(), vol 10716. Springer, Cham. https://doi.org/10.1007/978-3-319-73013-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-73013-4_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73012-7
Online ISBN: 978-3-319-73013-4
eBook Packages: Computer ScienceComputer Science (R0)