Abstract
In this paper, we compare delexicalized transfer and minimally supervised parsing techniques on 32 different languages from Universal Dependencies treebank collection. The minimal supervision is in adding handcrafted universal grammatical rules for POS tags. The rules are incorporated into the unsupervised dependency parser in forms of external prior probabilities. We also experiment with learning this probabilities from other treebanks. The average attachment score of our parser is slightly lower then the delexicalized transfer parser, however, it performs better for languages from less resourced language families (non-Indo-European) and is therefore suitable for those, for which the treebanks often do not exist.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
In the fully unsupervised setting, we cannot for example simply push verbs to the roots and nouns to become their dependents. This is already a kind of supervision.
- 2.
- 3.
We exclude ‘Ancient Greek-PROIEL’, ‘Finnish-FTB’, ‘Japan-KTC’, ‘Latin-ITT’, and ‘Latin-PROIEL’ treebanks.
- 4.
Malt parser in the current version 1.8.1 (http://maltparser.org).
- 5.
- 6.
We had to change the original parser code to do this.
- 7.
Note that for example \(p^{ext}_{attach}(PUNC|VERB,dir) = 1\) does not mean that all the dependents of VERB must be PUNC. Since the \(\lambda _{attach}\) is less than one, the value 1 only pushes punctuation to be attached below verbs.
- 8.
The results of different parameter settings for both parser varied only little (at most 2 % difference for all the languages).
- 9.
We used the Malt parser with its default feature set. Tuning in this specific delexicalized task would probably bring a bit better results.
- 10.
Danish is the only exception.
References
Blunsom, P., Cohn, T.: Unsupervised induction of tree substitution grammars for dependency parsing. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 1204–1213. EMNLP 2010. Association for Computational Linguistics, Stroudsburg (2010)
Buchholz, S., Marsi, E.: CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of the Tenth Conference on Computational Natural Language Learning, pp. 149–164. CoNLL-X 2006. Association for Computational Linguistics, Stroudsburg (2006)
Cerisara, C., Lorenzo, A., Kral, P.: Weakly supervised parsing with rules. In: Interspeech 2013, Lyon, France, pp. 2192–2196 (2013)
Gilks, W.R., Richardson, S., Spiegelhalter, D.J.: Markov chain Monte Carlo in practice. Interdisciplinary Statistics. Chapman & Hall, London (1996)
Hajič, J., Hajičová, E., Panevová, J., Sgall, P., Pajas, P., Štěpánek, J., Havelka, J., Mikulová, M.: Prague Dependency Treebank 2.0. CD-ROM, Linguistic Data Consortium, LDC Catalog No.: LDC2006T01, Philadelphia (2006)
Headden III, W.P., Johnson, M., McClosky, D.: Improving unsupervised dependency parsing with richer contexts and smoothing. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. NAACL 2009, pp. 101–109. Association for Computational Linguistics, Stroudsburg (2009)
Klein, D., Manning, C.D.: Corpus-based induction of syntactic structure: models of dependency and constituency. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. ACL 2004. Association for Computational Linguistics, Stroudsburg (2004)
Marecek, D.: Multilingual unsupervised dependency parsing with unsupervised POS tags. In: Sidorov, G., et al. (eds.) MICAI 2015. LNCS, vol. 9413, pp. 72–82. Springer, Heidelberg (2015). doi:10.1007/978-3-319-27060-9_6
Mareček, D., Straka, M.: Stop-probability estimates computed on a large corpus improve unsupervised dependency parsing. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 281–290. Association for Computational Linguistics, Sofia, August 2013
de Marneffe, M.C., Dozat, T., Silveira, N., Haverinen, K., Ginter, F., Nivre, J., Manning, C.D.: Universal stanford dependencies: a cross-linguistic typology. In: Proceedings of the 9th Conference on Language Resources and Evaluation (LREC) (2014)
de Marneffe, M.C., Manning, C.D.: The stanford typed dependencies representation. In: Coling 2008: Proceedings of the Workshop on Cross-Framework and Cross-Domain Parser Evaluation. CrossParser 2008, pp. 1–8. Association for Computational Linguistics, Stroudsburg (2008)
Mcdonald, R., Nivre, J., Quirmbach-brundage, Y., Goldberg, Y., Das, D., Ganchev, K., Hall, K., Petrov, S., Zhang, H., Tckstrm, O., Bedini, C., Bertomeu, N., Lee, C.J.: Universal dependency annotation for multilingual parsing. In: Proceedings of ACL 2013 (2013)
McDonald, R., Pereira, F., Ribarov, K., Hajič, J.: Non-projective dependency parsing using spanning tree algorithms. In: Proceedings of Human Langauge Technology Conference and Conference on Empirical Methods in Natural Language Processing (HTL/EMNLP), Vancouver, BC, Canada, pp. 523–530 (2005)
McDonald, R., Petrov, S., Hall, K.: Multi-source transfer of delexicalized dependency parsers. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. EMNLP 2011, pp. 62–72. Association for Computational Linguistics, Stroudsburg (2011)
Naseem, T., Chen, H., Barzilay, R., Johnson, M.: Using universal linguistic knowledge to guide grammar induction. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. EMNLP 2010, pp. 1234–1244. Association for Computational Linguistics, Stroudsburg (2010)
Nivre, J.: Non-projective dependency parsing in expected linear time. In: Su, K.Y., Su, J., Wiebe, J. (eds.) ACL/IJCNLP, pp. 351–359. The Association for Computer Linguistics, Stroudsburg (2009)
Nivre, J., Hall, J., Kübler, S., McDonald, R., Nilsson, J., Riedel, S., Yuret, D.: The CoNLL 2007 shared task on dependency parsing. In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pp. 915–932. Association for Computational Linguistics, Prague, June 2007
Nivre, J., de Marneffe, M.C., Ginter, F., Goldberg, Y., Hajič, J., Manning, C., McDonald, R., Petrov, S., Pyysalo, S., Silveira, N., Tsarfaty, R., Zeman, D.: Universal dependencies v1: a multilingual treebank collection. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016). European Language Resources Association, Portorož (2016)
Petrov, S., Das, D., McDonald, R.: A universal part-of-speech tagset. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA), Istanbul, May 2012
Rosa, R.: Multi-source cross-lingual delexicalized parser transfer: Prague or Stanford? In: Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015), pp. 281–290. Uppsala University, Uppsala (2015)
Rosa, R., Mašek, J., Mareček, D., Popel, M., Zeman, D., Žabokrtský, Z.: HamleDT 2.0: thirty dependency treebanks stanfordized. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), Reykjavik, Iceland, May 26–31, 2014, pp. 2334–2341 (2014)
Spitkovsky, V.I., Alshawi, H., Chang, A.X., Jurafsky, D.: Unsupervised dependency parsing without gold part-of-speech tags. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP 2011) (2011). pubs/goldtags.pdf
Spitkovsky, V.I., Alshawi, H., Jurafsky, D.: Punctuation: making a point in unsupervised dependency parsing. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning (CoNLL-2011) (2011)
Spitkovsky, V.I., Alshawi, H., Jurafsky, D.: Three dependency-and-boundary models for grammar induction. In: Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2012) (2012). pubs/dbm.pdf
Zeman, D.: Reusable tagset conversion using tagset drivers. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008). European Language Resources Association (ELRA), Marrakech (May 2008). http://www.lrec-conf.org/proceedings/lrec2008/
Zeman, D., Dušek, O., Mareček, D., Popel, M., Ramasamy, L., Štěpánek, J., Žabokrtský, Z., Hajič, J.: HamleDT: harmonized multi-language dependency treebank. Lang. Resour. Eval. 48(4), 601–637 (2014)
Zeman, D., Mareček, D., Popel, M., Ramasamy, L., Štěpánek, J., Žabokrtský, Z., Hajič, J.: HamleDT: to Parse or not to parse? In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA), Istanbul (2012)
Zeman, D., Resnik, P.: Cross-language parser adaptation between related languages. In: IJCNLP 2008 Workshop on NLP for Less Privileged Languages, pp. 35–42. Asian Federation of Natural Language Processing. International Institute of Information Technology, Hyderabad (2008)
Acknowledgments
This work has been supported by the grant 14-06548P of the Czech Science Foundation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Mareček, D. (2016). Delexicalized and Minimally Supervised Parsing on Universal Dependencies. In: Král, P., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2016. Lecture Notes in Computer Science(), vol 9918. Springer, Cham. https://doi.org/10.1007/978-3-319-45925-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-45925-7_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45924-0
Online ISBN: 978-3-319-45925-7
eBook Packages: Computer ScienceComputer Science (R0)