Abstract
In the past few years, performance of dependency parsing has been improved by large margin on closed-domain benchmark datasets. However, when processing real-life texts, parsing performance degrades dramatically. Besides the domain adaptation technique, which has made slow progress due to its intrinsic difficulty, one straightforward way is to annotate a certain scale of syntactic data given a new source of texts. However, it is well known that annotating data is time and effort consuming, especially for the complex syntactic annotation. Inspired by the progress in crowdsourcing, this paper proposes to annotate noisy multi-annotation syntactic data with non-experts annotators. Each sentence is independently annotated by multiple annotators and the inconsistencies are retained. In this way, we can annotate data very rapidly since we can recruit many ordinary annotators. Then we construct and release three multi-annotation datasets from different sources. Finally, we propose and compare several benchmark approaches to training dependency parsers on such multi-annotation data. We will release our code and data at http://hlt.suda.edu.cn/~zhli/.
This work was supported by National Nature Science Foundation of China (Grant No. 61876116, 61525205) and a project funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions. We thank the anonymous reviewers for the helpful comments.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Caro, L.D., Grella, M.: Sentiment analysis via dependency parsing. Comput. Stand. Interfaces 35(5), 442–453 (2013)
Che, W., Li, Z., Liu, T.: Chinese dependency treebank 1.0 (LDC2012T05). Website. Linguistic Data Consortium (2012). https://catalog.ldc.upenn.edu/LDC2012T05
Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the EM algorithm. J. Roy. Stat. Soc.: Ser. C (Appl. Stat.) 28(1), 20–28 (1979)
Demartini, G., Difallah, D.E., Cudré-Mauroux, P.: ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: Proceedings of WWW, pp. 469–478 (2012)
Dozat, T., Manning, C.D.: Deep biaffine attention for neural dependency parsing. In: Proceedings of ICLR (2017)
Dredze, M., Talukdar, P.P., Crammer, K.: Sequence learning from data with multiple labels. In: Proceedings of ECML/PKDD Workshop Co-Chairs, p. 39 (2009)
Flannery, D., Miayo, Y., Neubig, G., Mori, S.: Training dependency parsers from partially annotated corpora. In: Proceedings of 5th IJCNLP, pp. 776–784 (2011)
Guo, L., Li, Z., Peng, X., Zhang, M.: Annotation guideline of Chinese dependency treebankfrom multi-domain and multi-source texts. J. Chin. Inform. Process. 32(10), 28–35 (2018)
Gurari, D., et al.: How to collect segmentations for biomedical images? A benchmark evaluating the performance of experts, crowdsourced non-experts, and algorithms. In: Proceedings of WACV, pp. 1169–1176. IEEE (2015)
Li, Z., et al.: Active learning for dependency parsing with partial annotation. In: Proceedings of ACL, pp. 344–354 (2016)
Mcclosky, D., Surdeanu, M., Manning, C.D.: Event extraction as dependency parsing. In: Proceedings of ACL, pp. 1626—1635 (2011)
Nguyen, A.T., Wallace, B.C., Li, J.J., Nenkova, A., Lease, M.: Aggregating and predicting sequence labels from crowd annotations. In: Proceedings of ACL, vol. 2017, p. 299. NIH Public Access (2017)
Oepen, S., et al.: SemEval 2014 task 8: broad-coverage semantic dependency parsing. In: Proceedings of the 8th International Workshop on Semantic Evaluation. (SemEval 2014), pp. 63–72 (2014)
Qiu, L., Zhang, Y., Jin, P., Wang, H.: Multi-view Chinese treebanking. In: Proceedings of COLING, pp. 257–268 (2014)
Raykar, V.C., et al.: Supervised learning from multiple experts: whom to trust when everyone lies a bit. In: Proceedings of ICML, pp. 889–896 (2009)
Raykar, V.C., et al.: Learning from crowds. J. Mach. Learn. Res. 11, 1297–1322 (2010)
Rodrigues, F., Pereira, F., Ribeiro, B.: Sequence labeling with multiple annotators. Mach. Learn. 95(2), 165–181 (2013). https://doi.org/10.1007/s10994-013-5411-2
Sassano, M., Kurohashi, S.: Using smaller constituents rather than sentences in active learning for Japanese dependency parsing. In: Proceedings of ACL, pp. 356–365 (2010)
Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? Improving data quality and data mining using multiple, noisy labelers. In: Proceedings ACM SIGKDD, pp. 614–622 (2008)
Snow, R., O’connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast-but is it good? Evaluating non-expert annotations for natural language tasks. In: Proceedings of EMNLP, pp. 254–263 (2008)
Xue, N., Xia, F., Chiou, F., Palmer, M.: The penn Chinese treebank: phrase structure annotation of a large corpus. Nat. Lang. Eng. 11(2), 207–238 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhao, Y., Zhou, M., Li, Z., Zhang, M. (2020). Dependency Parsing with Noisy Multi-annotation Data. In: Zhu, X., Zhang, M., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2020. Lecture Notes in Computer Science(), vol 12431. Springer, Cham. https://doi.org/10.1007/978-3-030-60457-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-60457-8_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60456-1
Online ISBN: 978-3-030-60457-8
eBook Packages: Computer ScienceComputer Science (R0)