Skip to main content
Log in

Generalizing the semantic roles in the Chinese Proposition Bank

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

The Chinese Proposition Bank (CPB) is a corpus annotated with semantic roles for the arguments of verbal and nominalized predicates. The semantic roles for the core arguments are defined in a predicate-specific manner. That is, a set of semantic roles, numerically identified, are defined for each sense of a predicate lemma and recorded in a valency lexicon called frame files. The predicate-specific manner in which the semantic roles are defined reduces the cognitive burden on the annotators since they only need to internalize a few roles at a time and this has contributed to the consistency in annotation. It was also a sensible approach given the contentious issue of how many semantic roles are needed if one were to adopt of set of global semantic roles that apply to all predicates. A downside of this approach, however, is that the predicate-specific roles may not be consistent across predicates, and this inconsistency has a negative impact on training automatic systems. Given the progress that has been made in defining semantic roles in the last decade or so, time is ripe for adopting a set of general semantic roles. In this article, we describe our effort to “re-annotate” the CPB with a set of “global” semantic roles that are predicate-independent and investigate their impact on automatic semantic role labeling systems. When defining these global semantic roles, we strive to make them compatible with a recently published ISO standards on the annotation of semantic roles (ISO 24617-4:2014 SemAF-SR) while taking the linguistic characteristics of the Chinese language into account. We show that in spite of the much larger number of global semantic roles, the accuracy of an off-the-shelf semantic role labeling system retrained on the data re-annotated with global semantic roles is comparable to that trained on the data set with the original predicate-specific semantic roles. We also argue that the re-annotated data set, together with the original data, provides the user with more flexibility when using the corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Here “obligatory” does not mean that they have to be realized syntactically as in some cases they clearly can be dropped.

  2. An exception is made for arguments involving a coordination construction where one might argue there are multiple arguments playing the same role. For example in “/xiaozhang /yesterday /and /xiaowang /together /go to school (Xiaozhang and Xiaowang went to school together yesterday)”, “” and “” are both Agent. Alternatively one might say that coordinating conjunctions allow the conjuncts to collectively play the same role, and therefore the semantic roles are still unique.

  3. http://www.iso.org/iso/catalogue_detail.htm?csnumber=56866.

  4. http://mallet.cs.umass.edu/index.php.

  5. https://code.google.com/p/berkeleyparser/.

References

  • Aziz, W., Rios, M., & Specia, L. (2011). Improving chunk-based semantic role labeling with lexical features. In Proceedings of recent advances in natural language processing (pp. 226–232). Bulgaria: Hissar.

  • Baker, C. F., Fillmore, C. J., & Lowe, J. B. (1998). The Berkeley framenet project. In Proceedings of COLING/ACL. Montreal, Canada.

  • Che, W., Li, Z., Li, Y., Guo, Y., Qin, B., & Liu, T. (2009). Multilingual dependency-based syntactic and semantic parsing. In Proceedings of the thirteenth conference on computational natural language learning: shared task.

  • Ding, W. & Chang, B. (2008). Improving Chinese semantic role classification with hierarchical feature selection strategy. In Proceedings of the conference on empirical methods in natural language processing.

  • Gildea, D., & Jurafsky, D. (2002). Automatic labeling of semantic roles. Computational LInguistics, 28(3), 245–288.

    Article  Google Scholar 

  • Hacioglu, K., Pradhan, S., Ward, W., Martin, J. H., & Jurafsky, D. (2004). Semantic role labeling by tagging syntactic chunks. In Eighth conference on natural language learning.

  • Kipper-Schuler, K. (2005). VerbNet: A broad-coverage, comprehensive verb lexicon. Ph.D. thesis, Computer and Information Science Department, University of Pennsylvania, Philadelphia, US.

  • Koomen, P., Punyakanok, V., Roth, D., & tau Yih W. (2005) Generalized inference with multiple semantic role labeling systems. In Proceedings of the nineth conference on natural language learning (pp. 181–184). Michigan: Ann Arbor

  • Li, J., Zhou, G., & Ng, H. T. (2010). Joint syntactic and semantic parsing of Chinese. In Proceedings of the 48th annual meeting of the association for computational linguistics.

  • Li, J., Zhou, G., Zhao, H., Zhu, Q., & Qian, P. (2009). Improving Nominal SRL in Chinese Language with Verbal SRL Information and Automatic Predicate Recognition. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3. Singapore.

  • Loper, E., Yi, S. T., & Palmer, M. (2007). Combining lexical resources: Mapping between propbank and verbnet. In Proceedings of the 7th international workshop on computational linguistics. Tilburg.

  • Palmer, M., Gildea, D., & Kingsbury, P. (2005). The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1), 71–106.

    Article  Google Scholar 

  • Petrov, S., & Klein, D. (2007) Improved inference for unlexicalized parsing. In Proceedings of NAACL HLT. Rochester.

  • Petukhova, O., Schiffrin, A., & Bunt, H. (2007). Defining semantic roles. In Proceedings of 7th international workshop on computational semantics (pp. 362–165). Netherlands

  • Petukhova, V., & Bunt, H. (2008) LIRICS semantic role annotation: Design and evaluation of a set of data categories. In Proceedings of the sixth international conference on language resources and evaluation. Marrakech.

  • Pradhan, S., Ward, W., Hacioglu, K., Martin, J. H., & Jurafsky, D. (2004). Shallow semantic parsing using support vector machines. In Proceedings of the human language technology conference (HLT/NAACL-2004) (pp. 233–240). Boston.

  • Pradhan, S., Ward, W., Hacioglu, K., Martin, J. H., & Jurafsky, D. (2005). Semantic role labeling using different syntactic views. In Proceedings of ACL 2005 (pp. 581–588). Ann Arbor.

  • Roth, D., & Yih, W. T. (2005) Integer linear programming inference for conditional random fields. In Proceedings of the 22nd international conference on machine learning (pp. 736–743).

  • Sun, W. (2010). Improving Chinese semantic role labeling with rich syntactic features. In Proceedings of the ACL 2010 conference short papers (pp. 168–172).

  • Sun, W., Sui, Z., Wang, M., & Wang, X. (2009) Chinese semantic role labeling with shallow parsing. In Proceedings of the 2009 conference on empirical methods in natural language processing (Vol. 3, pp. 1475–1483).

  • Toutanova, K., Haghighi, A., & Manning, C. (2005). Joint learning improves semantic role labeling. In Proceedings of ACL-2005 (pp. 589–596).

  • Wang, H. (2003) Development of a large-scale lexical semantic knowledge-base of Chinese. In Proceedings of 17th Pacific Asia conference on language, information and computation (PACLIC17) (pp. 243–250). Singapore.

  • Xue, N. (2006). A Chinese lexicon of roles and senses. Language Resources and Evaluation, 40(3–4), 395–403.

    Google Scholar 

  • Xue, N. (2008). Labeling Chinese predicates with semantic roles. Computational Linguistics, 34(2), 225–255.

    Article  Google Scholar 

  • Xue, N., & Palmer, M. (2004) Calibrating features for semantic role labeling. In Proceedings of 2004 conference on empirical methods in natural language processing. Lisbon.

  • Xue, N., & Palmer, M. (2005) Automatic semantic role labeling for Chinese verbs. In Proceedings of the 19th international joint conference on artificial intelligence. Edinburgh.

  • Xue, N., & Palmer, M. (2009). Adding semantic roles to the Chinese Treebank. Natural Language Engineering, 15(1), 143–172.

    Article  Google Scholar 

  • Xue, N., Xia, F., Chiou, F. D., & Palmer, M. (2005). The Penn Chinese TreeBank: Phrase structure annotation of a large corpus. Natural Language Engineering, 11(2), 207–238.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaopeng Bai.

Additional information

This work is funded by the DAPRA via contract HR0011-11-C-0145 entitled “Linguistic Resources for Multilingual Processing”. All opinions expressed here are those of the authors and do not necessarily reflect the views of DARPA.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bai, X., Xue, N. Generalizing the semantic roles in the Chinese Proposition Bank. Lang Resources & Evaluation 50, 643–666 (2016). https://doi.org/10.1007/s10579-016-9342-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-016-9342-y

Keywords

Navigation