Abstract
Discourse parsing is a challenging task and plays a critical role in discourse analysis. Since the release of the Rhetorical Structure Theory Discourse Treebank (RST-DT) and the Penn Discourse Treebank (PDTB), the research on English discourse parsing has attracted increasing attention and achieved considerable success in recent years. At the same time, some preliminary research on certain subtasks about discourse parsing for other languages, such as Chinese, has been conducted. In this paper, the Connective-driven Dependency Treebank (CDTB) corpus is introduced. Then an end-to-end Chinese discourse parser to parse free texts into the Connective-driven Dependency Tree (CDT) style is presented. The parser consists of multiple components including elementary discourse unit detector, discourse relation recognizer, discourse parse tree generator and attribution labeler. In particular, attribution labeler determines two attributions (sense and centering) for every non-terminal node in the discourse parse trees. Effective feature sets are proposed for every component respectively. Comprehensive experiments are conducted on the Connective-driven Dependency Treebank (CDTB) corpus with an overall F1 score of 20.0%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We use POS combination of the parent, left sibling and right sibling of the given comma to present the context.
- 2.
We use POS combination of the parent, left sibling and right sibling of the dominating node to represent the context. When no parent or siblings, it is marked NULL.
- 3.
References
Carlson, L., Marcu, D., Okurowski, M.E.: Building a discourse-tagged corpus in the framework of rhetorical structure theory. In: Proceedings of 2001 SIGdial Workshop on Discourse and Dialogue (2001)
Feng, V.W., Hirst, G.: Text-level discourse parsing with rich linguistic features. In: Proceedings of ACL 2012 (2012)
Huang, H.H., Chen, H.H.: An annotation system for development of Chinese discourse corpus. In: Proceedings of COLING 2012 Demonstration Papers (2012)
Huang, H.H., Chen, H.H.: Chinese discourse relation recognition. In: Proceedings of IJCNLP 2011 (2011)
Huang, H.H., Chen, H.H.: Contingency and comparison relation labeling and structure prediction in Chinese sentences. In: Proceedings of 2012 Special Interest Group on Discourse and Dialogue (2012)
Kong, F., Ng, H.T., Zhou, G.: A constituent-based approach to argument labeling with joint inference in discourse parsing. In: Proceedings of EMNLP 2014 (2014)
Li, Y., Feng, W., Sun, J., Kong, F., Zhou, G.: Building Chinese discourse corpus with connective-driven dependency tree structure. In: Proceedings of EMNLP 2014 (2014)
Lin, Z., Ng, H.T., Kan, M.Y.: Automatically evaluating text coherence using discourse relations. In: Proceedings of ACL 2011 (2011)
Lin, Z., Ng, H.T., Kan, M.Y.: A PDTB-styled end-to-end discourse parser. Nat. Lang. Eng. 20(2), 151–184 (2014)
Meyer, T., Webber, B.: Implicitation of discourse connectives in (machine) translation. In: Proceedings of 2013 Workshop on Discourse in Machine Translation (2013)
Pitler, E., Nenkova, A.: Using syntax to disambiguate explicit discourse connectives in text. In: Proceedings of ACL-IJCNLP 2009 Short Papers (2009)
Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A., Webber, B.: The Penn Discourse TreeBank 2.0. In: Proceedings of LREC 2008 (2008)
Xue, N.: Annotating discourse connectives in the Chinese Treebank. In: Proceedings of 2005 Workshop on Frontiers in Corpus Annotations (2005)
Xue, N., Xia, F., Chiou, F.D., Palmer, M.: The Penn Chinese Treebank: phrase structure annotation of a large corpus. Nat. Lang. Eng. 11, 207–238 (2005)
Yang, Y., Xue, N.: Chinese comma disambiguation for discourse analysis. In: Proceedings of ACL 2012 (2012)
Zhou, L., Li, B., Wei, Z., Wong, K.F.: The CUHK Discourse Treebank for Chinese: annotating explicit discourse connectives for the Chinese Treebank. In: Proceedings of LREC 2014 (2014)
Zhou, Y., Xue, N.: PDTB-style discourse annotation of Chinese text. In: Proceedings of ACL 2012 (2012)
Zhou, Y., Xue, N.: The Chinese Discourse Treebank: a Chinese corpus annotated with discourse relations. Lang. Resour. Eval. 49(2), 397–431 (2015)
Acknowledgements
This research is supported by Key project 61333018 under the National Natural Science Foundation of China, Project 61472264 and 61402314 under the National Natural Science Foundation of China.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Kong, F., Wang, H., Zhou, G. (2016). A CDT-Styled End-to-End Chinese Discourse Parser. In: Lin, CY., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds) Natural Language Understanding and Intelligent Applications. ICCPOL NLPCC 2016 2016. Lecture Notes in Computer Science(), vol 10102. Springer, Cham. https://doi.org/10.1007/978-3-319-50496-4_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-50496-4_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50495-7
Online ISBN: 978-3-319-50496-4
eBook Packages: Computer ScienceComputer Science (R0)