Abstract
This paper presents UDRST, an unlabeled discourse parsing system in the RST framework. UDRST consists of a segmentation model and a parsing model. The segmentation model exploits subtree features to rerank N-best outputs of a base segmenter, which uses syntactic and lexical features in a CRF framework. In the parsing model, we present two algorithms for building a discourse tree from a segmented text: an incremental algorithm and a dual decomposition algorithm. Our system achieves 77.3% in the unlabeled score on the standard test set of the RST Discourse Treebank corpus, which improves 5.0% compared to HILDA [6], a state-of-the-art discourse parsing system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bateman, J., Kleinz, J., Kamps, T., Reichenberger, K.: Towards Constructive Text, Diagram, and Layout Generation for Information Presentation. Computational Linguistics 27(3), 409–449 (2001)
Carlson, L., Marcu, D., Okurowski, M.E.: RST Discourse Treebank. Linguistic Data Consortium, LDC (2002)
Collins, M., Koo, T.: Discriminative Reranking for Natural Language Parsing. Computational Linguistics 31(1), 25–70 (2005)
Hanamoto, A., Matsuzaki, T., Tsujii, J.: Coordination Structure Analysis using Dual Decomposition. In: Proceedings of EACL, pp. 430–438 (2012)
Hernault, H., Piwek, P., Prendinger, H., Ishizuka, M.: Generating Dialogues for Virtual Agents Using Nested Textual Coherence Relations. In: Prendinger, H., Lester, J.C., Ishizuka, M. (eds.) IVA 2008. LNCS (LNAI), vol. 5208, pp. 139–145. Springer, Heidelberg (2008)
Hernault, H., Prendinger, H.A., Du Verle, D., Ishizuka, M.: HILDA: A Discourse Parser Using Support Vector Machine Classification. Dialogue and Discourse 1(3), 1–33 (2010)
Hernault, H., Bollegala, D., Ishizuka, M.: A Sequential Model for Discourse Segmentation. In: Gelbukh, A. (ed.) CICLing 2010. LNCS, vol. 6008, pp. 315–326. Springer, Heidelberg (2010)
Klein, D., Manning, C.: Accurate Unlexicalized Parsing. In: Proceedings of ACL, pp. 423–430 (2003)
Koo, T., Rush, A.M., Collins, M., Jaakkola, T., Sontag, D.: Dual Decomposition for Parsing with Non-Projective Head Automata. In: Proceedings of EMNLP, pp. 1288–1298 (2010)
Kudo, T.: CRF++: Yet Another CRF toolkit, http://crfpp.sourceforge.net/
Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of ICML, pp. 282–289 (2001)
Louis, A., Joshi, A., Nenkova, A.: Discourse indicators for content selection in summarization. In: Proceedings of SIGDIAL, pp. 147–156 (2010)
Mann, W.C., Thompson, S.A.: Rhetorical Structure Theory. Toward a Functional Theory of Text Organization. Text 8, 243–281 (1988)
Marcu, D.: The Theory and Practice of Discourse Parsing and Summarization. MIT Press, Cambridge (2000)
Rush, A.M., Sontag, D., Collins, M., Tommi, J.: On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing. In: Proceedings of EMNLP, pp. 1–11 (2010)
Rush, A.M., Collins, M.: A Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference in Natural Language Processing. Tutorial at ACL (2011)
Sagae, K.: Analysis of Discourse Structure with Syntactic Dependencies and Data-Driven Shift-Reduce Parsing. In: Proceedings of IWPT, pp. 81–84 (2009)
Soricut, R., Marcu, D.: Sentence Level Discourse Parsing using Syntactic and Lexical Information. In: Proceedings of NAACL, pp. 149–156 (2003)
Subba, R., Di Eugenio, B.: Automatic Discourse Segmentation using Neural Networks. In: Proceedings of SemDial, pp. 189–190 (2007)
Thanh, H.L., Abeysinghe, G., Huyck, C.: Generating Discourse Structures for Written Texts. In: Proceedings of COLING, pp. 329–335 (2004)
Zirn, C., Niepert, M., Stuckenschmidt, H., Strube, M.: Fine-Grained Sentiment Analysis with Structural Features. In: Proceedings of IJCNLP, pp. 336–344 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xuan Bach, N., Le Minh, N., Shimazu, A. (2012). UDRST: A Novel System for Unlabeled Discourse Parsing in the RST Framework. In: Isahara, H., Kanzaki, K. (eds) Advances in Natural Language Processing. JapTAL 2012. Lecture Notes in Computer Science(), vol 7614. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33983-7_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-33983-7_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33982-0
Online ISBN: 978-3-642-33983-7
eBook Packages: Computer ScienceComputer Science (R0)