Skip to main content

UDRST: A Novel System for Unlabeled Discourse Parsing in the RST Framework

  • Conference paper
Advances in Natural Language Processing (JapTAL 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7614))

Included in the following conference series:

Abstract

This paper presents UDRST, an unlabeled discourse parsing system in the RST framework. UDRST consists of a segmentation model and a parsing model. The segmentation model exploits subtree features to rerank N-best outputs of a base segmenter, which uses syntactic and lexical features in a CRF framework. In the parsing model, we present two algorithms for building a discourse tree from a segmented text: an incremental algorithm and a dual decomposition algorithm. Our system achieves 77.3% in the unlabeled score on the standard test set of the RST Discourse Treebank corpus, which improves 5.0% compared to HILDA [6], a state-of-the-art discourse parsing system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bateman, J., Kleinz, J., Kamps, T., Reichenberger, K.: Towards Constructive Text, Diagram, and Layout Generation for Information Presentation. Computational Linguistics 27(3), 409–449 (2001)

    Article  Google Scholar 

  2. Carlson, L., Marcu, D., Okurowski, M.E.: RST Discourse Treebank. Linguistic Data Consortium, LDC (2002)

    Google Scholar 

  3. Collins, M., Koo, T.: Discriminative Reranking for Natural Language Parsing. Computational Linguistics 31(1), 25–70 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  4. Hanamoto, A., Matsuzaki, T., Tsujii, J.: Coordination Structure Analysis using Dual Decomposition. In: Proceedings of EACL, pp. 430–438 (2012)

    Google Scholar 

  5. Hernault, H., Piwek, P., Prendinger, H., Ishizuka, M.: Generating Dialogues for Virtual Agents Using Nested Textual Coherence Relations. In: Prendinger, H., Lester, J.C., Ishizuka, M. (eds.) IVA 2008. LNCS (LNAI), vol. 5208, pp. 139–145. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  6. Hernault, H., Prendinger, H.A., Du Verle, D., Ishizuka, M.: HILDA: A Discourse Parser Using Support Vector Machine Classification. Dialogue and Discourse 1(3), 1–33 (2010)

    Google Scholar 

  7. Hernault, H., Bollegala, D., Ishizuka, M.: A Sequential Model for Discourse Segmentation. In: Gelbukh, A. (ed.) CICLing 2010. LNCS, vol. 6008, pp. 315–326. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  8. Klein, D., Manning, C.: Accurate Unlexicalized Parsing. In: Proceedings of ACL, pp. 423–430 (2003)

    Google Scholar 

  9. Koo, T., Rush, A.M., Collins, M., Jaakkola, T., Sontag, D.: Dual Decomposition for Parsing with Non-Projective Head Automata. In: Proceedings of EMNLP, pp. 1288–1298 (2010)

    Google Scholar 

  10. Kudo, T.: CRF++: Yet Another CRF toolkit, http://crfpp.sourceforge.net/

  11. Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of ICML, pp. 282–289 (2001)

    Google Scholar 

  12. Louis, A., Joshi, A., Nenkova, A.: Discourse indicators for content selection in summarization. In: Proceedings of SIGDIAL, pp. 147–156 (2010)

    Google Scholar 

  13. Mann, W.C., Thompson, S.A.: Rhetorical Structure Theory. Toward a Functional Theory of Text Organization. Text 8, 243–281 (1988)

    Google Scholar 

  14. Marcu, D.: The Theory and Practice of Discourse Parsing and Summarization. MIT Press, Cambridge (2000)

    MATH  Google Scholar 

  15. Rush, A.M., Sontag, D., Collins, M., Tommi, J.: On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing. In: Proceedings of EMNLP, pp. 1–11 (2010)

    Google Scholar 

  16. Rush, A.M., Collins, M.: A Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference in Natural Language Processing. Tutorial at ACL (2011)

    Google Scholar 

  17. Sagae, K.: Analysis of Discourse Structure with Syntactic Dependencies and Data-Driven Shift-Reduce Parsing. In: Proceedings of IWPT, pp. 81–84 (2009)

    Google Scholar 

  18. Soricut, R., Marcu, D.: Sentence Level Discourse Parsing using Syntactic and Lexical Information. In: Proceedings of NAACL, pp. 149–156 (2003)

    Google Scholar 

  19. Subba, R., Di Eugenio, B.: Automatic Discourse Segmentation using Neural Networks. In: Proceedings of SemDial, pp. 189–190 (2007)

    Google Scholar 

  20. Thanh, H.L., Abeysinghe, G., Huyck, C.: Generating Discourse Structures for Written Texts. In: Proceedings of COLING, pp. 329–335 (2004)

    Google Scholar 

  21. Zirn, C., Niepert, M., Stuckenschmidt, H., Strube, M.: Fine-Grained Sentiment Analysis with Structural Features. In: Proceedings of IJCNLP, pp. 336–344 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xuan Bach, N., Le Minh, N., Shimazu, A. (2012). UDRST: A Novel System for Unlabeled Discourse Parsing in the RST Framework. In: Isahara, H., Kanzaki, K. (eds) Advances in Natural Language Processing. JapTAL 2012. Lecture Notes in Computer Science(), vol 7614. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33983-7_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33983-7_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33982-0

  • Online ISBN: 978-3-642-33983-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics