Abstract
Discourse parsing aims to identify the relationship between different discourse units, where most previous works focus on recovering the constituency structure among discourse units with carefully designed features. In this paper, we propose to exploit Long Short Term Memory (LSTM) to properly represent discourse units, while using as few feature engineering as possible. Our transition based parsing model features a multilayer stack LSTM framework to discover the dependency structures among different units. Experiments on RST Discourse Treebank show that our model can outperform traditional feature based systems in terms of dependency structures, without complicated feature design. When evaluated in discourse constituency, our parser can also achieve promising performance compared to the state-of-the-art constituency discourse parsers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
In Table 2 row ID 1 and row ID 2, we use results of [23] when they do experiment with their whole feature set 1 and feature set 2. We list their two feature sets below:
(1) WORD: The first one word, the last one word, and the first bigrams in each EDU, the pair of the two first words and the pair of the two last words in the two EDUs are extracted as features.
(2) POS: The first one and two POS tags in each EDU, and the pair of the two first POS tags in the two EDUs are extracted as features.
References
Abney, S.P., Johnson, M.: Memory requirements and local ambiguities of parsing strategies. J. Psycholinguist. Res. 20, 233–250 (1991)
Ballesteros, M., Dyer, C., Smith, N.A.: Improved transition-based parsing by modeling characters instead of words with LSTMs. In: EMNLP 2015, Lisbon, Portugal, pp. 349–359 (2015)
Carlson, L., Marcu, D., Okurowski, M.E.: Building a discourse-tagged corpus in the framework of rhetorical structure theory. In: SIGdial Workshop, pp. 1–10 (2001)
Chorowski, J., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based models for speech recognition (2015). CoRR arXiv:1506.07503
Chung, J., Gülçehre, Ç., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling (2014). CoRR arXiv:1412.3555
Collins, M., Roark, B.: Incremental parsing with the perceptron algorithm. In: Proceedings of 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, 21–26 July 2004, pp. 111–118 (2004)
Dyer, C., Ballesteros, M., Ling, W., Matthews, A., Smith, N.A.: Transition-based dependency parsing with stack long short-term memory. In: ACL 2015, Beijing, vol. 1, pp. 334–343 (2015)
Eisner, J.: Three new probabilistic models for dependency parsing: an exploration. In: COLING 1996, 5–9 August 1996, pp. 340–345 (1996)
Eyben, F., Böck, S., Schuller, B.W., Graves, A.: Universal onset detection with bidirectional long short-term memory neural networks. In: ISMIR 2010, Utrecht, Netherlands, 9–13 August 2010, pp. 589–594 (2010)
Feng, V.W., Hirst, G.: A linear-time bottom-up discourse parser with constraints and post-editing. In: ACL 2014, Baltimore, MD, USA, vol. 1, pp. 511–521 (2014)
Ferrucci, D.A., Brown, E.W., Chu-Carroll, J., Fan, J., Gondek, D., Kalyanpur, A., Lally, A., Murdock, J.W., Nyberg, E., Prager, J.M., Schlaefer, N., Welty, C.A.: Building Watson: an overview of the DeepQA project. AI Mag. 31(3), 59–79 (2010)
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: AISTATS 2011, Fort Lauderdale, USA, 11–13 April 2011, pp. 315–323 (2011)
Graves, A., Jaitly, N., Mohamed, A.: Hybrid speech recognition with deep bidirectional LSTM. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, Olomouc, Czech Republic, 8–12 December 2013, pp. 273–278 (2013)
Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)
Hahn, U.: The theory and practice of discourse parsing and summarization by Daniel Marcu. Comput. Linguist. 28(1), 81–83 (2002)
Hernault, H., Prendinger, H., duVerle, D.A., Ishizuka, M.: HILDA: a discourse parser using support vector machine classification. D&D 1(3), 1–33 (2010)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Ji, Y., Eisenstein, J.: One vector is not enough: entity-augmented distributed semantics for discourse relations. TACL 3, 329–344 (2015)
Joty, S.R., Carenini, G., Ng, R.T., Mehdad, Y.: Combining intra- and multi-sentential rhetorical parsing for document-level discourse analysis. In: ACL 2013, Sofia, Bulgaria, vol. 1, pp. 486–496 (2013)
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: ICML 2014, Beijing, China, 21–26 June 2014, pp. 1188–1196 (2014)
LeThanh, H.: Generating discourse structures for written texts. In: Proceedings of 20th International Conference on Computational Linguistics, pp. 329–335 (2004)
Li, J., Li, R., Hovy, E.H.: Recursive deep models for discourse parsing. In: EMNLP 2014, Doha, Qatar, 25–29 October 2014, pp. 2061–2069 (2014)
Li, S., Wang, L., Cao, Z., Li, W.: Text-level discourse dependency parsing. In: ACL 2014, Baltimore (vol. 1: Long Papers), pp. 25–35 (2014)
Louis, A., Joshi, A.K., Nenkova, A.: Discourse indicators for content selection in summarization. In: SIGDIAL 2010 Conference, Tokyo, Japan, pp. 147–156 (2010)
Mann, W., Thompson, S.: Rhetorical structure theory: toward a functional theory of text organization. Text-Interdiscip. J. Study Discourse 8, 243–281 (1988)
McDonald, R.T., Crammer, K., Pereira, F.C.N.: Online large-margin training of dependency parsers. In: ACL 2005, University of Michigan, USA (2005)
Nivre, J., Scholz, M.: Deterministic dependency parsing of english text. In: COLING 2004, Geneva, Switzerland, 23–27 August 2004 (2004)
Socher, R., Karpathy, A., Le, Q.V., Manning, C.D., Ng, A.Y.: Grounded compositional semantics for finding and describing images with sentences. TACL 2, 207–218 (2014)
Soricut, R., Marcu, D.: Sentence level discourse parsing using syntactic and lexical information. In: HLT-NAACL (2003)
Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. In: ACL 2015, Beijing, pp. 1556–1566 (2015)
Voll, K., Taboada, M.: Not all words are created equal: extracting semantic orientation as a function of adjective relevance. In: Orgun, M.A., Thornton, J. (eds.) AI 2007. LNCS (LNAI), vol. 4830, pp. 337–346. Springer, Heidelberg (2007). doi:10.1007/978-3-540-76928-6_35
Acknowledgments
We would like to thank Sujian Li, Liang Wang, and the anonymous reviewers for their helpful feedback. This work is supported by National High Technology R&D Program of China (Grant Nos. 2015AA015403, 2014AA015102), Natural Science Foundation of China (Grant Nos. 61202233, 61272344, 61370055) and the joint project with IBM Research. For any correspondence, please contact Yansong Feng.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Jia, Y., Feng, Y., Luo, B., Ye, Y., Liu, T., Zhao, D. (2016). Transition-Based Discourse Parsing with Multilayer Stack Long Short Term Memory. In: Lin, CY., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds) Natural Language Understanding and Intelligent Applications. ICCPOL NLPCC 2016 2016. Lecture Notes in Computer Science(), vol 10102. Springer, Cham. https://doi.org/10.1007/978-3-319-50496-4_30
Download citation
DOI: https://doi.org/10.1007/978-3-319-50496-4_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50495-7
Online ISBN: 978-3-319-50496-4
eBook Packages: Computer ScienceComputer Science (R0)