Abstract
Discourse parsing is an important research area in natural language processing (NLP), which aims to parse the discourse structure of coherent sentences. In this survey, we introduce several different kinds of discourse parsing tasks, mainly including RST-style discourse parsing, PDTB-style discourse parsing, and discourse parsing for multiparty dialogue. For these tasks, we introduce the classical and recent existing methods, especially neural network approaches. After that, we describe the applications of discourse parsing for other NLP tasks, such as machine reading comprehension and sentiment analysis. Finally, we discuss the future trends of the task.
Similar content being viewed by others
References
Jansen P, Surdeanu M, Clark P. Discourse complements lexical semantics for non-factoid answer reranking. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2014, 977–986
Narasimhan K, Barzilay R. Machine comprehension with discourse relations. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2015, 1253–1262
Bhatia P, Ji Y, Eisenstein J. Better document-level sentiment analysis from rst discourse parsing. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015, 2212–2218
Ji Y, Haffari G, Eisenstein J. A latent variable recurrent neural network for discourse-driven language models. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. June 2016, 332–342
Meyer T, Popescu-Belis A. Using sense-labeled discourse connectives for statistical machine translation. In: Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra). 2012, 129–138
Ji Y, Smith N A. Neural discourse structure for text categorization. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017, 996–1005
Mann W C, Thompson S A. Rhetorical structure theory: Toward a functional theory of text organization. Text-Interdisciplinary Journal for the Study of Discourse, 1988, 8(3): 243–281
Carlson L, Marcu D, Okurowski M E. Building a discourse-tagged corpus in the framework of rhetorical structure theory. Springer, 2003
Wolf F, Gibson E, Fisher A, Knight M. Discourse graphbank. Linguistic Data Consortium. Philadelphia, 2004
Prasad R, Dinesh N, Lee A, Miltsakaki E, Robaldo L, Joshi A K, Webber B L. The penn discourse treebank 2.0. In: LREC. 2008
Webber B. D-ltag: extending lexicalized tag to discourse. Cognitive Science, 2004, 28(5): 751–779
Afantenos S, Kow E, Asher N, Perret J. Discourse parsing for multiparty chat dialogues. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015, 928–937
Perret J, Afantenos S, Asher N, Morey M. Integer linear programming for discourse parsing. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016, 99–109
Shi Z, Huang M. A deep sequential model for discourse parsing on multi-party dialogues. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 7007–7014
Ji Y, Eisenstein J. Representation learning for text-level discourse parsing. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2014, 13–24
Webber B, Prasad R, Lee A, Joshi A. The penn discourse treebank 3.0 annotation manual. Philadelphia, University of Pennsylvania, 2019
Lin Z, Ng H T, Kan M Y. A pdtb-styled end-to-end discourse parser. Natural Language Engineering, 2014, 20(2): 151–184
Pitler E, Louis A, Nenkova A. Automatic sense prediction for implicit discourse relations in text. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2. 2009, 683–691
Hong Y, Zhou X, Che T, Yao J, Zhu Q, Zhou G. Cross-argument inference for implicit discourse relation recognition. In: Proceedings of the 21st ACM international conference on Information and knowledge management. 2012, 295–304
Rehbein I, Scholman M, Demberg V. Annotating discourse relations in spoken language: A comparison of the PDTB and CCR frameworks. LREC, 2016
Asher N, Hunter J, Morey M, Benamara F, Afantenos S. Discourse structure and dialogue acts in multiparty dialogue: The STAC corpus. In: Proceedings of the 10th International Conference on Language Resources and Evaluation, 2016, 2721–2727
Li J, Liu M, Kan M Y, Zheng Z, Wang Z, Lei W, Liu T, Qin B. Molweni: A challenge multiparty dialogues-based machine reading comprehension dataset with discourse structure. In: Proceedings of the 28th International Conference on Computational Linguistics. 2020, 2642–2652
Lowe R, Pow N, Serban I, Pineau J. The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems. In: Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue. 2015, 285–294
Soricut R, Marcu D. Sentence level discourse parsing using syntactic and lexical information. In: Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics. 2003, 228–235
Subba R, Di Eugenio B. Automatic discourse segmentation using neural networks. In: Proceedings of the 11th Workshop on the Semantics and Pragmatics of Dialogue. 2007, 189–190
Fisher S, Roark B. The utility of parse-derived features for automatic discourse segmentation. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. 2007, 488–495
Joty S, Carenini G, Ng R. A novel discriminative framework for sentence-level discourse analysis. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2012, 904–915
Sagae K. Analysis of discourse structure with syntactic dependencies and data-driven shift-reduce parsing. In: Proceedings of the 11th International Conference on Parsing Technologies (IWPTÂąÂŕ09). 2009, 81–84
Hernault H, Prendinger H, Ishizuka M, others. Hilda: A discourse parser using support vector machine classification. Dialogue & Discourse, 2010, 1(3)
Bach N X, Le Nguyen M, Shimazu A. A reranking model for discourse segmentation using subtree features. In: Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue. 2012, 160–168
Feng V W, Hirst G. Two-pass discourse segmentation with pairing and global features. 2014, arXiv preprint arXiv: 1407.8215
Wang Y, Li S, Yang J. Toward fast and accurate neural discourse segmentation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018, 962–967
Hirao T, Yoshida Y, Nishino M, Yasuda N, Nagata M. Single-document summarization as a tree knapsack problem. In: Proceedings of the 2013 conference on empirical methods in natural language processing. 2013, 1515–1520
Li S, Wang L, Cao Z, Li W. Text-level discourse dependency parsing. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2014, 25–35
Hayashi K, Hirao T, Nagata M. Empirical comparison of dependency conversions for rst discourse trees. In: Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue. 2016, 128–136
Surdeanu M, Hicks T, Valenzuela-Escárcega M A. Two practical rhetorical structure theory parsers. In: Proceedings of the 2015 conference of the North American chapter of the association for computational linguistics: Demonstrations. 2015, 1–5
Morey M, Muller P, Asher N. A dependency perspective on rst discourse parsing and evaluation. Computational Linguistics, 2018, 44(2): 197–235
Joty S, Carenini G, Ng R T. Codra: A novel discriminative framework for rhetorical analysis. Computational Linguistics, 2015, 41(3): 385–435
Li J, Li R, Hovy E. Recursive deep models for discourse parsing. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014, 2061–2069
Li Q, Li T, Chang B. Discourse parsing with attention-based hierarchical neural networks. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016, 362–371
Jia Y, Ye Y, Feng Y, Lai Y, Yan R, Zhao D. Modeling discourse cohesion for discourse parsing via memory network. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2018, 438–443
Yu N, Zhang M, Fu G. Transition-based neural rst parsing with implicit syntax features. In: Proceedings of the 27th International Conference on Computational Linguistics. 2018, 559–570
Jia Y, Feng Y, Ye Y, Lv C, Shi C, Zhao D. Improved discourse parsing with two-step neural transition-based model. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 2018, 17(2): 11
Braud C, Plank B, Søgaard A. Multi-view and multi-task training of rst discourse parsers. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 2016, 1903–1913
Braud C, Coavoux M, Søgaard A. Cross-lingual rst discourse parsing. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. 2017, 292–304
Pitler E, Nenkova A. Using syntax to disambiguate explicit discourse connectives in text. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers. 2009, 13–16
Li S, Kong F, Zhou G. A joint learning approach to explicit discourse parsing via structured perceptron. Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, Springer, Cham, 2014, 70–82
Marcu D, Echihabi A. An unsupervised approach to recognizing discourse relations. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. 2002, 368–375
Wang X, Li S, Li J, Li W. Implicit discourse relation recognition by selecting typical training examples. In: COLING. 2012, 2757–2772
Lan M, Xu Y, Niu Z Y. Leveraging synthetic discourse data via multitask learning for implicit discourse relation recognition. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2013, 476–485
Ji Y, Eisenstein J. One vector is not enough: Entity-augmented distributed semantics for discourse relations. Transactions of the Association for Computational Linguistics, 2015, 3: 329–344
Rutherford A T, Demberg V, Xue N. Neural network models for implicit discourse relation classification in english and chinese without surface features. 2016, arXiv preprint arXiv: 1606.01990
Braud C, Denis P. Comparing word representations for implicit discourse relation classification. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP 2015). 2015
Shi W, Demberg V. Next sentence prediction helps implicit discourse relation classification within and across domains. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019, 5794–5800
Kishimoto Y, Murawaki Y, Kurohashi S. Adapting bert to implicit discourse relation classification with a focus on discourse connectives. In: Proceedings of The 12th Language Resources and Evaluation Conference. 2020, 1152–1158
Rutherford A, Xue N. Discovering implicit discourse relations through brown cluster pair representation and coreference patterns. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. 2014, 645–654
McKeown K, Biran O. Aggregated word pair features for implicit discourse relation disambiguation. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 2013, 69–73
Lei W, Wang X, Liu M, Ilievski I, He X, Kan M Y. Swim: A simple word interaction model for implicit discourse relation recognition. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017, 4026–4032
Chen J, Zhang Q, Liu P, Qiu X, Huang X. Implicit discourse relation detection via a deep architecture with gated relevance network. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2016, 1726–1735
Chen J, Zhang Q, Liu P, Huang X. Discourse relations detection via a mixed generative-discriminative framework. In: Proceedings of Thirtieth AAAI Conference on Artificial Intelligence. 2016, 30(1)
Lei W, Xiang Y, Wang Y, Zhong Q, Liu M, Kan M Y. Linguistic properties matter for implicit discourse relation recognition: Combining semantic interaction, topic continuity and attribution. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018, 32(1)
Guo F, He R, Jin D, Dang J, Wang L, Li X. Implicit discourse relation recognition using neural tensor network with interactive attention and sparse learning. In: Proceedings of the 27th International Conference on Computational Linguistics. 2018, 547–558
Bai H, Zhao H. Deep enhanced representation for implicit discourse relation recognition. In: Proceedings of the 27th International Conference on Computational Linguistics. 2018, 571–583
Xu S, Li P, Kong F, Zhu Q, Zhou G. Topic tensor network for implicit discourse relation recognition in chinese. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019, 608–618
Liu Y, Li S, Zhang X, Sui Z. Implicit discourse relation classification via multi-task neural networks. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. 2016, 2750–2756
Rutherford A, Xue N. Improving the inference of implicit discourse relations via classifying explicit discourse connectives. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2015, 799–808
Shi W, Demberg V. Learning to explicitate connectives with seq2seq network for implicit discourse relation classification. In: Proceedings of the 13th International Conference on Computational SemanticsLong Papers. 2019, 188–199
Dai Z, Huang R. A regularization approach for incorporating event knowledge and coreference relations into neural discourse parsing. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019, 2967–2978
Guo F, He R, Dang J, Wang J. Working memory-driven neural networks with a novel knowledge enhancement paradigm for implicit discourse relation recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2020, 7822–7829
He R, Wang J, Guo F, Han Y. TransS-driven joint learning architecture for implicit discourse relation recognition. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 139–148
Verberne S, Boves L, Oostdijk N, Coppen P A. Evaluating discoursebased answer extraction for why-question answering. In: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. 2007, 735–736
Marcu D. The theory and practice of discourse parsing and summarization. MIT press, 2000
Gerani S, Mehdad Y, Carenini G, Ng R, Nejat B. Abstractive summarization of product reviews using discourse structure. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014, 1602–1613
Xu J, Gan Z, Cheng Y, Liu J. Discourse-aware neural extractive text summarization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. July 2020, 5021–5031
Meyer T. Disambiguating temporal-contrastive connectives for machine translation. In: Proceedings of the ACL 2011 Student Session. June 2011, 46–51
Meyer T, Popescu-Belis A, Zufferey S, Cartoni B. Multilingual annotation and disambiguation of discourse connectives for machine translation. In: Proceedings of Association for Computational Linguistics-Proceedings of 12th SIGdial Meeting on Discourse and Dialogue, number CONF. 2011
Chai J, Jin R. Discourse structure for context question answering. In: Proceedings of the Workshop on Pragmatics of Question Answering at HLT-NAACL 2004. 2004, 23–30
Sun M, Chai J Y. Discourse processing for context question answering based on linguistic knowledge. Knowledge-Based Systems, 2007, 20(6): 511–526
Sachan M, Dubey K, Xing E, Richardson M. Learning answerentailing structures for machine comprehension. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2015, 239–249
Kraus M, Feuerriegel S. Sentiment analysis based on rhetorical structure theory: Learning deep neural networks from discourse trees. Expert Systems with Applications, 2019, 118: 65–79
Louis A, Joshi A, Nenkova A. Discourse indicators for content selection in summarization. In: Proceedings of the SIGDIAL 2010 Conference. 2010, 147–156
Yoshida Y, Suzuki J, Hirao T, Nagata M. Dependency-based discourse parser for single-document summarization. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014, 1834–1839
Durrett G, Berg-Kirkpatrick T, Klein D. Learning-based singledocument summarization with compression and anaphoricity constraints. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2016, 1998–2008
Li J J, Thadani K, Stent A. The role of discourse units in near-extractive summarization. In: Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue. 2016, 137–147
Liu Z, Chen N. Exploiting discourse-level segmentation for extractive summarization. In: Proceedings of the 2nd Workshop on New Frontiers in Summarization. 2019, 116–121
Haenelt K. Towards a quality improvement in machine translation: Modelling discourse structure and including discourse development in the determination of translation equivalents. In: Proceedings of the 4th International Conference on Theoretical and Methodological Issues in Machine Translation. Mor-ristown: Association for Computational Linguiscs. 1992, 205–212
Mitkov R. How could rhetorical relations be used in machine translation? In: Proceedings of Intentionality and structure in discourse relations. 1993
Wang Y, Che W, Guo J, Liu T. A neural transition-based approach for semantic dependency graph parsing. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2018, 32(1)
Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L. Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018, 2227–2237
Devlin J, Chang M W, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019, 4171–4186
Acknowledgements
The research in this article is supported by the Science and Technology Innovation 2030 -“New Generation Artificial Intelligence” Major Project (2018AA0101901), the National Key Research and Development Project (2018YFB1005103), the National Natural Science Foundation of China (Grant Nos. 61772156 and 61976073), Shenzhen Foundational Research Funding (JCYJ20200109113441941), and the Foundation of Heilongjiang Province (F2018013).
Author information
Authors and Affiliations
Corresponding author
Additional information
Jiaqi Li received the BS degree from the School of Computer Science and Technology, Heilongjiang University, China in 2015. He is currently working toward the PhD degree in the Harbin Institute of Technology, China. His research interests include discourse parsing for multiparty dialogues and its applications.
Ming Liu received the PhD degree from the School of Computer Science and Technology, Harbin Institute of Technology, China in 2010. He is a full professor/PhD supervisor of the Department of Computer Science, and the faculty member of Social Computing and Information Retrieval (HIT-SCIR), Harbin Institute of Technology, China. His research interests include knowledge graph, machine reading comprehension.
Bing Qin received the PhD degree from the School of Computer Science and Technology, Harbin Institute of Technology, China in 2005. She is a full professor of the Department of Computer Science, and the director of the Research Center for Social Computing and Information Retrieval (HIT-SCIR), Harbin Institute of Technology, China. Her research interests include natural language processing, information extraction, document-level discourse analysis, and sentiment analysis.
Ting Liu received the PhD degree from the Department of Computer Science, Harbin Institute of Technology, China in 1998. He is a full professor of the School of Computer Science and Technology, and the director of Faculty of Computing, Harbin Institute of Technology, China. His research interests include information retrieval, natural language processing, and social media analysis.
Rights and permissions
About this article
Cite this article
Li, J., Liu, M., Qin, B. et al. A survey of discourse parsing. Front. Comput. Sci. 16, 165329 (2022). https://doi.org/10.1007/s11704-021-0500-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11704-021-0500-z