Abstract
Descriptive approaches to discourse (text) structure and coherence typically proceed either in a bottom-up or a top-down analytic way. The former ones analyze how the smallest discourse units (clauses, sentences) are connected in their closest neighbourhood, locally, in a linear way. The latter ones postulate a hierarchical organization of smaller and larger units, sometimes also represent the whole text as a tree-like graph. In the present study, we mine a Czech corpus of 50k sentences annotated in the local coherence fashion (Penn Discourse Treebank style) for indices signalling higher discourse structure. We analyze patterns of overlapping discourse relations and look into hierarchies they form. The types and distributions of the detected patterns correspond to the results for English local annotation, with patterns not complying with the tree-like interpretation at very low numbers. We also detect hierarchical organization of local discourse relations of up to 5 levels in the Czech data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
In this study, we use the term discourse relations, according to the Penn Discourse Treebank’s terminology.
- 2.
- 3.
Basically a constituency tree, which is in its nature projective and does not allow crossing edges, in comparison to the basic mathematical definition of a tree graph.
- 4.
Due to the limited range of this paper, we only compare our results to theirs for discourse relations. The implications for syntax (level of complexity) is not explicitly discussed.
- 5.
Technically, the annotation is not carried out on raw texts, but on top of the syntactic trees.
- 6.
We have obtained so much data that we must only select certain aspects for this study. We therefore concentrate on the patterns studied by Lee et al., and on hierarchical structuring of discourse relations.
- 7.
Due to space limit we only present the English translations of the PDT Czech originals here. Relation 1 is highlighted in italics, relation 2 in bold. The connectives are underlined.
- 8.
In the representation in Example 2, the clause This means that is not in italics, not a part of any argument of the left relation.
- 9.
The “also-not” connective is originally in Czech ani, in the meaning of neither. Lit. translation: “Neither here is_concerned a small portion...”.
- 10.
This also explains the zeros in Table 2.
- 11.
And the more so, as we do not include implicit and entity-based relations into our study.
References
Hajič, J., et al.: Prague Dependency Treebank 3.5. Data/software. Institute of Formal and Applied Linguistics, Charles University, LINDAT/CLARIN PID (2018). http://hdl.handle.net/11234/1-2621
Carlson, L., Okurowski, M.E., Marcu, D.: RST Discourse Treebank. Linguistic Data Consortium, University of Pennsylvania (2002)
Egg, M., Redeker, G.: How complex is discourse structure? In: Proceedings of LREC 2010, Malta, pp. 619–1623 (2010)
Feng, V.W., Lin, Z., Hirst, G.: The impact of deep hierarchical discourse structures in the evaluation of text coherence. In: Proceedings of COLING, pp. 940–949 (2014)
Lee, A., Prasad, R., Joshi, A., Dinesh, N.: Complexity of dependencies in discourse: are dependencies in discourse more complex than in syntax? In: Proceedings of the TLT 2006, Prague, Czech Republic, pp. 79–90 (2006)
Lee, A., Prasad, R., Joshi, A., Webber, B.: Departures from tree structures in discourse: shared arguments in the Penn Discourse Treebank. In: Proceedings of the Constraints in Discourse III Workshop, pp. 61–68 (2008)
Lin, Z., Ng, H.T., Kan, M.Y.: Automatically evaluating text coherence using discourse relations. In: Proceedings of the 49th Annual Meeting of the ACL: Human Language Technologies-Volume 1, pp. 997–1006 (2011)
Mann, W.C., Thompson, S.A.: Rhetorical structure theory: toward a functional theory of text organization. Text-Interdiscip. J. Study Discourse 8(3), 243–281 (1988)
Marcu, D.: The Theory and Practice of Discourse Parsing and Summarization. MIT Press, Cambridge (2000)
Poláková, L., Mírovský, J., Synková, P.: Signalling implicit relations: a PDTB-RST comparison. Dialogue Discourse 8(2), 225–248 (2017)
Poláková, L., Mírovský, J.: Anaphoric connectives and long-distance discourse relations in Czech. Computación y Sistemas 23(3), 711–717 (2019)
Prasad, R., Dinesh, N., Lee, A., et al.: The Penn discourse treebank 2.0. In: Proceedings of LREC 2008, Morocco, pp. 2961–2968 (2008)
Prasad, R., Joshi, A., Webber, B.: Exploiting scope for shallow discourse parsing. In: Proceedings of LREC 2010, Malta, pp. 2076–2083 (2010)
Stede, M., Neumann, A.: Potsdam commentary corpus 2.0: annotation for discourse research. In: Proceedings of LREC 2014, pp. 925–929 (2014)
Scheffler, T., Stede, M.: Mapping PDTB-style connective annotation to RST-style discourse annotation. In: Proceedings of KONVENS 2016, pp. 242–247 (2016)
Taboada, M., Mann, W.C.: Rhetorical structure theory: looking back and moving ahead. Discourse Stud. 8(3), 423–459 (2006)
Wolf, F., Gibson, E.: Representing discourse coherence: a corpus-based study. Comput. Linguist. 31(2), 249–287 (2005)
Wolf, F., Gibson, E., Fisher, A., Knight, M.: Discourse Graphbank, LDC2005T08 [Corpus]. Linguistic Data Consortium, Philadelphia (2005)
Acknowledgments
The authors gratefully acknowledge support from the Grant Agency of the Czech Republic, project no. 20-09853S. The work described herein has been using resources provided by the LINDAT/CLARIAH-CZ Research Infrastructure, supported by the Ministry of Education, Youth and Sports of the Czech Republic (project no. LM2018101).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Poláková, L., Mírovský, J. (2020). Mining Local Discourse Annotation for Features of Global Discourse Structure. In: Sojka, P., Kopeček, I., Pala, K., Horák, A. (eds) Text, Speech, and Dialogue. TSD 2020. Lecture Notes in Computer Science(), vol 12284. Springer, Cham. https://doi.org/10.1007/978-3-030-58323-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-58323-1_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58322-4
Online ISBN: 978-3-030-58323-1
eBook Packages: Computer ScienceComputer Science (R0)