Robust discourse parsing via discourse markers, topicality and position

FRANK SCHILDER

doi:10.1017/S1351324902002905

Abstract

This paper describes a simple discourse parsing and analysis algorithm that combines a formal underspecification utilising discourse grammar with Information Retrieval (IR) techniques. First, linguistic knowledge based on discourse markers is used to constrain a totally underspecified discourse representation. Then, the remaining underspecification is further specified by the computation of a topicality score for every discourse unit. This computation is done via the vector space model. Finally, the sentences in a prominent position (e.g. the first sentence of a paragraph) are given an adjusted topicality score. The proposed algorithm was evaluated by applying it to a text summarisation task. Results from a psycholinguistic experiment, indicating the most salient sentences for a given text as the ‘gold standard’, show that the algorithm performs better than commonly used machine learning and statistical approaches to summarisation.

Crossref Citations

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Taboada, Maite and Mann, William C. 2006. Applications of Rhetorical Structure Theory. Discourse Studies, Vol. 8, Issue. 4, p. 567.

Taboada, Maite and Mann, William C. 2006. Rhetorical Structure Theory: looking back and moving ahead. Discourse Studies, Vol. 8, Issue. 3, p. 423.

Egg, Markus 2010. Semantic Underspecification. Language and Linguistics Compass, Vol. 4, Issue. 3, p. 166.

Suwandaratna, N and Perera, U 2010. Discourse marker based topic identification and search results refining. p. 119.

Taboada, Maite Brooke, Julian Tofiloski, Milan Voll, Kimberly and Stede, Manfred 2011. Lexicon-Based Methods for Sentiment Analysis. Computational Linguistics, Vol. 37, Issue. 2, p. 267.

2012. Discourse Processing.

WEBBER, B. EGG, M. and KORDONI, V. 2012. Discourse structure and language technology. Natural Language Engineering, Vol. 18, Issue. 4, p. 437.

Galitsky, Boris 2014. Learning parse structure of paragraphs and its applications in search. Engineering Applications of Artificial Intelligence, Vol. 32, Issue. , p. 160.

Joty, Shafiq Carenini, Giuseppe and Ng, Raymond T. 2015. CODRA: A Novel Discriminative Framework for Rhetorical Analysis. Computational Linguistics, Vol. 41, Issue. 3, p. 385.

Das, Debopam and Taboada, Maite 2018. RST Signalling Corpus: a corpus of signals of coherence relations. Language Resources and Evaluation, Vol. 52, Issue. 1, p. 149.

Das, Debopam and Taboada, Maite 2018. Signalling of Coherence Relations in Discourse, Beyond Discourse Markers. Discourse Processes, Vol. 55, Issue. 8, p. 743.

Pisarevskaya, Dina Kobozeva, Maria Petukhova, Yulia Sedov, Sergey and Toldova, Svetlana 2019. Digital Transformation and Global Society. Vol. 1038, Issue. , p. 708.

Sun, Bo 2019. Information Structure Parsing for Chinese Legal Texts. International Journal of Technology and Human Interaction, Vol. 15, Issue. 1, p. 46.

Song, Wei and Liu, LiZhen 2020. Representation learning in discourse parsing: A survey. Science China Technological Sciences, Vol. 63, Issue. 10, p. 1921.

Article contents

Robust discourse parsing via discourse markers, topicality and position

Abstract

Access options

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Article contents

Robust discourse parsing via discourse markers, topicality and position

Abstract

Access options

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests