Abstract
Discourse parsing of complex text types such as scientific research articles requires the analysis of an input document on linguistic and structural levels that go beyond traditionally employed lexical discourse markers. This chapter describes a text-technological approach to discourse parsing. Discourse parsing with the aim of providing a discourse structure is seen as the addition of a new annotation layer for input documents marked up on several linguistic annotation levels. The discourse parser generates discourse structures according to the Rhetorical Structure Theory. An overview of the knowledge sources and components for parsing scientific journal articles is given. The parser’s core consists of cascaded applications of the GAP, a Generic Annotation Parser. Details of the chart parsing algorithm are provided, as well as a short evaluation in terms of comparisons with reference annotations from our corpus and with recently developed systems with a similar task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Allen, J.: Natural Language Understanding, 2nd edn. Benjamin/Cummings, Redwood City (1994)
Asher, N., Lascarides, A.: Logics of Conversation. Cambridge University Press, Cambridge (2003)
Asher, N., Vieu, L.: Subordinating and coordinating discourse relations. Lingua 115(4), 591–610 (2005)
Bärenfänger, M., Hilbert, M., Lobin, H., Lüngen, H., Puskàs, C.: Cues and constraints for the relational discourse analysis of complex text types - the role of logical and generic document structure. In: Sidner, C., Harpur, J., Benz, A., Kühnlein, P. (eds.) Proceedings of the Workshop on Constraints in Discourse, National University of Ireland, Maynooth, Ireland, pp. 27–34. (2006)
Bärenfänger, M., Goecke, D., Hilbert, M., Lüngen, H., Stührenberg, M.: Anaphora as an indicator of elaboration: A corpus study. JLCL - Journal for Language Technology and Computational Linguistics, 49–72 (2008)
Bärenfänger, M., Lobin, H., Lüngen, H., Hilbert, M.: OWL ontologies as a resource for discourse parsing. LDV-Forum GLDV-Journal for Computational Linguistics and Language Technology 23(2), 17–26 (2008)
Carlson, L., Marcu, D., Okurowski, M.E.: RST discourse treebank (2002), http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002T07 (visited 20.01.2009), Linguistic Data Consortium
Corston-Oliver, S.H.: Identifying the linguistic correlates of rhetorical relations. In: Proceedings of the ACL Workshop on Discourse Relations and Discourse Markers, pp. 8–14 (1998)
Cramer, I., Finthammer, M.: An evaluation procedure for word net based lexical chaining: Methods and issues. In: Proceedings of the Global WordNet Conference 2008, Szeged, Hungary (2008)
Diewald, N., Stührenberg, M., Garbar, A., Goecke, D.: Serengeti – Webbasierte Annotation semantischer Relationen. JLCL - Journal for Language Technology and Computational Linguistics, 74–94 (2008)
Earley, J.: An efficient context-free parsing algorithm. Communications of the Association for Computing Machinery 13(2), 94–102 (1970)
Egg, M., Redeker, G.: Underspecified discourse representation. In: Benz, A., Kühnlein, P. (eds.) Constraints in Discourse, Pragmatics & Beyond, Benjamins, Amsterdam, pp. 117–138 (2008)
Green, S.J.: Lexical semantics and automatic hypertext construction. ACM Computing Surveys 31(4) (1999)
Hanneforth, T., Heintze, S., Stede, M.: Rhetorical parsing with underspecification and forests. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT/NAACL), Edmonton, Canada (2003)
Hearst, M.A.: TextTiling: A quantitative appraoch to discourse segmentation. Technical Report UCB:S2K-93-24 (1993), http://people.ischool.berkeley.edu/hearst/tiling-about.html (visited 20.01.2009)
Hellwig, P.: Parsing natürlicher Sprachen: Grundlagen und Parsing natürlicher Sprachen: Realisierungen. In: Bátori, I.S., Lenders, W., Putschke, W. (eds.) Computational Linguistics. An International Handbook on Computer Oriented Language Research and Applications, Handbücher zur Sprach- und Kommunikationswissenschaft, de Gruyter, Berlin, pp. 348–431 (1989)
Hilbert, M., Lüngen, H.: RST-HP - Annotation of rhetorical structures in SemDok. Interne Reports der DFG-Forschergruppe 437 “Texttechnologische Informationsmodellierung”, Justus-Liebig-Universität Gießen, Fachgebiet ASCL (2009)
Hilbert, M., Lüngen, H., Bärenfänger, M., Lobin, H.: Demonstration des SemDok-Textparsers. In: Storrer, A., Geyken, A., Siebert, A., Würzner, K.M. (eds.) Proceedings of the 9th Conference on Natural Language Processing (KONVENS 2008), pp. 22–28. Ergänzungsband Textressourcen und lexikalisches Wissen, Berlin (2008)
Le Thanh, H.: An approach in automatically generating discourse structure of text. Journal of Computer Science and Cybernetics, Vietnam 23(3), 212–230 (2007)
Le Thanh, H., Abeysinghe, G.: A study to improve the efficiency of a discourse parsing system. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 104–117. Springer, Heidelberg (2003)
Le Thanh, H., Abeysinghe, G., Huyck, C.: Using cohesive devices to recognize rhetorical relations in text. In: Proceedings of the 4th Computational Linguistics UK Research Colloquium (CLUK-4). University of Edinburgh, UK (2003)
Le Thanh, H., Abeysinghe, G., Huyck, C.: Automated discourse segmentation by syntactic information and cue phrases. In: Proceedings of the IASTED International Conference on Artificial Intelligence and Applications (AIA 2004), Innsbruck, Austria (2004)
Le Thanh, H., Abeysinghe, G., Huyck, C.: Generating discourse structures for written texts. In: Proceedings of COLING 2004, Geneva, Switzerland (2004)
Lenz, E.A., Lüngen, H.: Dokumentation der Annotationsschicht: Logische Dokumentstruktur. Internal Report, Universität Dortmund, Institut für deutsche Sprache und Literatur/ Justus-Liebig-Universität Gießen, Fachgebiet ASCL (2004), http://www.uni-dortmund.de/hytex/hytex/publikationen.html
Lüngen, H., Puskás, C., Bärenfänger, M., Hilbert, M., Lobin, H.: Discourse segmentation of german written texts. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 245–256. Springer, Heidelberg (2006)
Lüngen, H., Kunze, C., Lemnitzer, L., Storrer, A.: Towards an integrated OWL model for domain-specific and general language wordnets. In: Proceedings of the Fourth Global WordNet Conference (GWC 2008), Szeged, Hungary, pp. 281–296 (2008)
Lüngen, H., Bärenfänger, M., Hilbert, M., Lobin, H., Puskàs, C.: Discourse relations and document structure. In: Metzing, D., Witt, A. (eds.) Linguistic Modeling of Information and Markup Languages. Contributions to Language Technology, Text, Speech and Language Technology. Springer, Dordrecht (2010)
Magerman, D.M., Marcus, M.P.: Pearl: A probabilistic chart parser. In: Proceedings of the European ACL Conference, pp. 40–47 (1991)
Mann, W.C., Thompson, S.A.: Rhetorical Structure Theory: Toward a functional theory of text organisation. Text 8(3), 243–281 (1988)
Marcu, D.: The rhetorical parsing, summarization, and generation of natural language texts. PhD thesis, University of Toronto (1997)
Marcu, D.: The Theory and Practice of Discourse Parsing and Summarization. MIT Press, Cambridge (2000)
Naumann, S., Langer, H.: Parsing. Teubner, Stuttgart (1994)
Polanyi, L., Culy, C., van den Berg, M., Thione, G.L., Ahn, D.: A rule based approach to discourse parsing. In: Proceedings of the 5th Workshop in Discourse and Dialogue, Cambridge, MA, pp. 108–117 (2004)
Polanyi, L., Culy, C., van den Berg, M., Thione, G.L., Ahn, D.: Sentential structure and discourse parsing. In: Proceedings of the ACL 2004 Workshop on Discourse Annotation, Barcelona, pp. 49–56 (2004)
Reitter, D.: Rhetorical analysis with rich-feature support vector models. Master’s thesis, University of Potsdam (2003)
Reitter, D.: Simple signals for complex rhetorics: On rhetorical analysis with rich-feature support vector models. In: Seewald-Heeg, U.: (ed) Sprachtechnologie für die multilinguale Kommunikation. Textproduktion, Recherche, Übersetzung, Lokalisierung. Beiträge der GLDV-Frühjahrstagung, Köthen, LDV-Forum, vol. 18(1,2), pp. 38–52 (2003)
Reitter, D., Stede, M.: Step by step: Underspecified markup in incremental rhetorical analysis. In: Proceedings of the 4th International Workshop on Linguistically Interpreted Corpora (LINC 2003) at the EACL, Budapest (2003)
Soricut, R., Marcu, D.: Sentence level discourse parsing using syntactic and lexical information. In: Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference (HLT/NAACL), Edmonton, Canada (2003)
Tapanainen, P., Järvinen, T.: A non-projective dependency parser. In: Proceedings of the 5th Conference on Applied Natural Language Processing, Association for Computational Linguistics, Washington D.C., pp. 64–71 (1997)
Tomita, M.: An efficient augmented-context-free parsing algorithm. Computational Linguistics 13(1-2), 31–46 (1987)
Walsh, N., Muellner, L.: DocBook: The Definitive Guide. O’Reilly, Sebastopol (1999)
Hilbert, M., Lüngen, H., Bärenfänger, M., Lobin, H.: Demonstration des SemDok-Textparsers. In: Storrer, A., Geyken, A., Siebert, A., Würzner, K.M. (eds.) Proceedings of the 9th Conference on Natural Language Processing (KONVENS 2008), pp. 22–28. Ergänzungsband Textressourcen und lexikalisches Wissen, Berlin (2008)
Webber, B.: D-LTAG: Extending Lexicalized TAG to Discourse. Cognitive Science 28(5), 751–779 (2004)
Witt, A.: Multiple hierarchies: New aspects of an old solution. In: Proceedings of the Extreme Markup Languages, Montreal (2004)
Witt, A., Lüngen, H., Goecke, D., Sasaki, F.: Unification of XML documents with concurrent markup. Literary and Linguistic Computing 20(1), 103–116 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Lobin, H., Lüngen, H., Hilbert, M., Bärenfänger, M. (2011). Processing Text-Technological Resources in Discourse Parsing. In: Mehler, A., Kühnberger, KU., Lobin, H., Lüngen, H., Storrer, A., Witt, A. (eds) Modeling, Learning, and Processing of Text Technological Data Structures. Studies in Computational Intelligence, vol 370. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22613-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-22613-7_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22612-0
Online ISBN: 978-3-642-22613-7
eBook Packages: EngineeringEngineering (R0)