skip to main content
research-article

A comprehensive comparative evaluation of RST-based summarization methods

Published: 18 May 2010 Publication History

Abstract

Motivated by governmental, commercial and academic interests, and due to the growing amount of information, mainly online, automatic text summarization area has experienced an increasing number of researches and products, which led to a countless number of summarization methods. In this paper, we present a comprehensive comparative evaluation of the main automatic text summarization methods based on Rhetorical Structure Theory (RST), claimed to be among the best ones. We compare our results to superficial summarizers, which belong to a paradigm with severe limitations, and to hybrid methods, combining RST and superficial methods. We also test voting systems and machine learning techniques trained on RST features. We run experiments for English and Brazilian Portuguese languages and compare the results obtained by using manually and automatically parsed texts. Our results systematically show that all RST methods have comparable overall performance and that they outperform most of the superficial methods. Machine learning techniques achieved high accuracy in the classification of text segments worth of being in the summary, but were not able to produce more informative summaries than the regular RST methods.

References

[1]
Baxendale, P. B. 1958. Machine-Made index for technical literature—An experiment. IBM J. Res. Devel. 2, 354--365.
[2]
Burstein, J., Marcu, D., and Knight, K. 2003. Finding the WRITE stuff: Automatic identification of discourse structure in student essays. IEEE Intell. Syst., 32--39.
[3]
Carbonel, T. I., Seno, E. R. M., Pardo, T. A. S., Coelho, J. C., Collovini, S., Rino, L. H. M., and Vieira, R. 2006. A two-step summarizer of Brazilian Portuguese texts. In Proceedings of the 4th Workshop on Information and Human Language Technology (TIL).
[4]
Carlson, L., Marcu, D., and Okurowski, M. E. 2003. Building a discourse-tagged corpus in the framework of rhetorical structure theory. In Current Directions in Discourse and Dialogue, J. van Kuppevelt and R. Smith, Eds. Kluwer Academic Publishers, 85--112.
[5]
Cristea, D., Ide, N., and Romary, L. 1998. Veins theory: A model of global discourse cohesion and coherence. In Proceedings of the Coling-ACL Conference. 281--285.
[6]
Leite, D. S., Rino, L. H. M., Pardo, T. A. S., and Nunes, M. G. V. 2007. Extractive automatic summarization: Does more linguistic knowledge make a difference? In Proceedings of the HLT/NAACL Workshop on TextGraphs-2: Graph-Based Algorithms for Natural Language Processing. 17--24.
[7]
Lin, C. Y. and Hovy, E. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the Language Technology Conference (HLT-NAACL'03).
[8]
Luhn, H. 1958. The automatic creation of literature abstracts. IBM J. Res. Devel. 2, 159--165.
[9]
Mani, I. and Maybury, M. T. 1999. Advances in Automatic Text Summarization. MIT Press, Cambridge, MA.
[10]
Mani, I. 2001. Automatic Summarization. John Benjamins Publishing.
[11]
Mann, W. C. and Thompson, S. A. 1987. Rhetorical structure theory: A theory of text organization. Tech. rep. ISI/RS-87-190, University of Southern California.
[12]
Mann, W. C. and Thompson, S. A. 1992. Discourse Description: Diverse Linguistic Analyses of a Fund-Raising Text. Pragmatics & Beyond, New Series. John Benjamins.
[13]
Marcu, D. 1997. The rhetorical parsing, summarization, and generation of natural language texts. Ph.D. thesis, University of Toronto.
[14]
Marcu, D. 1998. To build text summaries of high quality, nuclearity is not sufficient. Working Notes of the AAAI-98 Spring Symposium on Intelligent Text Summarization.
[15]
Marcu, D. 2000. The Theory and Practice of Discourse Parsing and Summarization. The MIT Press.
[16]
Marcu, D., Carlson, L., and Watanabe, M. 2000. The automatic translation of discourse structures. In Proceedings of the 1st Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL'00), Vol. 1, 9--17.
[17]
O'Donnell, M. 1997. Variable-Length on-line document generation. In Proceedings of the 6th European Workshop on Natural Language Generation.
[18]
Ono, K., Sumita, K., and Miike, S. 1994. Abstract generation based on rhetorical structure extraction. In Proceedings of the International Conference on Computational Linguistics (Coling-94).
[19]
Pardo, T. A. S., Rino, L. H. M., and Nunes, M. G. V. 2003. GistSumm: A summarization tool based on a new extractive method. In Proceedings of the 6th Workshop on Computational Processing of the Portuguese Language - Written and Spoken (PROPOR). Lecture Notes in Artificial Intelligence, vol. 2721. 210--218.
[20]
Pardo, T. A. S. and Seno, E. R. M. 2005. Rhetalho: Um corpus de referência anotado retoricamente. In Proceedings of the V Encontro de Corpora.
[21]
Pardo, T. A. S. and Nunes, M. G. V. 2006. Review and evaluation of DiZer—An automatic discourse analyzer for Brazilian Portuguese. In Proceedings of the 7th Workshop on Computational Processing of Written and Spoken Portuguese (PROPOR). Lecture Notes in Computer Science, vol. 3960. Springer, 180--189.
[22]
Pardo, T. A. S. and Nunes, M. G. V. 2008. On the development and evaluation of a Brazilian Portuguese discourse parser. J. Theor. Appl. Comput. 15, 2, 43--64.
[23]
Soricut, R. and Marcu, D. 2003. Sentence level discourse parsing using syntactic and lexical information. In Proceedings of the HLT-NAACL Conference. 149--156.
[24]
Salton, G. 1989. Automatic Text Processing. The Transformation, Analysis and Retrieval of Information by Computer. Addison-Wesley.
[25]
Schauer, H. 2000. Referential structure and coherence structure. In Proceedings of the TALN Conference.
[26]
Skorochodko, E. F. 1971. Adaptive method of automatic abstracting and indexing. Inform. Process. 2, 1179--1182.
[27]
Spärck Jones, K. 2007. Automatic summarising: A review and discussion of the state of the art. Tech. rep. UCAM-CL-TR-679, University of Cambridge.
[28]
Sumita, K., Ono, K., Chino, T., Ukita, T., and Amano, S. 1992. A discourse structure analyzer for Japonese text. In Proceedings of the International Conference on Fifth Generation Computer Systems, Vol. 2, 1133--1140.
[29]
Uzêda, V. R., Pardo, T. A. S., and Nunes, M. G. V. 2007. Estudo e avaliação de métodos de sumarização automática de textos baseados na rst. Tech. rep. ICMC-USP, São Carlos-SP.
[30]
Uzêda, V. R., Pardo, T. A. S., and Nunes, M. G. V. 2008. Evaluation of automatic text summarization methods based on rhetorical structure theory. In Proceedings of the 8th IEEE International Conference on Intelligent Systems Design and Applications (ISDA'08). 389--394.
[31]
Uzêda, V. R., Pardo, T. A. S., and Nunes, M. G. V. 2009. A comprehensive summary informativeness evaluation for RST-based summarization methods. Int. J. Comput. Inform. Syst. Industr. Manag. Appl. 1, 188--196.
[32]
Witten, I. H. and Frank, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann.
[33]
Wolf, F. and Gibson, E. 2006. Coherence in Natural Language. Data Structures and Applications. The MIT Press.

Cited By

View all
  • (2020)Sözbilimsel Yapı Kuramının Metinlerdeki Önemli Birimlerin Belirlenmesine Yönelik KullanımıDil Eğitimi ve Araştırmaları Dergisi10.31464/jlere.7702616:2(635-656)Online publication date: 25-Oct-2020
  • (2018)An abstractive Arabic text summarizer with user controlled granularityInformation Processing & Management10.1016/j.ipm.2018.06.00254:6(903-921)Online publication date: Nov-2018
  • (2017)Subtopic annotation and automatic segmentation for news texts in Brazilian PortugueseCorpora10.3366/cor.2017.010812:1(23-54)Online publication date: Apr-2017
  • Show More Cited By

Index Terms

  1. A comprehensive comparative evaluation of RST-based summarization methods

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Speech and Language Processing
    ACM Transactions on Speech and Language Processing   Volume 6, Issue 4
    May 2010
    20 pages
    ISSN:1550-4875
    EISSN:1550-4883
    DOI:10.1145/1767756
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 May 2010
    Accepted: 01 March 2010
    Revised: 01 October 2009
    Received: 01 May 2009
    Published in TSLP Volume 6, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Text summarization
    2. rhetorical structure theory

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 14 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Sözbilimsel Yapı Kuramının Metinlerdeki Önemli Birimlerin Belirlenmesine Yönelik KullanımıDil Eğitimi ve Araştırmaları Dergisi10.31464/jlere.7702616:2(635-656)Online publication date: 25-Oct-2020
    • (2018)An abstractive Arabic text summarizer with user controlled granularityInformation Processing & Management10.1016/j.ipm.2018.06.00254:6(903-921)Online publication date: Nov-2018
    • (2017)Subtopic annotation and automatic segmentation for news texts in Brazilian PortugueseCorpora10.3366/cor.2017.010812:1(23-54)Online publication date: Apr-2017
    • (2017)Event-based summarization using a centrality-as-relevance modelKnowledge and Information Systems10.1007/s10115-016-0966-450:3(945-968)Online publication date: 1-Mar-2017
    • (2015)Summarizing a document by trimming the discourse treeIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2015.246515023:11(2081-2092)Online publication date: 1-Nov-2015
    • (2014)A (New) Look at User Participation in an ERPInternational Journal of Knowledge-Based Organizations10.4018/ijkbo.20140701014:3(1-7)Online publication date: 1-Jul-2014
    • (2014)The Olds InstituteInternational Journal of Knowledge-Based Organizations10.4018/ijkbo.20140401034:2(37-52)Online publication date: 1-Apr-2014
    • (2014)A Control-Data-Mapping Entity-Relationship Model for Internal Controls Construction in Database DesignInternational Journal of Knowledge-Based Organizations10.4018/ijkbo.20140401024:2(20-36)Online publication date: 1-Apr-2014
    • (2014)On the Support of Mobility in ORDBMSInternational Journal of Knowledge-Based Organizations10.4018/ijkbo.20140101034:1(38-64)Online publication date: 1-Jan-2014
    • (2014)Building a Language Model for Local Coherence in Multi-document Summaries Using a Discourse-Enriched Entity-Based ModelProceedings of the 2014 Brazilian Conference on Intelligent Systems10.1109/BRACIS.2014.19(44-49)Online publication date: 18-Oct-2014
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media