Experimental analysis of multiple criteria for extractive multi-document text summarization

https://doi.org/10.1016/j.eswa.2019.112904Get rights and content

Highlights

  • We focus on the different criteria used for generating automatic text summaries.

  • We contribute with a complete study evaluating and comparing these different criteria.

  • This paper analyzes the best criteria and their combinations.

  • Our tests use Document Understanding Conferences datasets and the ROUGE metrics.

  • Coverage, redundancy reduction, and relevance obtain the most balanced results.

Abstract

Automatic text summarization methods are increasingly needed in different fields of knowledge. In the scientific literature, generic extractive multi-document text summarization can be formulated as an optimization problem which involves several criteria. Only two criteria have been considered simultaneously, i.e., content coverage and redundancy reduction, whereas the other ones, relevance and coherence have been considered separately. Therefore, there is a lack of studies comparing the performance of different criteria. For this reason, a comparative study of the different criteria suitable for generic extractive multi-document text summarization is performed here. All possible combinations of two, three, and four criteria have been considered within a multi-objective optimization context. Experiments have been carried out based on datasets from Document Understanding Conferences (DUC), and the combinations of objective functions have been compared and evaluated with Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics. Redundancy reduction has been demonstrated as an indispensable criterion, being the coherence the least significant and efficient criterion. The combination that includes content coverage, redundancy reduction, and relevance obtains the most balanced results in terms of average ROUGE and execution time.

Introduction

A large amount of information is collected in World Wide Web nowadays. In addition, this volume of information grows continuously. This explosion of digital information makes it difficult for the Internet users to quickly extract the most relevant information on a specific topic. Using the tools of text mining, it is possible to carry out this process (Fan & Bifet, 2013). From all the textual information of the specific topic, text mining tools should be able to automatically generate a summary (Hashimi, Hafez, & Mathkour, 2015). In addition, the generated summary must meet some basic requirements, including the needs of users.

In the scientific literature, there are several types of summarization methods. Text summarization methods can be query-oriented or generic. On the one hand, according to Huang, He, Wei, and Li (2010), query-oriented summarization requires the creation of a summary according to a given query from the user. This query describes the user’s information need, and it usually consists of one or several narrative or interrogative sentences to interact with. On the other hand, generic summarization methods do not need any query specified by the user (see e.g. Alguliev, Aliguliyev, & Mehdiyev, 2011c). Zajic, Dorr, and Lin (2008) differentiate summarization methods in single-document or multi-document depending on where the information is obtained from: a single-document summary reduces the information from only a document; and a multi-document summary is obtained from a document collection, selecting information from each of them.

Text summarization methods can also be classified as abstractive or extractive (Wan, 2008). On the one hand, abstractive summarization methods may generate a summary which contains words or parts of sentences that do not exist in the original text. On the other hand, extractive summarization methods only select entire sentences from the original text.

Text summarization problem can be addressed by single-objective or multi-objective optimization approaches. In the first ones, only one objective function is optimized (see e.g. Alguliev et al., 2011c). This objective function can include one criterion or more than one by weighting them. Most approaches for text summarization found in the scientific literature have used single-objective optimization. However, recently, multi-objective optimization is gaining strength for text summarization, as in Sanchez-Gomez, Vega-Rodríguez, and Pérez (2018). This optimization approach allows the optimization of more than one objective function simultaneously. In addition, the results obtained with these approaches improve those obtained with the single-objective optimization approaches.

The generation of a summary is carried out by means of an optimization problem. This usually involves a set of criteria to be optimized. In most of the works, the used criteria have been the content coverage and the redundancy reduction: the content coverage criterion concerns to the presence of the main topic of the document collection in the generated summary, whereas the redundancy reduction criterion tries to avoid similar sentences in the summary (see e.g. Alguliev, Aliguliyev, Mehdiyev, 2011c, Saleh, Kadhim, Attea, 2015, Sanchez-Gomez, Vega-Rodríguez, Pérez, 2018). Other criteria have also been used to a lesser extent, such as relevance and coherence: the relevance criterion tries to include the most important information from the document collection by including the appropriate sentences (Alguliev, Aliguliyev, & Isazade, 2012c), and the coherence criterion assesses the readability of the summary by means of a smooth connectivity among its sentences (Umam, Putro, Pratamasunu, Arifin, & Purwitasari, 2015). These four criteria have been used in the scientific literature as it is shown in the related work section. In addition, these criteria have also been used in other approaches that are not focused on optimization techniques, such as graph-based, clustering, or latent semantic analysis. These approaches applied content coverage (see e.g. Angheluta, De Busser, Moens, 2002, Baralis, Cagliero, Mahoto, Fiori, 2013) and redundancy reduction (see e.g. Garg, Favre, Reidhammer, & Hakkani-Tür, 2009). As for relevance and coherence, see e.g. Gong and Liu (2001) and Azmi and Al-Thanyyan (2012), respectively. To the best of the authors’ knowledge, no study has been conducted to analyse the performance of all the possible combinations of these four criteria/objectives and assess their importance.

In this work, generic extractive multi-document text summarization problem is addressed with a multi-objective optimization approach. Experiments have been carried out based on datasets from Document Understanding Conferences (DUC) (NIST, 2014). The combinations of objective functions have been evaluated and compared by using Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics (Lin, 2004). The main contributions of this work are the following:

  • All criteria suitable for generic extractive multi-document text summarization problem have been analyzed, implemented, and compared.

  • The experiments have taken into account all possible combinations of the different criteria (a total of eleven).

  • The multi-objective optimization approach used in the experiments has been adapted to consider different number of criteria/objectives (two, three, and four) and all possible combinations of them.

  • A complete statistical analysis has been performed for the results obtained in the combinations of two, three, and four objective functions.

  • The most relevant criteria have been obtained, and also the best combinations among them.

  • The balance between quality improvement and computational cost has been studied for the best combinations.

  • A comparison with other works has been included, showing the improvements of the best combination found.

The remainder of this paper is organized as follows. Section 2 presents the related work. In Section 3, the generic extractive multi-document text summarization is formulated as a multi-objective optimization problem, and the different criteria are described. Section 4 presents the datasets used for the experiments, the evaluation metrics used, the statistical methods carried out, and the algorithm used in the experiments. In Section 5, the results obtained for all combinations of objective functions and their statistical analysis are detailed. Finally, Section 6 includes the conclusions.

Section snippets

Related work

This section presents the state-of-the-art related to criteria applied for generic extractive multi-document text summarization from an optimization viewpoint.

First, the case for two criteria is reviewed. All references found refer to the content coverage and the redundancy reduction. Alguliev et al. (2011c) proposed an adaptive Differential Evolution (DE) algorithm for the optimization model. Alguliev, Aliguliyev, Hajirahimova, and Mehdiyev (2011) solved the text summarization model with

Problem definition

Generic extractive multi-document text summarization is formalized as an optimization problem. In the field of text summarization, vector-based word methods are the most used, where each sentence is represented as a vector of words. To measure similarity between sentences by means of pairwise comparisons, the most used measure in the scientific literature is the cosine similarity.

In this section, the cosine similarity measure is explained next. After that, the different criteria used for the

Methodology

This section presents the datasets involved in the experiments, the evaluation metrics used, the statistical methods performed, and the algorithm used for the experiments.

Experimental results

This section presents the results separated by number of objective functions and a discussion on the best combinations of objective functions.

Conclusions

Generic extractive multi-document text summarization is a problem whose nature is multi-objective. However, this problem has been traditionally solved from a single-objective perspective. When multi-objective optimization has been considered to address this problem, only two criteria have been used, i.e., content coverage and redundancy reduction. In other cases, criteria such as relevance and coherence have been used independently. Evaluating and comparing the different combinations of the

CRediT authorship contribution statement

Jesus M. Sanchez-Gomez: Software, Validation, Formal analysis, Investigation, Data curation, Writing - original draft, Visualization. Miguel A. Vega-Rodríguez: Conceptualization, Methodology, Formal analysis, Resources, Writing - review & editing, Supervision, Project administration, Funding acquisition. Carlos J. Pérez: Conceptualization, Methodology, Formal analysis, Resources, Writing - review & editing, Supervision, Project administration, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

We thank the referees for comments and suggestions which have improved both the content and the readability of this paper. This research has been supported by Ministerio de Economía y Competitividad (Centro para el Desarrollo Tecnológico Industrial, contract IDI-20161039; Agencia Estatal de Investigación, projects TIN2016-76259-P and MTM2017-86875-C3-2-R), Junta de Extremadura (Contract AA-16-0017-1, and projects GR18090 and GR18108), Cátedra/Aula ASPgems, and European Union (European Regional

References (35)

  • H. Hashimi et al.

    Selection criteria for text mining approaches

    Computers in Human Behavior

    (2015)
  • G. Salton et al.

    Term-weighting approaches in automatic text retrieval

    Information Processing & Management

    (1988)
  • J.M. Sanchez-Gomez et al.

    Extractive multi-document text summarization using a multi-objective artificial bee colony optimization approach

    Knowledge-Based Systems

    (2018)
  • P. Verma et al.

    MCRMR: Maximum coverage and relevancy with minimal redundancy based multi-document summarization

    Expert Systems with Applications

    (2019)
  • D.M. Zajic et al.

    Single-document and multi-document summarization techniques for email threads using sentence compression

    Information Processing & Management

    (2008)
  • R.M. Alguliev et al.

    Quadratic boolean programming model and binary differential evolution algorithm for text summarization

    Problems of Information Technology

    (2012)
  • R.M. Alguliev et al.

    An unsupervised approach to generating generic summaries of documents

    Applied Soft Computing

    (2015)
  • Cited by (29)

    • Automatic text summarization: A comprehensive survey

      2021, Expert Systems with Applications
      Citation Excerpt :

      The sentence scoring steps of an optimization-based extractive summarizer include: 1) creating a suitable representation to the input text such as the commonly-used vector representation (i.e. each sentence in the input text is represented as a vector of words), and 2) using an optimization algorithm (e.g. A Multi-Objective Artificial Bee Colony (MOABC) algorithm) to select the summary sentences based on the required summary length limit besides one or more optimization criteria: content coverage, redundancy reduction, relevance and coherence (Sanchez-Gomez et al., 2020b). In addition, the strength of genetic algorithms in adjusting weights could be used for ATS.

    View all citing articles on Scopus
    View full text