skip to main content
10.1145/3511808.3557821acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

Rank-Aware Gain-Based Evaluation of Extractive Summarization

Published:17 October 2022Publication History

ABSTRACT

ROUGE has long been a popular metric for evaluating text summarization tasks as it eliminates time-consuming and costly human evaluations. However, ROUGE is not a fair evaluation metric for extractive summarization task as it is entirely based on lexical overlap. Additionally, ROUGE ignores the quality of the ranker for extractive summarization which performs the actual sentence/phrase extraction job. The main focus of the thesis is to design a nCG (normalized cumulative gain)-based evaluation metric for extractive summarization that is both rank-aware and semantic-aware (called Sem-nCG). One fundamental contribution of the work is that it demonstrates how we can generate more reliable semantic-aware ground truths for evaluating extractive summarization tasks without any additional human intervention. To the best of our knowledge, this work is the first of its kind. Preliminary experimental results demonstrate that the new Sem-nCG metric is indeed semantic-aware and also exhibits higher correlation with human judgement for single document summarization when single reference is considered.

References

  1. Mousumi Akter, Naman Bansal, and Shubhra Kanti Karmaker Santu. 2022. Revisiting Automatic Evaluation of Extractive Summarization Task: Can We Do Better than ROUGE?. In Findings of the ACL 2022. Association for Computational Linguistics, Dublin, Ireland, 1547--1560.Google ScholarGoogle ScholarCross RefCross Ref
  2. Florian Bö hm, Yang Gao, Christian M. Meyer, Ori Shapira, Ido Dagan, and Iryna Gurevych. 2019. Better Rewards Yield Better Summaries: Learning to Summarise Without References. In Proceedings of the EMNLP-IJCNLP 2019. Association for Computational Linguistics, Hong Kong, China, 3108--3118.Google ScholarGoogle Scholar
  3. Elizabeth Clark, Asli Celikyilmaz, and Noah A. Smith. 2019. Sentence Mover's Similarity: Automatic Evaluation for Multi-Sentence Texts. In Proceedings of the ACL 2019. Association for Computational Linguistics, Florence, Italy, 2748--2760.Google ScholarGoogle Scholar
  4. Alexander R. Fabbri, Wojciech Kryscinski, Bryan McCann, Caiming Xiong, Richard Socher, and Dragomir R. Radev. 2021. SummEval: Re-evaluating Summarization Evaluation. Trans. Assoc. Comput. Linguistics, Vol. 9 (2021), 391--409.Google ScholarGoogle ScholarCross RefCross Ref
  5. Yang Gao, Wei Zhao, and Steffen Eger. 2020. SUPERT: Towards New Frontiers in Unsupervised Evaluation Metrics for Multi-Document Summarization. In Proceedings of ACL 2020. Association for Computational Linguistics, Online, 1347--1354.Google ScholarGoogle ScholarCross RefCross Ref
  6. Yvette Graham. 2015. Re-evaluating Automatic Summarization with BLEU and 192 Shades of ROUGE. In Proceedings of the EMNLP 2015. The Association for Computational Linguistics, Lisbon, Portugal, 128--137.Google ScholarGoogle ScholarCross RefCross Ref
  7. Hardy, Shashi Narayan, and Andreas Vlachos. 2019. HighRES: Highlight-based Reference-less Evaluation of Summarization. In Proceedings of the ACL 2019. Association for Computational Linguistics, Florence, Italy, 3381--3392.Google ScholarGoogle Scholar
  8. Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out. ACL, Barcelona, Spain, 74--81.Google ScholarGoogle Scholar
  9. Jun-Ping Ng and Viktoria Abrecht. 2015. Better Summarization Evaluation with Word Embeddings for ROUGE. In Proceedings of the EMNLP 2015. The Association for Computational Linguistics, Lisbon, Portugal, 1925--1930.Google ScholarGoogle ScholarCross RefCross Ref
  10. Dragomir R. Radev and Daniel Tam. 2003. Summarization evaluation using relative utility. In CIKM 2003. ACM, New Orleans, Louisiana, USA, 508--511.Google ScholarGoogle Scholar
  11. Hanlu Wu, Tengfei Ma, Lingfei Wu, Tariro Manyumwa, and Shouling Ji. 2020. Unsupervised Reference-Free Summary Quality Evaluation via Contrastive Learning. In Proceedings of the EMNLP 2020. Association for Computational Linguistics, Online, 3612--3621.Google ScholarGoogle ScholarCross RefCross Ref
  12. An Yang, Kai Liu, Jing Liu, Yajuan Lyu, and Sujian Li. 2018. Adaptations of ROUGE and BLEU to Better Evaluate Machine Reading Comprehension Task. In Proceedings of the Workshop on Machine Reading for Question Answering@ACL 2018. Association for Computational Linguistics, Melbourne, Australia, 98--104.Google ScholarGoogle ScholarCross RefCross Ref
  13. Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. 2020. BERTScore: Evaluating Text Generation with BERT. In ICLR 2020. OpenReview.net, Addis Ababa, Ethiopia.Google ScholarGoogle Scholar
  14. Wei Zhao, Maxime Peyrard, Fei Liu, Yang Gao, Christian M. Meyer, and Steffen Eger. 2019. MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance. In Proceedings of the EMNLP-IJCNLP 2019. Association for Computational Linguistics, Hong Kong, China, 563--578.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Rank-Aware Gain-Based Evaluation of Extractive Summarization

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management
        October 2022
        5274 pages
        ISBN:9781450392365
        DOI:10.1145/3511808
        • General Chairs:
        • Mohammad Al Hasan,
        • Li Xiong

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 17 October 2022

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper

        Acceptance Rates

        CIKM '22 Paper Acceptance Rate621of2,257submissions,28%Overall Acceptance Rate1,861of8,427submissions,22%

        Upcoming Conference

      • Article Metrics

        • Downloads (Last 12 months)49
        • Downloads (Last 6 weeks)5

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader