short-paper

DistilSum:: Distilling the Knowledge for Extractive Summarization

Authors:

Jianlong TanAuthors Info & Claims

CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

Pages 2069 - 2072

https://doi.org/10.1145/3340531.3412078

Published: 19 October 2020 Publication History

Abstract

A popular choice for extractive summarization is to conceptualize it as sentence-level classification, supervised by binary labels. While the common metric ROUGE prefers to measure the text similarity, instead of the performance of classifier. For example, BERTSUMEXT, the best extractive classifier so far, only achieves a precision of 32.9% at the top 3 extracted sentences (P@3) on CNN/DM dataset. It is obvious that current approaches cannot model the complex relationship of sentences exactly with 0/1 targets. In this paper, we introduce DistilSum, which contains teacher mechanism and student model. Teacher mechanism produces high entropy soft targets at a high temperature. Our student model is trained with the same temperature to match these informative soft targets and tested with temperature of 1 to distill for ground-truth labels. Compared with large version of BERTSUMEXT, our experimental result on CNN/DM achieves a substantial improvement of 0.99 ROUGE-L score (text similarity) and 3.95 P@3 score (performance of classifier). Our source code will be available on Github.

Supplementary Material

MP4 File (3340531.3412078.mp4)

Soft labels with high entropy will provide much more information than binary labels.\r\nIn this paper, we propose DistilSum, which contains a teacher algorithm to generate soft labels and a student model trained with the soft labels to extract summary sentences.\r\n

Download
10.57 MB

References

[1]

Kristjan Arumae and Fei Liu. 2018. Reinforced Extractive Summarization with Question-Focused Rewards. In ACL.

[2]

Sanghwan Bae, Taeuk Kim, Jihoon Kim, and Sang goo Lee. 2019. Summary Level Training of Sentence Rewriting for Abstractive Summarization. In Conference on Empirical Methods in Natural Language Processing, Workshop on New Frontiers in Summarization.

[3]

Yen-Chun Chen, Zhe Gan, Yu Cheng, Jingzhou Liu, and Jingjing Liu. 2019. Distilling the Knowledge of BERT for Text Generation. In ACL.

[4]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2014. Distilling the Knowledge in a Neural Network. In NIPS.

[5]

Aishwarya Jadhav and Vaibhav Rajan. 2018. Extractive summarization with swap-net: Sentences and words from alternating pointer networks. In ACL.

[6]

Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In ICLR.

[7]

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In arXiv preprint arXiv:1910.13461.

[8]

Yang Liu and Mirella Lapata. 2019. Text summarization with pretrained encoders. In EMNLP.

[9]

Afonso Mendes, Shashi Narayan, Sebasti ao Miranda, Zita Marinho, André FT Martins, and Shay B Cohen. 2019. Jointly Extracting and Compressing Documents with Summary State Representations. In NAACL-HLT.

[10]

Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. 2017. Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. In AAAI.

[11]

Shashi Narayan, Shay B Cohen, and Mirella Lapata. 2018. Ranking sentences for extractive summarization with reinforcement learning. In NAACL-HLT.

[12]

Alexander M. Rush, Sumit Chopra, and Jason Weston. 2015. A Neural Attention Model for Abstractive Sentence Summarization. In EMNLP.

[13]

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. In NIPS.

[14]

Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get To The Point: Summarization with Pointer-Generator Networks. In ACL.

[15]

Siqi Sun, Yu Cheng, Zhe Gan, and Jingjing Liu. 2019. Patient Knowledge Distillation for BERT Model Compression. In EMNLP.

[16]

Xu Tan, Yi Ren, Di He, Tao Qin, Zhou Zhao, and Tie-Yan Liu. 2019. Multilingual Neural Machine Translation with Knowledge Distillation. In ICLR.

[17]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS.

[18]

Danqing Wang, Pengfei Liu, Yining Zheng, Xipeng Qiu, and Xuanjing Huang. 2020. Heterogeneous Graph Neural Networks for Extractive Document Summarization. In ACL.

[19]

Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, and Ming Zhou. 2020. ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training. In arXiv preprint arXiv:2001.04063.

[20]

Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical Attention Networks for Document Classification. In NAACL-HLT.

[21]

Xingxing Zhang, Furu Wei, and Ming Zhou. 2019. HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization. In ACL.

[22]

Ming Zhong, Pengfei Liu, Yiran Chen, Danqing Wang, Xipeng Qiu, and Xuanjing Huang. 2020. Extractive Summarization as Text Matching. In ACL.

Cited By

Saleh MWazery YAli A(2024)A systematic literature review of deep learning-based text summarization: Techniques, input representation, training strategies, mechanisms, datasets, evaluation, and challengesExpert Systems with Applications10.1016/j.eswa.2024.124153252(124153)Online publication date: Oct-2024
https://doi.org/10.1016/j.eswa.2024.124153
Taieb-Maimon MRomanovski-Chernik ALast MLitvak MElhadad M(2023)Mining Eye-Tracking Data for Text SummarizationInternational Journal of Human–Computer Interaction10.1080/10447318.2023.222782740:17(4887-4905)Online publication date: 21-Jul-2023
https://doi.org/10.1080/10447318.2023.2227827
Wazery YSaleh MAli A(2023)An optimized hybrid deep learning model based on word embeddings and statistical features for extractive summarizationJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2023.10161435:7(101614)Online publication date: Jul-2023
https://doi.org/10.1016/j.jksuci.2023.101614
Show More Cited By

Index Terms

DistilSum:: Distilling the Knowledge for Extractive Summarization
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Summarization

Recommendations

A query-based multi-document sentiment summarizer
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

Review websites, such as Epinions.com, which offer users a platform to share their opinions on diverse products and services, provide a valuable source of opinion-rich information. Browsing through archived reviews to locate different opinions on a ...
An Ontology-Based Approach to Text Summarization
WI-IAT '08: Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03

Extractive text summarization aims to create a condensed version of one or more source documents by selecting the most informative sentences. Research in text summarization has therefore often focused on measures of the usefulness of sentences for a ...
Intertopic information mining for query-based summarization

In this article, the authors address the problem of sentence ranking in summarization. Although most existing summarization approaches are concerned with the information embodied in a particular topic (including a set of documents and an associated ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

October 2020

3619 pages

ISBN:9781450368599

DOI:10.1145/3340531

General Chairs:
Mathieu d'Aquin
DSI, Insight, NUI Galway, Ireland
,
Stefan Dietze
GESIS, Cologne, Germany, Heinrich-Heine-University Düsseldorf, Germany, L3S Research Center, Germany
,
Program Chairs:
Claudia Hauff
TU Delft, The Netherlands
,
Edward Curry
DSI, Insight, NUI Galway, Ireland
,
Philippe Cudre Mauroux
eXascale, University of Fribourg, Switzerland

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

CIKM '20

Sponsor:

CIKM '20: The 29th ACM International Conference on Information and Knowledge Management

October 19 - 23, 2020

Virtual Event, Ireland

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
311
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Saleh MWazery YAli A(2024)A systematic literature review of deep learning-based text summarization: Techniques, input representation, training strategies, mechanisms, datasets, evaluation, and challengesExpert Systems with Applications10.1016/j.eswa.2024.124153252(124153)Online publication date: Oct-2024
https://doi.org/10.1016/j.eswa.2024.124153
Taieb-Maimon MRomanovski-Chernik ALast MLitvak MElhadad M(2023)Mining Eye-Tracking Data for Text SummarizationInternational Journal of Human–Computer Interaction10.1080/10447318.2023.222782740:17(4887-4905)Online publication date: 21-Jul-2023
https://doi.org/10.1080/10447318.2023.2227827
Wazery YSaleh MAli A(2023)An optimized hybrid deep learning model based on word embeddings and statistical features for extractive summarizationJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2023.10161435:7(101614)Online publication date: Jul-2023
https://doi.org/10.1016/j.jksuci.2023.101614
Ghosh Roy SPadhi AJain RGupta MVarma VBellogín ABoratto LCena F(2022)Towards Proactively Forecasting Sentence-Specific Information Popularity within Online News DocumentsProceedings of the 33rd ACM Conference on Hypertext and Social Media10.1145/3511095.3531268(11-20)Online publication date: 28-Jun-2022
https://dl.acm.org/doi/10.1145/3511095.3531268

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten