skip to main content
10.1145/3340531.3412078acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

DistilSum:: Distilling the Knowledge for Extractive Summarization

Published: 19 October 2020 Publication History

Abstract

A popular choice for extractive summarization is to conceptualize it as sentence-level classification, supervised by binary labels. While the common metric ROUGE prefers to measure the text similarity, instead of the performance of classifier. For example, BERTSUMEXT, the best extractive classifier so far, only achieves a precision of 32.9% at the top 3 extracted sentences (P@3) on CNN/DM dataset. It is obvious that current approaches cannot model the complex relationship of sentences exactly with 0/1 targets. In this paper, we introduce DistilSum, which contains teacher mechanism and student model. Teacher mechanism produces high entropy soft targets at a high temperature. Our student model is trained with the same temperature to match these informative soft targets and tested with temperature of 1 to distill for ground-truth labels. Compared with large version of BERTSUMEXT, our experimental result on CNN/DM achieves a substantial improvement of 0.99 ROUGE-L score (text similarity) and 3.95 P@3 score (performance of classifier). Our source code will be available on Github.

Supplementary Material

MP4 File (3340531.3412078.mp4)
Soft labels with high entropy will provide much more information than binary labels.\r\nIn this paper, we propose DistilSum, which contains a teacher algorithm to generate soft labels and a student model trained with the soft labels to extract summary sentences.\r\n

References

[1]
Kristjan Arumae and Fei Liu. 2018. Reinforced Extractive Summarization with Question-Focused Rewards. In ACL.
[2]
Sanghwan Bae, Taeuk Kim, Jihoon Kim, and Sang goo Lee. 2019. Summary Level Training of Sentence Rewriting for Abstractive Summarization. In Conference on Empirical Methods in Natural Language Processing, Workshop on New Frontiers in Summarization.
[3]
Yen-Chun Chen, Zhe Gan, Yu Cheng, Jingzhou Liu, and Jingjing Liu. 2019. Distilling the Knowledge of BERT for Text Generation. In ACL.
[4]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2014. Distilling the Knowledge in a Neural Network. In NIPS.
[5]
Aishwarya Jadhav and Vaibhav Rajan. 2018. Extractive summarization with swap-net: Sentences and words from alternating pointer networks. In ACL.
[6]
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In ICLR.
[7]
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In arXiv preprint arXiv:1910.13461.
[8]
Yang Liu and Mirella Lapata. 2019. Text summarization with pretrained encoders. In EMNLP.
[9]
Afonso Mendes, Shashi Narayan, Sebasti ao Miranda, Zita Marinho, André FT Martins, and Shay B Cohen. 2019. Jointly Extracting and Compressing Documents with Summary State Representations. In NAACL-HLT.
[10]
Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. 2017. Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. In AAAI.
[11]
Shashi Narayan, Shay B Cohen, and Mirella Lapata. 2018. Ranking sentences for extractive summarization with reinforcement learning. In NAACL-HLT.
[12]
Alexander M. Rush, Sumit Chopra, and Jason Weston. 2015. A Neural Attention Model for Abstractive Sentence Summarization. In EMNLP.
[13]
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. In NIPS.
[14]
Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get To The Point: Summarization with Pointer-Generator Networks. In ACL.
[15]
Siqi Sun, Yu Cheng, Zhe Gan, and Jingjing Liu. 2019. Patient Knowledge Distillation for BERT Model Compression. In EMNLP.
[16]
Xu Tan, Yi Ren, Di He, Tao Qin, Zhou Zhao, and Tie-Yan Liu. 2019. Multilingual Neural Machine Translation with Knowledge Distillation. In ICLR.
[17]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS.
[18]
Danqing Wang, Pengfei Liu, Yining Zheng, Xipeng Qiu, and Xuanjing Huang. 2020. Heterogeneous Graph Neural Networks for Extractive Document Summarization. In ACL.
[19]
Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, and Ming Zhou. 2020. ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training. In arXiv preprint arXiv:2001.04063.
[20]
Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical Attention Networks for Document Classification. In NAACL-HLT.
[21]
Xingxing Zhang, Furu Wei, and Ming Zhou. 2019. HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization. In ACL.
[22]
Ming Zhong, Pengfei Liu, Yiran Chen, Danqing Wang, Xipeng Qiu, and Xuanjing Huang. 2020. Extractive Summarization as Text Matching. In ACL.

Cited By

View all
  • (2024)A systematic literature review of deep learning-based text summarization: Techniques, input representation, training strategies, mechanisms, datasets, evaluation, and challengesExpert Systems with Applications10.1016/j.eswa.2024.124153252(124153)Online publication date: Oct-2024
  • (2023)Mining Eye-Tracking Data for Text SummarizationInternational Journal of Human–Computer Interaction10.1080/10447318.2023.222782740:17(4887-4905)Online publication date: 21-Jul-2023
  • (2023)An optimized hybrid deep learning model based on word embeddings and statistical features for extractive summarizationJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2023.10161435:7(101614)Online publication date: Jul-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management
October 2020
3619 pages
ISBN:9781450368599
DOI:10.1145/3340531
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. knowledge distillation
  2. neural networks
  3. summarization

Qualifiers

  • Short-paper

Conference

CIKM '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A systematic literature review of deep learning-based text summarization: Techniques, input representation, training strategies, mechanisms, datasets, evaluation, and challengesExpert Systems with Applications10.1016/j.eswa.2024.124153252(124153)Online publication date: Oct-2024
  • (2023)Mining Eye-Tracking Data for Text SummarizationInternational Journal of Human–Computer Interaction10.1080/10447318.2023.222782740:17(4887-4905)Online publication date: 21-Jul-2023
  • (2023)An optimized hybrid deep learning model based on word embeddings and statistical features for extractive summarizationJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2023.10161435:7(101614)Online publication date: Jul-2023
  • (2022)Towards Proactively Forecasting Sentence-Specific Information Popularity within Online News DocumentsProceedings of the 33rd ACM Conference on Hypertext and Social Media10.1145/3511095.3531268(11-20)Online publication date: 28-Jun-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media