Abstract
Creating a short version of a concise and relevant summary regarding a specific query can broadly meet a user’s information needs in many areas. In a summarization system, the extractive technique is attractive because it is simple and fast and produces reliable outputs. Salience and relevance are two key points for the extractive summarization. The majority of existing approaches to achieving them are augmenting input features, incorporating additional attention, or expanding the training scales. Yet, there is much unsupervised but query-related knowledge needs better exploration. To this end, in this paper, we frame the query-focused document summarization as a combination of salience prediction and relevance prediction. Concretely, in addition to the oracle summary set for the salience task, we further create a pseudo-summary set regarding user-specific queries (i.e., title or image captions as the query) for the relevance task. Then, based on a modified BERT fine-tune summarization, we propose two methods, called guidance and distillation, respectively. Specifically, the guidance training essentially shares salient information to reinforce the useful contextual representations in a two-stage training with the salience-and-relevance objective. For the distillation, we propose a new “guide-student” learning paradigm that the relevance knowledge of the query is distilled and transferred from a guide model to a salience-oriented student model. Experiment results demonstrate that guidance training prevails at improving the salience of the summary and distillation training is significantly advanced at relevance learning. Both of them achieve the best state of the arts in unsupervised query-focused settings of CNN and DailyMail dataset.
Similar content being viewed by others
Data availability
The datasets generated during the current study are available from the corresponding author on reasonable request.
Notes
The ROUGE evaluation option is, -m -n 2.
References
Pugoy RA, Kao H-Y (2021) Unsupervised extractive summarization-based representations for accurate and explainable collaborative filtering. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 1: Long Papers), pp 2981–2990
Wan X, Xiao J (2009) Graph-based multi-modality learning for topic-focused multi-document summarization. In: IJCAI, pp 1586–1591
Yih WT, Goodman J, Vanderwende L, Suzuki H (2007) Multi-document summarization by maximizing informative content-words. In: Proceedings of IJCAI’07, pp 1776–1782
Ouyang Y, Li W, Li S, Lu Q (2011) Applying regression models to query-focused multi-document summarization. Inf Process Manag 47(2):227–237
Lin CY, Hovy E (2000) The automated acquisition of topic signatures for text summarization. In: Proceedings of COLING’00, pp 495–501
Zhou Q, Yang N, Wei F, Huang S, Zhou M, Zhao T (2018) Neural document summarization by jointly learning to score and select sentences. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Vol. 1: Long Papers), pp 654–663
Liu Y, Lapata M (2019) Text summarization with pretrained encoders. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 3721–3731
Nallapati R, Zhai F, Zhou B (2017) Summarunner: a recurrent neural network based sequence model for extractive summarization of documents. In: Thirty-first AAAI conference on artificial intelligence
Zhang X, Wei F, Zhou M (2019) HIBERT: document level pre-training of hierarchical bidirectional transformers for document summarization. In: Proceedings of the 57th conference of the association for computational linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Vol. 1: Long Papers, pp 5059–5069
Narayan S, Cohen SB, Lapata M (2018) Ranking sentences for extractive summarization with reinforcement learning. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long Papers), pp 1747–1759
Cao Z, Li W, Li S, Wei F, Li Y (2016) Attsum: joint learning of focusing and summarization with neural attention. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers, pp 547–556
Narayan S, Cardenas R, Papasarantopoulos N, Cohen SB, Lapata M, Yu J, Chang Y (2018) Document modeling with external attention for sentence extraction. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 2020–2030
Ren P, Chen Z, Ren Z, Wei F, Ma J, de Rijke M (2017) Leveraging contextual sentence relations for extractive summarization using a neural attention model. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, pp 95–104. ACM
Zhu H, Dong L, Wei F, Qin B, Liu T (2019) Transforming wikipedia into augmented data for query-focused summarization. arXiv:1911.03324
Devlin J, Chang M-W, Lee K, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 4171–4186
Furlanello T, Lipton Z, Tschannen M, Itti L, Anandkumar A (2018) Born-again neural networks. In: International conference on machine learning, pp 1602–1611
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. arXiv:1802.05365
Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized bert pretraining approach. arXiv:1907.11692
Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2019) Albert: a lite bert for self-supervised learning of language representations. arXiv:1909.11942
Dong L, Yang N, Wang W, Wei F, Liu X, Wang Y, Gao J, Zhou M, Hon H-W (2019) Unified language model pre-training for natural language understanding and generation. arXiv preprint arXiv:1905.03197
Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L (2020) Bart: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 7871–7880
Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ et al (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(140):1–67
Harabagiu S, Lacatusu F (2005) Topic themes for multi-document summarization. In: Proceedings of SIGIR’05, pp 202–209
Baumel T, Cohen R, Elhadad M (2016) Topic concentration in query focused summarization datasets. In: Thirtieth AAAI conference on artificial intelligence
Zhang J, Zhao Y, Saleh M, Liu P (2020) Pegasus: pre-training with extracted gap-sentences for abstractive summarization. In: International conference on machine learning. PMLR, pp 11328–11339
Su D, Xu Y, Yu T, Siddique FB, Barezi EJ, Fung P (2020) Caire-covid: a question answering and query-focused multi-document summarization system for covid-19 scholarly information management. arXiv preprint arXiv:2005.03975
Du J, Gao Y (2021) Query-focused abstractive summarization via question-answering model. In: 2021 IEEE international conference on big knowledge (ICBK). IEEE, pp 440–447
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv:1503.02531
Zhang Y, Xiang T, Hospedales TM, Lu H (2018) Deep mutual learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4320–4328
Phuong M, Lampert C (2019) Towards understanding knowledge distillation. In: International conference on machine learning, pp 5142–5151
Mirzadeh SI, Farajtabar M, Li A, Levine N, Matsukawa A, Ghasemzadeh H (2020) Improved knowledge distillation via teacher assistant. Proc AAAI Conf Artif Intell 34:5191–5198
He J, Gu J, Shen J, Ranzato M (2019) Revisiting self-training for neural sequence generation. arXiv:1909.13788
Liu X, He P, Chen W, Gao J (2019) Improving multi-task deep neural networks via knowledge distillation for natural language understanding. arXiv:1904.09482
Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2014) Fitnets: hints for thin deep nets. arXiv:1412.6550
Polino A, Pascanu R, Alistarh D (2018) Model compression via distillation and quantization. arXiv:1802.05668
Clark K, Luong M-T, Khandelwal U, Manning CD, Le Q (2019) Bam! Born-again multi-task networks for natural language understanding. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 5931–5937
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Hermann KM, Kocisky T, Grefenstette E, Espeholt L, Kay W, Suleyman M, Blunsom P (2015) Teaching machines to read and comprehend. In: Advances in neural information processing systems, pp 1693–1701
Sandhaus E (2008) The New York times annotated corpus. Linguist Data Consort Phila 6(12):26752
Durrett G, Berg-Kirkpatrick T, Klein D (2016) Learning-based single-document summarization with compression and anaphoricity constraints. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Vol. 1: Long Papers), pp 1998–2008
See A, Liu PJ, Manning CD (2017) Get to the point: Summarization with pointer-generator networks. In: Proceedings of the 55th annual meeting of the association for computational linguistics, ACL 2017, Vol. 1: Long Papers, pp 1073–1083
Manning CD, Surdeanu M, Bauer J, Finkel J, Bethard SJ, McClosky D (2014) The Stanford CoreNLP natural language processing toolkit. In: Association for computational linguistics (ACL) system demonstrations, pp 55–60
Lin C-Y (2004) Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out, pp 74–81
Mihalcea R, Tarau P (2004) TextRank: bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing, pp 404–411
Zhang X, Lapata M, Wei F, Zhou M (2018) Neural latent extractive document summarization. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 79–784
Dong Y, Shen Y, Crawford E, van Hoof H, Cheung JCK (2018) Banditsum: extractive summarization as a contextual bandit. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp. 3739–3748
Bae S, Kim T, Kim J, Lee S-g (2019) Summary level training of sentence rewriting for abstractive summarization. In: Proceedings of the 2nd workshop on new frontiers in summarization, pp 10–20. Association for Computational Linguistics, Hong Kong, China
Clarke J, Lapata M (2010) Discourse constraints for document compression. Comput Linguist 36(3):411–441
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yue, Y., Li, Y., Zhan, Ja. et al. Query focused summarization via relevance distillation. Neural Comput & Applic 35, 16543–16557 (2023). https://doi.org/10.1007/s00521-023-08525-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08525-w