Skip to main content
Log in

Query focused summarization via relevance distillation

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Creating a short version of a concise and relevant summary regarding a specific query can broadly meet a user’s information needs in many areas. In a summarization system, the extractive technique is attractive because it is simple and fast and produces reliable outputs. Salience and relevance are two key points for the extractive summarization. The majority of existing approaches to achieving them are augmenting input features, incorporating additional attention, or expanding the training scales. Yet, there is much unsupervised but query-related knowledge needs better exploration. To this end, in this paper, we frame the query-focused document summarization as a combination of salience prediction and relevance prediction. Concretely, in addition to the oracle summary set for the salience task, we further create a pseudo-summary set regarding user-specific queries (i.e., title or image captions as the query) for the relevance task. Then, based on a modified BERT fine-tune summarization, we propose two methods, called guidance and distillation, respectively. Specifically, the guidance training essentially shares salient information to reinforce the useful contextual representations in a two-stage training with the salience-and-relevance objective. For the distillation, we propose a new “guide-student” learning paradigm that the relevance knowledge of the query is distilled and transferred from a guide model to a salience-oriented student model. Experiment results demonstrate that guidance training prevails at improving the salience of the summary and distillation training is significantly advanced at relevance learning. Both of them achieve the best state of the arts in unsupervised query-focused settings of CNN and DailyMail dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

The datasets generated during the current study are available from the corresponding author on reasonable request.

Notes

  1. https://catalog.ldc.upenn.edu/LDC2008T19.

  2. https://github.com/shashiongithub/Refresh.

  3. https://github.com/google-research/bert.

  4. The ROUGE evaluation option is, -m -n 2.

References

  1. Pugoy RA, Kao H-Y (2021) Unsupervised extractive summarization-based representations for accurate and explainable collaborative filtering. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 1: Long Papers), pp 2981–2990

  2. Wan X, Xiao J (2009) Graph-based multi-modality learning for topic-focused multi-document summarization. In: IJCAI, pp 1586–1591

  3. Yih WT, Goodman J, Vanderwende L, Suzuki H (2007) Multi-document summarization by maximizing informative content-words. In: Proceedings of IJCAI’07, pp 1776–1782

  4. Ouyang Y, Li W, Li S, Lu Q (2011) Applying regression models to query-focused multi-document summarization. Inf Process Manag 47(2):227–237

    Article  Google Scholar 

  5. Lin CY, Hovy E (2000) The automated acquisition of topic signatures for text summarization. In: Proceedings of COLING’00, pp 495–501

  6. Zhou Q, Yang N, Wei F, Huang S, Zhou M, Zhao T (2018) Neural document summarization by jointly learning to score and select sentences. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Vol. 1: Long Papers), pp 654–663

  7. Liu Y, Lapata M (2019) Text summarization with pretrained encoders. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 3721–3731

  8. Nallapati R, Zhai F, Zhou B (2017) Summarunner: a recurrent neural network based sequence model for extractive summarization of documents. In: Thirty-first AAAI conference on artificial intelligence

  9. Zhang X, Wei F, Zhou M (2019) HIBERT: document level pre-training of hierarchical bidirectional transformers for document summarization. In: Proceedings of the 57th conference of the association for computational linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Vol. 1: Long Papers, pp 5059–5069

  10. Narayan S, Cohen SB, Lapata M (2018) Ranking sentences for extractive summarization with reinforcement learning. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long Papers), pp 1747–1759

  11. Cao Z, Li W, Li S, Wei F, Li Y (2016) Attsum: joint learning of focusing and summarization with neural attention. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers, pp 547–556

  12. Narayan S, Cardenas R, Papasarantopoulos N, Cohen SB, Lapata M, Yu J, Chang Y (2018) Document modeling with external attention for sentence extraction. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 2020–2030

  13. Ren P, Chen Z, Ren Z, Wei F, Ma J, de Rijke M (2017) Leveraging contextual sentence relations for extractive summarization using a neural attention model. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, pp 95–104. ACM

  14. Zhu H, Dong L, Wei F, Qin B, Liu T (2019) Transforming wikipedia into augmented data for query-focused summarization. arXiv:1911.03324

  15. Devlin J, Chang M-W, Lee K, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 4171–4186

  16. Furlanello T, Lipton Z, Tschannen M, Itti L, Anandkumar A (2018) Born-again neural networks. In: International conference on machine learning, pp 1602–1611

  17. Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. arXiv:1802.05365

  18. Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training

  19. Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized bert pretraining approach. arXiv:1907.11692

  20. Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2019) Albert: a lite bert for self-supervised learning of language representations. arXiv:1909.11942

  21. Dong L, Yang N, Wang W, Wei F, Liu X, Wang Y, Gao J, Zhou M, Hon H-W (2019) Unified language model pre-training for natural language understanding and generation. arXiv preprint arXiv:1905.03197

  22. Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L (2020) Bart: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 7871–7880

  23. Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ et al (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(140):1–67

    MathSciNet  MATH  Google Scholar 

  24. Harabagiu S, Lacatusu F (2005) Topic themes for multi-document summarization. In: Proceedings of SIGIR’05, pp 202–209

  25. Baumel T, Cohen R, Elhadad M (2016) Topic concentration in query focused summarization datasets. In: Thirtieth AAAI conference on artificial intelligence

  26. Zhang J, Zhao Y, Saleh M, Liu P (2020) Pegasus: pre-training with extracted gap-sentences for abstractive summarization. In: International conference on machine learning. PMLR, pp 11328–11339

  27. Su D, Xu Y, Yu T, Siddique FB, Barezi EJ, Fung P (2020) Caire-covid: a question answering and query-focused multi-document summarization system for covid-19 scholarly information management. arXiv preprint arXiv:2005.03975

  28. Du J, Gao Y (2021) Query-focused abstractive summarization via question-answering model. In: 2021 IEEE international conference on big knowledge (ICBK). IEEE, pp 440–447

  29. Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv:1503.02531

  30. Zhang Y, Xiang T, Hospedales TM, Lu H (2018) Deep mutual learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4320–4328

  31. Phuong M, Lampert C (2019) Towards understanding knowledge distillation. In: International conference on machine learning, pp 5142–5151

  32. Mirzadeh SI, Farajtabar M, Li A, Levine N, Matsukawa A, Ghasemzadeh H (2020) Improved knowledge distillation via teacher assistant. Proc AAAI Conf Artif Intell 34:5191–5198

    Google Scholar 

  33. He J, Gu J, Shen J, Ranzato M (2019) Revisiting self-training for neural sequence generation. arXiv:1909.13788

  34. Liu X, He P, Chen W, Gao J (2019) Improving multi-task deep neural networks via knowledge distillation for natural language understanding. arXiv:1904.09482

  35. Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2014) Fitnets: hints for thin deep nets. arXiv:1412.6550

  36. Polino A, Pascanu R, Alistarh D (2018) Model compression via distillation and quantization. arXiv:1802.05668

  37. Clark K, Luong M-T, Khandelwal U, Manning CD, Le Q (2019) Bam! Born-again multi-task networks for natural language understanding. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 5931–5937

  38. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008

  39. Hermann KM, Kocisky T, Grefenstette E, Espeholt L, Kay W, Suleyman M, Blunsom P (2015) Teaching machines to read and comprehend. In: Advances in neural information processing systems, pp 1693–1701

  40. Sandhaus E (2008) The New York times annotated corpus. Linguist Data Consort Phila 6(12):26752

    Google Scholar 

  41. Durrett G, Berg-Kirkpatrick T, Klein D (2016) Learning-based single-document summarization with compression and anaphoricity constraints. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Vol. 1: Long Papers), pp 1998–2008

  42. See A, Liu PJ, Manning CD (2017) Get to the point: Summarization with pointer-generator networks. In: Proceedings of the 55th annual meeting of the association for computational linguistics, ACL 2017, Vol. 1: Long Papers, pp 1073–1083

  43. Manning CD, Surdeanu M, Bauer J, Finkel J, Bethard SJ, McClosky D (2014) The Stanford CoreNLP natural language processing toolkit. In: Association for computational linguistics (ACL) system demonstrations, pp 55–60

  44. Lin C-Y (2004) Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out, pp 74–81

  45. Mihalcea R, Tarau P (2004) TextRank: bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing, pp 404–411

  46. Zhang X, Lapata M, Wei F, Zhou M (2018) Neural latent extractive document summarization. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 79–784

  47. Dong Y, Shen Y, Crawford E, van Hoof H, Cheung JCK (2018) Banditsum: extractive summarization as a contextual bandit. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp. 3739–3748

  48. Bae S, Kim T, Kim J, Lee S-g (2019) Summary level training of sentence rewriting for abstractive summarization. In: Proceedings of the 2nd workshop on new frontiers in summarization, pp 10–20. Association for Computational Linguistics, Hong Kong, China

  49. Clarke J, Lapata M (2010) Discourse constraints for document compression. Comput Linguist 36(3):411–441

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang Gao.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yue, Y., Li, Y., Zhan, Ja. et al. Query focused summarization via relevance distillation. Neural Comput & Applic 35, 16543–16557 (2023). https://doi.org/10.1007/s00521-023-08525-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08525-w

Keywords

Navigation