Skip to main content
Log in

Abstractive document summarization via multi-template decoding

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Most previous abstractive summarization models generate the summary in a left-to-right manner without making the most use of target-side global information. Recently, many researchers seek to alleviate this issue by retrieving target-side templates from large-scale training corpus, yet have limitations in template quality. To overcome the problem of template selection bias, one promising direction is to get better target-side global information from multiple high-quality templates. Hence, this paper extends the encoder-decoder framework by introducing a multi-template decoding mechanism, which can utilize multiple templates retrieved from the training corpus based on the semantic distance. In addition, we introduce a multi-granular attention mechanism by simultaneously taking into account the importance of words in templates and the importance of different templates. Extensive experiment results on CNN/Daily mail and English Gigaword show that our proposed model significantly outperforms several state-of-the-art abstractive and extractive baseline models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. https://lucene.apache.org/

  2. https://github.com/facebookresearch/faiss

  3. https://github.com/bisee/cnn-dailymail

  4. https://github.com/facebook/NAMAS

  5. https://opennmt.net/OpenNMT-py/Summarization.html

  6. The BERT-BASE model can be downloaded at https://github.com/google-research/bert#pre-trained-models

  7. https://github.com/bheinzerling/pyrouge

References

  1. Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv:1607.06450

  2. Cao Z, Li W, Li S, Wei F (2018) Retrieve, rerank and rewrite: Soft template based neural summarization. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia. https://doi.org/10.18653/v1/P18-1015. https://www.aclweb.org/anthology/P18-1015, pp 152–161

  3. Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: Human language technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota. https://doi.org/10.18653/v1/N19-1423. https://www.aclweb.org/anthology/N19-1423, pp 4171–4186

  4. Elbayad M, Gu J, Grave E, Auli M (2019) Depth-adaptive transformer. In: ICLR 2020-Eighth international conference on learning representations

  5. Fan A, Grave E, Joulin A (2019) Reducing transformer depth on demand with structured dropout. In: International conference on learning representations

  6. Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. In: Proceedings of the 34th international conference on machine learning-Volume 70, pp 1243–1252. JMLR. org

  7. Gehrmann S, Deng Y, Rush A (2018) Bottom-up abstractive summarization. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 4098–4109

  8. Gu J, Wang Y, Cho K, Li VO (2018) Search engine guided neural machine translation. In: Thirty-second AAAI conference on artificial intelligence

  9. Iyyer M, Manjunatha V, Boyd-Graber J, Daumé H III (2015) Deep unordered composition rivals syntactic methods for text classification. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Volume 1: Long Papers). Association for Computational Linguistics, Beijing, China. https://doi.org/10.3115/v1/P15-1162. https://www.aclweb.org/anthology/P15-1162, pp 1681–1691

  10. Klein G, Kim Y, Deng Y, Nguyen V, Senellart J, Rush A (2018) OpenNMT: Neural machine translation toolkit. In: Proceedings of the 13th conference of the association for machine translation in the americas (Volume 1: Research Papers). Association for Machine Translation in the Americas, Boston, MA. https://www.aclweb.org/anthology/W18-1817, pp 177–184

  11. Lin CY (2004) ROUGE: A package for automatic evaluation of summaries. In: Text summarization branches out. Association for Computational Linguistics, Barcelona, Spain. https://www.aclweb.org/anthology/W04-1013, pp 74–81

  12. Liu Y, Lapata M (2019) Hierarchical transformers for multi-document summarization. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 5070–5081

  13. Lobov SA, Mikhaylov AN, Shamshin M, Makarov VA, Kazantsev VB (2020) Spatial properties of stdp in a self-learning spiking neural network enable controlling a mobile robot. Front Neurosci 0:88

    Article  Google Scholar 

  14. Luo L, Ao X, Song Y, Pan F, Yang M, He Q (2019) Reading like HER: Human reading inspired extractive summarization. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China. https://doi.org/10.18653/v1/D19-1300. https://www.aclweb.org/anthology/D19-1300, pp 3033–3043

  15. Miller A, Fisch A, Dodge J, Karimi AH, Bordes A, Weston J (2016) Key-value memory networks for directly reading documents. In: Proceedings of the 2016 conference on empirical methods in natural language processing. Association for Computational Linguistics, Austin, Texas. https://doi.org/10.18653/v1/D16-1147. https://www.aclweb.org/anthology/D16-1147, pp 1400–1409

  16. Nallapati R, Zhai F, Zhou B (2017) Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. In: Thirty-first AAAI conference on artificial intelligence

  17. Nallapati R, Zhou B, dos Santos C, Guçehre Ç, Xiang B (2016) Abstractive text summarization using sequence-to-sequence RNNs and beyond. In: Proceedings of The 20th SIGNLL conference on computational natural language learning. Association for Computational Linguistics, Berlin, Germany. https://doi.org/10.18653/v1/K16-1028. https://www.aclweb.org/anthology/K16-1028, pp 280–290

  18. Napoles C, Gormley M, Van Durme B (2012) Annotated gigaword. In: Proceedings of the joint workshop on automatic knowledge base construction and web-scale knowledge extraction, pp 95–100. Association for computational linguistics

  19. Niu J, Sun M, Rodrigues JJ, Liu X (2019) A novel attention mechanism considering decoder input for abstractive text summarization. In: ICC 2019-2019 IEEE International conference on communications (ICC). IEEE, pp 1–7

  20. Pandey G, Contractor D, Kumar V, Joshi S (2018) Exemplar encoder-decoder for neural conversation generation. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia. https://doi.org/10.18653/v1/P18-1123. https://www.aclweb.org/anthology/P18-1123, pp 1329–1338

  21. Rush AM, Chopra S, Weston J (2015) A neural attention model for abstractive sentence summarization. In: Proceedings of the 2015 conference on empirical methods in natural language processing. Association for Computational Linguistics, Lisbon, Portugal. https://doi.org/10.18653/v1/D15-1044. https://www.aclweb.org/anthology/D15-1044, pp 379–389

  22. See A, Liu PJ, Manning CD (2017) Get to the point: Summarization with pointer-generator networks. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada. https://doi.org/10.18653/v1/P17-1099. https://www.aclweb.org/anthology/P17-1099, pp 1073–1083

  23. Song K, Tan X, Qin T, Lu J, Liu TY (2019) Mass: Masked sequence to sequence pre-training for language generation. arXiv:1905.02450

  24. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems 27. Curran Associates, Inc. http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf, pp 3104–3112

  25. Tu Z, Liu Y, Shi S, Zhang T (2018) Learning to remember translation history with a continuous cache. Trans Assoc Computat Linguist 0:407–420. https://www.aclweb.org/anthology/Q18-1029

    Article  Google Scholar 

  26. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Lu K, Polosukhin I (2017) Attention is all you need. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems 30. Curran Associates, Inc. http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf, pp 5998–6008

  27. Wang K, Quan X, Wang R (2019) BiSET: Bi-directional selective encoding with template for abstractive summarization. In: Proceedings of the 57th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Florence, Italy https://doi.org/10.18653/v1/P19-1207, https://www.aclweb.org/anthology/P19-1207, pp 2153–2162

  28. Wang Y, Xia Y, Tian F, Gao F, Qin T, Zhai CX, Liu TY (2019) Neural machine translation with soft prototype. In: Advances in neural information processing systems, pp 6313–6322

  29. Xia M, Huang G, Liu L, Shi S (2019) Graph based translation memory for neural machine translation. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 7297–7304

  30. Xia Y, Tian F, Wu L, Lin J, Qin T, Yu N, Liu TY (2017) Deliberation networks: Sequence generation beyond one-pass decoding. In: Advances in neural information processing systems, pp 1784–1794

  31. Xu K, Lai Y, Feng Y, Wang Z (2019) Enhancing key-value memory neural networks for knowledge based question answering. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: Human language technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota https://doi.org/10.18653/v1/N19-1301, https://www.aclweb.org/anthology/N19-1301, pp 2937–2947

  32. Xu W, Li C, Lee M, Zhang C (2020) Multi-task learning for abstractive text summarization with key information guide network. EURASIP J Adv Signal Process 0:1–11

    Google Scholar 

  33. Yang S, Deng B, Wang J, Li H, Lu M, Che Y, Wei X, Loparo KA (2019) Scalable digital neuromorphic architecture for large-scale biophysically meaningful neural network with multi-compartment neurons. IEEE Trans Neural Netw Learn Syst 0(1):148–162

    Article  Google Scholar 

  34. Yang S, Gao T, Wang J, Deng B, Lansdell B, Linares-Barranco B (2021) Efficient spike-driven learning with dendritic event-based processing. Front Neurosci 0:97

    Google Scholar 

  35. Yang S, Wang J, Deng B, Liu C, Li H, Fietkiewicz C, Loparo KA (2018) Real-time neuromorphic system for large-scale conductance-based spiking neural networks. IEEE Trans Cybern 0(7):2490–2503

    Article  Google Scholar 

  36. Yang S, Wang J, Hao X, Li H, Wei X, Deng B, Loparo KA (2021) Bicoss: toward large-scale cognition brain with multigranular neuromorphic architecture. IEEE Transactions on Neural Networks and Learning Systems

  37. Yang S, Wang J, Zhang N, Deng B, Pang Y, Azghadi MR (2021) Cerebellumorphic: Large-scale neuromorphic model and architecture for supervised motor learning. IEEE Transactions on Neural Networks and Learning Systems

  38. Yao K, Zhang L, Du D, Luo T, Tao L, Wu Y (2020) Dual encoding for abstractive text summarization. IEEE Trans Cybern 0(3):985–996

    Article  Google Scholar 

  39. Zhang J, Utiyama M, Sumita E, Neubig G, Nakamura S (2018) Guiding neural machine translation with retrieved translation pieces. In: 1325–1335. Association for Computational Linguistics, New Orleans, Louisiana https://doi.org/10.18653/v1/N18-1120, https://www.aclweb.org/anthology/N18-1120,

  40. Zhang X, Su J, Qin Y, Liu Y, Ji R, Wang H (2018) Asynchronous bidirectional decoding for neural machine translation. In: Thirty-second AAAI conference on artificial intelligence

  41. Zhang X, Wei F, Zhou M (2019) Hibert: Document level pre-training of hierarchical bidirectional transformers for document summarization. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 5059–5069

  42. Zhou L, Hovy E (2004) Template-filtered headline summarization. In: Text summarization branches out. Association for Computational Linguistics, Barcelona, Spain https://www.aclweb.org/anthology/W04-1010, pp 56–60

  43. Zhou L, Zhang J, Zong C (2019) Synchronous bidirectional neural machine translation. In: Proceedings of the 2019 association for computational linguistics. Association for Computational Linguistics, Minneapolis, Minnesotahttps://www.aclweb.org/anthology/Q19-1006, pp 91–105

  44. Zhou Q, Yang N, Wei F, Huang S, Zhou M, Zhao T (2018) Neural document summarization by jointly learning to score and select sentences. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers). Association for computational linguistics, Melbourne, Australia https://doi.org/10.18653/v1/P18-1061, https://www.aclweb.org/anthology/P18-1061, pp 654–663

  45. Zhou Q, Yang N, Wei F, Zhou M (2017) Selective encoding for abstractive sentence summarization. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada https://doi.org/10.18653/v1/P17-1101, https://www.aclweb.org/anthology/P17-1101, pp 1095–1104

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their constructive comments. This work was supported by the National Key Research and Development Program of China (Grant Nos. 2018YFC0830105, 2018YFC0830100); the National Natural Science Foundation of China (Grant Nos. 61972186, 61762056, 61472168); the Yunnan Provincial Major Science and Technology Special Plan Projects (Grant Nos. 202002AD080001); General Projects of Basic Research in Yunnan Province (Grant Nos. 202001AT070047, 202001AT070046).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhengtao Yu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, Y., Yu, Z., Guo, J. et al. Abstractive document summarization via multi-template decoding. Appl Intell 52, 9650–9663 (2022). https://doi.org/10.1007/s10489-021-02607-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02607-9

Keywords

Navigation