Abstract
Most previous abstractive summarization models generate the summary in a left-to-right manner without making the most use of target-side global information. Recently, many researchers seek to alleviate this issue by retrieving target-side templates from large-scale training corpus, yet have limitations in template quality. To overcome the problem of template selection bias, one promising direction is to get better target-side global information from multiple high-quality templates. Hence, this paper extends the encoder-decoder framework by introducing a multi-template decoding mechanism, which can utilize multiple templates retrieved from the training corpus based on the semantic distance. In addition, we introduce a multi-granular attention mechanism by simultaneously taking into account the importance of words in templates and the importance of different templates. Extensive experiment results on CNN/Daily mail and English Gigaword show that our proposed model significantly outperforms several state-of-the-art abstractive and extractive baseline models.
Similar content being viewed by others
Notes
The BERT-BASE model can be downloaded at https://github.com/google-research/bert#pre-trained-models
References
Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv:1607.06450
Cao Z, Li W, Li S, Wei F (2018) Retrieve, rerank and rewrite: Soft template based neural summarization. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia. https://doi.org/10.18653/v1/P18-1015. https://www.aclweb.org/anthology/P18-1015, pp 152–161
Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: Human language technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota. https://doi.org/10.18653/v1/N19-1423. https://www.aclweb.org/anthology/N19-1423, pp 4171–4186
Elbayad M, Gu J, Grave E, Auli M (2019) Depth-adaptive transformer. In: ICLR 2020-Eighth international conference on learning representations
Fan A, Grave E, Joulin A (2019) Reducing transformer depth on demand with structured dropout. In: International conference on learning representations
Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. In: Proceedings of the 34th international conference on machine learning-Volume 70, pp 1243–1252. JMLR. org
Gehrmann S, Deng Y, Rush A (2018) Bottom-up abstractive summarization. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 4098–4109
Gu J, Wang Y, Cho K, Li VO (2018) Search engine guided neural machine translation. In: Thirty-second AAAI conference on artificial intelligence
Iyyer M, Manjunatha V, Boyd-Graber J, Daumé H III (2015) Deep unordered composition rivals syntactic methods for text classification. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Volume 1: Long Papers). Association for Computational Linguistics, Beijing, China. https://doi.org/10.3115/v1/P15-1162. https://www.aclweb.org/anthology/P15-1162, pp 1681–1691
Klein G, Kim Y, Deng Y, Nguyen V, Senellart J, Rush A (2018) OpenNMT: Neural machine translation toolkit. In: Proceedings of the 13th conference of the association for machine translation in the americas (Volume 1: Research Papers). Association for Machine Translation in the Americas, Boston, MA. https://www.aclweb.org/anthology/W18-1817, pp 177–184
Lin CY (2004) ROUGE: A package for automatic evaluation of summaries. In: Text summarization branches out. Association for Computational Linguistics, Barcelona, Spain. https://www.aclweb.org/anthology/W04-1013, pp 74–81
Liu Y, Lapata M (2019) Hierarchical transformers for multi-document summarization. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 5070–5081
Lobov SA, Mikhaylov AN, Shamshin M, Makarov VA, Kazantsev VB (2020) Spatial properties of stdp in a self-learning spiking neural network enable controlling a mobile robot. Front Neurosci 0:88
Luo L, Ao X, Song Y, Pan F, Yang M, He Q (2019) Reading like HER: Human reading inspired extractive summarization. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China. https://doi.org/10.18653/v1/D19-1300. https://www.aclweb.org/anthology/D19-1300, pp 3033–3043
Miller A, Fisch A, Dodge J, Karimi AH, Bordes A, Weston J (2016) Key-value memory networks for directly reading documents. In: Proceedings of the 2016 conference on empirical methods in natural language processing. Association for Computational Linguistics, Austin, Texas. https://doi.org/10.18653/v1/D16-1147. https://www.aclweb.org/anthology/D16-1147, pp 1400–1409
Nallapati R, Zhai F, Zhou B (2017) Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. In: Thirty-first AAAI conference on artificial intelligence
Nallapati R, Zhou B, dos Santos C, Guçehre Ç, Xiang B (2016) Abstractive text summarization using sequence-to-sequence RNNs and beyond. In: Proceedings of The 20th SIGNLL conference on computational natural language learning. Association for Computational Linguistics, Berlin, Germany. https://doi.org/10.18653/v1/K16-1028. https://www.aclweb.org/anthology/K16-1028, pp 280–290
Napoles C, Gormley M, Van Durme B (2012) Annotated gigaword. In: Proceedings of the joint workshop on automatic knowledge base construction and web-scale knowledge extraction, pp 95–100. Association for computational linguistics
Niu J, Sun M, Rodrigues JJ, Liu X (2019) A novel attention mechanism considering decoder input for abstractive text summarization. In: ICC 2019-2019 IEEE International conference on communications (ICC). IEEE, pp 1–7
Pandey G, Contractor D, Kumar V, Joshi S (2018) Exemplar encoder-decoder for neural conversation generation. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia. https://doi.org/10.18653/v1/P18-1123. https://www.aclweb.org/anthology/P18-1123, pp 1329–1338
Rush AM, Chopra S, Weston J (2015) A neural attention model for abstractive sentence summarization. In: Proceedings of the 2015 conference on empirical methods in natural language processing. Association for Computational Linguistics, Lisbon, Portugal. https://doi.org/10.18653/v1/D15-1044. https://www.aclweb.org/anthology/D15-1044, pp 379–389
See A, Liu PJ, Manning CD (2017) Get to the point: Summarization with pointer-generator networks. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada. https://doi.org/10.18653/v1/P17-1099. https://www.aclweb.org/anthology/P17-1099, pp 1073–1083
Song K, Tan X, Qin T, Lu J, Liu TY (2019) Mass: Masked sequence to sequence pre-training for language generation. arXiv:1905.02450
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems 27. Curran Associates, Inc. http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf, pp 3104–3112
Tu Z, Liu Y, Shi S, Zhang T (2018) Learning to remember translation history with a continuous cache. Trans Assoc Computat Linguist 0:407–420. https://www.aclweb.org/anthology/Q18-1029
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Lu K, Polosukhin I (2017) Attention is all you need. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems 30. Curran Associates, Inc. http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf, pp 5998–6008
Wang K, Quan X, Wang R (2019) BiSET: Bi-directional selective encoding with template for abstractive summarization. In: Proceedings of the 57th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Florence, Italy https://doi.org/10.18653/v1/P19-1207, https://www.aclweb.org/anthology/P19-1207, pp 2153–2162
Wang Y, Xia Y, Tian F, Gao F, Qin T, Zhai CX, Liu TY (2019) Neural machine translation with soft prototype. In: Advances in neural information processing systems, pp 6313–6322
Xia M, Huang G, Liu L, Shi S (2019) Graph based translation memory for neural machine translation. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 7297–7304
Xia Y, Tian F, Wu L, Lin J, Qin T, Yu N, Liu TY (2017) Deliberation networks: Sequence generation beyond one-pass decoding. In: Advances in neural information processing systems, pp 1784–1794
Xu K, Lai Y, Feng Y, Wang Z (2019) Enhancing key-value memory neural networks for knowledge based question answering. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: Human language technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota https://doi.org/10.18653/v1/N19-1301, https://www.aclweb.org/anthology/N19-1301, pp 2937–2947
Xu W, Li C, Lee M, Zhang C (2020) Multi-task learning for abstractive text summarization with key information guide network. EURASIP J Adv Signal Process 0:1–11
Yang S, Deng B, Wang J, Li H, Lu M, Che Y, Wei X, Loparo KA (2019) Scalable digital neuromorphic architecture for large-scale biophysically meaningful neural network with multi-compartment neurons. IEEE Trans Neural Netw Learn Syst 0(1):148–162
Yang S, Gao T, Wang J, Deng B, Lansdell B, Linares-Barranco B (2021) Efficient spike-driven learning with dendritic event-based processing. Front Neurosci 0:97
Yang S, Wang J, Deng B, Liu C, Li H, Fietkiewicz C, Loparo KA (2018) Real-time neuromorphic system for large-scale conductance-based spiking neural networks. IEEE Trans Cybern 0(7):2490–2503
Yang S, Wang J, Hao X, Li H, Wei X, Deng B, Loparo KA (2021) Bicoss: toward large-scale cognition brain with multigranular neuromorphic architecture. IEEE Transactions on Neural Networks and Learning Systems
Yang S, Wang J, Zhang N, Deng B, Pang Y, Azghadi MR (2021) Cerebellumorphic: Large-scale neuromorphic model and architecture for supervised motor learning. IEEE Transactions on Neural Networks and Learning Systems
Yao K, Zhang L, Du D, Luo T, Tao L, Wu Y (2020) Dual encoding for abstractive text summarization. IEEE Trans Cybern 0(3):985–996
Zhang J, Utiyama M, Sumita E, Neubig G, Nakamura S (2018) Guiding neural machine translation with retrieved translation pieces. In: 1325–1335. Association for Computational Linguistics, New Orleans, Louisiana https://doi.org/10.18653/v1/N18-1120, https://www.aclweb.org/anthology/N18-1120,
Zhang X, Su J, Qin Y, Liu Y, Ji R, Wang H (2018) Asynchronous bidirectional decoding for neural machine translation. In: Thirty-second AAAI conference on artificial intelligence
Zhang X, Wei F, Zhou M (2019) Hibert: Document level pre-training of hierarchical bidirectional transformers for document summarization. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 5059–5069
Zhou L, Hovy E (2004) Template-filtered headline summarization. In: Text summarization branches out. Association for Computational Linguistics, Barcelona, Spain https://www.aclweb.org/anthology/W04-1010, pp 56–60
Zhou L, Zhang J, Zong C (2019) Synchronous bidirectional neural machine translation. In: Proceedings of the 2019 association for computational linguistics. Association for Computational Linguistics, Minneapolis, Minnesotahttps://www.aclweb.org/anthology/Q19-1006, pp 91–105
Zhou Q, Yang N, Wei F, Huang S, Zhou M, Zhao T (2018) Neural document summarization by jointly learning to score and select sentences. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers). Association for computational linguistics, Melbourne, Australia https://doi.org/10.18653/v1/P18-1061, https://www.aclweb.org/anthology/P18-1061, pp 654–663
Zhou Q, Yang N, Wei F, Zhou M (2017) Selective encoding for abstractive sentence summarization. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada https://doi.org/10.18653/v1/P17-1101, https://www.aclweb.org/anthology/P17-1101, pp 1095–1104
Acknowledgements
We would like to thank the anonymous reviewers for their constructive comments. This work was supported by the National Key Research and Development Program of China (Grant Nos. 2018YFC0830105, 2018YFC0830100); the National Natural Science Foundation of China (Grant Nos. 61972186, 61762056, 61472168); the Yunnan Provincial Major Science and Technology Special Plan Projects (Grant Nos. 202002AD080001); General Projects of Basic Research in Yunnan Province (Grant Nos. 202001AT070047, 202001AT070046).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Huang, Y., Yu, Z., Guo, J. et al. Abstractive document summarization via multi-template decoding. Appl Intell 52, 9650–9663 (2022). https://doi.org/10.1007/s10489-021-02607-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02607-9