Incorporating word attention with convolutional neural networks for abstractive summarization

Yuan, Chengzhe; Bao, Zhifeng; Sanderson, Mark; Tang, Yong

doi:10.1007/s11280-019-00709-6

Incorporating word attention with convolutional neural networks for abstractive summarization

Published: 06 August 2019

Volume 23, pages 267–287, (2020)
Cite this article

World Wide Web Aims and scope Submit manuscript

Chengzhe Yuan¹,
Zhifeng Bao²,
Mark Sanderson² &
…
Yong Tang ORCID: orcid.org/0000-0002-9812-0742¹

561 Accesses
4 Citations
3 Altmetric
Explore all metrics

Abstract

Neural sequence-to-sequence (seq2seq) models have been widely used in abstractive summarization tasks. One of the challenges of this task is redundant contents in the input document often confuses the models and leads to poor performance. An efficient way to solve this problem is to select salient information from the input document. In this paper, we propose an approach that incorporates word attention with multilayer convolutional neural networks (CNNs) to extend a standard seq2seq model for abstractive summarization. First, by concentrating on a subset of source words during encoding an input sentence, word attention is able to extract informative keywords in the input, which gives us the ability to interpret generated summaries. Second, these keywords are further distilled by multilayer CNNs to capture the coarse-grained contextual features of the input sentence. Thus, the combined word attention and multilayer CNNs modules provide a better-learned representation of the input document, which helps the model generate interpretable, coherent and informative summaries in an abstractive summarization task. We evaluate the effectiveness of our model on the English Gigaword, DUC2004 and Chinese summarization dataset LCSTS. Experimental results show the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CBAM: Convolutional Block Attention Module

A survey on deep learning approaches for text-to-SQL

Article Open access 23 January 2023

Video summarization using deep learning techniques: a detailed analysis and investigation

Article 15 March 2023

Notes

https://catalog.ldc.upenn.edu/ldc2012t21
https://duc.nist.gov/duc2004/
http://www.weibo.com
https://www.tensorflow.org/
We use RG-1, RG-2, and RG-L denote ROUGE-1, ROUGE-2, and ROUGE-L.
https://pypi.org/project/pyrouge/0.1.3/

References

Ayana, Shen, S., Liu, Z., Sun, M.: Neural headline generation with minimum risk training. arXiv:1604.01904 (2016)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv:1409.0473 (2014)
Bing, L., Li, P., Liao, Y., Lam, W., Guo, W., Passonneau, R.J.: Abstractive multi-document summarization via phrase selection and merging. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, pp 1587–1597 (2015)
Cheng, J., Dong, L., Lapata, M.: Long short-term memory-networks for machine reading. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp 551–561 (2016)
Cho, K., van Merrienboer, B., Gülçehre, Ç., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp 1724–1734 (2014)
Chopra, S., Auli, M., Rush, A.M.: Abstractive sentence summarization with attentive recurrent neural networks. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 93–98 (2016)
Cohan, A., Dernoncourt, F., Kim, D.S., Bui, T., Kim, S., Chang, W., Goharian, N.: A discourse-aware attention model for abstractive summarization of long documents. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 615–621 (2018)
Colmenares, C.A., Litvak, M., Mantrach, A., Silvestri, F.: HEADS: headline generation as sequence prediction using an abstract feature-rich space. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 133–142 (2015)
Dasgupta, A., Kumar, R., Ravi, S.: Summarization through submodularity and dispersion. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp 1014–1022 (2013)
Erkan, G., Radev, D.R.: Lexrank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)
Article Google Scholar
Filippova, K.: Multi-sentence compression: finding shortest paths in word graphs. In: Proceedings of the Conference COLING 2010, 23rd International Conference on Computational Linguistics, pp 322–330 (2010)
Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. In: Proceedings of the 34th International Conference on Machine Learning, pp 1243–1252 (2017)
Genest, P., Lapalme, G.: Framework for abstractive summarization using text-to-text generation. In: Proceedings of the Workshop on Monolingual Text-To-Text Generation@ACL, pp 64–73 (2011)
Georgescu, M., Pham, D.D., Kanhabua, N., Zerr, S., Siersdorfer, S., Nejdl, W.: Temporal summarization of event-related updates in wikipedia. In: Proceedings of the 22nd International World Wide Web Conference, pp 281–284 (2013)
Gu, J., Lu, Z., Li, H., Li, V.O.K.: Incorporating copying mechanism in sequence-to-sequence learning. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp 1631–1640 (2016)
Hu, B., Chen, Q., Zhu, F.: LCSTS: a large scale chinese short text summarization dataset. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp 1967–1972 (2015)
Huang, Y., Shen, C., Li, T.: Event summarization for sports games using twitter streams. World Wide Web 21(3), 609–627 (2018)
Article Google Scholar
Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp 2267–2273 (2015)
Li, C., Liu, F., Weng, F., Liu, Y.: Document summarization via guided sentence compression. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp 490–500 (2013)
Li, P., Lam, W., Bing, L., Wang, Z.: Deep recurrent generative decoder for abstractive text summarization. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp 2091–2100 (2017)
Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pp 74–81 (2004)
Lin, Z., Feng, M., dos Santos, C.N., Yu, M., Xiang, B., Zhou, B., Bengio, Y.: A structured self-attentive sentence embedding. arXiv:1703.03130 (2017)
Lin, J., Sun, X., Ma, S., Su, Q.: Global encoding for abstractive summarization. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15–20, 2018, Volume 2: Short Papers, pp 163–169 (2018)
Liu, F., Liu, Y.: From extractive to abstractive meeting summaries: Can it be done by sentence compression?. In: Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp 261–264 (2009)
Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp 1412–1421 (2015)
Ma, S., Sun, X., Xu, J., Wang, H., Li, W., Su, Q.: Improving semantic relevance for sequence-to-sequence learning of chinese social media text summarization. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp 635–640 (2017)
Mei, H., Bansal, M., Walter, M.R.: Coherent dialogue with attention-based language models. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp 3252–3258 (2017)
Nallapati, R., Zhou, B., dos Santos, C.N., Gülçehre, Ç., Xiang, B.: Abstractive text summarization using sequence-to-sequence rnns and beyond. In: Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, pp 280–290 (2016)
Nema, P., Khapra, M.M., Laha, A., Ravindran, B.: Diversity driven attention model for query-based abstractive summarization. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp 1063–1072 (2017)
Rush, A.M., Chopra, S., Weston, J.: A neural attention model for abstractive sentence summarization. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp 379–389 (2015)
Severyn, A., Moschitti, A.: Twitter sentiment analysis with deep convolutional neural networks. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 959–962 (2015)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp 2818–2826 (2016)
Tan, J., Wan, X., Xiao, J.: Abstractive document summarization with a graph-based attentional neural model. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp 1171–1181 (2017)
Tan, J., Wan, X., Xiao, J.: From neural sentence summarization to headline generation: A coarse-to-fine approach. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pp 4109–4115 (2017)
Tran, T.A., Niederée, C., Kanhabua, N., Gadiraju, U., Anand, A.: Balancing novelty and salience: Adaptive learning to rank entities for timeline summarization of high-impact events. In: Proceedings of the 24th ACM International Conference on Information and Knowledge Management, pp 1201–1210 (2015)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp 6000–6010 (2017)
Wan, X., Yang, J., Xiao, J.: Manifold-ranking based topic-focused multi-document summarization. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp 2903–2908 (2007)
Wang, C., He, X., Zhou, A.: Event phase oriented news summarization. World Wide Web 21(4), 1069–1092 (2018)
Article Google Scholar
Wang, L., Yao, J., Tao, Y., Zhong, L., Liu, W., Du, Q.: A reinforced topic-aware convolutional sequence-to-sequence model for abstractive text summarization. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13–19, 2018, Stockholm, Sweden, pp 4453–4460 (2018)
Wang, S., Huang, M., Deng, Z.: Densely connected CNN with multi-scale feature attention for text classification. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, pp 4468–4474 (2018)
Wong, K., Wu, M., Li, W.: Extractive summarization using supervised and semi-supervised learning. In: Proceedings of the 22nd International Conference on Computational Linguistics, pp 985–992 (2008)
Yoon, J., Kim, H.: Multi-channel lexicon integrated cnn-bilstm models for sentiment analysis. In: Proceedings of the 29th Conference on Computational Linguistics and Speech Processing, pp 244–253 (2017)
Yuan, C., Li, D., Zhu, J., Tang, Y., Wasti, S., He, C., Liu, H., Lin, R.: Citation based collaborative summarization of scientific publications by a new sentence similarity measure. In: Collaborative Computing: Networking, Applications and Worksharing - 13th International Conference, CollaborateCom 2017, Edinburgh, UK, December 11–13, 2017, Proceedings, pp 680–689 (2017)
Zajic, D., Dorr, B., Schwartz, R.: Automatic headline generation for newspaper stories. In: Workshop on Text Summarization, pp 78–85 (2002)
Zhou, Q., Yang, N., Wei, F., Zhou, M.: Selective encoding for abstractive sentence summarization. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp 1095–1104 (2017)
Zhou, Q., Yang, N., Wei, F., Zhou, M.: Sequential copying networks. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2–7, 2018, pp 4987–4995 (2018)

Download references

Acknowledgements

This work was supported by the China Scholarship Council (CSC) for award No. 201706750031, NSFC (No.61772211, 61728204, 91646204), ARC (DP170102726, DP180102050), Special Project on the Integration of Industry & Academia and Synergy of Research of Guangzhou, China (No. 201704020203), Special Fund for Applied Program of Science and Technology of Guangdong Province, China (No. 2016B010124008), the Innovation Project of Graduate School of South China Normal University. Zhifeng Bao is a recipient of Google Faculty Award.

Author information

Authors and Affiliations

School of Computer Science, South China Normal University, Guangzhou, China
Chengzhe Yuan & Yong Tang
School of Science, Computer Science and Information Technology, RMIT University, Melbourne, Australia
Zhifeng Bao & Mark Sanderson

Authors

Chengzhe Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Zhifeng Bao
View author publications
You can also search for this author in PubMed Google Scholar
Mark Sanderson
View author publications
You can also search for this author in PubMed Google Scholar
Yong Tang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Tang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuan, C., Bao, Z., Sanderson, M. et al. Incorporating word attention with convolutional neural networks for abstractive summarization. World Wide Web 23, 267–287 (2020). https://doi.org/10.1007/s11280-019-00709-6

Download citation

Received: 25 September 2018
Revised: 09 May 2019
Accepted: 28 May 2019
Published: 06 August 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s11280-019-00709-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Incorporating word attention with convolutional neural networks for abstractive summarization

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

A survey on deep learning approaches for text-to-SQL

Video summarization using deep learning techniques: a detailed analysis and investigation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Incorporating word attention with convolutional neural networks for abstractive summarization

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

A survey on deep learning approaches for text-to-SQL

Video summarization using deep learning techniques: a detailed analysis and investigation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation