Abstractive Summarization Model with Adaptive Sparsemax

Guo, Shiqi; Si, Yumeng; Zhao, Jing

doi:10.1007/978-3-031-17120-8_62

Abstractive Summarization Model with Adaptive Sparsemax

Shiqi Guo¹¹,
Yumeng Si¹¹ &
Jing Zhao^12,13

Conference paper
First Online: 24 September 2022

2319 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13551))

Abstract

Abstractive summarization models mostly rely on Sequence-to-Sequence architectures, in which the softmax function is widely used to transform the model output to simplex. However, softmax’s output probability distribution often has the long-tail effect especially when the vocabulary size is large. Many unrelated tokens occupy too many probabilities so they will reduce the training efficiency and effect. More recently, some work has begun to design mapping functions to gain sparse output probabilities to ignore these irrelevant tokens. In this paper, we propose Adaptive Sparsemax which can self-adaptively control the sparsity of the model’s output. Our method combines sparsemax and temperature mechanism, and the temperature value can be learned by the neural network. One of the advantages of our method is that it doesn’t need any hyperparameter. The experimental result on CNN-Daily Mail and LCSTS dataset shows that our method has better performance on the abstractive summarization task than baseline models.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://github.com/deep-spin/entmax.

References

Ackley, D.H.: A learning algorithm for boltzmann machines. Cognitive Science 9 (1985)
Google Scholar
Bridle, J.S.: Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Soulié, F.F., Hérault, J. (eds.) Neurocomputing, pp. 227–236. Springer, Berlin Heidelberg, Berlin, Heidelberg (1990)
Chapter Google Scholar
Child, R., Gray, S., Radford, A., Sutskever, I.: Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509 (2019)
Choi, B., Hong, J., Park, D.K., Lee, S.W.: F2-softmax: Diversifying neural text generation via frequency factorized softmax. CoRR abs/2009.09417 (2020), arxiv.org/abs/2009.09417
Deng, Y., Kim, Y., Chiu, J., Guo, D., Rush, A.: Latent alignment and variational attention. Advances in Neural Information Processing Systems 31 (2018)
Google Scholar
Duan, X., Yu, H., Yin, M., Zhang, M., Luo, W., Zhang, Y.: Contrastive attention mechanism for abstractive sentence summarization. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3035–3044 (2019)
Google Scholar
Fan, A., Lewis, M., Dauphin, Y.: Hierarchical neural story generation. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 889–898 (2018)
Google Scholar
Gehrmann, S., Deng, Y., Rush, A.M.: Bottom-up abstractive summarization. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4098–4109 (2018)
Google Scholar
Gu, J., Lu, Z., Li, H., Li, V.O.: Incorporating copying mechanism in sequence-to-sequence learning. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1631–1640 (2016)
Google Scholar
Guo, S., Zhao, J., Sun, S.: Resilient abstractive summarization model with adaptively weighted training loss. In: International Joint Conference on Neural Networks, IJCNN 2021, Shenzhen, China, pp. 1–8. IEEE (2021)
Google Scholar
Hermann, K.M., et al.: Teaching machines to read and comprehend. Adv. Neural. Inf. Process. Syst. 28, 1693–1701 (2015)
Google Scholar
Holtzman, A., Buys, J., Du, L., Forbes, M., Choi, Y.: The curious case of neural text degeneration. In: International Conference on Learning Representations (2019)
Google Scholar
Hu, B., Chen, Q., Zhu, F.: Lcsts: a large scale Chinese short text summarization dataset. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1967–1972 (2015)
Google Scholar
Laha, A., Chemmengath, S.A., Agrawal, P., Khapra, M., Sankaranarayanan, K., Ramaswamy, H.G.: On controllable sparse alternatives to softmax. Advances in neural information processing systems 31 (2018)
Google Scholar
Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out, pp. 74–81 (2004)
Google Scholar
Malaviya, C., Ferreira, P., Martins, A.F.: Sparse and constrained attention for neural machine translation. arXiv preprint arXiv:1805.08241 (2018)
Martins, A.F.T., Astudillo, R.F.: From softmax to sparsemax: A sparse model of attention and multi-label classification. CoRR abs/1602.02068 (2016), arxiv.org/abs/1602.02068
Nallapati, R., Zhou, B., dos Santos, C.N., Gülçehre, Ç., Xiang, B.: Abstractive text summarization using sequence-to-sequence RNNs and beyond. In: Goldberg, Y., Riezler, S. (eds.) Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, CoNLL 2016, Berlin, Germany, 11–12 August, 2016, pp. 280–290. ACL (2016)
Google Scholar
Peters, B., Niculae, V., Martins, A.F.T.: Sparse sequence-to-sequence models. CoRR abs/1905.05702 (2019). arxiv.org/abs/1905.05702
Rush, A.M., Chopra, S., Weston, J.: A neural attention model for abstractive sentence summarization. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 379–389 (2015)
Google Scholar
See, A., Liu, P.J., Manning, C.D.: Get to the point: Summarization with pointer-generator networks. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1073–1083 (2017)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. Adv. Neural. Inf. Process. Syst. 27, 3104–3112 (2014)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in neural information processing systems, pp. 5998–6008 (2017)
Google Scholar
Xu, J., Desai, S., Durrett, G.: Understanding neural abstractive summarization models via uncertainty. arXiv preprint arXiv:2010.07882 (2020)
Zhao, G., Lin, J., Zhang, Z., Ren, X., Su, Q., Sun, X.: Explicit sparse transformer: Concentrated attention through explicit selection. arXiv preprint arXiv:1912.11637 (2019)

Download references

Acknowledgements

This work was supported by the NSFC Project 62006078 and STCSM Project 22ZR1421700.

Author information

Authors and Affiliations

School of Computer Science and Technology, East China Normal University, Shanghai, 200241, China
Shiqi Guo & Yumeng Si
School of Computer Science and Technology, East China Normal University, Shanghai, 200062, China
Jing Zhao
Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, Shanghai, 200241, China
Jing Zhao

Authors

Shiqi Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yumeng Si
View author publications
You can also search for this author in PubMed Google Scholar
Jing Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jing Zhao .

Editor information

Editors and Affiliations

Singapore University of Technology and Design, Singapore, Singapore
Wei Lu
Nanjing University, Nanjing, China
Shujian Huang
Soochow University, Suzhou, China
Yu Hong
Soochow University, Soochow, China
Xiabing Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guo, S., Si, Y., Zhao, J. (2022). Abstractive Summarization Model with Adaptive Sparsemax. In: Lu, W., Huang, S., Hong, Y., Zhou, X. (eds) Natural Language Processing and Chinese Computing. NLPCC 2022. Lecture Notes in Computer Science(), vol 13551. Springer, Cham. https://doi.org/10.1007/978-3-031-17120-8_62

Download citation

DOI: https://doi.org/10.1007/978-3-031-17120-8_62
Published: 24 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-17119-2
Online ISBN: 978-3-031-17120-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)