Abstract
Abstractive summarization models mostly rely on Sequence-to-Sequence architectures, in which the softmax function is widely used to transform the model output to simplex. However, softmax’s output probability distribution often has the long-tail effect especially when the vocabulary size is large. Many unrelated tokens occupy too many probabilities so they will reduce the training efficiency and effect. More recently, some work has begun to design mapping functions to gain sparse output probabilities to ignore these irrelevant tokens. In this paper, we propose Adaptive Sparsemax which can self-adaptively control the sparsity of the model’s output. Our method combines sparsemax and temperature mechanism, and the temperature value can be learned by the neural network. One of the advantages of our method is that it doesn’t need any hyperparameter. The experimental result on CNN-Daily Mail and LCSTS dataset shows that our method has better performance on the abstractive summarization task than baseline models.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Ackley, D.H.: A learning algorithm for boltzmann machines. Cognitive Science 9 (1985)
Bridle, J.S.: Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Soulié, F.F., Hérault, J. (eds.) Neurocomputing, pp. 227–236. Springer, Berlin Heidelberg, Berlin, Heidelberg (1990)
Child, R., Gray, S., Radford, A., Sutskever, I.: Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509 (2019)
Choi, B., Hong, J., Park, D.K., Lee, S.W.: F2-softmax: Diversifying neural text generation via frequency factorized softmax. CoRR abs/2009.09417 (2020), arxiv.org/abs/2009.09417
Deng, Y., Kim, Y., Chiu, J., Guo, D., Rush, A.: Latent alignment and variational attention. Advances in Neural Information Processing Systems 31 (2018)
Duan, X., Yu, H., Yin, M., Zhang, M., Luo, W., Zhang, Y.: Contrastive attention mechanism for abstractive sentence summarization. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3035–3044 (2019)
Fan, A., Lewis, M., Dauphin, Y.: Hierarchical neural story generation. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 889–898 (2018)
Gehrmann, S., Deng, Y., Rush, A.M.: Bottom-up abstractive summarization. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4098–4109 (2018)
Gu, J., Lu, Z., Li, H., Li, V.O.: Incorporating copying mechanism in sequence-to-sequence learning. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1631–1640 (2016)
Guo, S., Zhao, J., Sun, S.: Resilient abstractive summarization model with adaptively weighted training loss. In: International Joint Conference on Neural Networks, IJCNN 2021, Shenzhen, China, pp. 1–8. IEEE (2021)
Hermann, K.M., et al.: Teaching machines to read and comprehend. Adv. Neural. Inf. Process. Syst. 28, 1693–1701 (2015)
Holtzman, A., Buys, J., Du, L., Forbes, M., Choi, Y.: The curious case of neural text degeneration. In: International Conference on Learning Representations (2019)
Hu, B., Chen, Q., Zhu, F.: Lcsts: a large scale Chinese short text summarization dataset. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1967–1972 (2015)
Laha, A., Chemmengath, S.A., Agrawal, P., Khapra, M., Sankaranarayanan, K., Ramaswamy, H.G.: On controllable sparse alternatives to softmax. Advances in neural information processing systems 31 (2018)
Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out, pp. 74–81 (2004)
Malaviya, C., Ferreira, P., Martins, A.F.: Sparse and constrained attention for neural machine translation. arXiv preprint arXiv:1805.08241 (2018)
Martins, A.F.T., Astudillo, R.F.: From softmax to sparsemax: A sparse model of attention and multi-label classification. CoRR abs/1602.02068 (2016), arxiv.org/abs/1602.02068
Nallapati, R., Zhou, B., dos Santos, C.N., Gülçehre, Ç., Xiang, B.: Abstractive text summarization using sequence-to-sequence RNNs and beyond. In: Goldberg, Y., Riezler, S. (eds.) Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, CoNLL 2016, Berlin, Germany, 11–12 August, 2016, pp. 280–290. ACL (2016)
Peters, B., Niculae, V., Martins, A.F.T.: Sparse sequence-to-sequence models. CoRR abs/1905.05702 (2019). arxiv.org/abs/1905.05702
Rush, A.M., Chopra, S., Weston, J.: A neural attention model for abstractive sentence summarization. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 379–389 (2015)
See, A., Liu, P.J., Manning, C.D.: Get to the point: Summarization with pointer-generator networks. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1073–1083 (2017)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. Adv. Neural. Inf. Process. Syst. 27, 3104–3112 (2014)
Vaswani, A., et al.: Attention is all you need. In: Advances in neural information processing systems, pp. 5998–6008 (2017)
Xu, J., Desai, S., Durrett, G.: Understanding neural abstractive summarization models via uncertainty. arXiv preprint arXiv:2010.07882 (2020)
Zhao, G., Lin, J., Zhang, Z., Ren, X., Su, Q., Sun, X.: Explicit sparse transformer: Concentrated attention through explicit selection. arXiv preprint arXiv:1912.11637 (2019)
Acknowledgements
This work was supported by the NSFC Project 62006078 and STCSM Project 22ZR1421700.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Guo, S., Si, Y., Zhao, J. (2022). Abstractive Summarization Model with Adaptive Sparsemax. In: Lu, W., Huang, S., Hong, Y., Zhou, X. (eds) Natural Language Processing and Chinese Computing. NLPCC 2022. Lecture Notes in Computer Science(), vol 13551. Springer, Cham. https://doi.org/10.1007/978-3-031-17120-8_62
Download citation
DOI: https://doi.org/10.1007/978-3-031-17120-8_62
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-17119-2
Online ISBN: 978-3-031-17120-8
eBook Packages: Computer ScienceComputer Science (R0)