Skip to main content

Abstractive Summarization Model with Adaptive Sparsemax

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13551))

Abstract

Abstractive summarization models mostly rely on Sequence-to-Sequence architectures, in which the softmax function is widely used to transform the model output to simplex. However, softmax’s output probability distribution often has the long-tail effect especially when the vocabulary size is large. Many unrelated tokens occupy too many probabilities so they will reduce the training efficiency and effect. More recently, some work has begun to design mapping functions to gain sparse output probabilities to ignore these irrelevant tokens. In this paper, we propose Adaptive Sparsemax which can self-adaptively control the sparsity of the model’s output. Our method combines sparsemax and temperature mechanism, and the temperature value can be learned by the neural network. One of the advantages of our method is that it doesn’t need any hyperparameter. The experimental result on CNN-Daily Mail and LCSTS dataset shows that our method has better performance on the abstractive summarization task than baseline models.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://github.com/deep-spin/entmax.

References

  1. Ackley, D.H.: A learning algorithm for boltzmann machines. Cognitive Science 9 (1985)

    Google Scholar 

  2. Bridle, J.S.: Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Soulié, F.F., Hérault, J. (eds.) Neurocomputing, pp. 227–236. Springer, Berlin Heidelberg, Berlin, Heidelberg (1990)

    Chapter  Google Scholar 

  3. Child, R., Gray, S., Radford, A., Sutskever, I.: Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509 (2019)

  4. Choi, B., Hong, J., Park, D.K., Lee, S.W.: F2-softmax: Diversifying neural text generation via frequency factorized softmax. CoRR abs/2009.09417 (2020), arxiv.org/abs/2009.09417

  5. Deng, Y., Kim, Y., Chiu, J., Guo, D., Rush, A.: Latent alignment and variational attention. Advances in Neural Information Processing Systems 31 (2018)

    Google Scholar 

  6. Duan, X., Yu, H., Yin, M., Zhang, M., Luo, W., Zhang, Y.: Contrastive attention mechanism for abstractive sentence summarization. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3035–3044 (2019)

    Google Scholar 

  7. Fan, A., Lewis, M., Dauphin, Y.: Hierarchical neural story generation. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 889–898 (2018)

    Google Scholar 

  8. Gehrmann, S., Deng, Y., Rush, A.M.: Bottom-up abstractive summarization. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4098–4109 (2018)

    Google Scholar 

  9. Gu, J., Lu, Z., Li, H., Li, V.O.: Incorporating copying mechanism in sequence-to-sequence learning. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1631–1640 (2016)

    Google Scholar 

  10. Guo, S., Zhao, J., Sun, S.: Resilient abstractive summarization model with adaptively weighted training loss. In: International Joint Conference on Neural Networks, IJCNN 2021, Shenzhen, China, pp. 1–8. IEEE (2021)

    Google Scholar 

  11. Hermann, K.M., et al.: Teaching machines to read and comprehend. Adv. Neural. Inf. Process. Syst. 28, 1693–1701 (2015)

    Google Scholar 

  12. Holtzman, A., Buys, J., Du, L., Forbes, M., Choi, Y.: The curious case of neural text degeneration. In: International Conference on Learning Representations (2019)

    Google Scholar 

  13. Hu, B., Chen, Q., Zhu, F.: Lcsts: a large scale Chinese short text summarization dataset. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1967–1972 (2015)

    Google Scholar 

  14. Laha, A., Chemmengath, S.A., Agrawal, P., Khapra, M., Sankaranarayanan, K., Ramaswamy, H.G.: On controllable sparse alternatives to softmax. Advances in neural information processing systems 31 (2018)

    Google Scholar 

  15. Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out, pp. 74–81 (2004)

    Google Scholar 

  16. Malaviya, C., Ferreira, P., Martins, A.F.: Sparse and constrained attention for neural machine translation. arXiv preprint arXiv:1805.08241 (2018)

  17. Martins, A.F.T., Astudillo, R.F.: From softmax to sparsemax: A sparse model of attention and multi-label classification. CoRR abs/1602.02068 (2016), arxiv.org/abs/1602.02068

  18. Nallapati, R., Zhou, B., dos Santos, C.N., Gülçehre, Ç., Xiang, B.: Abstractive text summarization using sequence-to-sequence RNNs and beyond. In: Goldberg, Y., Riezler, S. (eds.) Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, CoNLL 2016, Berlin, Germany, 11–12 August, 2016, pp. 280–290. ACL (2016)

    Google Scholar 

  19. Peters, B., Niculae, V., Martins, A.F.T.: Sparse sequence-to-sequence models. CoRR abs/1905.05702 (2019). arxiv.org/abs/1905.05702

  20. Rush, A.M., Chopra, S., Weston, J.: A neural attention model for abstractive sentence summarization. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 379–389 (2015)

    Google Scholar 

  21. See, A., Liu, P.J., Manning, C.D.: Get to the point: Summarization with pointer-generator networks. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1073–1083 (2017)

    Google Scholar 

  22. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. Adv. Neural. Inf. Process. Syst. 27, 3104–3112 (2014)

    Google Scholar 

  23. Vaswani, A., et al.: Attention is all you need. In: Advances in neural information processing systems, pp. 5998–6008 (2017)

    Google Scholar 

  24. Xu, J., Desai, S., Durrett, G.: Understanding neural abstractive summarization models via uncertainty. arXiv preprint arXiv:2010.07882 (2020)

  25. Zhao, G., Lin, J., Zhang, Z., Ren, X., Su, Q., Sun, X.: Explicit sparse transformer: Concentrated attention through explicit selection. arXiv preprint arXiv:1912.11637 (2019)

Download references

Acknowledgements

This work was supported by the NSFC Project 62006078 and STCSM Project 22ZR1421700.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Zhao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Guo, S., Si, Y., Zhao, J. (2022). Abstractive Summarization Model with Adaptive Sparsemax. In: Lu, W., Huang, S., Hong, Y., Zhou, X. (eds) Natural Language Processing and Chinese Computing. NLPCC 2022. Lecture Notes in Computer Science(), vol 13551. Springer, Cham. https://doi.org/10.1007/978-3-031-17120-8_62

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-17120-8_62

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-17119-2

  • Online ISBN: 978-3-031-17120-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics