Skip to main content

OTSummarizer an Optimal Transport Based Approach for Extractive Text Summarization

  • Conference paper
  • First Online:
Advances in Computational Collective Intelligence (ICCCI 2023)

Abstract

Automatic text summarization (ATS) consists of automatically generating a coherent and concise summary of the original document. It is a fundamental task in Natural Language Processing (NLP) with various applications, including news aggregation and social media analysis. The most recent approaches are based on transformer architecture, such as BERT and its different descendants. However, these promising approaches face input length limitations, such as 512 tokens for the BERT-base model. To alleviate these issues, we propose an Optimal Transport (OT) based approach for ATS called OTSummarizer. It represents a sentence by a distribution over words and then applies an OT solver to get similarities between the original document and a candidate summary. We design a Beam Search (BS) strategy to efficiently explore the summary search space and get the optimal summary. We develop theoretical results to justify the use of OT in ATS. Empirically, we evaluate the model on the CNN Daily Mail and PubMed datasets, ensuring a ROUGE score of 41.66%. The experimental results show that the OTSummarizer performs better than previous extractive summarization state-of-the-art approaches in terms of ROUGE-1, ROUGE-2 and ROUGE-L scores.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abu Nada, A.M., Alajrami, E., Al-Saqqa, A.A., Abu-Naser, S.S.: Arabic text summarization using arabert model using extractive text summarization approach (2020)

    Google Scholar 

  2. Liu, Y., Lapata, M.: Text summarization with pretrained encoders. arXiv preprint arXiv:1908.08345 (2019)

  3. Nallapati, R., et al.: Abstractive text summarization using sequence-to-sequence RNNs and beyond. arXiv preprint arXiv:1602.06023 (2016)

  4. Hermann, K.M., et al.: Teaching machines to read and comprehend. Adv. Neural Inf. Process. Syst. 28 (2015)

    Google Scholar 

  5. Xu, J., Durrett, G.: Neural extractive text summarization with syntactic compression. arXiv preprint arXiv:1902.00863 (2019)

  6. Zhong, M., et al.: Extractive summarization as text matching. arXiv preprint arXiv:2004.08795 (2020)

  7. Bouscarrat, L., Antoine, B., Thomas, P., Cécile, P.: STRASS: a light and effective method for extractive summarization based on sentence embeddings. arXiv preprint arXiv:1907.07323 (2019)

  8. Zheng, H., Lapata, M.: Sentence centrality revisited for unsupervised summarization. arXiv preprint arXiv:1906.03508 (2019)

  9. Srivastava, R., et al.: A topic modeled unsupervised approach to single document extractive text summarization. Knowl.-Based Syst. 246, 108636 (2022)

    Google Scholar 

  10. Tang, P., Hu, K., Yan, R., Zhang, L., Gao, J., Wang, Z.: OTExtSum: extractive text summarisation with optimal transport. arXiv preprint arXiv:2204.10086 (2022)

  11. Maynez, J., Narayan, S., Bohnet, B., McDonald, R.: On faithfulness and factuality in abstractive summarization. arXiv preprint arXiv:2005.00661 (2020)

  12. Tanfouri, I., Jarray, F.: GaSUM: a genetic algorithm wrapped BERT for text summarization. In: International Conference on Agents and Artificial Intelligence (2023)

    Google Scholar 

  13. Tanfouri, T.G., Jarray, F.: An automatic Arabic text summarization system based on genetic algorithms. Procedia Comput. Sci. 189, 195–202 (2021)

    Google Scholar 

  14. Tanfouri, I., Jarray, F.: Genetic Algorithm and Latent Semantic Analysis based Documents Summarization Technique (2022)

    Google Scholar 

  15. Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transport. Adv. Neural Inf. Process. Syst. 26 (2013)

    Google Scholar 

  16. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  17. Cohan, A., et al.: A discourse-aware attention model for abstractive summarization of long documents. arXiv preprint arXiv:1804.05685 (2018)

  18. Paulus, R., Xiong, C., Socher, R. : A deep reinforced model for abstractive summarization. arXiv preprint arXiv:1705.04304 (2017)

  19. Bai, Y., Gao, Y., Huang, H.: Cross-lingual abstractive summarization with limited parallel resources. arXiv preprint arXiv:2105.13648 (2021)

  20. Liu, Y.: Fine-tune BERT for extractive summarization. arXiv preprint arXiv:1903.10318 (2019)

  21. Nallapati, R., Zhou, B., Gulcehre, C., Xiang, B.: Abstractive text summarization using sequence-to-sequence RNNs and beyond. arXiv preprint arXiv:1602.06023 (2016)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fethi Jarray .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tanfouri, I., Jarray, F. (2023). OTSummarizer an Optimal Transport Based Approach for Extractive Text Summarization. In: Nguyen, N.T., et al. Advances in Computational Collective Intelligence. ICCCI 2023. Communications in Computer and Information Science, vol 1864. Springer, Cham. https://doi.org/10.1007/978-3-031-41774-0_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-41774-0_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-41773-3

  • Online ISBN: 978-3-031-41774-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics