OTSummarizer an Optimal Transport Based Approach for Extractive Text Summarization

Tanfouri, Imen; Jarray, Fethi

doi:10.1007/978-3-031-41774-0_20

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1864))

Included in the following conference series:

International Conference on Computational Collective Intelligence

407 Accesses

Abstract

Automatic text summarization (ATS) consists of automatically generating a coherent and concise summary of the original document. It is a fundamental task in Natural Language Processing (NLP) with various applications, including news aggregation and social media analysis. The most recent approaches are based on transformer architecture, such as BERT and its different descendants. However, these promising approaches face input length limitations, such as 512 tokens for the BERT-base model. To alleviate these issues, we propose an Optimal Transport (OT) based approach for ATS called OTSummarizer. It represents a sentence by a distribution over words and then applies an OT solver to get similarities between the original document and a candidate summary. We design a Beam Search (BS) strategy to efficiently explore the summary search space and get the optimal summary. We develop theoretical results to justify the use of OT in ATS. Empirically, we evaluate the model on the CNN Daily Mail and PubMed datasets, ensuring a ROUGE score of 41.66%. The experimental results show that the OTSummarizer performs better than previous extractive summarization state-of-the-art approaches in terms of ROUGE-1, ROUGE-2 and ROUGE-L scores.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

EXABSUM: a new text summarization approach for generating extractive and abstractive summaries

Article Open access 24 October 2023

Extractive-Abstractive: A Two-Stage Model for Long Text Summarization

Recent automatic text summarization techniques: a survey

Article 29 March 2016

References

Abu Nada, A.M., Alajrami, E., Al-Saqqa, A.A., Abu-Naser, S.S.: Arabic text summarization using arabert model using extractive text summarization approach (2020)
Google Scholar
Liu, Y., Lapata, M.: Text summarization with pretrained encoders. arXiv preprint arXiv:1908.08345 (2019)
Nallapati, R., et al.: Abstractive text summarization using sequence-to-sequence RNNs and beyond. arXiv preprint arXiv:1602.06023 (2016)
Hermann, K.M., et al.: Teaching machines to read and comprehend. Adv. Neural Inf. Process. Syst. 28 (2015)
Google Scholar
Xu, J., Durrett, G.: Neural extractive text summarization with syntactic compression. arXiv preprint arXiv:1902.00863 (2019)
Zhong, M., et al.: Extractive summarization as text matching. arXiv preprint arXiv:2004.08795 (2020)
Bouscarrat, L., Antoine, B., Thomas, P., Cécile, P.: STRASS: a light and effective method for extractive summarization based on sentence embeddings. arXiv preprint arXiv:1907.07323 (2019)
Zheng, H., Lapata, M.: Sentence centrality revisited for unsupervised summarization. arXiv preprint arXiv:1906.03508 (2019)
Srivastava, R., et al.: A topic modeled unsupervised approach to single document extractive text summarization. Knowl.-Based Syst. 246, 108636 (2022)
Google Scholar
Tang, P., Hu, K., Yan, R., Zhang, L., Gao, J., Wang, Z.: OTExtSum: extractive text summarisation with optimal transport. arXiv preprint arXiv:2204.10086 (2022)
Maynez, J., Narayan, S., Bohnet, B., McDonald, R.: On faithfulness and factuality in abstractive summarization. arXiv preprint arXiv:2005.00661 (2020)
Tanfouri, I., Jarray, F.: GaSUM: a genetic algorithm wrapped BERT for text summarization. In: International Conference on Agents and Artificial Intelligence (2023)
Google Scholar
Tanfouri, T.G., Jarray, F.: An automatic Arabic text summarization system based on genetic algorithms. Procedia Comput. Sci. 189, 195–202 (2021)
Google Scholar
Tanfouri, I., Jarray, F.: Genetic Algorithm and Latent Semantic Analysis based Documents Summarization Technique (2022)
Google Scholar
Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transport. Adv. Neural Inf. Process. Syst. 26 (2013)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Cohan, A., et al.: A discourse-aware attention model for abstractive summarization of long documents. arXiv preprint arXiv:1804.05685 (2018)
Paulus, R., Xiong, C., Socher, R. : A deep reinforced model for abstractive summarization. arXiv preprint arXiv:1705.04304 (2017)
Bai, Y., Gao, Y., Huang, H.: Cross-lingual abstractive summarization with limited parallel resources. arXiv preprint arXiv:2105.13648 (2021)
Liu, Y.: Fine-tune BERT for extractive summarization. arXiv preprint arXiv:1903.10318 (2019)
Nallapati, R., Zhou, B., Gulcehre, C., Xiang, B.: Abstractive text summarization using sequence-to-sequence RNNs and beyond. arXiv preprint arXiv:1602.06023 (2016)

Download references

Author information

Authors and Affiliations

LIMTIC Laboratory, UTM University, Tunis, Tunisia
Imen Tanfouri & Fethi Jarray
Higher Institute of Computer Science of Medenine, Medenine, Tunisia
Fethi Jarray

Authors

Imen Tanfouri
View author publications
You can also search for this author in PubMed Google Scholar
Fethi Jarray
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fethi Jarray .

Editor information

Editors and Affiliations

Wrocław University of Science and Technology, Wrocław, Poland
Ngoc Thanh Nguyen
Eötvös Loránd University, Budapest, Hungary
János Botzheim
Eötvös Loránd University, Budapest, Hungary
László Gulyás
Universidad Complutense de Madrid, Madrid, Spain
Manuel Nunez
Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
Jan Treur
University of Münster, Münster, Germany
Gottfried Vossen
Wrocław University of Science and Technology, Wrocław, Poland
Adrianna Kozierkiewicz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tanfouri, I., Jarray, F. (2023). OTSummarizer an Optimal Transport Based Approach for Extractive Text Summarization. In: Nguyen, N.T., et al. Advances in Computational Collective Intelligence. ICCCI 2023. Communications in Computer and Information Science, vol 1864. Springer, Cham. https://doi.org/10.1007/978-3-031-41774-0_20

Download citation

DOI: https://doi.org/10.1007/978-3-031-41774-0_20
Published: 22 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41773-3
Online ISBN: 978-3-031-41774-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

OTSummarizer an Optimal Transport Based Approach for Extractive Text Summarization

Abstract

Access this chapter

Similar content being viewed by others

EXABSUM: a new text summarization approach for generating extractive and abstractive summaries

Extractive-Abstractive: A Two-Stage Model for Long Text Summarization

Recent automatic text summarization techniques: a survey

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

OTSummarizer an Optimal Transport Based Approach for Extractive Text Summarization

Abstract

Access this chapter

Similar content being viewed by others

EXABSUM: a new text summarization approach for generating extractive and abstractive summaries

Extractive-Abstractive: A Two-Stage Model for Long Text Summarization

Recent automatic text summarization techniques: a survey

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation