A Discourse-Informed Approach for Cost-Effective Extractive Summarization

Vicente, Marta; Lloret, Elena

doi:10.1007/978-3-030-59430-5_9

Marta Vicente¹¹ &
Elena Lloret¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12379))

Included in the following conference series:

International Conference on Statistical Language and Speech Processing

378 Accesses
1 Citations

Abstract

This paper presents an empirical study that harnesses the benefits of Positional Language Models (PLMs) as key of an effective methodology for understanding the gist of a discursive text via extractive summarization. We introduce an unsupervised, adaptive, and cost-efficient approach that integrates semantic information in the process. Texts are linguistically analyzed, and then semantic information, specifically synsets and named entities, are integrated into the PLM, enabling the understanding of text, in line with its discursive structure. The proposed unsupervised approach is tested for different summarization tasks within standard benchmarks. The results obtained are very competitive with respect to the state of the art, thus proving the effectiveness of this approach, which requires neither training data nor high-performance computing resources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
NLP-progress is a repository to track the progress in NLP, that includes the datasets and the current state of the art for the most common NLP tasks. (nlpprogress.com, last accessed in July 2020).

References

Boudin, F., Nie, J.Y., Dawes, M.: Positional language models for clinical information retrieval. In: 2010 Proceedings of EMNLP, pp. 108–115 (2010)
Google Scholar
Chali, Y., Uddin, M.: Multi-document summarization based on atomic semantic events and their temporal relationships. In: Ferro, N., et al. (eds.) ECIR 2016. LNCS, vol. 9626, pp. 366–377. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30671-1_27
Chapter Google Scholar
Chen, Y.C., Bansal, M.: Fast abstractive summarization with reinforce-selected sentence rewriting. In: Proceedings of the ACL, vol. 1, pp. 675–686 (2018)
Google Scholar
Cheng, J., Lapata, M.: Neural summarization by extracting sentences and words. In: 54th Annual Meeting of the ACL - Long Papers 1, pp. 484–494, March 2016
Google Scholar
Erkan, G., Radev, D.R.: LexRank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)
Article Google Scholar
Frermann, L., Klementiev, A.: Inducing document structure for aspect-based summarization. In: Proceedings of the ACL, vol. 1, pp. 6263–6273 (2019)
Google Scholar
Gardner, M., et al.: AllenNLP: a deep semantic natural language processing platform. arXiv preprint arXiv:1803.07640 (2018)
Hermann, K.M., et al.: Teaching machines to read and comprehend. In: Advances in Neural Information Processing Systems, pp. 1693–1701 (2015)
Google Scholar
Karimi, S., Moraes, L.F., Das, A., Verma, R.M.: University of Houston@ CL-SCiSumm 2017: positional language models, structural correspondence learning and textual entailment. In: BIRNDL@ SIGIR (2), pp. 73–85 (2017)
Google Scholar
Kilgarriff, A., Fellbaum, C.: WordNet: an electronic lexical database. Language 76(3), 706 (2000)
Article Google Scholar
Lee, K., He, L., Lewis, M., Zettlemoyer, L.S.: End-to-end neural coreference resolution. In: Proceedings of EMNLP (2017)
Google Scholar
Li, H., Zhu, J., Zhang, J., Zong, C.: Ensure the correctness of the summary: incorporate entailment knowledge into abstractive sentence summarization. In: Proceedings of COLING, pp. 1430–1441 (2018)
Google Scholar
Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81. ACL (2004)
Google Scholar
Lin, H., Bilmes, J.: Multi-document summarization via budgeted maximization of submodular functions. In: Proceedings of the NAACL, pp. 912–920 (2010)
Google Scholar
Liu, Y., Titov, I., Lapata, M.: Single document summarization as tree induction. In: Proceedings of the NAACL, vol. 1, pp. 1745–1755 (2019)
Google Scholar
Liu, Z., Chen, N.: Exploiting discourse-level segmentation for extractive summarization. In: Proceedings of the 2nd Workshop on New Frontiers in Summarization, pp. 116–121. ACL (2019)
Google Scholar
Lv, Y., Zhai, C.: Positional language models for information retrieval. In: Proceedings of the 32nd International ACM SIGIR, pp. 299–306. ACM (2009)
Google Scholar
Mann, W.C., Thompson, S.A.: Rhetorical structure theory: description and construction of text structures. In: Kempen, G. (ed.) Natural Language Generation. NATO ASI Series (Series E: Applied Sciences), vol. 135, pp. 85–95. Springer, Dordrecht (1987). https://doi.org/10.1007/978-94-009-3645-4_7
Chapter Google Scholar
Nallapati, R., Zhou, B., dos Santos, C., Gulcehre, Ç., Xiang, B.: Abstractive text summarization using sequence-to-sequence RNNs and beyond. In: Proceedings of the SIGNLL Conference, pp. 280–290. ACL (2016)
Google Scholar
Nenkova, A., McKeown, K.: Automatic summarization. Found. Trends® Inf. Retrieval 5(2), 103–233 (2011)
Article Google Scholar
Neto, J.L., Santos, A.D., Kaestner, C.A., Freitas, A.A.: Document clustering and text summarization. In: Proceedings of the International Conference on Practical Applications of Knowledge Discovery and Data Mining, pp. 41–55 (2000)
Google Scholar
Padró, L., Stanilovsky, E.: FreeLing 3.0: Towards wider multilinguality. In: Proceedings of the LREC, pp. 2473–2479. ELRA (2012)
Google Scholar
Pottker, H.: News and its communicative quality: the inverted pyramid–when and why did it appear? J. Stud. 4(4), 501–511 (2003)
Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)
Article Google Scholar
See, A., Liu, P.J., Manning, C.D.: Get to the point: summarization with pointer-generator networks. In: Proceedings of the ACL, vol. 1, pp. 1073–1083 (2017)
Google Scholar
Strubell, E., Ganesh, A., McCallum, A.: Energy and policy considerations for deep learning in NLP. In: Proceedings of the ACL, pp. 3645–3650 (2019)
Google Scholar
Takamura, H., Okumura, M.: Text summarization model based on maximum coverage problem and its variant. In: Proceedings of the EACL, pp. 781–789 (2009)
Google Scholar
Takase, S., Okazaki, N.: Positional encoding to control output sequence length. In: Proceedings of NAACL, vol. 1, pp. 3999–4004. ACL (2019)
Google Scholar
Vicente, M., Barros, C., Lloret, E.: Statistical language modelling for automatic story generation. J. Intell. Fuzzy Syst. 34(5), 3069–3079 (2018)
Article Google Scholar
Vicente, M., Lloret, E.: Relevant content selection through positional language models: an exploratory analysis. Procesamiento del Lenguaje Natural 65, 75–82 (2020)
Google Scholar
Wang, D., Zhu, S., Li, T., Gong, Y.: Multi-document summarization using sentence-based topic models. In: Proceedings of the ACL, pp. 297–300. ACL (2009)
Google Scholar
Wu, Y., Hu, B.: Learning to extract coherent summary via deep reinforcement learning. In: Proceedings of the AAAI, pp. 5602–5609 (2018)
Google Scholar
Yan, R., Li, X., Liu, M., Hu, X.: Tackling sparsity, the achilles heel of social networks: language model smoothing via social regularization. In: Proceedings of the ACL, vol. 2, pp. 623–629 (2015)
Google Scholar

Download references

Acknowledgments

This research work has been funded by the Spanish Government through the projects: “Modelang: Modeling the behavior of digital entities by Human Language Technologies” (RTI2018-094653-B-C22) and “INTEGER Intelligent Text Generation” (RTI2018-094649-B-I00), also by Generalitat Valenciana through “SIIA: Tecnologías del lenguaje humano para una sociedad inclusiva, igualitaria, y accesible” (PROMETEU/2018/089). This paper is also based upon work from COST Action CA18231 “Multi3Generation: Multi-task, Multilingual, Multi-modal Language Generation”.

Author information

Authors and Affiliations

Department of Software and Computing Systems, University of Alicante, Alicante, Spain
Marta Vicente & Elena Lloret

Authors

Marta Vicente
View author publications
You can also search for this author in PubMed Google Scholar
Elena Lloret
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marta Vicente .

Editor information

Editors and Affiliations

Cardiff University, Cardiff, UK
Luis Espinosa-Anke
Rovira i Virgili University, Tarragona, Tarragona, Spain
Carlos Martín-Vide
Computer Science, Cardiff University, Cardiff, UK
Irena Spasić

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vicente, M., Lloret, E. (2020). A Discourse-Informed Approach for Cost-Effective Extractive Summarization. In: Espinosa-Anke, L., Martín-Vide, C., Spasić, I. (eds) Statistical Language and Speech Processing. SLSP 2020. Lecture Notes in Computer Science(), vol 12379. Springer, Cham. https://doi.org/10.1007/978-3-030-59430-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-59430-5_9
Published: 26 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59429-9
Online ISBN: 978-3-030-59430-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics