Abstract
This paper presents an empirical study that harnesses the benefits of Positional Language Models (PLMs) as key of an effective methodology for understanding the gist of a discursive text via extractive summarization. We introduce an unsupervised, adaptive, and cost-efficient approach that integrates semantic information in the process. Texts are linguistically analyzed, and then semantic information, specifically synsets and named entities, are integrated into the PLM, enabling the understanding of text, in line with its discursive structure. The proposed unsupervised approach is tested for different summarization tasks within standard benchmarks. The results obtained are very competitive with respect to the state of the art, thus proving the effectiveness of this approach, which requires neither training data nor high-performance computing resources.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
NLP-progress is a repository to track the progress in NLP, that includes the datasets and the current state of the art for the most common NLP tasks. (nlpprogress.com, last accessed in July 2020).
References
Boudin, F., Nie, J.Y., Dawes, M.: Positional language models for clinical information retrieval. In: 2010 Proceedings of EMNLP, pp. 108–115 (2010)
Chali, Y., Uddin, M.: Multi-document summarization based on atomic semantic events and their temporal relationships. In: Ferro, N., et al. (eds.) ECIR 2016. LNCS, vol. 9626, pp. 366–377. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30671-1_27
Chen, Y.C., Bansal, M.: Fast abstractive summarization with reinforce-selected sentence rewriting. In: Proceedings of the ACL, vol. 1, pp. 675–686 (2018)
Cheng, J., Lapata, M.: Neural summarization by extracting sentences and words. In: 54th Annual Meeting of the ACL - Long Papers 1, pp. 484–494, March 2016
Erkan, G., Radev, D.R.: LexRank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)
Frermann, L., Klementiev, A.: Inducing document structure for aspect-based summarization. In: Proceedings of the ACL, vol. 1, pp. 6263–6273 (2019)
Gardner, M., et al.: AllenNLP: a deep semantic natural language processing platform. arXiv preprint arXiv:1803.07640 (2018)
Hermann, K.M., et al.: Teaching machines to read and comprehend. In: Advances in Neural Information Processing Systems, pp. 1693–1701 (2015)
Karimi, S., Moraes, L.F., Das, A., Verma, R.M.: University of Houston@ CL-SCiSumm 2017: positional language models, structural correspondence learning and textual entailment. In: BIRNDL@ SIGIR (2), pp. 73–85 (2017)
Kilgarriff, A., Fellbaum, C.: WordNet: an electronic lexical database. Language 76(3), 706 (2000)
Lee, K., He, L., Lewis, M., Zettlemoyer, L.S.: End-to-end neural coreference resolution. In: Proceedings of EMNLP (2017)
Li, H., Zhu, J., Zhang, J., Zong, C.: Ensure the correctness of the summary: incorporate entailment knowledge into abstractive sentence summarization. In: Proceedings of COLING, pp. 1430–1441 (2018)
Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81. ACL (2004)
Lin, H., Bilmes, J.: Multi-document summarization via budgeted maximization of submodular functions. In: Proceedings of the NAACL, pp. 912–920 (2010)
Liu, Y., Titov, I., Lapata, M.: Single document summarization as tree induction. In: Proceedings of the NAACL, vol. 1, pp. 1745–1755 (2019)
Liu, Z., Chen, N.: Exploiting discourse-level segmentation for extractive summarization. In: Proceedings of the 2nd Workshop on New Frontiers in Summarization, pp. 116–121. ACL (2019)
Lv, Y., Zhai, C.: Positional language models for information retrieval. In: Proceedings of the 32nd International ACM SIGIR, pp. 299–306. ACM (2009)
Mann, W.C., Thompson, S.A.: Rhetorical structure theory: description and construction of text structures. In: Kempen, G. (ed.) Natural Language Generation. NATO ASI Series (Series E: Applied Sciences), vol. 135, pp. 85–95. Springer, Dordrecht (1987). https://doi.org/10.1007/978-94-009-3645-4_7
Nallapati, R., Zhou, B., dos Santos, C., Gulcehre, Ç., Xiang, B.: Abstractive text summarization using sequence-to-sequence RNNs and beyond. In: Proceedings of the SIGNLL Conference, pp. 280–290. ACL (2016)
Nenkova, A., McKeown, K.: Automatic summarization. Found. Trends® Inf. Retrieval 5(2), 103–233 (2011)
Neto, J.L., Santos, A.D., Kaestner, C.A., Freitas, A.A.: Document clustering and text summarization. In: Proceedings of the International Conference on Practical Applications of Knowledge Discovery and Data Mining, pp. 41–55 (2000)
Padró, L., Stanilovsky, E.: FreeLing 3.0: Towards wider multilinguality. In: Proceedings of the LREC, pp. 2473–2479. ELRA (2012)
Pottker, H.: News and its communicative quality: the inverted pyramid–when and why did it appear? J. Stud. 4(4), 501–511 (2003)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)
See, A., Liu, P.J., Manning, C.D.: Get to the point: summarization with pointer-generator networks. In: Proceedings of the ACL, vol. 1, pp. 1073–1083 (2017)
Strubell, E., Ganesh, A., McCallum, A.: Energy and policy considerations for deep learning in NLP. In: Proceedings of the ACL, pp. 3645–3650 (2019)
Takamura, H., Okumura, M.: Text summarization model based on maximum coverage problem and its variant. In: Proceedings of the EACL, pp. 781–789 (2009)
Takase, S., Okazaki, N.: Positional encoding to control output sequence length. In: Proceedings of NAACL, vol. 1, pp. 3999–4004. ACL (2019)
Vicente, M., Barros, C., Lloret, E.: Statistical language modelling for automatic story generation. J. Intell. Fuzzy Syst. 34(5), 3069–3079 (2018)
Vicente, M., Lloret, E.: Relevant content selection through positional language models: an exploratory analysis. Procesamiento del Lenguaje Natural 65, 75–82 (2020)
Wang, D., Zhu, S., Li, T., Gong, Y.: Multi-document summarization using sentence-based topic models. In: Proceedings of the ACL, pp. 297–300. ACL (2009)
Wu, Y., Hu, B.: Learning to extract coherent summary via deep reinforcement learning. In: Proceedings of the AAAI, pp. 5602–5609 (2018)
Yan, R., Li, X., Liu, M., Hu, X.: Tackling sparsity, the achilles heel of social networks: language model smoothing via social regularization. In: Proceedings of the ACL, vol. 2, pp. 623–629 (2015)
Acknowledgments
This research work has been funded by the Spanish Government through the projects: “Modelang: Modeling the behavior of digital entities by Human Language Technologies” (RTI2018-094653-B-C22) and “INTEGER Intelligent Text Generation” (RTI2018-094649-B-I00), also by Generalitat Valenciana through “SIIA: Tecnologías del lenguaje humano para una sociedad inclusiva, igualitaria, y accesible” (PROMETEU/2018/089). This paper is also based upon work from COST Action CA18231 “Multi3Generation: Multi-task, Multilingual, Multi-modal Language Generation”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Vicente, M., Lloret, E. (2020). A Discourse-Informed Approach for Cost-Effective Extractive Summarization. In: Espinosa-Anke, L., Martín-Vide, C., Spasić, I. (eds) Statistical Language and Speech Processing. SLSP 2020. Lecture Notes in Computer Science(), vol 12379. Springer, Cham. https://doi.org/10.1007/978-3-030-59430-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-59430-5_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59429-9
Online ISBN: 978-3-030-59430-5
eBook Packages: Computer ScienceComputer Science (R0)