Skip to main content

A Discourse-Informed Approach for Cost-Effective Extractive Summarization

  • Conference paper
  • First Online:
Book cover Statistical Language and Speech Processing (SLSP 2020)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12379))

Included in the following conference series:

Abstract

This paper presents an empirical study that harnesses the benefits of Positional Language Models (PLMs) as key of an effective methodology for understanding the gist of a discursive text via extractive summarization. We introduce an unsupervised, adaptive, and cost-efficient approach that integrates semantic information in the process. Texts are linguistically analyzed, and then semantic information, specifically synsets and named entities, are integrated into the PLM, enabling the understanding of text, in line with its discursive structure. The proposed unsupervised approach is tested for different summarization tasks within standard benchmarks. The results obtained are very competitive with respect to the state of the art, thus proving the effectiveness of this approach, which requires neither training data nor high-performance computing resources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    NLP-progress is a repository to track the progress in NLP, that includes the datasets and the current state of the art for the most common NLP tasks. (nlpprogress.com, last accessed in July 2020).

References

  1. Boudin, F., Nie, J.Y., Dawes, M.: Positional language models for clinical information retrieval. In: 2010 Proceedings of EMNLP, pp. 108–115 (2010)

    Google Scholar 

  2. Chali, Y., Uddin, M.: Multi-document summarization based on atomic semantic events and their temporal relationships. In: Ferro, N., et al. (eds.) ECIR 2016. LNCS, vol. 9626, pp. 366–377. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30671-1_27

    Chapter  Google Scholar 

  3. Chen, Y.C., Bansal, M.: Fast abstractive summarization with reinforce-selected sentence rewriting. In: Proceedings of the ACL, vol. 1, pp. 675–686 (2018)

    Google Scholar 

  4. Cheng, J., Lapata, M.: Neural summarization by extracting sentences and words. In: 54th Annual Meeting of the ACL - Long Papers 1, pp. 484–494, March 2016

    Google Scholar 

  5. Erkan, G., Radev, D.R.: LexRank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)

    Article  Google Scholar 

  6. Frermann, L., Klementiev, A.: Inducing document structure for aspect-based summarization. In: Proceedings of the ACL, vol. 1, pp. 6263–6273 (2019)

    Google Scholar 

  7. Gardner, M., et al.: AllenNLP: a deep semantic natural language processing platform. arXiv preprint arXiv:1803.07640 (2018)

  8. Hermann, K.M., et al.: Teaching machines to read and comprehend. In: Advances in Neural Information Processing Systems, pp. 1693–1701 (2015)

    Google Scholar 

  9. Karimi, S., Moraes, L.F., Das, A., Verma, R.M.: University of Houston@ CL-SCiSumm 2017: positional language models, structural correspondence learning and textual entailment. In: BIRNDL@ SIGIR (2), pp. 73–85 (2017)

    Google Scholar 

  10. Kilgarriff, A., Fellbaum, C.: WordNet: an electronic lexical database. Language 76(3), 706 (2000)

    Article  Google Scholar 

  11. Lee, K., He, L., Lewis, M., Zettlemoyer, L.S.: End-to-end neural coreference resolution. In: Proceedings of EMNLP (2017)

    Google Scholar 

  12. Li, H., Zhu, J., Zhang, J., Zong, C.: Ensure the correctness of the summary: incorporate entailment knowledge into abstractive sentence summarization. In: Proceedings of COLING, pp. 1430–1441 (2018)

    Google Scholar 

  13. Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81. ACL (2004)

    Google Scholar 

  14. Lin, H., Bilmes, J.: Multi-document summarization via budgeted maximization of submodular functions. In: Proceedings of the NAACL, pp. 912–920 (2010)

    Google Scholar 

  15. Liu, Y., Titov, I., Lapata, M.: Single document summarization as tree induction. In: Proceedings of the NAACL, vol. 1, pp. 1745–1755 (2019)

    Google Scholar 

  16. Liu, Z., Chen, N.: Exploiting discourse-level segmentation for extractive summarization. In: Proceedings of the 2nd Workshop on New Frontiers in Summarization, pp. 116–121. ACL (2019)

    Google Scholar 

  17. Lv, Y., Zhai, C.: Positional language models for information retrieval. In: Proceedings of the 32nd International ACM SIGIR, pp. 299–306. ACM (2009)

    Google Scholar 

  18. Mann, W.C., Thompson, S.A.: Rhetorical structure theory: description and construction of text structures. In: Kempen, G. (ed.) Natural Language Generation. NATO ASI Series (Series E: Applied Sciences), vol. 135, pp. 85–95. Springer, Dordrecht (1987). https://doi.org/10.1007/978-94-009-3645-4_7

    Chapter  Google Scholar 

  19. Nallapati, R., Zhou, B., dos Santos, C., Gulcehre, Ç., Xiang, B.: Abstractive text summarization using sequence-to-sequence RNNs and beyond. In: Proceedings of the SIGNLL Conference, pp. 280–290. ACL (2016)

    Google Scholar 

  20. Nenkova, A., McKeown, K.: Automatic summarization. Found. Trends® Inf. Retrieval 5(2), 103–233 (2011)

    Article  Google Scholar 

  21. Neto, J.L., Santos, A.D., Kaestner, C.A., Freitas, A.A.: Document clustering and text summarization. In: Proceedings of the International Conference on Practical Applications of Knowledge Discovery and Data Mining, pp. 41–55 (2000)

    Google Scholar 

  22. Padró, L., Stanilovsky, E.: FreeLing 3.0: Towards wider multilinguality. In: Proceedings of the LREC, pp. 2473–2479. ELRA (2012)

    Google Scholar 

  23. Pottker, H.: News and its communicative quality: the inverted pyramid–when and why did it appear? J. Stud. 4(4), 501–511 (2003)

    Google Scholar 

  24. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)

    Article  Google Scholar 

  25. See, A., Liu, P.J., Manning, C.D.: Get to the point: summarization with pointer-generator networks. In: Proceedings of the ACL, vol. 1, pp. 1073–1083 (2017)

    Google Scholar 

  26. Strubell, E., Ganesh, A., McCallum, A.: Energy and policy considerations for deep learning in NLP. In: Proceedings of the ACL, pp. 3645–3650 (2019)

    Google Scholar 

  27. Takamura, H., Okumura, M.: Text summarization model based on maximum coverage problem and its variant. In: Proceedings of the EACL, pp. 781–789 (2009)

    Google Scholar 

  28. Takase, S., Okazaki, N.: Positional encoding to control output sequence length. In: Proceedings of NAACL, vol. 1, pp. 3999–4004. ACL (2019)

    Google Scholar 

  29. Vicente, M., Barros, C., Lloret, E.: Statistical language modelling for automatic story generation. J. Intell. Fuzzy Syst. 34(5), 3069–3079 (2018)

    Article  Google Scholar 

  30. Vicente, M., Lloret, E.: Relevant content selection through positional language models: an exploratory analysis. Procesamiento del Lenguaje Natural 65, 75–82 (2020)

    Google Scholar 

  31. Wang, D., Zhu, S., Li, T., Gong, Y.: Multi-document summarization using sentence-based topic models. In: Proceedings of the ACL, pp. 297–300. ACL (2009)

    Google Scholar 

  32. Wu, Y., Hu, B.: Learning to extract coherent summary via deep reinforcement learning. In: Proceedings of the AAAI, pp. 5602–5609 (2018)

    Google Scholar 

  33. Yan, R., Li, X., Liu, M., Hu, X.: Tackling sparsity, the achilles heel of social networks: language model smoothing via social regularization. In: Proceedings of the ACL, vol. 2, pp. 623–629 (2015)

    Google Scholar 

Download references

Acknowledgments

This research work has been funded by the Spanish Government through the projects: “Modelang: Modeling the behavior of digital entities by Human Language Technologies” (RTI2018-094653-B-C22) and “INTEGER Intelligent Text Generation” (RTI2018-094649-B-I00), also by Generalitat Valenciana through “SIIA: Tecnologí­as del lenguaje humano para una sociedad inclusiva, igualitaria, y accesible” (PROMETEU/2018/089). This paper is also based upon work from COST Action CA18231 “Multi3Generation: Multi-task, Multilingual, Multi-modal Language Generation”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marta Vicente .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vicente, M., Lloret, E. (2020). A Discourse-Informed Approach for Cost-Effective Extractive Summarization. In: Espinosa-Anke, L., Martín-Vide, C., Spasić, I. (eds) Statistical Language and Speech Processing. SLSP 2020. Lecture Notes in Computer Science(), vol 12379. Springer, Cham. https://doi.org/10.1007/978-3-030-59430-5_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-59430-5_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-59429-9

  • Online ISBN: 978-3-030-59430-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics