Domain-Driven and Discourse-Guided Scientific Summarisation

Goldsack, Tomas; Zhang, Zhihao; Lin, Chenghua; Scarton, Carolina

doi:10.1007/978-3-031-28244-7_23

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13980))

Included in the following conference series:

European Conference on Information Retrieval

1953 Accesses

Abstract

Scientific articles tend to follow a standardised discourse that enables a reader to quickly identify and extract useful or important information. We hypothesise that such structural conventions are strongly influenced by the scientific domain (e.g., Computer Science, Chemistry, etc.) and explore this through a novel extractive algorithm that utilises domain-specific discourse information for the task of abstract generation. In addition to being both simple and lightweight, the proposed algorithm constructs summaries in a structured and interpretable manner. In spite of these factors, we show that our approach outperforms strong baselines on the arXiv scientific summarisation dataset in both automatic and human evaluations, confirming that a scientific article’s domain strongly influences its discourse structure and can be leveraged to effectively improve its summarisation. Our code can be found at: https://github.com/TGoldsack1/DodoRank.

T. Goldsack and Z. Zhang—Equal contribution

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Overview of Approaches for Increasing Coherence in Extractive Summaries

Structured abstract generator (SAG) model: analysis of IMRAD structure of articles and its effect on extractive summarization

Article Open access 07 May 2024

Insights from CL-SciSumm 2016: the faceted scientific document summarization Shared Task

Article 14 June 2017

Notes

1.
This sample set is also used within §3.2, as indicated in Fig. 2.
2.
Note that we also experimented with both rounding up and rounding to the nearest integer value for Eq. (1), but found that rounding down gave the best performance.
3.
The domain names retrieved are equal to highest-level categories as defined in the arXiv category taxonomy: https://arxiv.org/category_taxonomy.
4.
Increasing or decreasing K (which directly influences the number of sentences in the summaries produced by DodoRank) invariably led to a worse average performance, as measured by the metrics described in this Section.
5.
All ROUGE calculations are performed using the rouge-score Python package.
6.
Judges are native English speakers holding a bachelor’s degree in scientific disciplines.

References

Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3615–3620. Association for Computational Linguistics, Hong Kong, China (2019). https://doi.org/10.18653/v1/D19-1371. https://aclanthology.org/D19-1371
Cheng, Y., et al.: Guiding the growth: difficulty-controllable question generation through step-by-step rewriting. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP), pp. 5968–5978. Online (2021)
Google Scholar
Cohan, A., Beltagy, I., King, D., Dalvi, B., Weld, D.: Pretrained language models for sequential sentence classification. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3693–3699. Association for Computational Linguistics, Hong Kong, China (2019)
Google Scholar
Cohan, A., et al.: A discourse-aware attention model for abstractive summarization of long documents. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pp. 615–621. Association for Computational Linguistics, New Orleans, Louisiana (2018). https://doi.org/10.18653/v1/N18-2097. https://aclanthology.org/N18-2097
Collins, E., Augenstein, I., Riedel, S.: A supervised approach to extractive summarisation of scientific papers. In: Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), pp. 195–205. Association for Computational Linguistics, Vancouver, Canada (2017)
Google Scholar
Contractor, D., Guo, Y., Korhonen, A.: Using argumentative zones for extractive summarization of scientific articles. In: Proceedings of COLING 2012, pp. 663–678. The COLING 2012 Organizing Committee, Mumbai, India (2012). https://aclanthology.org/C12-1041
Cui, P., Hu, L.: Sliding selector network with dynamic memory for extractive summarization of long documents. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 5881–5891 (2021)
Google Scholar
Cui, P., Hu, L., Liu, Y.: Enhancing extractive text summarization with topic-aware graph neural networks. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 5360–5371 (2020)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota (2019). https://doi.org/10.18653/v1/N19-1423. https://aclanthology.org/N19-1423
Dong, Y., Mircea, A., Cheung, J.C.K.: Discourse-Aware unsupervised summarization for long scientific documents. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 1089–1102. Association for Computational Linguistics, Online (2021)
Google Scholar
El-Kassas, W., Salama, C., Rafea, A., Mohamed, H.: Automatic text summarization: a comprehensive survey. Expert Syst. Appl. 165, 113679 (2020). https://doi.org/10.1016/j.eswa.2020.113679
Erkan, G., Radev, D.R.: LexRank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)
Article Google Scholar
Ermakova, L., Bordignon, F., Turenne, N., Noel, M.: Is the abstract a mere teaser? Evaluating generosity of article abstracts in the environmental sciences. Front. Res. Metrics Anal. 3, 16 (2018). https://doi.org/10.3389/frma.2018.00016. https://www.frontiersin.org/articles/10.3389/frma.2018.00016
Goldsack, T., Zhang, Z., Lin, C., Scarton, C.: Making science simple: corpora for the lay summarisation of scientific literature. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 10589–10604. Association for Computational Linguistics, Abu Dhabi (2022)
Google Scholar
Graetz, N.: Teaching EFL students to extract structural information from abstracts. International Symposium on Language for Special Purposes (1982)
Google Scholar
Guo, Y., Korhonen, A., Liakata, M., Silins, I., Sun, L., Stenius, U.: Identifying the information structure of scientific abstracts: an investigation of three different schemes. In: Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, pp. 99–107. Association for Computational Linguistics, Uppsala, Sweden (2010)
Google Scholar
Guo, Y., Silins, I., Stenius, U., Korhonen, A.: Active learning-based information structure analysis of full scientific articles and two applications for biomedical literature review. Bioinformatics 29(11), 1440–1447 (2013)
Article Google Scholar
Johnson, F.: Automatic abstracting research. Libr. Rev. 44(8), 28–36 (1995). https://www.proquest.com/scholarly-journals/automatic-abstracting-research/docview/218330298/se-2?accountid=13828
Ju, J., Liu, M., Koh, H.Y., Jin, Y., Du, L., Pan, S.: Leveraging information bottleneck for scientific document summarization. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 4091–4098. Association for Computational Linguistics, Punta Cana, Dominican Republic (2021). https://doi.org/10.18653/v1/2021.findings-emnlp.345. https://aclanthology.org/2021.findings-emnlp.345
Liakata, M.: Zones of conceptualisation in scientific papers: a window to negative and speculative statements. In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, pp. 1–4. University of Antwerp, Uppsala, Sweden (2010)
Google Scholar
Liakata, M., Dobnik, S., Saha, S., Batchelor, C., Rebholz-Schuhmann, D.: A discourse-driven content model for summarising scientific articles evaluated in a complex question answering task. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 747–757. Association for Computational Linguistics, Seattle, Washington, USA (2013)
Google Scholar
Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81. Association for Computational Linguistics, Barcelona, Spain (2004). https://aclanthology.org/W04-1013
Maynez, J., Narayan, S., Bohnet, B., McDonald, R.: On faithfulness and factuality in abstractive summarization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. pp. 1906–1919. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.acl-main.173. https://aclanthology.org/2020.acl-main.173
Peng, K., Yin, C., Rong, W., Lin, C., Zhou, D., Xiong, Z.: Named entity aware transfer learning for biomedical factoid question answering. IEEE/ACM Trans. Comput. Biol. Bioinform. 19(4), 2365–2376 (2021)
Google Scholar
Steinberger, J., Jezek, K.: Using latent semantic analysis in text summarization and summary evaluation. In: Proceedings of the 7th International Conference ISIM (2004)
Google Scholar
Swales, J.: Genre analysis: Eenglish in academic and research settings. Cambridge University Press (1990)
Google Scholar
Teufel, S.: Argumentative zoning: information extraction from scientific text, Ph. D. thesis, University of Edinburgh (1999)
Google Scholar
Teufel, S., Moens, M.: Articles summarizing scientific articles: experiments with relevance and rhetorical status. Comput. Linguist. 28(4), 409–445 (2002)
Article Google Scholar
Teufel, S., Siddharthan, A., Batchelor, C.: Towards discipline-independent argumentative zoning: evidence from chemistry and computational linguistics. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 3, pp. 1493–1502. EMNLP 2009, Association for Computational Linguistics, USA (2009)
Google Scholar
Xiao, W., Carenini, G.: Extractive summarization of long documents by combining global and local context. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3011–3021. Association for Computational Linguistics, Hong Kong, China (2019). https://doi.org/10.18653/v1/D19-1298. https://aclanthology.org/D19-1298
Zheng, H., Lapata, M.: Sentence centrality revisited for unsupervised summarization. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 6236–6247 (2019)
Google Scholar
Zhong, M., Liu, P., Chen, Y., Wang, D., Qiu, X., Huang, X.J.: Extractive summarization as text matching. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 6197–6208 (2020)
Google Scholar
Zhu, T., Hua, W., Qu, J., Zhou, X.: summarizing long-form document with rich discourse information. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 2770–2779. CIKM 2021, Association for Computing Machinery, New York, NY, USA (2021)
Google Scholar

Download references

Acknowledgements

This work was supported by the Centre for Doctoral Training in Speech and Language Technologies (SLT) and their Applications funded by UK Research and Innovation [grant number EP/S023062/1].

Author information

Authors and Affiliations

University of Sheffield, Sheffield, UK
Tomas Goldsack, Chenghua Lin & Carolina Scarton
Beihang University, Beijing, China
Zhihao Zhang

Authors

Tomas Goldsack
View author publications
You can also search for this author in PubMed Google Scholar
Zhihao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chenghua Lin
View author publications
You can also search for this author in PubMed Google Scholar
Carolina Scarton
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chenghua Lin .

Editor information

Editors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
Jaap Kamps
Université Grenoble-Alpes, Saint-Martin-d’Hères, France
Lorraine Goeuriot
Università della Svizzera Italiana, Lugano, Switzerland
Fabio Crestani
University of Copenhagen, Copenhagen, Denmark
Maria Maistro
University of Tsukuba, Ibaraki, Japan
Hideo Joho
Dublin City University, Dublin, Ireland
Brian Davis
Dublin City University, Dublin, Ireland
Cathal Gurrin
Universität Regensburg, Regensburg, Germany
Udo Kruschwitz
Dublin City University, Dublin, Ireland
Annalina Caputo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Goldsack, T., Zhang, Z., Lin, C., Scarton, C. (2023). Domain-Driven and Discourse-Guided Scientific Summarisation. In: Kamps, J., et al. Advances in Information Retrieval. ECIR 2023. Lecture Notes in Computer Science, vol 13980. Springer, Cham. https://doi.org/10.1007/978-3-031-28244-7_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-28244-7_23
Published: 17 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28243-0
Online ISBN: 978-3-031-28244-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Domain-Driven and Discourse-Guided Scientific Summarisation