Skip to main content

Comparative Analysis of Evaluation Measures for Scientific Text Simplification

  • Conference paper
  • First Online:
Linking Theory and Practice of Digital Libraries (TPDL 2024)

Abstract

Access to reliable scientific knowledge is crucial to making informed decisions for both policymakers and citizens. However, scientific documents are inherently difficult due to their complex terminology and vernacular. Automatic text simplification aims to remove some of these barriers. Evaluation frameworks, which include collections and evaluation measures, are designed to assess the generated text simplifications. In this paper, we perform a comparative analysis of current text simplification evaluation measures on both scientific text and a generic corpus based on Wikipedia. Our main finding is that the currently existing measures tend to perform worse on scientific texts and on longer texts consisting of several sentences. More generally, our analysis informs the development of suitable text simplification evaluation measures for scientific texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Change history

  • 16 November 2024

    A correction has been published.

Notes

  1. 1.

    https://www.wikipedia.org/.

  2. 2.

    https://simple.wikipedia.org/.

  3. 3.

    http://simpletext-project.com.

References

  1. Alva-Manchego, F., Martin, L., Bordes, A., Scarton, C., Sagot, B., Specia, L.: ASSET: a dataset for tuning and evaluation of sentence simplification models with multiple rewriting transformations. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4668–4679. ACL (2020). https://doi.org/10.18653/v1/2020.acl-main.424

  2. Alva-Manchego, F., Scarton, C., Specia, L.: The (un)suitability of automatic evaluation metrics for text simplification. Comput. Linguist. 47(4), 861–889 (2021). https://doi.org/10.1162/coli_a_00418

  3. Amplayo, R.K., Liu, P.J., Zhao, Y., Narayan, S.: SMART: sentences as basic units for text evaluation. In: The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, 1–5 May 2023. OpenReview.net (2023)

    Google Scholar 

  4. Beauchemin, D., Saggion, H., Khoury, R.: Meaningbert: assessing meaning preservation between sentences. Front. Artif. Intell. 6 (2023). https://doi.org/10.3389/frai.2023.1223924

  5. Brown, T.B., et al.: Language models are few-shot learners. In: Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems, pp. 1877–1901 (2020)

    Google Scholar 

  6. Cripwell, L., Legrand, J., Gardent, C.: Document-level planning for text simplification. In: Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pp. 993–1006. ACL (2023). https://doi.org/10.18653/v1/2023.eacl-main.70

  7. Cripwell, L., Legrand, J., Gardent, C.: Simplicity level estimate (SLE): a learned reference-less metric for sentence simplification. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, 6–10 December 2023, pp. 12053–12059. ACL (2023). https://doi.org/10.18653/V1/2023.EMNLP-MAIN.739

  8. Devaraj, A., Marshall, I., Wallace, B., Li, J.J.: Paragraph-level simplification of medical texts. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4972–4984. ACL (2021). https://doi.org/10.18653/v1/2021.naacl-main.395

  9. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding (2019)

    Google Scholar 

  10. Di Nunzio, G.M., Vezzani, F., Bonato, V., Azarbonyad, H., , Kamps, J., Ermakova, L.: Overview of the CLEF 2024 SimpleText task 2: identify and explain difficult concepts. In: Working Notes of CLEF 2024: Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings, CEUR-WS.org (2024)

    Google Scholar 

  11. Ermakova, L., Bertin, S., McCombie, H., Kamps, J.: Overview of the CLEF 2023 simpletext task 3: simplification of scientific texts. In: Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2023), Thessaloniki, Greece, September 18th to 21st, 2023. CEUR Workshop Proceedings, vol. 3497, pp. 2855–2875. CEUR-WS.org (2023)

    Google Scholar 

  12. Ermakova, L., Laimé, V., McCombie, H., Kamps, J.: Overview of the CLEF 2024 SimpleText task 3: Simplify scientific text. In: Working Notes of CLEF 2024: Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings, CEUR-WS.org (2024)

    Google Scholar 

  13. Ermakova, L., SanJuan, E., Huet, S., Azarbonyad, H., Augereau, O., Kamps, J.: Overview of the CLEF 2023 simpletext lab: automatic simplification of scientific texts. In: Arampatzis, A., et al. (eds.) CLEF 2023. LNCS, vol. 14163, pp. 482–506. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-42448-9_30

    Chapter  Google Scholar 

  14. Ermakova, L., et al.: CLEF 2024 SimpleText track: improving access to scientific texts for everyone. In: Goharian, N., et al. (eds.) ECIR 2024. LNCS, vol. 14613, pp. 28–35. Springer, Cham (2024). https://doi.org/10.1007/978-3-031-56072-9_4

    Chapter  Google Scholar 

  15. Garimella, A., Sancheti, A., Aggarwal, V., Ganesh, A., Chhaya, N., Kambhatla, N.: Text simplification for legal domain: insights and challenges. In: Proceedings of the Natural Legal Language Processing Workshop 2022, pp. 296–304. ACL (2022). https://doi.org/10.18653/v1/2022.nllp-1.28

  16. Grabar, N., Saggion, H.: Evaluation of automatic text simplification: where are we now, where should we go from here. In: Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1: conférence principale, pp. 453–463. ATALA (2022)

    Google Scholar 

  17. Kincaid, J., Fishburne, R., Jr., Rogers, R., Chissom, B.: Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Technical report, Naval Technical Training Command Millington TN Research Branch (1975)

    Google Scholar 

  18. Kriz, R., Apidianaki, M., Callison-Burch, C.: Simple-QE: better automatic quality estimation for text simplification. CoRR abs/2012.12382 (2020)

    Google Scholar 

  19. Lewis, M., et al.: Bart: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension (2019)

    Google Scholar 

  20. Liu, Y., Lapata, M.: Text summarization with pretrained encoders (2019)

    Google Scholar 

  21. Liu, Y., et al.: Roberta: a robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019)

    Google Scholar 

  22. Lu, J., Li, J., Wallace, B., He, Y., Pergola, G.: NapSS: paragraph-level medical text simplification via narrative prompting and sentence-matching summarization. In: Findings of the Association for Computational Linguistics: EACL 2023, pp. 1079–1091. ACL (2023). https://doi.org/10.18653/v1/2023.findings-eacl.80

  23. Maddela, M., Dou, Y., Heineman, D., Xu, W.: LENS: a learnable evaluation metric for text simplification. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, 9–14 July 2023, pp. 16383–16408. ACL (2023). https://doi.org/10.18653/V1/2023.ACL-LONG.905

  24. Mucida, L., Oliveira, A., Possi, M.: Language-independent metric for measuring text simplification that does not require a parallel corpus. In: The International FLAIRS Conference Proceedings, pp. 1–4 (2022). https://doi.org/10.32473/flairs.v35i.130608

  25. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318. ACL (2002). https://doi.org/10.3115/1073083.1073135

  26. Popović, M.: chrF: character n-gram F-score for automatic MT evaluation. In: Proceedings of the Tenth Workshop on Statistical Machine Translation, pp. 392–395. ACL (2015). https://doi.org/10.18653/v1/W15-3049

  27. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 140:1–140:67 (2020)

    Google Scholar 

  28. Scialom, T., Martin, L., Staiano, J., de la Clergerie, É.V., Sagot, B.: Rethinking automatic evaluation in sentence simplification. CoRR abs/2104.07560 (2021)

    Google Scholar 

  29. Sellam, T., Das, D., Parikh, A.: Bleurt: learning robust metrics for text generation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7881–7892. ACL (2020)

    Google Scholar 

  30. Siddharthan, A., Mandya, A.: Hybrid text simplification using synchronous dependency grammars with hand-written and automatically harvested rules. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 722–731. ACL (2014). https://doi.org/10.3115/v1/E14-1076

  31. Sulem, E., Abend, O., Rappoport, A.: BLEU is not suitable for the evaluation of text simplification. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 738–744. ACL (2018). https://doi.org/10.18653/v1/D18-1081

  32. Sulem, E., Abend, O., Rappoport, A.: Semantic structural evaluation for text simplification. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 685–696. ACL (2018). https://doi.org/10.18653/v1/N18-1063

  33. Sulem, E., Abend, O., Rappoport, A.: Simple and effective text simplification using semantic and neural methods. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 162–173. ACL (2018). https://doi.org/10.18653/v1/P18-1016

  34. Sun, H., Zhou, M.: Joint learning of a dual SMT system for paraphrase generation. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 38–42. ACL (2012)

    Google Scholar 

  35. Sun, R., Jin, H., Wan, X.: Document-level text simplification: Dataset, criteria and baseline. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 7997–8013. ACL (2021). https://doi.org/10.18653/v1/2021.emnlp-main.630

  36. Wan, X.: Automatic text simplification. Comput. Linguist. 44(4), 659–661 (2018). https://doi.org/10.1162/coli_r_00332

    Article  Google Scholar 

  37. Wubben, S., van den Bosch, A., Krahmer, E.: Sentence simplification by monolingual machine translation. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1015–1024. ACL (2012)

    Google Scholar 

  38. Xu, W., Napoles, C., Pavlick, E., Chen, Q., Callison-Burch, C.: Optimizing statistical machine translation for text simplification. Trans. Assoc. Comput. Linguist. 4, 401–415 (2016). https://doi.org/10.1162/tacl_a_00107

  39. Yuan, W., Neubig, G., Liu, P.: Bartscore: evaluating generated text as text generation. In: Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, 6–14 December 2021, virtual, pp. 27263–27277 (2021)

    Google Scholar 

  40. Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y.: Bertscore: evaluating text generation with BERT. In: 8th International Conference on Learning Representations. OpenReview.net (2020)

    Google Scholar 

  41. Zhao, X., Durmus, E., Yeung, D.Y.: Towards reference-free text simplification evaluation with a BERT Siamese network architecture. In: Findings of the Association for Computational Linguistics: ACL 2023, pp. 13250–13264. ACL (2023). https://doi.org/10.18653/v1/2023.findings-acl.838

  42. Zhu, Z., Bernhard, D., Gurevych, I.: A monolingual tree-based translation model for sentence simplification. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 1353–1361 (2010)

    Google Scholar 

  43. Zuo, T., Yosinaga, N.: Leveraging word representation for text simplification evaluation. In: Proceedings of Forum on Data Engineering and Information Management (2021)

    Google Scholar 

Download references

Acknowledgments

This research was funded in part by the French National Research Agency (ANR) under the project ANR-22-CE23-0019-01. We would like to thank our colleagues and the students from the University of Brest (France) who participated in data construction and evaluation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liana Ermakova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Davari, D., Ermakova, L., Krestel, R. (2024). Comparative Analysis of Evaluation Measures for Scientific Text Simplification. In: Antonacopoulos, A., et al. Linking Theory and Practice of Digital Libraries. TPDL 2024. Lecture Notes in Computer Science, vol 15177. Springer, Cham. https://doi.org/10.1007/978-3-031-72437-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72437-4_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72436-7

  • Online ISBN: 978-3-031-72437-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics