Skip to main content

Ensembling and Score-Based Filtering in Sentence Alignment for Automatic Simplification of German Texts

  • Conference paper
  • First Online:
Universal Access in Human-Computer Interaction. Novel Design Approaches and Technologies (HCII 2022)

Abstract

Among the well-known accessibility services for audiovisual media are subtitling for the deaf and hard-of-hearing, audio description, and sign language interpreting. More recently, automatic text simplification has emerged as a topic in the context of media accessibility, with research often approaching the task as a case of (sentence-based) monolingual machine translation. This approach relies on large amounts of high-quality parallel data, which is why monolingual sentence alignment has gained momentum. Alignment for text simplification is a complex task, with alignments often taking the form of n:m (in contrast to the standard case of 1:1 in machine translation). In this contribution, we evaluate the performance of different alignment methods against a human-created gold standard of standard German/simplified German sentence alignments created from a number of parallel corpora. Two of the corpora contain multiple levels of simplification.

We employ a variety of alignment methods developed for monolingual tasks and bilingual sentence alignment. We explore strategies such as ensembling and score-based filtering to further improve the performance over these baselines. We show that combining multiple alignment methods with various hard voting strategies can outperform even the best individual methods and that we achieve similar results with score-based filtering of extracted alignments to find the most promising candidates. Our results motivate the notion that the overall task of sentence alignment for automatic simplification of German should be viewed as a two-step process that goes beyond the application of individual alignment methods.

Funded by the Austrian Research Promotion Agency (Österreichische Forschungsförderungsgesellschaft, FFG) General Programme under grant agreement number 881202.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The term “simplified language” is used to denote the sum of all “comprehensibility-enhanced varieties of natural languages” [6, p. 52], i.e., what is commonly termed “Easy Language” (German leichte Sprache) and “Plain Language” (German einfache Sprache). “Easy-to-understand language” has been mentioned as an umbrella term subsuming these varieties [6, p. 52]. However, in this contribution, we prefer the term “simplified language” to emphasize the notion of the result of a simplification process.

  2. 2.

    https://www.capito.eu/.

  3. 3.

    https://www.deepl.com.

  4. 4.

    https://github.com/kostrzmar/SATEF.

  5. 5.

    https://github.com/thompsonb/vecalign.

References

  1. Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72. Association for Computational Linguistics, Ann Arbor, Michigan, June 2005. https://aclanthology.org/W05-0909

  2. Battisti, A., Pfütze, D., Säuberli, A., Kostrzewa, M., Ebling, S.: A corpus for automatic readability assessment and text simplification of German. In: Proceedings of The 12th Language Resources and Evaluation Conference, pp. 3295–3304. European Language Resources Association, Marseille, France, May 2020. https://www.aclweb.org/anthology/2020.lrec-1.403

  3. Council of Europe: Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Cambridge University Press, Cambridge (2009)

    Google Scholar 

  4. Hwang, W., Hajishirzi, H., Ostendorf, M., Wu, W.: Aligning sentences from standard Wikipedia to simple Wikipedia. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 211–217. Association for Computational Linguistics, Denver, Colorado, May–June 2015. https://doi.org/10.3115/v1/N15-1022. https://aclanthology.org/N15-1022

  5. Jiang, C., Maddela, M., Lan, W., Zhong, Y., Xu, W.: Neural CRF model for sentence alignment in text simplification (2021)

    Google Scholar 

  6. Maaß, C.: Easy Language–Plain Language–Easy Language Plus. Balancing Comprehensibility and Acceptability, Easy–Plain–Accessible, vol. 3. Frank & Timme (2020)

    Google Scholar 

  7. Nikolov, N., Hahnloser, R.: Large-scale hierarchical alignment for data-driven text rewriting. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2019 (2019)

    Google Scholar 

  8. Paetzold, G., Alva-Manchego, F., Specia, L.: MassAlign: alignment and annotation of comparable documents. In: Park, S., Supnithi, T. (eds.) Proceedings of the IJCNLP 2017, Tapei, Taiwan, 27 November–1 December 2017, System Demonstrations, pp. 1–4. Association for Computational Linguistics (2017). https://aclanthology.info/papers/I17-3001/i17-3001

  9. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 311–318, Philadelphia, PA (2002)

    Google Scholar 

  10. Pfütze, D.: Sentence alignment gold standards for neural text simplification, University of Zurich (2020)

    Google Scholar 

  11. Pfütze, D., Ebling, S.: Sentence alignment in the context of automatic text simplification. Poster Presented at KLAARA 2021–2nd Conference on Easy-to-Read Language Research, Switzerland (Online), August 2021

    Google Scholar 

  12. Reimers, N., Gurevych, I.: Making monolingual sentence embeddings multilingual using knowledge distillation. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, November 2020. https://arxiv.org/abs/2004.09813

  13. Schwenk, H., Douze, M.: Learning joint multilingual sentence representations with neural machine translation. In: Proceedings of the 2nd Workshop on Representation Learning for NLP, pp. 157–167. Association for Computational Linguistics, Vancouver, Canada, August 2017. https://www.aclweb.org/anthology/W17-2619

  14. Spring, N., Pfütze, D., Kostrzewa, M., Battisti, A., Rios, A., Ebling, S.: Comparing sentence alignment methods for automatic simplification of German texts. Presentation Given at the 1st International Easy Language Day Conference (IELD), Germersheim, Germany (2021)

    Google Scholar 

  15. Štajner, S., Franco-Salvador, M., Rosso, P., Ponzetto, S.: CATS: a tool for customized alignment of text simplification corpora. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), pp. 3895–3903, Miyazaki, Japan (2018)

    Google Scholar 

  16. Xu, W., Callison-Burch, C., Napoles, C.: Problems in current text simplification research: new data can help. Trans. Assoc. Comput. Linguist. 3, 283–297 (2015)

    Google Scholar 

  17. Xu, W., Napoles, C., Pavlick, E., Chen, Q., Callison-Burch, C.: Optimizing statistical machine translation for text simplification. Trans. Assoc. Comput. Linguist. 4, 401–415 (2016)

    Article  Google Scholar 

  18. Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y.: BERTScore: evaluating text generation with BERT. In: International Conference on Learning Representations (2020)

    Google Scholar 

  19. Zhu, Z., Bernhard, D., Gurevych, I.: A monolingual tree-based translation model for sentence simplification. In: Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pp. 1353–1361. Coling 2010 Organizing Committee, Beijing, China, August 2010. https://aclanthology.org/C10-1152

Download references

Acknowledgements

The authors would like to thank the Austria Presse Agentur and CFS GmbH for providing the data for two of the parallel corpora of standard German documents with their simplified counterparts.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicolas Spring .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Spring, N., Kostrzewa, M., Rios, A., Ebling, S. (2022). Ensembling and Score-Based Filtering in Sentence Alignment for Automatic Simplification of German Texts. In: Antona, M., Stephanidis, C. (eds) Universal Access in Human-Computer Interaction. Novel Design Approaches and Technologies. HCII 2022. Lecture Notes in Computer Science, vol 13308. Springer, Cham. https://doi.org/10.1007/978-3-031-05028-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-05028-2_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-05027-5

  • Online ISBN: 978-3-031-05028-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics