Abstract
Among the well-known accessibility services for audiovisual media are subtitling for the deaf and hard-of-hearing, audio description, and sign language interpreting. More recently, automatic text simplification has emerged as a topic in the context of media accessibility, with research often approaching the task as a case of (sentence-based) monolingual machine translation. This approach relies on large amounts of high-quality parallel data, which is why monolingual sentence alignment has gained momentum. Alignment for text simplification is a complex task, with alignments often taking the form of n:m (in contrast to the standard case of 1:1 in machine translation). In this contribution, we evaluate the performance of different alignment methods against a human-created gold standard of standard German/simplified German sentence alignments created from a number of parallel corpora. Two of the corpora contain multiple levels of simplification.
We employ a variety of alignment methods developed for monolingual tasks and bilingual sentence alignment. We explore strategies such as ensembling and score-based filtering to further improve the performance over these baselines. We show that combining multiple alignment methods with various hard voting strategies can outperform even the best individual methods and that we achieve similar results with score-based filtering of extracted alignments to find the most promising candidates. Our results motivate the notion that the overall task of sentence alignment for automatic simplification of German should be viewed as a two-step process that goes beyond the application of individual alignment methods.
Funded by the Austrian Research Promotion Agency (Österreichische Forschungsförderungsgesellschaft, FFG) General Programme under grant agreement number 881202.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The term “simplified language” is used to denote the sum of all “comprehensibility-enhanced varieties of natural languages” [6, p. 52], i.e., what is commonly termed “Easy Language” (German leichte Sprache) and “Plain Language” (German einfache Sprache). “Easy-to-understand language” has been mentioned as an umbrella term subsuming these varieties [6, p. 52]. However, in this contribution, we prefer the term “simplified language” to emphasize the notion of the result of a simplification process.
- 2.
- 3.
- 4.
- 5.
References
Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72. Association for Computational Linguistics, Ann Arbor, Michigan, June 2005. https://aclanthology.org/W05-0909
Battisti, A., Pfütze, D., Säuberli, A., Kostrzewa, M., Ebling, S.: A corpus for automatic readability assessment and text simplification of German. In: Proceedings of The 12th Language Resources and Evaluation Conference, pp. 3295–3304. European Language Resources Association, Marseille, France, May 2020. https://www.aclweb.org/anthology/2020.lrec-1.403
Council of Europe: Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Cambridge University Press, Cambridge (2009)
Hwang, W., Hajishirzi, H., Ostendorf, M., Wu, W.: Aligning sentences from standard Wikipedia to simple Wikipedia. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 211–217. Association for Computational Linguistics, Denver, Colorado, May–June 2015. https://doi.org/10.3115/v1/N15-1022. https://aclanthology.org/N15-1022
Jiang, C., Maddela, M., Lan, W., Zhong, Y., Xu, W.: Neural CRF model for sentence alignment in text simplification (2021)
Maaß, C.: Easy Language–Plain Language–Easy Language Plus. Balancing Comprehensibility and Acceptability, Easy–Plain–Accessible, vol. 3. Frank & Timme (2020)
Nikolov, N., Hahnloser, R.: Large-scale hierarchical alignment for data-driven text rewriting. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2019 (2019)
Paetzold, G., Alva-Manchego, F., Specia, L.: MassAlign: alignment and annotation of comparable documents. In: Park, S., Supnithi, T. (eds.) Proceedings of the IJCNLP 2017, Tapei, Taiwan, 27 November–1 December 2017, System Demonstrations, pp. 1–4. Association for Computational Linguistics (2017). https://aclanthology.info/papers/I17-3001/i17-3001
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 311–318, Philadelphia, PA (2002)
Pfütze, D.: Sentence alignment gold standards for neural text simplification, University of Zurich (2020)
Pfütze, D., Ebling, S.: Sentence alignment in the context of automatic text simplification. Poster Presented at KLAARA 2021–2nd Conference on Easy-to-Read Language Research, Switzerland (Online), August 2021
Reimers, N., Gurevych, I.: Making monolingual sentence embeddings multilingual using knowledge distillation. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, November 2020. https://arxiv.org/abs/2004.09813
Schwenk, H., Douze, M.: Learning joint multilingual sentence representations with neural machine translation. In: Proceedings of the 2nd Workshop on Representation Learning for NLP, pp. 157–167. Association for Computational Linguistics, Vancouver, Canada, August 2017. https://www.aclweb.org/anthology/W17-2619
Spring, N., Pfütze, D., Kostrzewa, M., Battisti, A., Rios, A., Ebling, S.: Comparing sentence alignment methods for automatic simplification of German texts. Presentation Given at the 1st International Easy Language Day Conference (IELD), Germersheim, Germany (2021)
Štajner, S., Franco-Salvador, M., Rosso, P., Ponzetto, S.: CATS: a tool for customized alignment of text simplification corpora. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), pp. 3895–3903, Miyazaki, Japan (2018)
Xu, W., Callison-Burch, C., Napoles, C.: Problems in current text simplification research: new data can help. Trans. Assoc. Comput. Linguist. 3, 283–297 (2015)
Xu, W., Napoles, C., Pavlick, E., Chen, Q., Callison-Burch, C.: Optimizing statistical machine translation for text simplification. Trans. Assoc. Comput. Linguist. 4, 401–415 (2016)
Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y.: BERTScore: evaluating text generation with BERT. In: International Conference on Learning Representations (2020)
Zhu, Z., Bernhard, D., Gurevych, I.: A monolingual tree-based translation model for sentence simplification. In: Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pp. 1353–1361. Coling 2010 Organizing Committee, Beijing, China, August 2010. https://aclanthology.org/C10-1152
Acknowledgements
The authors would like to thank the Austria Presse Agentur and CFS GmbH for providing the data for two of the parallel corpora of standard German documents with their simplified counterparts.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Spring, N., Kostrzewa, M., Rios, A., Ebling, S. (2022). Ensembling and Score-Based Filtering in Sentence Alignment for Automatic Simplification of German Texts. In: Antona, M., Stephanidis, C. (eds) Universal Access in Human-Computer Interaction. Novel Design Approaches and Technologies. HCII 2022. Lecture Notes in Computer Science, vol 13308. Springer, Cham. https://doi.org/10.1007/978-3-031-05028-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-05028-2_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-05027-5
Online ISBN: 978-3-031-05028-2
eBook Packages: Computer ScienceComputer Science (R0)