Ensembling and Score-Based Filtering in Sentence Alignment for Automatic Simplification of German Texts

Spring, Nicolas; Kostrzewa, Marek; Rios, Annette; Ebling, Sarah

doi:10.1007/978-3-031-05028-2_8

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13308))

Included in the following conference series:

International Conference on Human-Computer Interaction

1129 Accesses
1 Citations

Abstract

Among the well-known accessibility services for audiovisual media are subtitling for the deaf and hard-of-hearing, audio description, and sign language interpreting. More recently, automatic text simplification has emerged as a topic in the context of media accessibility, with research often approaching the task as a case of (sentence-based) monolingual machine translation. This approach relies on large amounts of high-quality parallel data, which is why monolingual sentence alignment has gained momentum. Alignment for text simplification is a complex task, with alignments often taking the form of n:m (in contrast to the standard case of 1:1 in machine translation). In this contribution, we evaluate the performance of different alignment methods against a human-created gold standard of standard German/simplified German sentence alignments created from a number of parallel corpora. Two of the corpora contain multiple levels of simplification.

We employ a variety of alignment methods developed for monolingual tasks and bilingual sentence alignment. We explore strategies such as ensembling and score-based filtering to further improve the performance over these baselines. We show that combining multiple alignment methods with various hard voting strategies can outperform even the best individual methods and that we achieve similar results with score-based filtering of extracted alignments to find the most promising candidates. Our results motivate the notion that the overall task of sentence alignment for automatic simplification of German should be viewed as a two-step process that goes beyond the application of individual alignment methods.

Funded by the Austrian Research Promotion Agency (Österreichische Forschungsförderungsgesellschaft, FFG) General Programme under grant agreement number 881202.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The term “simplified language” is used to denote the sum of all “comprehensibility-enhanced varieties of natural languages” [6, p. 52], i.e., what is commonly termed “Easy Language” (German leichte Sprache) and “Plain Language” (German einfache Sprache). “Easy-to-understand language” has been mentioned as an umbrella term subsuming these varieties [6, p. 52]. However, in this contribution, we prefer the term “simplified language” to emphasize the notion of the result of a simplification process.
2.
https://www.capito.eu/.
3.
https://www.deepl.com.
4.
https://github.com/kostrzmar/SATEF.
5.
https://github.com/thompsonb/vecalign.

References

Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72. Association for Computational Linguistics, Ann Arbor, Michigan, June 2005. https://aclanthology.org/W05-0909
Battisti, A., Pfütze, D., Säuberli, A., Kostrzewa, M., Ebling, S.: A corpus for automatic readability assessment and text simplification of German. In: Proceedings of The 12th Language Resources and Evaluation Conference, pp. 3295–3304. European Language Resources Association, Marseille, France, May 2020. https://www.aclweb.org/anthology/2020.lrec-1.403
Council of Europe: Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Cambridge University Press, Cambridge (2009)
Google Scholar
Hwang, W., Hajishirzi, H., Ostendorf, M., Wu, W.: Aligning sentences from standard Wikipedia to simple Wikipedia. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 211–217. Association for Computational Linguistics, Denver, Colorado, May–June 2015. https://doi.org/10.3115/v1/N15-1022. https://aclanthology.org/N15-1022
Jiang, C., Maddela, M., Lan, W., Zhong, Y., Xu, W.: Neural CRF model for sentence alignment in text simplification (2021)
Google Scholar
Maaß, C.: Easy Language–Plain Language–Easy Language Plus. Balancing Comprehensibility and Acceptability, Easy–Plain–Accessible, vol. 3. Frank & Timme (2020)
Google Scholar
Nikolov, N., Hahnloser, R.: Large-scale hierarchical alignment for data-driven text rewriting. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2019 (2019)
Google Scholar
Paetzold, G., Alva-Manchego, F., Specia, L.: MassAlign: alignment and annotation of comparable documents. In: Park, S., Supnithi, T. (eds.) Proceedings of the IJCNLP 2017, Tapei, Taiwan, 27 November–1 December 2017, System Demonstrations, pp. 1–4. Association for Computational Linguistics (2017). https://aclanthology.info/papers/I17-3001/i17-3001
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 311–318, Philadelphia, PA (2002)
Google Scholar
Pfütze, D.: Sentence alignment gold standards for neural text simplification, University of Zurich (2020)
Google Scholar
Pfütze, D., Ebling, S.: Sentence alignment in the context of automatic text simplification. Poster Presented at KLAARA 2021–2nd Conference on Easy-to-Read Language Research, Switzerland (Online), August 2021
Google Scholar
Reimers, N., Gurevych, I.: Making monolingual sentence embeddings multilingual using knowledge distillation. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, November 2020. https://arxiv.org/abs/2004.09813
Schwenk, H., Douze, M.: Learning joint multilingual sentence representations with neural machine translation. In: Proceedings of the 2nd Workshop on Representation Learning for NLP, pp. 157–167. Association for Computational Linguistics, Vancouver, Canada, August 2017. https://www.aclweb.org/anthology/W17-2619
Spring, N., Pfütze, D., Kostrzewa, M., Battisti, A., Rios, A., Ebling, S.: Comparing sentence alignment methods for automatic simplification of German texts. Presentation Given at the 1st International Easy Language Day Conference (IELD), Germersheim, Germany (2021)
Google Scholar
Štajner, S., Franco-Salvador, M., Rosso, P., Ponzetto, S.: CATS: a tool for customized alignment of text simplification corpora. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), pp. 3895–3903, Miyazaki, Japan (2018)
Google Scholar
Xu, W., Callison-Burch, C., Napoles, C.: Problems in current text simplification research: new data can help. Trans. Assoc. Comput. Linguist. 3, 283–297 (2015)
Google Scholar
Xu, W., Napoles, C., Pavlick, E., Chen, Q., Callison-Burch, C.: Optimizing statistical machine translation for text simplification. Trans. Assoc. Comput. Linguist. 4, 401–415 (2016)
Article Google Scholar
Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y.: BERTScore: evaluating text generation with BERT. In: International Conference on Learning Representations (2020)
Google Scholar
Zhu, Z., Bernhard, D., Gurevych, I.: A monolingual tree-based translation model for sentence simplification. In: Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pp. 1353–1361. Coling 2010 Organizing Committee, Beijing, China, August 2010. https://aclanthology.org/C10-1152

Download references

Acknowledgements

The authors would like to thank the Austria Presse Agentur and CFS GmbH for providing the data for two of the parallel corpora of standard German documents with their simplified counterparts.

Author information

Authors and Affiliations

Department of Computational Linguistics, University of Zurich, Zurich, Switzerland
Nicolas Spring, Marek Kostrzewa, Annette Rios & Sarah Ebling

Authors

Nicolas Spring
View author publications
You can also search for this author in PubMed Google Scholar
Marek Kostrzewa
View author publications
You can also search for this author in PubMed Google Scholar
Annette Rios
View author publications
You can also search for this author in PubMed Google Scholar
Sarah Ebling
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicolas Spring .

Editor information

Editors and Affiliations

Foundation for Research and Technology - Hellas (FORTH), Heraklion, Crete, Greece
Margherita Antona
University of Crete and Foundation for Research and Technology - Hellas (FORTH), Heraklion, Crete, Greece
Constantine Stephanidis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Spring, N., Kostrzewa, M., Rios, A., Ebling, S. (2022). Ensembling and Score-Based Filtering in Sentence Alignment for Automatic Simplification of German Texts. In: Antona, M., Stephanidis, C. (eds) Universal Access in Human-Computer Interaction. Novel Design Approaches and Technologies. HCII 2022. Lecture Notes in Computer Science, vol 13308. Springer, Cham. https://doi.org/10.1007/978-3-031-05028-2_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-05028-2_8
Published: 16 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-05027-5
Online ISBN: 978-3-031-05028-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics