Involving language professionals in the evaluation of machine translation

Popović, Maja; Avramidis, Eleftherios; Burchardt, Aljoscha; Hunsicker, Sabine; Schmeier, Sven; Tscherwinka, Cindy; Vilar, David; Uszkoreit, Hans

doi:10.1007/s10579-014-9286-z

Involving language professionals in the evaluation of machine translation

Original Paper
Published: 15 November 2014

Volume 48, pages 541–559, (2014)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Maja Popović¹,
Eleftherios Avramidis¹,
Aljoscha Burchardt¹,
Sabine Hunsicker²,
Sven Schmeier¹,
Cindy Tscherwinka²,
David Vilar¹ &
…
Hans Uszkoreit¹

620 Accesses
3 Citations
Explore all metrics

Abstract

Significant breakthroughs in machine translation (MT) only seem possible if human translators are taken into the loop. While automatic evaluation and scoring mechanisms such as BLEU have enabled the fast development of systems, it is not clear how systems can meet real-world (quality) requirements in industrial translation scenarios today. The taraXŰ project has paved the way for wide usage of multiple MT outputs through various feedback loops in system development. The project has integrated human translators into the development process thus collecting feedback for possible improvements. This paper describes results from detailed human evaluation. Performance of different types of translation systems has been compared and analysed via ranking, error analysis and post-editing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on large language model based autonomous agents

Article Open access 22 March 2024

The Use of Artificial Intelligence in Writing Scientific Review Articles

Article Open access 16 January 2024

The Breakthrough of Large Language Models Release for Medical Applications: 1-Year Timeline and Perspectives

Article Open access 17 February 2024

Notes

http://taraxu.dfki.de/
http://translate.google.com/
http://www.trados.com/en/
For reasons of required anonymisation.
Note that this must be seen as an experiment. This was done in order to simulate the use of TMs, although it does not mirror the exact use of them in the translation industry.
More publications can be found online: http://taraxu.dfki.de/publications

References

Alonso, J. A., & Thurmair, G. (2003). The comprendium translator system. In: Proceedings of the Ninth Machine Translation Summit.
Avramidis, E., Popović, M., Vilar, D., & Burchardt, A. (2011). Evaluate with confidence estimation: Machine ranking of translation outputs using grammatical features. Proceedings of the Sixth Workshop on Statistical Machine Translation. Edinburgh, Scotland: Association for Computational Linguistics, pp. 65–70.
Burchardt, A., Tscherwinka, C., & Avramidis, E. (2013). Machine translation at work, studies in computational intelligence (241–261) (Vol. 458). Berlin: Springer.
Google Scholar
Callison-Burch, C., Koehn, P., Monz, C., Peterson, K., Przybocki, M., & Zaidan, O. (2010). Findings of the 2010 joint workshop on statistical machine translation and metrics for machine translation. In: Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR. Association for Computational Linguistics, Uppsala, Sweden, pp. 17–53, revised August 2010.
Eisele, A., & Chen, Y. (2010). MultiUN: A Multilingual Corpus from United Nation Documents. Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), Malta: La Valletta, pp. 2868–2872.
Farzindar, A., & Lapalme, G. (2009). Machine translation of legal information and its evaluation. Proceedings of the 22nd Canadian Conference on Artificial Intelligence (Canadian AI 09), BC: Kelowna, pp. 64–73.
Federmann, C. (2010). Appraise: An open-source toolkit for manual phrase-based evaluation of translations. In: Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC 2010), La Valletta, Malta.
He, Y., Ma, Y., Roturier, J., Way, A., & van Genabith, J. (2010). Improving the post-editing experience using translation recommendation: A user study. In: Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas (AMTA 2010), Denver, Colorado.
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., et al. (2007). Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. Association for Computational Linguistics, Stroudsburg, PA, USA, ACL ’07, pp. 177–180.
Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2001). Bleu: A method for automatic evaluation of machine translation. IBM Research Report RC22176(W0109–022), IBM.
Popović, M. (2011). Hjerson: An open source tool for automatic error classification of machine translation output. The Prague Bulletin of Mathematical Linguistics, 96, 59–68.
Specia, L., & Farzindar, A. (2010). Estimating machine translation post-editing effort with HTER. In: Proceedings of AMTA-2010 Workshop Bringing MT to the User. MT Research and the Translation Industry, Denver, Colorado.
Tiedemann, J. (2009). News from OPUS—A collection of multilingual parallel corpora with tools and interfaces. In: N. Nicolov, K. Bontcheva, G. Angelova & R. Mitkov (Eds.), Advances in natural language processing (pp. 237–248.), vol V, chap V, Borovets, Bulgaria.
Vilar, D., Xu, J., D’Haro, L. F., & Ney, H. (2006). Error analysis of machine translation output. International Conference on Language Resources and Evaluation, Italy: Genoa, pp. 697–702.
Vilar, D., Stein, D., Huck, M., & Ney, H. (2010). Jane: Open source hierarchical translation, extended with reordering and lexicon models. ACL 2010 Joint Fifth Workshop on Statistical Machine Translation and Metrics MATR, Sweden: Uppsala, pp. 262–270.

Download references

Acknowledgments

This work has been developed within the taraXŰ Project financed by TSB Technologiestiftung Berlin—Zukunftsfonds Berlin, co-financed by the European Union—European fund for regional development. Thanks to our colleague Christian Federmann for helping with the Appraise system.

Author information

Authors and Affiliations

DFKI – Language Technology Lab, Berlin, Germany
Maja Popović, Eleftherios Avramidis, Aljoscha Burchardt, Sven Schmeier, David Vilar & Hans Uszkoreit
euroscript Deutschland, Berlin, Germany
Sabine Hunsicker & Cindy Tscherwinka

Authors

Maja Popović
View author publications
You can also search for this author in PubMed Google Scholar
Eleftherios Avramidis
View author publications
You can also search for this author in PubMed Google Scholar
Aljoscha Burchardt
View author publications
You can also search for this author in PubMed Google Scholar
Sabine Hunsicker
View author publications
You can also search for this author in PubMed Google Scholar
Sven Schmeier
View author publications
You can also search for this author in PubMed Google Scholar
Cindy Tscherwinka
View author publications
You can also search for this author in PubMed Google Scholar
David Vilar
View author publications
You can also search for this author in PubMed Google Scholar
Hans Uszkoreit
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maja Popović.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Popović, M., Avramidis, E., Burchardt, A. et al. Involving language professionals in the evaluation of machine translation. Lang Resources & Evaluation 48, 541–559 (2014). https://doi.org/10.1007/s10579-014-9286-z

Download citation

Published: 15 November 2014
Issue Date: December 2014
DOI: https://doi.org/10.1007/s10579-014-9286-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Involving language professionals in the evaluation of machine translation

Abstract

Access this article

Similar content being viewed by others

A survey on large language model based autonomous agents

The Use of Artificial Intelligence in Writing Scientific Review Articles

The Breakthrough of Large Language Models Release for Medical Applications: 1-Year Timeline and Perspectives

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Involving language professionals in the evaluation of machine translation

Abstract

Access this article

Similar content being viewed by others

A survey on large language model based autonomous agents

The Use of Artificial Intelligence in Writing Scientific Review Articles

The Breakthrough of Large Language Models Release for Medical Applications: 1-Year Timeline and Perspectives

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation