Skip to main content
Log in

Self-selection bias of similarity metrics in translation memory evaluation

  • Published:
Machine Translation

Abstract

A translation memory system attempts to retrieve useful suggestions from previous translations to assist a translator in a new translation task. While assisting the translator with a specific segment, some similarity metric is usually employed to select the best matches from previously translated segments to present to a translator. Automated methods for evaluating a translation memory system usually use reference translations and some similarity metric. Such evaluation methods might be expected to assist in choosing between competing systems. No single evaluation method has gained widespread use; additionally the similarity metric used in each of these methods is not standardised either. This paper investigates the consequences of substituting the similarity metric in such an evaluation method, and finds that the similarity metrics exhibit a strong bias for the system using the same metric for retrieval. Consequently the choice of similarity metric in the evaluation of translation memory systems should be carefully reconsidered.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Also see http://hlt-evaluation.org.

  2. This definition assumes unit cost, in other words a distance of 1 for each of the operations except identity. Different weights can be assigned to different operations, but is not considered further in this paper.

  3. http://omegat.org/.

  4. http://okapi.opentag.com/.

  5. http://virtaal.translatehouse.org/.

  6. http://amagama.translatehouse.org/.

  7. http://site.icu-project.org/.

  8. http://snowball.tartarus.org/.

  9. http://amagama.translatehouse.org/.

  10. Available from https://l10n.gnome.org/releases/gnome-3-8/.

  11. \(F_1 = 2\times \frac{P^w_f\times F^w_f}{(P^w_f + F^w_f)}\).

References

  • Azzano D (2011) Placeable and localizable elements in translation memory systems. PhD thesis, Ludwig-Maximilians-Universität München

  • Baldwin T (2009) The hare and the tortoise: speed and accuracy in translation retrieval. Mach Trans 23:195–240. doi:10.1007/s10590-009-9064-7

    Article  Google Scholar 

  • Bloodgood M, Strauss B (2014) Translation memory retrieval methods. In: Proceedings of the 14 th conference of the European chapter of the association for computational linguistics, Association for Computational Linguistics, Gothenburg, pp 202–210, http://www.aclweb.org/anthology/E14-1022

  • Damerau FJ (1964) A technique for computer detection and correction of spelling errors. Commun ACM 7(3):171–176. doi:10.1145/363958.363994

    Article  Google Scholar 

  • Levenshtein V (1966) Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Dokl 10:707–710

    MathSciNet  MATH  Google Scholar 

  • Mapelli V, Arranz V, Mazo H, Choukri K (2008) Latest developments in ELRA’s services. In: Calzolari N, Choukri K, Maegaard B, Mariani J, Odijk J, Piperidis S, Tapias D (eds) Proceedings of the sixth international conference on language resources and evaluation (LREC’08), European Language Resources Association (ELRA), Marrakech, http://www.lrec-conf.org/proceedings/lrec2008/

  • O’Brien S (2007) Eye-tracking and translation memory matches. Perspectives 14(3):185–205

    Google Scholar 

  • Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, Association for Computational Linguistics, Stroudsburg, ACL ’02, pp 311–318, doi:10.3115/1073083.1073135

  • Servan C, Schwenk H (2011) Optimising multiple metrics with MERT. The Prague Bulletin of Mathematical Linguistics (PBML). http://www-lium.univ-lemans.fr/~servan/publications/Servan_PBML_2011.pdf

  • Simard M, Fujita A (2012) A poor mans translation memory using machine translation evaluation metrics. In: Proceedings of the tenth conference of the association for machine translation in the Americas

  • Steinberger R, Eisele A, Klocek S, Pilos S, Schlüter P (2012) DGT-TM: A freely available translation memory in 22 languages. In: Calzolari N, Choukri K, Declerck T, Doǧan MU, Maegaard B, Mariani J, Odijk J, Piperidis S (eds) Proceedings of the eight international conference on language resources and evaluation (LREC’12), European Language Resources Association (ELRA), Istanbul, http://www.lrec-conf.org/proceedings/lrec2012/index.html

  • Vanallemeersch T, Vandeghinste V (2015) Assessing linguistically aware fuzzy matching in translation memories. In: Proceedings of the 18th annual conference of the European association for machine translation, Antalya, EAMT, https://lirias.kuleuven.be/handle/123456789/499781

  • Whyman E, Somers H (1999) Evaluation metrics for a translation memory system. Softw-Pract Exp 29(14):1265–1284

    Article  Google Scholar 

  • Wolff F, Pretorius L, Dugast L, Buitelaar P (2016) Methodological pitfalls in automated translation memory evaluation. In: Proceedings of the 2nd workshop on natural language processing for translation memories (NLP4TM 2016), Portorož, LREC 2016, http://www.lrec-conf.org/proceedings/lrec2016/workshops/LREC2016Workshop-NLP4TM_Proceedings.pdf

Download references

Acknowledgements

This research was supported in part by funding from the Science Foundation Ireland under Grant Number SFI/12/RC/2289 (Insight) and the Academy of African Languages and Science Strategic Project of the University of South Africa.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Friedel Wolff.

Additional information

This paper is an extended version of Wolff et al. (2016).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wolff, F., Pretorius, L., Dugast, L. et al. Self-selection bias of similarity metrics in translation memory evaluation. Machine Translation 30, 129–144 (2016). https://doi.org/10.1007/s10590-016-9185-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-016-9185-8

Keywords

Navigation