Abstract
A translation memory system attempts to retrieve useful suggestions from previous translations to assist a translator in a new translation task. While assisting the translator with a specific segment, some similarity metric is usually employed to select the best matches from previously translated segments to present to a translator. Automated methods for evaluating a translation memory system usually use reference translations and some similarity metric. Such evaluation methods might be expected to assist in choosing between competing systems. No single evaluation method has gained widespread use; additionally the similarity metric used in each of these methods is not standardised either. This paper investigates the consequences of substituting the similarity metric in such an evaluation method, and finds that the similarity metrics exhibit a strong bias for the system using the same metric for retrieval. Consequently the choice of similarity metric in the evaluation of translation memory systems should be carefully reconsidered.
Similar content being viewed by others
Notes
Also see http://hlt-evaluation.org.
This definition assumes unit cost, in other words a distance of 1 for each of the operations except identity. Different weights can be assigned to different operations, but is not considered further in this paper.
Available from https://l10n.gnome.org/releases/gnome-3-8/.
\(F_1 = 2\times \frac{P^w_f\times F^w_f}{(P^w_f + F^w_f)}\).
References
Azzano D (2011) Placeable and localizable elements in translation memory systems. PhD thesis, Ludwig-Maximilians-Universität München
Baldwin T (2009) The hare and the tortoise: speed and accuracy in translation retrieval. Mach Trans 23:195–240. doi:10.1007/s10590-009-9064-7
Bloodgood M, Strauss B (2014) Translation memory retrieval methods. In: Proceedings of the 14 th conference of the European chapter of the association for computational linguistics, Association for Computational Linguistics, Gothenburg, pp 202–210, http://www.aclweb.org/anthology/E14-1022
Damerau FJ (1964) A technique for computer detection and correction of spelling errors. Commun ACM 7(3):171–176. doi:10.1145/363958.363994
Levenshtein V (1966) Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Dokl 10:707–710
Mapelli V, Arranz V, Mazo H, Choukri K (2008) Latest developments in ELRA’s services. In: Calzolari N, Choukri K, Maegaard B, Mariani J, Odijk J, Piperidis S, Tapias D (eds) Proceedings of the sixth international conference on language resources and evaluation (LREC’08), European Language Resources Association (ELRA), Marrakech, http://www.lrec-conf.org/proceedings/lrec2008/
O’Brien S (2007) Eye-tracking and translation memory matches. Perspectives 14(3):185–205
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, Association for Computational Linguistics, Stroudsburg, ACL ’02, pp 311–318, doi:10.3115/1073083.1073135
Servan C, Schwenk H (2011) Optimising multiple metrics with MERT. The Prague Bulletin of Mathematical Linguistics (PBML). http://www-lium.univ-lemans.fr/~servan/publications/Servan_PBML_2011.pdf
Simard M, Fujita A (2012) A poor mans translation memory using machine translation evaluation metrics. In: Proceedings of the tenth conference of the association for machine translation in the Americas
Steinberger R, Eisele A, Klocek S, Pilos S, Schlüter P (2012) DGT-TM: A freely available translation memory in 22 languages. In: Calzolari N, Choukri K, Declerck T, Doǧan MU, Maegaard B, Mariani J, Odijk J, Piperidis S (eds) Proceedings of the eight international conference on language resources and evaluation (LREC’12), European Language Resources Association (ELRA), Istanbul, http://www.lrec-conf.org/proceedings/lrec2012/index.html
Vanallemeersch T, Vandeghinste V (2015) Assessing linguistically aware fuzzy matching in translation memories. In: Proceedings of the 18th annual conference of the European association for machine translation, Antalya, EAMT, https://lirias.kuleuven.be/handle/123456789/499781
Whyman E, Somers H (1999) Evaluation metrics for a translation memory system. Softw-Pract Exp 29(14):1265–1284
Wolff F, Pretorius L, Dugast L, Buitelaar P (2016) Methodological pitfalls in automated translation memory evaluation. In: Proceedings of the 2nd workshop on natural language processing for translation memories (NLP4TM 2016), Portorož, LREC 2016, http://www.lrec-conf.org/proceedings/lrec2016/workshops/LREC2016Workshop-NLP4TM_Proceedings.pdf
Acknowledgements
This research was supported in part by funding from the Science Foundation Ireland under Grant Number SFI/12/RC/2289 (Insight) and the Academy of African Languages and Science Strategic Project of the University of South Africa.
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper is an extended version of Wolff et al. (2016).
Rights and permissions
About this article
Cite this article
Wolff, F., Pretorius, L., Dugast, L. et al. Self-selection bias of similarity metrics in translation memory evaluation. Machine Translation 30, 129–144 (2016). https://doi.org/10.1007/s10590-016-9185-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-016-9185-8