Abstract
The Interactive Machine Translation (IMT) systems produce translations combining the knowledge of professional translators with the generation speed of the Machine Translation (MT) models. Both interact with the finality of generating error-free translations. The main goal of research in the IMT field is to reduce the effort that the professional translators have to perform during the IMT session. There are very different techniques to reduce this effort, from changing the display used to perform the corrections to changing the feedback signal that the user sends to the MT model. This article propose a method to reduce the effort performed by applying Confidence Measures (CMs) that give us a score for each translation and only let the user translate those that obtained a low score. We have trained for Recurrent Neural Network (RNN) models to approximate the scores from four of the most used metrics in MT: Bleu, Meteor, Chr-F, and Ter. We have simulated the user interaction with an Interactive-Predictive Neural Machine Translation (IPNMT) system to study the effort reduction that we can obtain while getting high-quality translations from the system. We have tested different thresholds values to consider that a translation has a low score, which gives us a transition between a convention IPNMT system where the system has to correct all the translations to an unsupervised MT system. The results showed that this method obtains very good translations – 70 points of Bleu – and reduces the human effort by 60%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alabau, V., et al.: CASMACAT: an open source workbench for advanced computer aided translation. Prague Bull. Math. Linguist. 100(1), 101–112 (2013)
Bahdanau, D., et al.: An actor-critic algorithm for sequence prediction. arXiv preprint arXiv:1607.07086 (2016)
Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the Association for Computational Linguistics Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, Michigan, pp. 65–72. ACL, June 2005. https://aclanthology.org/W05-0909
Barrachina, S., et al.: Statistical approaches to computer-assisted translation. Comput. Linguist. 35(1), 3–28 (2009). https://aclanthology.org/J09-1002
Blatz, J., et al.: Confidence estimation for machine translation. In: COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics, pp. 315–321. COLING, Geneva, August 2004. https://aclanthology.org/C04-1046
Domingo, M., Peris, Á., Casacuberta, F.: Segment-based interactive-predictive machine translation. Mach. Transl. 31(4), 163–185 (2018). https://doi.org/10.1007/s10590-017-9213-3
Foster, G., Isabelle, P., Plamondon, P.: Target-text mediated interactive machine translation. Mach. Transl. 12(1), 175–194 (1997)
González-Rubio, J., Ortíz-Martínez, D., Casacuberta, F.: Balancing user effort and translation error in interactive machine translation via confidence measures. In: Proceedings of the Association for Computational Linguistics 2010 Conference Short Papers, Uppsala, Sweden, pp. 173–177. ACL, July 2010. https://www.aclweb.org/anthology/P10-2032
González-Rubio, J., Ortíz-Martínez, D., Casacuberta, F.: On the use of confidence measures within an interactive-predictive machine translation system. In: Proceedings of the 14th Annual conference of the European Association for Machine Translation. EAMT, Saint Raphaël, May 2010. https://aclanthology.org/2010.eamt-1.18
Granell, E., Romero, V., Martínez-Hinarejos, C.D.: Study of the influence of lexicon and language restrictions on computer assisted transcription of historical manuscripts. Neurocomputing 390, 12–27 (2020)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Ive, J., Blain, F., Specia, L.: DeepQuest: a framework for neural-based quality estimation. In: Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, pp. 3146–3157. ACL, August 2018. https://aclanthology.org/C18-1266
Kepler, F., Trénous, J., Treviso, M., Vera, M., Martins, A.F.T.: OpenKiwi: an open source framework for quality estimation. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Florence, Italy, pp. 117–122. ACL, July 2019. https://aclanthology.org/P19-3020
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2017)
Koehn, P., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, Prague, Czech Republic, pp. 177–180. ACL, June 2007. https://www.aclweb.org/anthology/P07-2045
Navarro, Á., Casacuberta, F.: Confidence measures for interactive neural machine translation. In: Proceedings of the IberSPEECH 2021, pp. 195–199. IberSPEECH (2021)
Navarro, Á., Casacuberta, F.: Neural models for measuring confidence on interactive machine translation systems. Appl. Sci. 12(3), 1100 (2022)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, pp. 311–318. ACL, July 2002. https://aclanthology.org/P02-1040
Peris, Á., Casacuberta, F.: NMT-Keras: a very flexible toolkit with a focus on interactive NMT and online learning. Prague Bull. Math. Linguist. 111, 113–124, October 2018. https://ufal.mff.cuni.cz/pbml/111/art-peris-casacuberta.pdf
Popović, M.: chrF: character n-gram F-score for automatic MT evaluation. In: Proceedings of the Tenth Workshop on Statistical Machine Translation, Lisbon, Portugal, pp. 392–395. ACL, September 2015. https://aclanthology.org/W15-3049
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, pp. 1715–1725. ACL, August 2016. https://www.aclweb.org/anthology/P16-1162
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers, pp. 223–231. AMTA, Cambridge, Massachusetts, August 2006. https://aclanthology.org/2006.amta-papers.25
Specia, L., Shah, K., De Souza, J.G., Cohn, T.: QuEst-a translation quality estimation framework. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Sofia, Bulgaria, pp. 79–84. ACL, August 2013. https://aclanthology.org/P13-4014
Specia, L., Turchi, M., Cancedda, N., Cristianini, N., Dymetman, M.: Estimating the sentence-level quality of machine translation systems. In: Proceedings of the 13th Annual conference of the European Association for Machine Translation, vol. 9, pp. 28–35. EAMT (2009)
Tomás, J., Casacuberta, F.: Statistical phrase-based models for interactive computer-assisted translation. In: Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, Sydney, Australia, pp. 835–841. ACL, July 2006. https://aclanthology.org/P06-2107
Toral, A.: Reassessing claims of human parity and super-human performance in machine translation at WMT 2019. In: Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, Lisboa, Portugal, pp. 185–194. EAMT, November 2020. https://aclanthology.org/2020.eamt-1.20
Ueffing, N., Macherey, K., Ney, H.: Confidence measures for statistical machine translation. In: Proceedings of Machine Translation Summit IX: Papers, MTSummit, New Orleans, USA, September 2003. https://aclanthology.org/2003.mtsummit-papers.52
Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)
Wessel, F., Schluter, R., Macherey, K., Ney, H.: Confidence measures for large vocabulary continuous speech recognition. IEEE Trans. Speech Audio Process. 9(3), 288–298 (2001)
Acknowledgements
This work received funds from the Comunitat Valenciana under project EU-FEDER (IDIFEDER/2018/025), Generalitat Valenciana under project ALMAMATER (PrometeoII/2014/030), Ministerio de Ciencia e Investigación/Agencia Estatal de Investigacion /10.13039/501100011033/and “FEDER Una manera de hacer Europa” under project MIRANDA-DocTIUM (RTI2018-095645-B-C22), and Universitat Politècnica de València under the program (PAID-01-21).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Navarro Martínez, Á., Casacuberta Nolla, F. (2022). Turning Machine Translation Metrics into Confidence Measures. In: Rosso, P., Basile, V., Martínez, R., Métais, E., Meziane, F. (eds) Natural Language Processing and Information Systems. NLDB 2022. Lecture Notes in Computer Science, vol 13286. Springer, Cham. https://doi.org/10.1007/978-3-031-08473-7_44
Download citation
DOI: https://doi.org/10.1007/978-3-031-08473-7_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08472-0
Online ISBN: 978-3-031-08473-7
eBook Packages: Computer ScienceComputer Science (R0)