Abstract
One problem of automatic translation is the evaluation of the result. The result should be as close to a human reference translation as possible, but varying word order or synonyms have to be taken into account for the evaluation of the similarity of both. In the conventional methods, researchers tend to employ many resources such as the synonyms vocabulary, paraphrasing, and text entailment data, etc. To make the evaluation model both accurate and concise, this paper explores the evaluation only using Part-of-Speech information of the words, which means the method is based only on the consilience of the POS strings of the hypothesis translation and reference. In this developed method, the POS also acts as the similar function with the synonyms in addition to its syntactic or morphological behaviour of the lexical item in question. Measures for the similarity between machine translation and human reference are dependent on the language pair since the word order or the number of synonyms may vary, for instance. This new measure solves this problem to a certain extent by introducing weights to different sources of information. The experiment results on English, German and French languages correlate on average better with the human reference than some existing measures, such as BLEU, AMBER and MP4IBM1.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Koehn, P.: Statistical Machine Translation (University of Edinburgh). Cambridge University Press (2010)
Su, K.-Y., Wu, M.-W., Chang, J.-S.: A New Quantitative Quality Measure for Machine Translation Systems. In: Proceedings of the 14th International Conference on Computational Linguistics, Nantes, France, pp. 433–439 (July 1992)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the ACL 2002, Philadelphia, PA, USA, pp. 311–318 (2002)
Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the Second International Conference on Human Language Technology Research, San Diego, California, USA, pp. 138–145 (2002)
Banerjee, S., Lavie, A.: Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of ACL-WMT, Prague, Czech Republic, pp. 65–72 (2005)
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of the Conference of the Association for Machine Translation in the Americas, Boston, USA, pp. 223–231 (2006)
Chen, B., Kuhn, R.: Amber: A modified bleu, enhanced ranking metric. In: Proceedings of ACL-WMT, Edinburgh, Scotland, UK, pp. 71–77 (2011)
Bicici, E., Yuret, D.: RegMT system for machine translation, system combination, and evaluation. In: Proceedings ACL-WMT, Edinburgh, Scotland, UK, pp. 323–329 (2011)
Taylor, J.S., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press (2004)
Wong, B.T.-M., Kit, C.: Word choice and word position for automatic MT evaluation. In: Workshop: MetricsMATR of the Association for Machine Translation in the Americas, Waikiki, Hawai, USA, 3 pages (2008)
Isozaki, H., Hirao, T., Duh, K., Sudoh, K., Tsukada, H.: Automatic evaluation of translation quality for distant language pairs. In: Proceedings of the 2010 Conference on EMNLP, Cambridge, MA, pp. 944–952 (2010)
Talbot, D., Kazawa, H., Ichikawa, H., Katz-Brown, J., Seno, M., Och, F.: A Lightweight Evaluation Framework for Machine Translation Reordering. In: Proceedings of the Sixth ACL-WMT, Edinburgh, Scotland, UK, pp. 12–21 (2011)
Song, X., Cohn, T.: Regression and ranking based optimisation for sentence level MT evaluation. In: Proceedings of the ACL-WMT, Edinburgh, Scotland, UK, pp. 123–129 (2011)
Popovic, M.: Morphemes and POS tags for n-gram based evaluation metrics. In: Proceedings of ACL-WMT, Edinburgh, Scotland, UK, pp. 104–107 (2011)
Popovic, M., Vilar, D., Avramidis, E., Burchardt, A.: Evaluation without references: IBM1 scores as evaluation metrics. In: Proceedings of the ACL-WMT, Edinburgh, Scotland, UK, pp. 99–103 (2011)
Petrov, S., Barrett, L., Thibaux, R., Klein, D.: Learning accurate, compact, and interpretable tree annotation. In: Proceedings of the 21st ACL, Sydney, pp. 433–440 (July 2006)
Callison-Bruch, C., Koehn, P., Monz, C., Zaidan, O.F.: Findings of the 2011 Workshop on Statistical Machine Translation. In: Proceedings of ACL-WMT, Edinburgh, Scotland, UK, pp. 22–64 (2011)
Han, A.L.-F., Wong, D.F., Chao, L.S.: LEPOR: A Robust Evaluation Metric for Machine Translation with Augmented Factors. In: Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012): Posters, Mumbai, India, pp. 441–450 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Han, A.L.F., Wong, D.F., Chao, L.S., He, L. (2013). Automatic Machine Translation Evaluation with Part-of-Speech Information. In: Habernal, I., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2013. Lecture Notes in Computer Science(), vol 8082. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40585-3_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-40585-3_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40584-6
Online ISBN: 978-3-642-40585-3
eBook Packages: Computer ScienceComputer Science (R0)