Abstract
This paper describes a study on the contribution of some basic linguistic features to the task of machine translation evaluation of Arabic as a target language. AL-TERp is used as a metric dedicated and tuned especially for Arabic. Performed experiments on a medium sized corpora show that linguistic knowledge improves the correlation of metric results with human assessments. Also a detailed qualitative analysis of the results highlights a number of resolved issues related to the use of linguistic features.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Dahlmeier, D., Liu, C., Ng, H.T.: TESLA at WMT2011: translation evaluation and tunable metric. In: WMT 2011 Proceedings of the Sixth Workshop on Statistical Machine Translation, Edinburgh, pp. 78–84 (2011)
Denkowski, M., Lavie, A.: Extending the METEOR machine translation evaluation metric to the phrase level. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 250–253. Association for Computational Linguistics, June 2010
Snover, M.G., Madnani, N., Dorr, B., Schwartz, R.: TER-Plus: paraphrase, semantic, and alignment enhancements to translation edit rate. Mach. Transl. 23(2–3), 117–127 (2009). https://doi.org/10.1007/s10590-009-9062-9
Padó, S., Galley, M., Jurafsky, D., Manning, C.D.: Textual entailment features for machine translation evaluation. In: Proceedings of the Fourth Workshop on Statistical Machine Translation, pp. 37–41. Association for Computational Linguistics, March 2009
Guzmán, F., Bouamor, H., Baly, R., Habash, N.: Machine translation evaluation for Arabic using morphologically-enriched embeddings. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 1398–1408 (2016)
Habash, N.Y.: Introduction to Arabic natural language processing. In: Synthesis Lectures on Human Language Technologies, vol. 3, pp. 1–187 (2010)
El Marouani, M., Boudaa, T., Enneya, N.: AL-TERp: extended metric for machine translation evaluation of Arabic. In: Frasincar, F., Ittoo, A., Nguyen, L., Métais, E. (eds.) NLDB 2017. LNCS, vol. 10260, pp. 156–161. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59569-6_17
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of the AMTA (2006)
Snover, M., Madnani, N., Dorr, B.J., Schwartz, R.: Fluency, adequacy, or HTER?: exploring different human judgments with a tunable MT metric. In: Proceedings of the Fourth Workshop on Statistical Machine Translation, pp. 259–268. Association for Computational Linguistics (2009)
Bouamor, H., Alshikhabobakr, H., Mohit, B., Oflazer, K.: A human judgement corpus and a metric for Arabic MT evaluation. In: EMNLP, pp. 207–213 (2014)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)
Bojar, O., Chatterjee, R., Federmann, C., Graham, Y., Haddow, B., Huang, S., Huck, M., Koehn, P., Liu, Q., Logacheva, V., Monz, C., Negri, M., Post, M., Rubino, R., Specia, L., Turchi, M.: Findings of the 2017 conference on machine translation (WMT17). In: Proceedings of the Second Conference on Machine Translation, pp. 169–214 (2017)
Proceeding of IWSLT 2017 International Workshop on Spoken Language Translation. http://workshop2017.iwslt.org/downloads/iwslt2017_proceeding_v2.pdf
Snowball: a language for stemming algorithms. http://snowball.tartarus.org/texts/introduction.html
Miller, G.A., Fellbaum, C.: WordNet then and now. Lang. Res. Eval. 41, 209–214 (2007)
Bannard, C., Callison-Burch, C.: Paraphrasing with bilingual parallel corpora. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 597–604. Association for Computational Linguistics (2005)
Dorr, B., Snover, M., Madnani, N., Schwartz, R.: TERp system description. In: MetricsMATR Workshop at AMTA (2008)
Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn (2009)
Lavie, M.D.A.: Meteor universal: language specific translation evaluation for any target language. In: ACL 2014, p. 376 (2014)
Ganitkevitch, J., Callison-Burch, C.: The multilingual paraphrase database. In: LREC, pp. 4276–4283 (2014)
Elkateb, S., Black, W., Rodríguez, H., Alkhalifa, M., Vossen, P., Pease, A., Fellbaum, C.: Building a wordnet for Arabic. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006), pp. 22–28 (2006)
Shereen, K.: Stemming Arabic Text. http://zeus.cs.pacificu.edu/shereen/research.htm
Kendall, M.G.: A new measure of rank correlation. Biometrika 30, 81–93 (1938)
Silver, N.C., Dunlap, W.P.: Averaging correlation coefficients: should Fisher’s z transformation be used? J. Appl. Psychol. 72, 146 (1987)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
El Marouani, M., Boudaa, T., Enneya, N. (2018). Incorporation of Linguistic Features in Machine Translation Evaluation of Arabic. In: Tabii, Y., Lazaar, M., Al Achhab, M., Enneya, N. (eds) Big Data, Cloud and Applications. BDCA 2018. Communications in Computer and Information Science, vol 872. Springer, Cham. https://doi.org/10.1007/978-3-319-96292-4_39
Download citation
DOI: https://doi.org/10.1007/978-3-319-96292-4_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96291-7
Online ISBN: 978-3-319-96292-4
eBook Packages: Computer ScienceComputer Science (R0)