Evaluation of 2-way Iraqi Arabic–English speech translation systems using automated metrics

Condon, Sherri; Arehart, Mark; Parvaz, Dan; Sanders, Gregory; Doran, Christy; Aberdeen, John

doi:10.1007/s10590-011-9105-x

Evaluation of 2-way Iraqi Arabic–English speech translation systems using automated metrics

Published: 22 September 2011

Volume 26, pages 159–176, (2012)
Cite this article

Machine Translation

Sherri Condon¹,
Mark Arehart¹,
Dan Parvaz²,
Gregory Sanders³,
Christy Doran⁴ &
…
John Aberdeen⁴

206 Accesses
3 Citations
Explore all metrics

Abstract

The Defense Advanced Research Projects Agency (DARPA) Spoken Language Communication and Translation System for Tactical Use (TRANSTAC) program (http://1.usa.gov/transtac) faced many challenges in applying automated measures of translation quality to Iraqi Arabic–English speech translation dialogues. Features of speech data in general and of Iraqi Arabic data in particular undermine basic assumptions of automated measures that depend on matching system outputs to reference translations. These features are described along with the challenges they present for evaluating machine translation quality using automated metrics. We show that scores for translation into Iraqi Arabic exhibit higher correlations with human judgments when they are computed from normalized system outputs and reference translations. Orthographic normalization, lexical normalization, and operations involving light stemming resulted in higher correlations with human judgments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial intelligence with American values and Chinese characteristics: a comparative analysis of American and Chinese governmental AI policies

Article Open access 25 June 2022

Machine translation systems and quality assessment: a systematic review

Article Open access 10 April 2021

Assessing gender bias in machine translation: a case study with Google Translate

Article 27 March 2019

References

Bannerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL 2005 workshop on intrinsic and extrinsic evaluation measures for MT and/or summarization, pp 65–73
Buckwalter T (2001) Arabic transliteration. http://www.qamus.org/transliteration.htm
Callison-Burch C, Osborne M, Koehn P (2006) Re-evaluating the role of BLEU in machine translation research. Proc EACL 2006:249–256
Chatterjee N, Johnson A, Krishna M (2007) Some improvements over the BLEU metric for measuring translation quality for Hindi. In: Proceedings of the international conference on computing: theory and applications 2007, pp 485–490
Condon S, Sanders G, Parvaz D, Rubenstein A, Doran C, Aberdeen J, Oshika B (2009) Normalization for automated metrics: English and Arabic speech translation. In: Proceedings of MT summit XII, Ottawa, Ontario, Canada, pp 33–40
Culy C, Riehemann S (2003) The limits of n-gram translation evaluation metrics. In: Proceedings of the MT summit IX, New Orleans, LA, USA, pp 71–78
Diab M (2009) Second generation tools (amira 2.0): Fast and robust tokenization, pos tagging, and base phrase chunking. In: MEDAR 2nd international conference on arabic language resources and tools, Cairo, Egypt
Habash N, Rambow O (2005) Arabic tokenization, morphological analysis, and part-of-speech tagging in one fell swoop. In: Proceedings of ACL, Ann Arbor
Habash N, Sadat F (2006) Arabic preprocessing schemes for statistical machine translation. In: Proceedings of the North American chapter of NAACL, New York
Larkey LS, Ballesteros L, Connell ME (2007) Light stemming for Arabic information retrieval. In: Soudi A, van den Bosch A, Neumann G (eds) Arabic computational morphology: knowledge-based and empirical methods. Springer, New York, pp 221–243
Chapter Google Scholar
Lavie A, Sagae S, Jayaraman S (2004) The significance of recall in automatic metrics for MT evaluation. In: Proceedings of the 6th conference of the association for machine translation in the Americas (AMTA-2004), pp 134–143
Likert R (1932) A technique for the measurement of attitudes. Arch Psychol 140: 1–55
Google Scholar
Owczarzak K, van Genabith J, Way A (2007) Dependency-based automatic evaluation for machine translation. In: Proceedings of HLT-NAACL 2007 AMTA workshop on syntax and structure in statistical translation, pp 80–87
Papineni K, Roukos S, Ward T, Zhou WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of ACL 2002, pp 311–318
Przybocki M, Peterson K, Bronsart S, Sanders G (2009) The nist 2008 metrics for machine translation challenge–overview, methodology, metrics, and results. Mach Trans 23: 71–103
Article Google Scholar
Riesa J, Mohit B, Knight K, Marcu D (2006) Building an English-Iraqi Arabic machine translation system for spoken utterances with limited resources. In: Proceedings of interspeech 2006: ICSLP ninth international conference on spoken language processing, p 2012
Sanders G, Bronsart S, Condon S, Schlenoff C (2008) Odds of successful transfer of low-level concepts: a key metric for bidirectional speech-to-speech machine translation in DARPA’s TRANSTAC program. In: Proceedings of LREC 2008, Marrakesh, Morocco
SCLite (2009) SCLite–NIST multi-modal information group. http://www.itl.nist.gov/iad/mig/tools/
Shen W, Delaney B, Anderson T, Slyh R (2007) The MIT-LL/AFRL IWSLT-2007 MT system. In: IWSLT 2007: international workshop on spoken language translation, Trento, Italy
Snover M, Dorr B, Schwartz R, Makhoul J, Micciula L (2006) A study of translation error rate with targeted human annotation. In: Proceedings of AMTA 2006, pp 223–231
Turian JP, Shen L, Melamed ID (2003) Evaluation of machine translation and its evaluation. In: Proceedings of MT summit 2003, pp 386–393
Weiss B, Schlenoff C, Sanders G, Steves M, Condon S, Phillips J, Parvaz D (2008) Performance evaluation of speech translation systems. In: Proceedings of LREC 2008, Marrakesh, Morocco

Download references

Author information

Authors and Affiliations

The MITRE Corporation, McLean, VA, USA
Sherri Condon & Mark Arehart
The MITRE Corporation, Orlando, FL, USA
Dan Parvaz
National Institute of Standards and Technology, Gaithersburg, MD, USA
Gregory Sanders
The MITRE Corporation, Bedford, MA, USA
Christy Doran & John Aberdeen

Authors

Sherri Condon
View author publications
You can also search for this author in PubMed Google Scholar
Mark Arehart
View author publications
You can also search for this author in PubMed Google Scholar
Dan Parvaz
View author publications
You can also search for this author in PubMed Google Scholar
Gregory Sanders
View author publications
You can also search for this author in PubMed Google Scholar
Christy Doran
View author publications
You can also search for this author in PubMed Google Scholar
John Aberdeen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sherri Condon.

Additional information

Approved for Public Release: 11-0118. Distribution Unlimited. The views expressed are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government. Some of the material in this article was originally presented at the Language Resources and Evaluation Conference (LREC) 2008 in Marrakesh, Morocco and at the 2009 MT Summit XII in Ottawa, Canada.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Condon, S., Arehart, M., Parvaz, D. et al. Evaluation of 2-way Iraqi Arabic–English speech translation systems using automated metrics. Machine Translation 26, 159–176 (2012). https://doi.org/10.1007/s10590-011-9105-x

Download citation

Received: 12 July 2010
Accepted: 09 August 2011
Published: 22 September 2011
Issue Date: March 2012
DOI: https://doi.org/10.1007/s10590-011-9105-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluation of 2-way Iraqi Arabic–English speech translation systems using automated metrics

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence with American values and Chinese characteristics: a comparative analysis of American and Chinese governmental AI policies

Machine translation systems and quality assessment: a systematic review

Assessing gender bias in machine translation: a case study with Google Translate

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Evaluation of 2-way Iraqi Arabic–English speech translation systems using automated metrics

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence with American values and Chinese characteristics: a comparative analysis of American and Chinese governmental AI policies

Machine translation systems and quality assessment: a systematic review

Assessing gender bias in machine translation: a case study with Google Translate

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation