Skip to main content
Log in

Methods for integrating rule-based and statistical systems for Arabic to English machine translation

  • Published:
Machine Translation

Abstract

This article presents several techniques for integrating information from a rule-based machine translation (RBMT) system into a statistical machine translation (SMT) framework. These techniques are grouped into three parts that correspond to the type of information integrated: the morphological, lexical, and system levels. The first part presents techniques that use information from a rule-based morphological tagger to do morpheme splitting of the Arabic source text. We also compare with the results of using a statistical morphological tagger. In the second part, we present two ways of using Arabic diacritics to improve SMT results, both based on binary decision trees. The third part presents a system combination method that combines the outputs of the RBMT and the SMT systems, leveraging the strength of each. This article shows how language specific information obtained through a deterministic rule-based process can be used to improve SMT, which is mostly language-independent.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Badr I, Zbib R, Glass J (2008) Segmentation for English-to-Arabic statistical machine translation. In: Proceedings of ACL-08: HLT, short papers, Columbus, OH, June, pp 153–156

  • Banerjee S, Lavie A (2005) Meteor: an automatic metric for mt evaluation with improved correlation with human judgments. In: Proceedings of ACL 2005 workshop on intrinsic and extrinsic evaluation measures for MT and/or summarization, Ann Arbor, MI

  • Brunning J, de Gispert A, Byrne W (2009) Context-dependent alignment models for statistical machine translation. In: NAACL ’09: proceedings of the 2009 human language technology conference of the North American Chapter of the Association for Computational Linguistics, Boulder, Colorado, pp 110–118

  • Buckwalter T (2004) Buckwalter arabic morphological analyzer version 2.0, Linguistic Data Consortium

  • Chen Y, Eisele A (2010) Hierarchical hybrid translation between english and german. In: Proceedings of the 14th annual conference of the European Association for Machine Translation, St. Raphael, France

  • Devlin J (2009) Lexical features for statistical machine translation. Master’s Thesis, University of Maryland, December

  • Diab M, Ghoneim M, Habash N (2007) Arabic diacritization in the context of statistical machine translation. In: MT Summit XI, Copenhagen, Denmark, pp 143–149

  • Habash N, Rambow O (2005) Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In: Proceedings of the 43th annual meeting of the Association for Computational Linguistics (ACL), Ann Arbor, MI

  • Habash N, Sadat F (2006) Arabic preprocessing schemes for statistical machine translation. In: Proceedings of the 2006 human language technology conference of the North American Chapter of the Association for Computational Linguistics, New York, NY

  • Hart PE, Nilsson NJ, Raphael B (1968) A formal basis for the heuristic determination of minimum cost paths. IEEE Trans Syst Sci Cybern SSC4, 4

  • Koehn P (2004) Statistical significance tests for machine translation evaluation. In: EMNLP04, Barcelona, Spain

  • Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the 2003 human language technology conference of the North American Chapter of the Association for Computational Linguistics, Edmonton, Canada, pp 48–54

  • Lee Y-S (2004) Morphological analysis for statistical machine translation. In: HLT-NAACL ’04: proceedings of HLT-NAACL 2004, Boston, Massachusetts

  • Lee YS, Papineni K, Roukos S (2003) Language model based arabic word segmentation. In: Proceedings of the 41st annual meeting of the Association for Computational Linguistics (ACL), Sapporo, Japan

  • Li Z, Callison-Burch C, Dyer C, Ganitkevitch J, Khudanpur S, Schwartz L, Thornton W, Weese J, Zaidan O (2009) Joshua: an open source toolkit for parsing-based machine translation. In: Proceedings of the fourth workshop on statistical machine translation. StatMT ’09, Athens, Greece, pp 135–139

  • Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1): 19–51

    Article  Google Scholar 

  • Och FJ, Gildea D, Khudanpur S, Sarkar A, Yamada K, Fraser A, Kumar S, Shen L, Smith D, Eng K, Jain V, Jin Z, Radev DR (2004) A smorgasbord of features for statistical machine translation. In: HLT-NAACL, Boston, MA, pp 161–168

  • Odell J (1995) The use of context in large vocabulary speech recognition. Ph.D. Thesis, Cambridge University Engineering Department

  • Olive, J, Caitlin, C, McCary, J (eds) (2011) Handbook of natural language processing and machine translation: DARPA global autonomous language exploitation. Springer, New York

    MATH  Google Scholar 

  • Papineni K, Roukos S, Ward T, Zhu W (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics (ACL), Philadelphia, PA

  • Rosti AI, Matsoukas S, Schwartz R (2007) Improved word-level system combination for machine translation. In: Proceedings of the 45th annual meeting of the Association for Computational Linguistics (ACL), Prague, Czech Republic

  • Rosti AI, Zhang B, Matsoukas S, Schwartz R (2008) Incremental hypothesis alignment for building confusion networks with applicatoin to machine translation system combination. In: Proceedings of the third workshop on statistical machine translation, Columbus, OH

  • Rosti AI, Zhang B, Matsoukas S, Schwartz R (2010) BBN system description for WMT10 system combination task. In: ACL 2010 joint fifth workshop on statistical machine translation and metrics MATR, Uppsala, Sweden

  • Sadat F, Habash N (2006) Combination of Arabic preprocessing schemes for statistical machine translation. In: Proceedings of COLING ’04: The 21st international conference on computational linguistics, Geneva, Switzerland

  • Shen L, Xu J, Weischedel R (2008) A new string-to-dependency machine translation algorithm with a target dependency language model. In: Proceedings of the 46th annual meeting of the Association for Computational Linguistics (ACL), Columbus, OH, pp 577–585

  • Simard M, Goutte C, Isabelle P (2007a) Statistical phrase-based post-editing. In: Proceedings of the 2007 human language technology conference of the North American Chapter of the Association for Computational Linguistics, Rochester, NY

  • Simard M, Ueffing N, Isabelle P, Kuhn P (2007b) Rule-based translation with statistical phrase-based post-editing. In: Proceedings of the second workshop on statistical machine translation, Prague, Czech Republic

  • Snover M, Dorr B, Schwartz R, Makhoul J, Micciulla L (2006) A study of translation error rate with targeted human annotation. In: Proceedings of the 7th conference of the Association for Machine Translation in the Americas (AMTA 2006), Cambridge, MA, pp 223–231

  • Thurmair G (2009) Comparing different architectures of hybrid machine translation systems. In: MT Summit XII: proceedings of the twelfth Machine Translation Summit, Ottawa, ON, Canada

  • Zbib R, Matsoukas S, Schwartz R, Makhoul J (2010) Decision trees for lexical smoothing in statistical machine translation. In: ACL 2010 joint fifth workshop on statistical machine translation and metrics MATR, Uppsala, Sweden

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rabih Zbib.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zbib, R., Kayser, M., Matsoukas, S. et al. Methods for integrating rule-based and statistical systems for Arabic to English machine translation. Machine Translation 26, 67–83 (2012). https://doi.org/10.1007/s10590-011-9106-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-011-9106-9

Keywords

Navigation