Skip to main content
Log in

Efficient accurate syntactic direct translation models: one tree at a time

  • Published:
Machine Translation

Abstract

A challenging aspect of Statistical Machine Translation from Arabic to English lies in bringing the Arabic source morpho-syntax to bear on the lexical as well as word-order choices of the English target string. In this article, we extend the feature-rich discriminative Direct Translation Model 2 (DTM2) with a novel linear-time parsing algorithm based on an eager, incremental interpretation of Combinatory Categorial Grammar. This way we can reap the benefits of a target syntactic enhancement that leads to more grammatical output while also enabling dynamic decoding without the risk of blowing up decoding space and time requirements. Our model defines a mix of model parameters, some of which involve DTM2 source morpho-syntactic features, and others are novel target side syntactic features. Alongside translation features extracted from the derived parse tree, we explore syntactic features extracted from the incremental derivation process. Our empirical experiments show that our model significantly outperforms the state-of-the-art DTM2 system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bangalore S, Joshi A (1999) Supertagging: An approach to almost parsing. Comput Linguist 25(2): 237–265

    Google Scholar 

  • Berger A, Della Pietra S, Della Pietra VJ (1996) Maximum entropy approach to natural language processing. Computat Linguist 22(1): 39–71

    Google Scholar 

  • Brown P, Cocke J, Della Pietra S, Jelinek F, Della Pietra VJ, Mercer Lafferty R, Roossin P (1990) A statistical approach to machine translation. Computat Linguist 16(2): 79–85

    Google Scholar 

  • Chelba C (2000) Exploiting syntactic structure for natural language modeling. Ph.D. thesis, Johns Hopkins University, Baltimore, MD

  • Chiang D (2005) A hierarchical phrase-based model for statistical machine translation. In: 43rd annual meeting of the association for computational linguistics (ACL05), Ann Arbor, pp 263–270

  • Clark S, Curran J (2007) Wide-coverage efficient statistical parsing with ccg and log-linear models. Computat Linguist 33(1): 439–552

    Google Scholar 

  • Hassan H, Sima’an K, Way A (2009) Lexicalized semi-incremental dependency parsing. In: Proceedings of RANLP 2009, the international conference on recent advances in natural language processing, Borovets, Bulgaria (to appear)

  • Hassan H, Sima’an K, Way A (2008a) Syntactically lexicalized phrase-based statistical translation. IEEE Trans Audio Speech Lang Process 6(7): 1260–1273

    Article  Google Scholar 

  • Hassan H, Sima’an K, Way A (2008b) A syntactic language model based on incremental ccg parsing. In: Proceedings IEEE workshop on spoken language technology (SLT), Goa

  • Hassan H, Sima’an K, Way A (2007) Integrating supertags into phrase-based statistical machine translation. In: Proceedings of the ACL-2007, Prague, Czech Republic, pp 288–295

  • Hockenmaier J (2003) Data and models for statistical parsing with combinatory categorial grammar. Ph.D Thesis, University of Edinburgh, Edinburgh

  • Huang L, Chiang D (2007) Forest rescoring: faster decoding with integrated language models. In: Proceedings of the ACL-2007, Prague

  • Ittycheriah A, Roukos S (2007) Direct translation model 2. In: Human Language Technologies 2007: the conference of the North American chapter of the association for computational linguistics. Proceedings of the main conference, Rochester, pp 57–64

  • Koehn P (2004a) Pharaoh: a beam search decoder for phrase-based statistical machine translation models. Machine translation: from real users to research. In: Proceedings of 6th conference of the association for machine translation in the Americas, AMTA, Washington, DC, pp 115–124

  • Koehn P (2004b) Statistical significance tests for machine translation evaluation. In: Proceedings the conference on empirical methods in natural language processing (EMNLP), Barcelona, pp 388–395

  • Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the joint human language technology conference and the annual meeting of the North American chapter of the association for computational linguistics (HLT-NAACL 2003), Edmonton, pp 127–133

  • Marcu D, Wang W, Echihabi A, Knight K (2006) SPMT: statistical machine translation with syntactified target language phrases. In: Proceedings of the 2006 conference on empirical methods in natural language processing (EMNLP 2006), Sydney, pp 44–52

  • Papineni K, Roukos S, Ward T (1997) Feature-based language understanding. In: Proceedings of 5th European conference on speech communication and technology EUROSPEECH ’97, Rhodes, pp 1435–1438

  • Papineni K, Roukos S, Ward T, Zhu W-J (2002) BLEU: a Method for automatic evaluation of machine translation. In: 40th annual meeting of the association for computational linguistics (ACL’02), Philadelphia, pp 311–318

  • Shen L, Xu J, Weischedel R (2008) A new string-to-dependency machine translation algorithm with a target dependency language model. In: Proceedings of ACL-08: HLT, Columbus, pp 577–585

  • Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: AMTA 2006: Proceedings of the 7th conference of the association for machine translation in the Americas, Cambridege, pp 223–231

  • Steedman M (2000) The syntactic process. MIT Press, Cambridge

    Google Scholar 

  • Tillmann C, Ney H (2003) Word reordering and a dynamic programming beam search algorithm for statistical machine translation. Computat Linguist 29(1): 97–133

    Article  Google Scholar 

  • Zollmann A, Venugopal A. Syntax augmented machine translation via chart parsing. In: Proceedings of the workshop on statistical machine translation, HLT/NAACL, New York, pp 138–141

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hany Hassan.

Additional information

This work was done while the first author was working at IBM.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hassan, H., Sima’an, K. & Way, A. Efficient accurate syntactic direct translation models: one tree at a time. Machine Translation 26, 121–136 (2012). https://doi.org/10.1007/s10590-011-9116-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-011-9116-7

Keywords

Navigation