Skip to main content
Log in

Minimum description length inference of phrase-based translation models

  • IBPRIA 2015
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

This work explores the application of minimum description length (MDL) inference to estimate the parameters of phrase-based statistical machine translation (SMT) models. In comparison with current inference techniques that rely on a long decoupled pipeline with multiple heuristic steps, MDL is a well-founded theoretically sound approach whose empirical results are however below those of the heuristically motivated state-of-the-art training pipeline. We identify potential limitations of MDK inference when applied to natural language and propose practical approaches to overcome them when inferring SMT models. The evaluation in a Spanish-to-English translation task demonstrates that MDL inference can be adapted to yield a performance close to the state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Brown PF, Cocke J, Pietra SAD, Pietra VJD, Jelinek F, Lafferty JD, Mercer RL, Roossin PS (1990) A statistical approach to machine translation. Comput Linguist 16(2):79–85

    Google Scholar 

  2. Casacuberta F, Civera J, Cubel E, Lagarda AL, Lapalme G, Macklovitch E, Vidal E (2009) Human interaction for high-quality machine translation. Commun ACM 52(10):135–138

    Article  Google Scholar 

  3. DeNero J, Bouchard-Côté A, Klein D (2008) Sampling alignment structure under a bayesian translation model. In: Proceedings of the conference on empirical methods in natural language processing, EMNLP ’08. Association for Computational Linguistics, pp 314–323

  4. González-Rubio J, Casacuberta F (2014) Inference of phrase-based translation models via minimum description length. In: Proceedings of the conference of the European chapter of the Association for Computational Linguistics, EACL ’14’. Association for Computational Linguistics, pp 90–94

  5. González-Rubio J, Casacuberta F (2015) Improving the minimum description length inference of phrase-based translation models. In: Proceedings of Iberian conference on pattern recognition and image analysis, IbPRIA ’15’. AERFAI & APRP, Springer, pp 219–227

  6. Grünwald P (1995) A minimum description length approach to grammar inference. In: Connectionist, statistical, and symbolic approaches to learning for natural language processing, Lecture notes in computer science, vol 1040. Springer, pp. 203–216

  7. Grünwald P (2005) A tutorial introduction to the minimum description length principle. In: Grunwald P, Myung IJ, Pitt M (eds) Advances in minimum description length: theory and applications. MIT Press, Cambridge

  8. Heafield K, Pouzyrevsky I, Clark JH, Koehn P (2013) Scalable modified Kneser–Ney language model estimation. In: Proceedings of the annual meeting of the Association for Computational Linguistics, ACL ’13, pp 690–696

  9. Khadivi S, Goutte C (2003) Tools for corpus alignment and evaluation of the alignments (deliverable d4.9). Technical report, TransType2 (IST-2001-32091)

  10. Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the annual meeting of the ACL on interactive poster and demonstration sessions, ACL ’07. Association for Computational Linguistics, pp 177–180

  11. Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the North American chapter of the Association for Computational Linguistics on Human Language Technology, vol 1, NAACL ’03. Association for Computational Linguistics, pp 48–54

  12. Marcu D, Wong W (2002) A phrase-based, joint probability model for statistical machine translation. In: Proceedings of the conference on empirical methods in natural language processing, vol 10, EMNLP ’02. Association for Computational Linguistics, pp 133–139

  13. Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the annual meeting on Association for Computational Linguistics, vol 1, ACL ’03. Association for Computational Linguistics, pp 160–167

  14. Och FJ, Ney H (2000) Improved statistical alignment models. In: Proceedings of the annual meeting on Association for Computational Linguistics, ACL ’00. Association for Computational Linguistics, pp 440–447

  15. Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51

    Article  MATH  Google Scholar 

  16. Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the annual meeting on Association for Computational Linguistics, ACL ’02. Association for Computational Linguistics, pp 311–318

  17. Rissanen J (1978) Modeling by shortest data description. Automatica 14(5):465–471

    Article  MATH  Google Scholar 

  18. Saers M, Addanki K, Wu D (2013) Iterative rule segmentation under minimum description length for unsupervised transduction grammar induction. In: Proceedings of the statistical language and speech processing conference, Lecture notes in computer science, vol 7978. Springer, pp 224–235

  19. Sanchis-Trilles G, Ortiz-Martínez D, González-Rubio J, González J, Casacuberta F (2011) Bilingual segmentation for phrasetable pruning in statistical machine translation. In: Proceedings of the annual conference of the European Association for Machine Translation, EAMT ’11

  20. Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27(3):379–423

    Article  MathSciNet  MATH  Google Scholar 

  21. Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of Association for Machine Translation in the Americas, AMTA ’06, pp 223–231

  22. Solomonoff R (1964) A formal theory of inductive inference: parts 1 & 2. Inf Control 7(1):1–22 and 224–254

  23. Vilar JM, Vidal E (2005) A recursive statistical translation model. In: Proceedings of the ACL workshop on building and using parallel texts, ParaText ’05. Association for Computational Linguistics, pp 199–207

  24. Vogel S, Ney H, Tillmann C (1996) Hmm-based word alignment in statistical translation. In: Proceedings of the conference on computational linguistics, vol 2, COLING ’96. Association for Computational Linguistics, pp 836–841

  25. Zens R, Och F, Ney H (2002) Phrase-based statistical machine translation. In: German conference on artificial intelligence, pp 18–32

  26. Zhang J (2005) Model-based search for statistical machine translation. Master’s thesis, Edinburgh University

  27. Zipf GK (1935) The psychobiology of language. Houghton-Mifflin, Boston

    Google Scholar 

Download references

Acknowledgments

This work was supported by the EU 7th Framework Programme (FP7/2007–2013) under the CasMaCat project (Grant Agreement No. 287576), and by the Generalitat Valenciana under Grant ALMAPATER (PrometeoII/2014/030).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jesús González-Rubio.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

González-Rubio, J., Casacuberta, F. Minimum description length inference of phrase-based translation models. Neural Comput & Applic 28, 2403–2413 (2017). https://doi.org/10.1007/s00521-016-2257-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-016-2257-0

Keywords

Navigation