Skip to main content
Log in

Jane: an advanced freely available hierarchical machine translation toolkit

  • Published:
Machine Translation

Abstract

In this article we will describe the design and implementation of Jane, an efficient hierarchical phrase-based (HPB) toolkit developed at RWTH Aachen University. The system has been used by RWTH at several international evaluation campaigns, including the WMT and NIST evaluations, and is now freely available for non-commercial application. We will go through the main features of Jane, which include, among others, support for different search strategies, different language model formats, support for syntax-based enhancements to the HPB machine translation paradigm, string-to-dependency translation, extended lexicon models, different methods for minimum-error-rate training and distributed operation on a computer cluster. Special attention has been paid to the efficiency of the decoder, clean code and quality assurance through unit and regression testing. Results on current machine translation tasks are reported, which show that the system is able to obtain state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Birch A, Blunsom P, Osborne M (2009) A quantitative analysis of reordering phenomena. In: Proceedings of the fourth workshop on statistical machine translation, Athens, pp 197–205

  • Blunsom P, Cohn T, Osborne M (2008) A discriminative latent variable model for statistical machine translation. In: ACL-08: HLT, 46th annual meeting of the association for computational linguistics: human language technologies, proceedings of the conference, Columbus, pp 200–208

  • Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2): 263–311

    Google Scholar 

  • Byrd RH, Lu P, Nocedal J, Zhu C (1995) A limited memory algorithm for bound constrained optimization. SIAM J Sci Comput 16(5): 1190–1208

    Article  MathSciNet  MATH  Google Scholar 

  • Card OS (1986) Speaker for the dead. Tor Books, New York

    Google Scholar 

  • Chappelier JC, Rajman M (1998) A generalized CYK algorithm for parsing stochastic CFG. In: Proceedings of the first workshop on tabulation in parsing and deduction, Paris, pp 133–137

  • Chiang D (2005) A hierarchical phrase-based model for statistical machine translation. In: ACL-05: 43rd annual meeting of the association for computational linguistics, Ann Arbor, pp 263–270

  • Chiang D, Knight K, Wang W (2009) 11,001 new features for statistical machine translation. In: Proceedings of human language technologies: the 2009 annual conference of the North American chapter of the association for computational linguistics, Boulder, pp 218–226

  • Cmejrek M, Zhou B, Xiang B (2009) Enriching SCFG rules directly from efficient bilingual chart parsing. In: Proceedings of the international workshop on spoken language translation, Tokyo, pp 136–143

  • Hasan S, Ney H (2009) Comparison of extended lexicon models in search and rescoring for SMT. In: Joint conference of the 47th annual meeting of the association for computational linguistics and the 4th international joint conference on natural language processing of the asian federation of natural language processing, Boulder, pp 17–20

  • Hasan S, Ganitkevitch J, Ney H, Andrés-Ferrer J (2008) Triplet lexicon models for statistical machine translation. In: EMNLP 2008: Proceedings of the 2008 conference on empirical methods in natural language processing, Honolulu, pp 372–381

  • Heger C, Wuebker J, Huck M, Leusch G, Mansour S, Stein D, Ney H (2010) The RWTH Aachen machine translation system for WMT 2010. In: Proceedings of the joint 5th workshop on statistical machine translation and metricsMATR, Uppsala, pp 93–97

  • Huang L, Chiang D (2007) Forest rescoring: Faster decoding with integrated language models. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, Prague, pp 144–151

  • Huck M, Ratajczak M, Lehnen P, Ney H (2010) A comparison of various types of extended lexicon models for statistical machine translation. In: AMTA 2010: proceedings of the ninth conference of the association for machine translation in the Americas, Denver

  • Johnson H, Martin J, Foster G, Kuhn R (2007) Improving translation quality by discarding most of the phrasetable. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, Prague, pp 967–975

  • Koehn P (2010) Statistical machine translation. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Koehn P, Haddow B, Williams P, Hoang H (2010) More linguistic annotation for statistical machine translation. In: ACL 2010: Joint 5th workshop on statistical machine translation and metricsMATR, proceedings of the workshop, Uppsala, pp 121–126

  • Leusch G, Ney H (2009) Edit distances with block movements and error rate confidence estimates. Mach Transl 23: 129–140

    Article  Google Scholar 

  • Li Z, Callison-Burch C, Dyer C, Khudanpur S, Schwartz L, Thornton W, Weese J, Zaidan O (2009) Joshua: an open source toolkit for parsing-based machine translation. In: EACL 2009: 4th workshop on statistical machine translation, proceedings of the workshop, Athens, pp 135–139

  • Mauser A, Hasan S, Ney H (2009) Extending statistical machine translation with discriminative and trigger-based lexicon models. In: EMNLP 2009: proceedings of the 2009 conference on empirical methods in natural language processing, Singapore, pp 210–218

  • Och FJ (2003) Minimum error rate training for statistical machine translation. In: 41st annual meeting of the association for computational linguistics, proceedings of the conference, Sapporo, pp 160–167

  • Schwartz L (2010) Reproducible results in parsing-based machine translation: the JHU shared task submission. In: ACL 2010: joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, proceedings of the workshop, Uppsala, pp 177–182

  • Shen L, Xu J, Weischedel R (2008) A new string-to-dependency machine translation algorithm with a target dependency language model. In: ACL-08: HLT, 46th annual meeting of the association for computational linguistics: human language technologies, proceedings of the conference, Columbus, pp 577–585

  • Stolcke A (2002) SRILM—an extensible language modeling toolkit. In: Proceedings of the 7th international conference on spoken language processing, vol 3. Denver, pp 901–904

  • Talbot D, Osborne M (2007) Smoothed Bloom filter language models: tera-scale LMs on the cheap. In: EMNLP-CoNLL 2007: proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, Prague, pp 468–476

  • Venugopal A, Zollmann A, Smith N, Vogel S (2009) Preference grammars: Softening syntactic constraints to improve statistical machine translation. In: Human language technologies: the 2009 annual conference of the North American chapter of the association for computational linguistics, proceedings of the conference, Boulder, pp 236–244

  • Vilar D, Ney H (2009) On LM heuristics for the cube growing algorithm. In: EAMT-2009: proceedings of the 13th annual conference of the European association for machine translation, Barcelona, pp 242–249

  • Vilar D, Ney H (2011) Cardinality pruning and language model heuristics for hierarchical phrase-based translation. Mach Transl 1–38. doi:10.1007/s10590-011-9119-4

  • Vilar D, Stein D, Ney H (2008) Analysing soft syntax features and heuristics for hierarchical phrase based machine translation. In: IWSLT 2008: proceedings of the international workshop on spoken language translation, Waikiki, pp 190–197

  • Vilar D, Stein D, Huck M, Ney H (2010a) Jane: open source hierarchical translation, extended with reordering and lexicon models. In: ACL 2010: joint 5th workshop on statistical machine translation and metricsMATR, proceedings of the workshop, Uppsala, pp 262–270

  • Vilar D, Stein D, Peitz S, Ney H (2010b) If I only had a parser: Poor man’s syntax for hierarchical machine translation. In: Proceedings of the 7th international workshop on spoken language translation, Paris, pp 345–352

  • Wuebker J, Mauser A, Ney H (2010) Training phrase translation models with leaving-one-out. In: ACL 2010, 48th annual meeting of the association for computational linguistics, proceedings of the conference, Uppsala, pp 475–484

  • Zens R, Ney H (2007) Efficient phrase-table representation for machine translation with applications to online MT and speech translation. In: Human language technologies 2007: the conference of the North American chapter of the association for computational linguistics, proceedings of the main conference, Rochester, pp 492–499

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Vilar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vilar, D., Stein, D., Huck, M. et al. Jane: an advanced freely available hierarchical machine translation toolkit. Machine Translation 26, 197–216 (2012). https://doi.org/10.1007/s10590-011-9120-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-011-9120-y

Keywords

Navigation