Skip to main content
Log in

A vector-space dynamic feature for phrase-based statistical machine translation

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

In this paper, we propose and evaluate a novel dynamic feature function for log-linear model combinations in phrase-based statistical machine translation. The feature function is inspired on the popularly known vector-space model which is typically used in information retrieval and text mining applications, and it aims at improving translation unit selection at decoding time by incorporating context information from the source language. Significant improvements on an English-Spanish experimental corpus are presented and discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. But it still falls within the range of acceptability based on the fact that other corpora of similar size are used. See for instance IWSLT International Evaluation Campaign (http://mastarpj.nict.go.jp/IWSLT2009/.

References

  • Carpuat, M., & Wu, D. (2007). Improving statistical machine translation using word sense disambiguation. In Empirical methods in natural language processing (EMNLP) (pp. 61–72). Prague.

  • Chew, P. A., Verzi, S. J., Bauer, T. L., & McClain, J. T. (2006). Evaluation of the bible as a resource for cross-language information retrieval. In Proceedings of the workshop on multilingual language resources and interoperability (pp. 68–74). Sydney, Australia.

  • Costa-jussà, M. R., & Fonollosa, J. A. R. (2009). State-of-the-art word reordering approaches in statistical machine translation. IEICE Transactions on Information and Systems, 92(11), 2179–2185.

    Article  Google Scholar 

  • Haque, R., Kumar Naskar, S., Ma, Y., & Way, A. (2009). Using supertags as source language context in smt. In 13th annual conference of the European association for machine translation (EAMT) (pp. 234–241). Barcelona.

  • Koehn, K., & Knight, K. (2003). Empirical methods for compound splitting. In Proc. of the 10th conf. of the European chapter of the association for computational linguistics (pp. 347–354). Budapest, Hungary.

  • Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., et al. (2007). Moses: Open source toolkit for statistical machine translation. In Proc. of the 45th annual meeting of the association for computational linguistics (pp. 177–180). Prague, Czech Republic.

  • Och, F. J. (1999). An efficient method for determining bilingual word classes. In Proc. of the 9th conf. of the European chapter of the association for computational linguistics (pp. 71–76). Bergen, Norway.

  • Och, F. J. (2003). Minimum error rate training in statistical machine translation. In Proc. of the 41th annual meeting of the association for computational linguistics (pp. 160–167). Sapporo.

  • Och, F. J., & Ney, H. (2000). A comparison of alignment models for statistical machine translation. In Proc. of the 18th conference on computational linguistics (pp. 1086–1090). Morristown, USA.

  • Och, F. J., & Ney, H. (2002). Discriminative training and maximum entropy models for statistical machine translation. In Proc. of the 40th annual meeting of the association for computational linguistics (pp. 295–302). Philadelphia, USA.

  • Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). Bleu: A method for automatic evaluation of machine translation. In Proc. of the 40th annual meeting of the association for computational linguistics (pp. 311–318). Philadelphia, PA.

  • Salton, G., & McGill, M. (1983). Introduction to modern information retrieval. McGraw-Hill.

  • Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620.

    Article  MATH  Google Scholar 

  • Schwenk, H., Costa-jussà, M. R., & Fonollosa, J. A. R. (2007). Smooth bilingual translation. In Empirical methods in natural language processing (EMNLP) (pp. 430–438). Prague.

  • Stolcke, A. (2002). SRILM—an extensible language modeling toolkit. In Proc. of the 7th int. conf. on spoken language processing, ICSLP’02 (pp. 901–904). Denver, USA.

  • Stroppa, N., van de Bosch, A., & Way, A. (2007). Exploiting source similarity for smt using context-informed features. In 11th conference on theoretical and methodological issues in machine translation (TMI) (pp. 231–240). Skövde.

Download references

Acknowledgements

The authors would like to thank Barcelona Media Innovation Center and Institute for Infocomm Research for its support and permission to publish this research. We would also like to thank Bart Mellebeek for his helpful contribution. We would like to give credit to the anonymous reviewers of this paper for their valuable suggestions.

This work has been partially funded by the Spanish Department of Education and Science through the Juan de la Cierva fellowship program and the Spanish Government under the BUCEADOR project (TEC2009-14094-C04-01).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marta R. Costa-jussà.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Costa-jussà, M.R., Banchs, R.E. A vector-space dynamic feature for phrase-based statistical machine translation. J Intell Inf Syst 37, 139–154 (2011). https://doi.org/10.1007/s10844-010-0130-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-010-0130-7

Keywords

Navigation