Skip to main content
Log in

A kernel regression framework for SMT

  • Published:
Machine Translation

Abstract

This paper presents a novel regression framework to model both the translational equivalence problem and the parameter estimation problem in statistical machine translation (SMT). The proposed method kernelizes the training process by formulating the translation problem as a linear mapping among source and target word chunks (word n-grams of various length), which yields a regression problem with vector outputs. A kernel ridge regression model and a one-class classifier called maximum margin regression are explored for comparison, between which the former is proved to perform better in this task. The experimental results conceptually demonstrate its advantages of handling very high-dimensional features implicitly and flexibly. However, it shares the common drawback of kernel methods, i.e. the lack of scalability. For real-world application, a more practical solution based on locally linear regression hyperplane approximation is proposed by using online relevant training examples subsetting. In addition, we also introduce a novel way to integrate language models into this particular machine translation framework, which utilizes the language model as a penalty item in the objective function of the regression model, since its n-gram representation exactly matches the definition of our feature space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2008) Further meta-evaluation of machine translation. In: Proceedings of the third ACL workshop on statistical machine translation, pp 70–106

  • Cao Y, Li H (2002) Base noun phrase translation using web data and the EM algorithm. In: Proceedings of the 19th international conference on computational linguistics (COLING), pp 1–7

  • Collins M, Duffy N (2001) Convolution kernels for natural language. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in Neural information processing systems, vol 14. MIT Press, Cambridge

  • Cortes C, Mohri M, Weston J (2005) A general regression technique for learning transductions. In: Proceedings of the 22nd international conference on machine learning (ICML)

  • Hildebrand AS, Eck M, Vogel S, Waibel A (2005) Adaptation of the Translation model for statistical machine translation. In: Proceedings of European association for machine translation 10th annual conference (EAMT)

  • Koehn P (2004) Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In: Proceedings of the 6th conference of the association for machine translation in the Americas (AMTA)

  • Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Proceedings of machine translation summit X

  • Koehn P, Hoang H (2007) Factored translation models. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), pp 868–876

  • Meilǎ M (2003) Data centering in feature space. In: Proceedings of the 9th international workshop on artificial intelligence and statistics (AISTATS)

  • Munteanu DS, Marcu D (2005) Improving machine translation performance by exploiting non-parallel corpora. Comput Linguist 31(4): 477–504

    Article  Google Scholar 

  • Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st annual meeting on association for computational linguistics (ACL), pp 160–167

  • Resnik P, Smith NA (2003) The web as a parallel corpus. Comput Linguist 29(3): 349–380

    Article  Google Scholar 

  • Shawe-Taylor J, Cristianini N (2004) Kernel Methods for pattern analysis. Cambridge University Press, Cambridge

    Google Scholar 

  • Szedmák S, Shawe-Taylor J, Parado-Hernandez E (2006) Learning via linear operators: maximum margin regression; multiclass and multiview learning at one-class complexity. Technical Report, University of Southampton

  • Wang Z, Shawe-Taylor J (2009) Kernel based machine translation. In: Goutte C, Cancedda N, Dymetman M, Foster G (eds) Learning machine translation, NIPS workshop series. MIT Press, Cambridge, pp 169–184

    Google Scholar 

  • Way A, Gough N (2003) wEBMT: developing and validating an example-based machine translation system using the world wide web. Comput Linguist 29(3): 421–457

    Article  Google Scholar 

  • Weston J, Chapelle O, Elisseeff A, Schölkopf B, Vapnik V (2002) Kernel dependency estimation. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing systems, vol 15. MIT Press, Cambridge

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhuoran Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Z., Shawe-Taylor, J. A kernel regression framework for SMT. Machine Translation 24, 87–102 (2010). https://doi.org/10.1007/s10590-010-9079-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-010-9079-0

Keywords

Navigation