A kernel regression framework for SMT

Wang, Zhuoran; Shawe-Taylor, John

doi:10.1007/s10590-010-9079-0

A kernel regression framework for SMT

Published: 12 June 2010

Volume 24, pages 87–102, (2010)
Cite this article

Machine Translation

Zhuoran Wang¹ &
John Shawe-Taylor¹

133 Accesses
2 Citations
Explore all metrics

Abstract

This paper presents a novel regression framework to model both the translational equivalence problem and the parameter estimation problem in statistical machine translation (SMT). The proposed method kernelizes the training process by formulating the translation problem as a linear mapping among source and target word chunks (word n-grams of various length), which yields a regression problem with vector outputs. A kernel ridge regression model and a one-class classifier called maximum margin regression are explored for comparison, between which the former is proved to perform better in this task. The experimental results conceptually demonstrate its advantages of handling very high-dimensional features implicitly and flexibly. However, it shares the common drawback of kernel methods, i.e. the lack of scalability. For real-world application, a more practical solution based on locally linear regression hyperplane approximation is proposed by using online relevant training examples subsetting. In addition, we also introduce a novel way to integrate language models into this particular machine translation framework, which utilizes the language model as a penalty item in the objective function of the regression model, since its n-gram representation exactly matches the definition of our feature space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2008) Further meta-evaluation of machine translation. In: Proceedings of the third ACL workshop on statistical machine translation, pp 70–106
Cao Y, Li H (2002) Base noun phrase translation using web data and the EM algorithm. In: Proceedings of the 19th international conference on computational linguistics (COLING), pp 1–7
Collins M, Duffy N (2001) Convolution kernels for natural language. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in Neural information processing systems, vol 14. MIT Press, Cambridge
Cortes C, Mohri M, Weston J (2005) A general regression technique for learning transductions. In: Proceedings of the 22nd international conference on machine learning (ICML)
Hildebrand AS, Eck M, Vogel S, Waibel A (2005) Adaptation of the Translation model for statistical machine translation. In: Proceedings of European association for machine translation 10th annual conference (EAMT)
Koehn P (2004) Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In: Proceedings of the 6th conference of the association for machine translation in the Americas (AMTA)
Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Proceedings of machine translation summit X
Koehn P, Hoang H (2007) Factored translation models. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), pp 868–876
Meilǎ M (2003) Data centering in feature space. In: Proceedings of the 9th international workshop on artificial intelligence and statistics (AISTATS)
Munteanu DS, Marcu D (2005) Improving machine translation performance by exploiting non-parallel corpora. Comput Linguist 31(4): 477–504
Article Google Scholar
Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st annual meeting on association for computational linguistics (ACL), pp 160–167
Resnik P, Smith NA (2003) The web as a parallel corpus. Comput Linguist 29(3): 349–380
Article Google Scholar
Shawe-Taylor J, Cristianini N (2004) Kernel Methods for pattern analysis. Cambridge University Press, Cambridge
Google Scholar
Szedmák S, Shawe-Taylor J, Parado-Hernandez E (2006) Learning via linear operators: maximum margin regression; multiclass and multiview learning at one-class complexity. Technical Report, University of Southampton
Wang Z, Shawe-Taylor J (2009) Kernel based machine translation. In: Goutte C, Cancedda N, Dymetman M, Foster G (eds) Learning machine translation, NIPS workshop series. MIT Press, Cambridge, pp 169–184
Google Scholar
Way A, Gough N (2003) wEBMT: developing and validating an example-based machine translation system using the world wide web. Comput Linguist 29(3): 421–457
Article Google Scholar
Weston J, Chapelle O, Elisseeff A, Schölkopf B, Vapnik V (2002) Kernel dependency estimation. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing systems, vol 15. MIT Press, Cambridge
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Centre for Computational Statistics and Machine Learning, University College London, Gower Street, London, WC1E 6BT, UK
Zhuoran Wang & John Shawe-Taylor

Authors

Zhuoran Wang
View author publications
You can also search for this author in PubMed Google Scholar
John Shawe-Taylor
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhuoran Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Z., Shawe-Taylor, J. A kernel regression framework for SMT. Machine Translation 24, 87–102 (2010). https://doi.org/10.1007/s10590-010-9079-0

Download citation

Received: 26 October 2009
Accepted: 25 May 2010
Published: 12 June 2010
Issue Date: June 2010
DOI: https://doi.org/10.1007/s10590-010-9079-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A kernel regression framework for SMT

Abstract

Access this article

Similar content being viewed by others

Combined Methodology Based on Kernel Regression and Kernel Density Estimation for Sign Language Machine Translation

Learning sign language machine translation based on elastic net regularization and latent semantic analysis

Log-Linear Weight Optimization Using Discriminative Ridge Regression Method in Statistical Machine Translation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A kernel regression framework for SMT

Abstract

Access this article

Similar content being viewed by others

Combined Methodology Based on Kernel Regression and Kernel Density Estimation for Sign Language Machine Translation

Learning sign language machine translation based on elastic net regularization and latent semantic analysis

Log-Linear Weight Optimization Using Discriminative Ridge Regression Method in Statistical Machine Translation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation