Abstract
This paper presents a novel regression framework to model both the translational equivalence problem and the parameter estimation problem in statistical machine translation (SMT). The proposed method kernelizes the training process by formulating the translation problem as a linear mapping among source and target word chunks (word n-grams of various length), which yields a regression problem with vector outputs. A kernel ridge regression model and a one-class classifier called maximum margin regression are explored for comparison, between which the former is proved to perform better in this task. The experimental results conceptually demonstrate its advantages of handling very high-dimensional features implicitly and flexibly. However, it shares the common drawback of kernel methods, i.e. the lack of scalability. For real-world application, a more practical solution based on locally linear regression hyperplane approximation is proposed by using online relevant training examples subsetting. In addition, we also introduce a novel way to integrate language models into this particular machine translation framework, which utilizes the language model as a penalty item in the objective function of the regression model, since its n-gram representation exactly matches the definition of our feature space.
Similar content being viewed by others
References
Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2008) Further meta-evaluation of machine translation. In: Proceedings of the third ACL workshop on statistical machine translation, pp 70–106
Cao Y, Li H (2002) Base noun phrase translation using web data and the EM algorithm. In: Proceedings of the 19th international conference on computational linguistics (COLING), pp 1–7
Collins M, Duffy N (2001) Convolution kernels for natural language. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in Neural information processing systems, vol 14. MIT Press, Cambridge
Cortes C, Mohri M, Weston J (2005) A general regression technique for learning transductions. In: Proceedings of the 22nd international conference on machine learning (ICML)
Hildebrand AS, Eck M, Vogel S, Waibel A (2005) Adaptation of the Translation model for statistical machine translation. In: Proceedings of European association for machine translation 10th annual conference (EAMT)
Koehn P (2004) Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In: Proceedings of the 6th conference of the association for machine translation in the Americas (AMTA)
Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Proceedings of machine translation summit X
Koehn P, Hoang H (2007) Factored translation models. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), pp 868–876
Meilǎ M (2003) Data centering in feature space. In: Proceedings of the 9th international workshop on artificial intelligence and statistics (AISTATS)
Munteanu DS, Marcu D (2005) Improving machine translation performance by exploiting non-parallel corpora. Comput Linguist 31(4): 477–504
Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st annual meeting on association for computational linguistics (ACL), pp 160–167
Resnik P, Smith NA (2003) The web as a parallel corpus. Comput Linguist 29(3): 349–380
Shawe-Taylor J, Cristianini N (2004) Kernel Methods for pattern analysis. Cambridge University Press, Cambridge
Szedmák S, Shawe-Taylor J, Parado-Hernandez E (2006) Learning via linear operators: maximum margin regression; multiclass and multiview learning at one-class complexity. Technical Report, University of Southampton
Wang Z, Shawe-Taylor J (2009) Kernel based machine translation. In: Goutte C, Cancedda N, Dymetman M, Foster G (eds) Learning machine translation, NIPS workshop series. MIT Press, Cambridge, pp 169–184
Way A, Gough N (2003) wEBMT: developing and validating an example-based machine translation system using the world wide web. Comput Linguist 29(3): 421–457
Weston J, Chapelle O, Elisseeff A, Schölkopf B, Vapnik V (2002) Kernel dependency estimation. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing systems, vol 15. MIT Press, Cambridge
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, Z., Shawe-Taylor, J. A kernel regression framework for SMT. Machine Translation 24, 87–102 (2010). https://doi.org/10.1007/s10590-010-9079-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-010-9079-0