Abstract
Although it has been always thought that Word Sense Disambiguation (WSD) can be useful for Machine Translation, only recently efforts have been made towards integrating both tasks to prove that this assumption is valid, particularly for Statistical Machine Translation (SMT). While different approaches have been proposed and results started to converge in a positive way, it is not clear yet how these applications should be integrated to allow the strengths of both to be exploited. This paper aims to contribute to the recent investigation on the usefulness of WSD for SMT by using n-best reranking to efficiently integrate WSD with SMT. This allows using rich contextual WSD features, which is otherwise not done in current SMT systems. Experiments with English-Portuguese translation in a syntactically motivated phrase-based SMT system and both symbolic and probabilistic WSD models showed significant improvements in BLEU scores.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agirre, E., Màrquez, L., Wicentowski, R.: Proceedings of SemEval-2007 - the Fourth International Workshop on Semantic Evaluations, Prague (2007)
Bar-Hillel, Y.: The Present Status of Automatic Translations of Languages, 91–163 (1960)
Brown, P.F., et al.: The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics 19(2) (1993)
Cabezas, C., Resnik, P.: Using WSD Techniques for Lexical Selection in Statistical Machine Translation. UMIACS Technical Report UMIACS-TR-2005-42 (2005)
Carpuat, M., Wu, D.: Word Sense Disambiguation vs. Statistical Machine Translation. In: 43rd Annual Meeting of the Association for Computational Linguistics (ACL-2005), Ann Arbor, pp. 387–394 (2005)
Carpuat, M., Wu, D.: Improving Statistical Machine Translation Using Word Sense Disambiguation. In: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL-2007), Prague, pp. 61–72 (2007)
Chan, Y.S., Ng, H.T., Chiang, D.: Word Sense Disambiguation Improves Statistical Machine Translation. In: 45th Annual Meeting of the Association for Computational Linguistics (ACL-2007), Prague, pp. 33–40 (2007)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm
Nunes, M.G.V., et al.: The design of a Lexicon for Brazilian Portuguese: Lessons learned and Perspectives. In: II Workshop on Computational Processing of Written and Speak Portuguese (Propor), Curitiba, pp. 61–70 (1996)
Och, F.J.: Minimum error rate training in statistical machine translation. In: 41st Annual Meeting of the Association for Computational Linguistics (ACL-2003), Sapporo, pp. 160–167 (2003)
Och, F.J., Ney, H.: Improved statistical alignment models. In: 38th Annual Meeting of the Association for Computational Linguistics (ACL-2000), Hong Kong, pp. 440–447 (2000)
Och, F.J., et al.: A Smorgasbord of Features for Statistical Machine Translation. Human Language Technology / North American Chapter of the Association for Computational Linguistics (HLT/NAACL-04), Boston, pp. 161–168 (2004)
Papineni, K., et al.: BLEU: a method for automatic evaluation of machine translation. In: 40th Annual Meeting of the Association for Computational Linguistics (ACL-2002), Philadelphia, pp. 311–318 (2002)
Quirk, C., Menezes, A., Cherry, C.: Dependency Treelet Translation: Syntactically Informed Phrasal SMT. In: 43rd Annual Meeting of the Association for Computational Linguistics (ACL-2005), Ann Arbor, pp. 271–279 (2005)
Specia, L., Nunes, M.G.V., Stevenson, M.: Exploiting Parallel Texts to Produce a Multilingual Sense-tagged Corpus for Word Sense Disambiguation. Recent Advances in Natural Language Processing (RANLP-2005), Borovets, pp. 525–531 (2005)
Specia, L., Stevenson, M., Nunes, M.G.V.: Learning Expressive Models for Word Sense Disambiguation. In: 45th Annual Meeting of the Association for Computational Linguistics (ACL-2007), Prague, pp. 41–48 (2007)
Stevenson, M., Wilks, Y.: The Interaction of Knowledge Sources for Word Sense Disambiguation. Computational Linguistics 27(3), 321–349 (2001)
Toutanova, K., Suzuki, H.: Generating Case Markers in Machine Translation. Human Language Technology / North American Chapter of the Association for Computational Linguistics (HLT/NAACL-2007), Rochester, pp. 49–56 (2007)
Vickrey, D., et al.: Word-Sense Disambiguation for Machine Translation. Joint Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT/EMNLP-2005), Vancouver, pp. 771–778 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Specia, L., Sankaran, B., das Graças Volpe Nunes, M. (2008). n-Best Reranking for the Efficient Integration of Word Sense Disambiguation and Statistical Machine Translation. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2008. Lecture Notes in Computer Science, vol 4919. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78135-6_34
Download citation
DOI: https://doi.org/10.1007/978-3-540-78135-6_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78134-9
Online ISBN: 978-3-540-78135-6
eBook Packages: Computer ScienceComputer Science (R0)