n-Best Reranking for the Efficient Integration of Word Sense Disambiguation and Statistical Machine Translation

Specia, Lucia; Sankaran, Baskaran; das Graças Volpe Nunes, Maria

doi:10.1007/978-3-540-78135-6_34

Lucia Specia^1,2,
Baskaran Sankaran² &
Maria das Graças Volpe Nunes¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4919))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1525 Accesses
6 Citations

Abstract

Although it has been always thought that Word Sense Disambiguation (WSD) can be useful for Machine Translation, only recently efforts have been made towards integrating both tasks to prove that this assumption is valid, particularly for Statistical Machine Translation (SMT). While different approaches have been proposed and results started to converge in a positive way, it is not clear yet how these applications should be integrated to allow the strengths of both to be exploited. This paper aims to contribute to the recent investigation on the usefulness of WSD for SMT by using n-best reranking to efficiently integrate WSD with SMT. This allows using rich contextual WSD features, which is otherwise not done in current SMT systems. Experiments with English-Portuguese translation in a syntactically motivated phrase-based SMT system and both symbolic and probabilistic WSD models showed significant improvements in BLEU scores.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agirre, E., Màrquez, L., Wicentowski, R.: Proceedings of SemEval-2007 - the Fourth International Workshop on Semantic Evaluations, Prague (2007)
Google Scholar
Bar-Hillel, Y.: The Present Status of Automatic Translations of Languages, 91–163 (1960)
Google Scholar
Brown, P.F., et al.: The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics 19(2) (1993)
Google Scholar
Cabezas, C., Resnik, P.: Using WSD Techniques for Lexical Selection in Statistical Machine Translation. UMIACS Technical Report UMIACS-TR-2005-42 (2005)
Google Scholar
Carpuat, M., Wu, D.: Word Sense Disambiguation vs. Statistical Machine Translation. In: 43rd Annual Meeting of the Association for Computational Linguistics (ACL-2005), Ann Arbor, pp. 387–394 (2005)
Google Scholar
Carpuat, M., Wu, D.: Improving Statistical Machine Translation Using Word Sense Disambiguation. In: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL-2007), Prague, pp. 61–72 (2007)
Google Scholar
Chan, Y.S., Ng, H.T., Chiang, D.: Word Sense Disambiguation Improves Statistical Machine Translation. In: 45th Annual Meeting of the Association for Computational Linguistics (ACL-2007), Prague, pp. 33–40 (2007)
Google Scholar
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm
Nunes, M.G.V., et al.: The design of a Lexicon for Brazilian Portuguese: Lessons learned and Perspectives. In: II Workshop on Computational Processing of Written and Speak Portuguese (Propor), Curitiba, pp. 61–70 (1996)
Google Scholar
Och, F.J.: Minimum error rate training in statistical machine translation. In: 41st Annual Meeting of the Association for Computational Linguistics (ACL-2003), Sapporo, pp. 160–167 (2003)
Google Scholar
Och, F.J., Ney, H.: Improved statistical alignment models. In: 38th Annual Meeting of the Association for Computational Linguistics (ACL-2000), Hong Kong, pp. 440–447 (2000)
Google Scholar
Och, F.J., et al.: A Smorgasbord of Features for Statistical Machine Translation. Human Language Technology / North American Chapter of the Association for Computational Linguistics (HLT/NAACL-04), Boston, pp. 161–168 (2004)
Google Scholar
Papineni, K., et al.: BLEU: a method for automatic evaluation of machine translation. In: 40th Annual Meeting of the Association for Computational Linguistics (ACL-2002), Philadelphia, pp. 311–318 (2002)
Google Scholar
Quirk, C., Menezes, A., Cherry, C.: Dependency Treelet Translation: Syntactically Informed Phrasal SMT. In: 43rd Annual Meeting of the Association for Computational Linguistics (ACL-2005), Ann Arbor, pp. 271–279 (2005)
Google Scholar
Specia, L., Nunes, M.G.V., Stevenson, M.: Exploiting Parallel Texts to Produce a Multilingual Sense-tagged Corpus for Word Sense Disambiguation. Recent Advances in Natural Language Processing (RANLP-2005), Borovets, pp. 525–531 (2005)
Google Scholar
Specia, L., Stevenson, M., Nunes, M.G.V.: Learning Expressive Models for Word Sense Disambiguation. In: 45th Annual Meeting of the Association for Computational Linguistics (ACL-2007), Prague, pp. 41–48 (2007)
Google Scholar
Stevenson, M., Wilks, Y.: The Interaction of Knowledge Sources for Word Sense Disambiguation. Computational Linguistics 27(3), 321–349 (2001)
Article Google Scholar
Toutanova, K., Suzuki, H.: Generating Case Markers in Machine Translation. Human Language Technology / North American Chapter of the Association for Computational Linguistics (HLT/NAACL-2007), Rochester, pp. 49–56 (2007)
Google Scholar
Vickrey, D., et al.: Word-Sense Disambiguation for Machine Translation. Joint Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT/EMNLP-2005), Vancouver, pp. 771–778 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

NILC/ICMC, Universidade de São Paulo, Trabalhador São-Carlense,400, São Carlos, 13560-970, Brazil
Lucia Specia & Maria das Graças Volpe Nunes
Microsoft Research India, “Scientia”, 196/36, 2nd Main, Sadashivanagar, Bangalore, 560080, India
Lucia Specia & Baskaran Sankaran

Authors

Lucia Specia
View author publications
You can also search for this author in PubMed Google Scholar
Baskaran Sankaran
View author publications
You can also search for this author in PubMed Google Scholar
Maria das Graças Volpe Nunes
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Specia, L., Sankaran, B., das Graças Volpe Nunes, M. (2008). n-Best Reranking for the Efficient Integration of Word Sense Disambiguation and Statistical Machine Translation. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2008. Lecture Notes in Computer Science, vol 4919. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78135-6_34

Download citation

DOI: https://doi.org/10.1007/978-3-540-78135-6_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78134-9
Online ISBN: 978-3-540-78135-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics