A Probabilistic Model Based on n-Grams for Bilingual Word Sense Disambiguation

Vilariño, Darnes; Pinto, David; Tovar, Mireya; Balderas, Carlos; Beltrán, Beatriz

doi:10.1007/978-3-642-16761-4_8

Darnes Vilariño²²,
David Pinto²²,
Mireya Tovar²²,
Carlos Balderas²² &
…
Beatriz Beltrán²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6437))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

1291 Accesses

Abstract

Word Sense Disambiguation (WSD) is considered one of the most important problems in Natural Language Processing. Even if the problem of WSD is difficult, when we consider its bilingual version, this problem becomes to be much more complex. In this case, it is needed not only to find the correct translation, but this translation must consider the contextual senses of the original sentence (in a source language), in order to find the correct sense (in the target language) of the source word. In this paper we propose a model based on n-grams (3-grams and 5-grams) that significantly outperforms the last results that we presented at the cross-lingual word sense disambiguation task at the SemEval-2 forum. We use a naïve Bayes classifier for determining the probability of a target sense (in a target language) given a sentence which contains the ambiguous word (in a source language). For this purpose, we use a bilingual statistical dictionary, which is calculated with Giza++ by using the EUROPARL parallel corpus, in order to determine the probability of a source word to be translated to a target word (which is assumed to be the correct sense of the source word but in a different language). As we mentioned, the results were compared with those of an international competition, obtaining a good performance.

This work has been partially supported by the CONACYT project #106625, as well as by the PROMEP/103.5/09/4213 grant.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aguirre, E., Edmonds, P.: Word Sense Disambiguation, Text, Speech and Language Technology. Springer, Heidelberg (2006)
Book Google Scholar
Chan, Y., Ng, H., Chiang, D.: Word sense disambiguation improves statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 33–40 (2007)
Google Scholar
Carpuat, M., Wu, D.: Improving statistical machine translation using word sense disambiguation. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLPCoNLL), pp. 61–72 (2007)
Google Scholar
Florian, R., Yarowsky, D.: Modeling consensus: Classifier combination for word sense disambiguation. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, pp. 25–32 (2002)
Google Scholar
Lee, Y.K., Ng, H.T.: An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, pp. 41–48 (2002)
Google Scholar
Mihalcea, R.F., Moldovan, D.I.: Pattern learning and active feature selection for word sense disambiguation. In: Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems (SENSEVAL-2), pp. 127–130 (2001)
Google Scholar
Yarowsky, D., Cucerzan, S., Florian, R., Schafer, C., Wicentowski, R.: The johns hopkins senseval2 system descriptions. In: Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems (SENSEVAL-2), pp. 163–166 (2001)
Google Scholar
Ng, H.T., Wang, B., Chan, Y.S.: Exploiting parallel texts for word sense disambiguation: An empirical study. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pp. 455–462 (2003)
Google Scholar
Alonso, G.B.: Spanish word sense disambiguation with parallel texts (In spanish: Desambiguacion de los sentidos de las palabras en español usando textos paralelos). PhD thesis, Instituto Politécnico Nacional, Centro de Investigación en Computación (2010)
Google Scholar
Sinha, R., McCarthy, D., Mihalcea, R.: Semeval-2010 task 2: Cross-lingual lexical substitution. In: Proceedings of the NAACL HLT Workshop on Semantic Evaluations: Recent Achievements and Future Directions, Association for Computational Linguistics, pp. 76–81 (2009)
Google Scholar
Harris, Z.: Distributional structure. Word 10(23), 146–162 (1954)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Science, Benemérita Universidad Autónoma de Puebla, Mexico
Darnes Vilariño, David Pinto, Mireya Tovar, Carlos Balderas & Beatriz Beltrán

Authors

Darnes Vilariño
View author publications
You can also search for this author in PubMed Google Scholar
David Pinto
View author publications
You can also search for this author in PubMed Google Scholar
Mireya Tovar
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Balderas
View author publications
You can also search for this author in PubMed Google Scholar
Beatriz Beltrán
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Centro de Investigación en Computación, Av. Juan Dios Batiz, s/n, Zacatenco, 07738, Mexico City, México
Grigori Sidorov
Area de Computación, Centro de Investigación en Matemáticas (CIMAT), Callejón de Jalisco s/n, Mineral de Valenciana, 36240, Guanajuato, México
Arturo Hernández Aguirre
Instituto Nacional de Astrofísica, Optica y Electrónica (INAOE), Ciencias Computacionales, Luis Enrique Erro No. 1, 72840, Santa María Tonantzintla, Puebla,, México
Carlos Alberto Reyes García

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vilariño, D., Pinto, D., Tovar, M., Balderas, C., Beltrán, B. (2010). A Probabilistic Model Based on n-Grams for Bilingual Word Sense Disambiguation. In: Sidorov, G., Hernández Aguirre, A., Reyes García, C.A. (eds) Advances in Artificial Intelligence. MICAI 2010. Lecture Notes in Computer Science(), vol 6437. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16761-4_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-16761-4_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16760-7
Online ISBN: 978-3-642-16761-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics