Skip to main content

Cross-Lingual Word Sense Clustering for Sense Disambiguation

  • Conference paper
  • First Online:
Progress in Artificial Intelligence (EPIA 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9273))

Included in the following conference series:

  • 3864 Accesses

Abstract

Translation is one of the areas where word disambiguation must be solved in order to find out adequate translations for such words in the contexts where they occur. In this paper, a Word Sense Disambiguation (WSD) approach using Word Sense Clustering within a cross-lingual strategy is proposed. Available sentence-aligned parallel corpora are used as a reliable knowledge source. English is taken as the source language, and Portuguese, French or Spanish as the targets. Clusters are built based on the correlation between senses, which is measured by a language-independent algorithm that uses as features the words near the ambiguous word and its translation in the parallel sentences, together with their relative positions. Clustering quality reached 81% (V-measure) and 92% (F-measure) in average for the three language pairs. Learned clusters are then used to train a support vector machine, whose classification results are used for sense disambiguation. Classification tests showed an average (for the three languages) F-measure of 81%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aires, J., Lopes, G.P., Gomes, L.: Phrase translation extraction from aligned parallel corpora using suffix arrays and related structures. In: Lopes, L.S., Lau, N., Mariano, P., Rocha, L.M. (eds.) EPIA 2009. LNCS, vol. 5816, pp. 587–597. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  2. Apidianaki, M.: Cross-lingual word sense disambiguation using translation sense clustering. In: Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval 2013), pp. 178–182. *SEM and NAACL (2013)

    Google Scholar 

  3. Apidianaki, M., He, Y., et al.: An algorithm for cross-lingual sense-clustering tested in a MT evaluation setting. In: Proceedings of the International Workshop on Spoken Language Translation, pp. 219–226 (2010)

    Google Scholar 

  4. Bansal, M., DeNero, J., Lin, D.: Unsupervised translation sense clustering. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 773–782. Association for Computational Linguistics (2012)

    Google Scholar 

  5. Brown, P.F., Pietra, S.A.D., Pietra, V.J.D., Mercer, R.L.: Word-sense disambiguation using statistical methods. In: Proceedings of the 29th annual meeting on Association for Computational Linguistics, pp. 264–270. Association for Computational Linguistics (1991)

    Google Scholar 

  6. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2(3), 27 (2011)

    Google Scholar 

  7. Diab, M.T.: Word sense disambiguation within a multilingual framework. Ph.D. thesis, University of Maryland at College Park (2003)

    Google Scholar 

  8. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)

    Article  Google Scholar 

  9. Lefever, E., Hoste, V.: Semeval-2010 task 3: Cross-lingual word sense disambiguation. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 15–20. Association for Computational Linguistics (2010)

    Google Scholar 

  10. Lefever, E., Hoste, V., De Cock, M.: Five languages are better than one: an attempt to bypass the data acquisition bottleneck for WSD. In: Gelbukh, A. (ed.) CICLing 2013, Part I. LNCS, vol. 7816, pp. 343–354. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  11. Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)

    Article  MATH  Google Scholar 

  12. Pelleg, D., Moore, A.W., et al.: X-means: Extending k-means with efficient estimation of the number of clusters. In: ICML, pp. 727–734 (2000)

    Google Scholar 

  13. Rijsbergen, V. (ed.): Information Retrieval, 2nd edn. Information Retrieval Group, University of Glasgow (1979)

    Google Scholar 

  14. Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external cluster evaluation measure. EMNLP-CoNLL 7, 410–420 (2007)

    Google Scholar 

  15. Tufiş, D., Ion, R., Ide, N.: Fine-grained word sense disambiguation based on parallel corpora, word alignment, word clustering and aligned wordnets. In: Proceedings of the 20th international conference on Computational Linguistics, p. 1312. Association for Computational Linguistics (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joaquim Ferreira da Silva .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Casteleiro, J., da Silva, J.F., Lopes, G.P. (2015). Cross-Lingual Word Sense Clustering for Sense Disambiguation. In: Pereira, F., Machado, P., Costa, E., Cardoso, A. (eds) Progress in Artificial Intelligence. EPIA 2015. Lecture Notes in Computer Science(), vol 9273. Springer, Cham. https://doi.org/10.1007/978-3-319-23485-4_75

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23485-4_75

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23484-7

  • Online ISBN: 978-3-319-23485-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics