Skip to main content

LAR-WordNet: A Machine-Translated, Pan-Hispanic and Regional WordNet for Spanish

  • Conference paper
  • First Online:
Book cover Advances in Artificial Intelligence - IBERAMIA 2018 (IBERAMIA 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11238))

Included in the following conference series:

Abstract

WordNet is one of the most used resources in Natural Language Processing (NLP). However, the only WordNet available for Spanish is mainly representative of Spain and its size is approximately 50 % compared to Princeton’s WordNet in English. To address these issues, we automatically translate the Princeton version using lemmas and sentences from all the available corpora annotated with WordNet senses (LAS-WordNet). In addition, we enrich the translated version using lexicons that contain Pan-Hispanic regionalisms extracted from Twitter (LAR-WordNet). The proposed resources were evaluated in the task of Semantic Textual Similarity in Spanish and cross-lingual between Spanish and English. The results showed that LAS-WordNet significantly outperformed the current Spanish WordNet and that the regionalisms added to LAR-WordNet do not hinder its performance. Although the proposed resources are noisier than the current WordNet in Spanish, their size and representativeness make them suitable for many NLP applications.

Supported by the Asociación de Amigos del Instituto Caro y Cuervo.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/sgjimenezv/wordnet_3_0_glosstag.

  2. 2.

    Available at https://www.datos.gov.co/browse?q=regionalismos%20ejemplos.

  3. 3.

    https://www.google.com/drive/.

  4. 4.

    LAS-WordNet and LAR-WordNet are available at https://www.datos.gov.co/browse?q=wordnet.

  5. 5.

    https://en.wikipedia.org/wiki/SemEval.

  6. 6.

    STS Benchmark http://ixa2.si.ehu.es/stswiki/index.php/STSbenchmark.

  7. 7.

    http://adimen.si.ehu.es/web/mcr/.

References

  1. Agirre, E., et al.: Semeval-2015 task 2: Semantic textual similarity, English, Spanish and pilot on interpretability. In: Proceedings of SemEval 2015, pp. 252–263. ACL (2015)

    Google Scholar 

  2. Agirre, E., et al.: Semeval-2014 task 10: Multilingual semantic textual similarity. In: Proceedings of SemEval 2014, pp. 81–91. ACL and Dublin City University (2014)

    Google Scholar 

  3. Agirre, E., et al.: Semeval-2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation. In: Proceedings of SemEval-2016, pp. 497–511. ACL (2016)

    Google Scholar 

  4. Bird, S., Loper, E.: NLTK: the natural language toolkit. In: Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions, p. 31. ACL (2004)

    Google Scholar 

  5. Bond, F., et al.: Open multilingual wordnet. Web page of the resource and project (2013). http://compling.hss.ntu.edu.sg/omw/

  6. Calvo, H.: Simple TF\(\cdot \)IDF is not the best you can get for regionalism classification. In: Gelbukh, A. (ed.) CICLing 2014. LNCS, vol. 8403, pp. 92–101. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54906-9_8

    Chapter  Google Scholar 

  7. Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., Specia, L.: Semeval-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. In: Proceedings of SemEval-2017, pp. 1–14. ACL (2017)

    Google Scholar 

  8. Edmonds, P., Cotton, S.: Senseval-2: overview. In: The Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems, pp. 1–5. ACL (2001)

    Google Scholar 

  9. Fellbaum, C.: WordNet. Wiley, Hoboken (1998)

    MATH  Google Scholar 

  10. Fernández-Montraveta, A., Vázquez, G., Fellbaum, C.: The Spanish version of WordNet 3.0. Mouton de Gruyter, Berlin

    Chapter  Google Scholar 

  11. Gonzalez-Agirre, A., Laparra, E., Rigau, G.: Multilingual central repository version 3.0. In: LREC, pp. 2525–2529 (2012)

    Google Scholar 

  12. Jimenez, S., Becerra, C., Gelbukh, A., Gonzalez, F.: Generalized Mongue-Elkan method for approximate text string comparison. In: Gelbukh, A. (ed.) CICLing 2009. LNCS, vol. 5449, pp. 559–570. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00382-0_45

    Chapter  Google Scholar 

  13. Jimenez, S., Dueñas, G., Gelbukh, A., Rodriguez-Diaz, C.A., Mancera, S.: Automatic detection of regional words from twitter for the Pan-Hispanic Spanish (2018, to appear)

    Google Scholar 

  14. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  15. Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  16. Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Lang. Cogn. Processes 6(1), 1–28 (1991)

    Article  MathSciNet  Google Scholar 

  17. Miller, G.A., Chodorow, M., Landes, S., Leacock, C., Thomas, R.G.: Using a semantic concordance for sense identification. In: Proceedings of the workshop on Human Language Technology, pp. 240–243. ACL (1994)

    Google Scholar 

  18. Monge, A.E., Elkan, C., et al.: The field matching problem: algorithms and applications. In: KDD, pp. 267–270 (1996)

    Google Scholar 

  19. Moro, A., Navigli, R.: Semeval-2015 task 13: multilingual all-words sense disambiguation and entity linking. In: Proceedings of SemEval 2015, pp. 288–297. ACL (2015)

    Google Scholar 

  20. Navigli, R., Jurgens, D., Vannella, D.: Semeval-2013 task 12: multilingual word sense disambiguation. In: Proceedings of SemEval 2013. vol. 2, pp. 222–231. ACL (2013)

    Google Scholar 

  21. Navigli, R., Ponzetto, S.P.: Babelnet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)

    Article  MathSciNet  Google Scholar 

  22. Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)

    Article  Google Scholar 

  23. Oliver, A., Climent, S.: Parallel corpora for wordnet construction: machine translation vs. automatic sense tagging. In: Gelbukh, A. (ed.) CICLing 2012. LNCS, vol. 7182, pp. 110–121. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28601-8_10

    Chapter  Google Scholar 

  24. Pedersen, T., Pakhomov, S.V., Patwardhan, S., Chute, C.G.: Measures of semantic similarity and relatedness in the biomedical domain. J. Biomed. Inform. 40(3), 288–299 (2007)

    Article  Google Scholar 

  25. Pianta, E., Bentivogli, L., Girardi, C.: Multiwordnet: developing an aligned multilingual database. 1st GWC. In: Proceedings of the First International Conference on Global WordNet, Mysore, India, pp. 293–302 (2002)

    Google Scholar 

  26. Pradhan, S.S., Loper, E., Dligach, D., Palmer, M.: Semeval-2007 task 17: English lexical sample, SRL and all words. In: Proceedings of the 4th International Workshop on Semantic Evaluations, pp. 87–92. ACL (2007)

    Google Scholar 

  27. Raganato, A., Camacho-Collados, J., Navigli, R.: Word sense disambiguation: a unified evaluation framework and empirical comparison. In: Proceedings of the 15th Conference of the European Chapter of the ACL. vol. 1, pp. 99–110 (2017)

    Google Scholar 

  28. Snyder, B., Palmer, M.: The English all-words task. In: Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (2004)

    Google Scholar 

  29. Taghipour, K., Ng, H.T.: One million sense-tagged instances for word sense disambiguation and induction. In: Proceedings of the Nineteenth Conference on Computational Natural Language Learning, pp. 338–344 (2015)

    Google Scholar 

  30. Vossen, P.: Eurowordnet: a multilingual database of autonomous and language-specific wordnets connected via an inter-lingualindex. Int. J. Lexicography 17(2), 161–173 (2004)

    Article  Google Scholar 

  31. Wu, Y., et al.: Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergio Jimenez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jimenez, S., Dueñas, G. (2018). LAR-WordNet: A Machine-Translated, Pan-Hispanic and Regional WordNet for Spanish. In: Simari, G., Fermé, E., Gutiérrez Segura, F., Rodríguez Melquiades, J. (eds) Advances in Artificial Intelligence - IBERAMIA 2018. IBERAMIA 2018. Lecture Notes in Computer Science(), vol 11238. Springer, Cham. https://doi.org/10.1007/978-3-030-03928-8_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-03928-8_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-03927-1

  • Online ISBN: 978-3-030-03928-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics