Skip to main content

Unsupervised Approaches for Computing Word Similarity in Portuguese

  • Conference paper
  • First Online:
Book cover Progress in Artificial Intelligence (EPIA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10423))

Included in the following conference series:

Abstract

This paper presents several approaches for computing word similarity in Portuguese and is motivated by the recent availability of state-of-the-art distributional models of Portuguese words, which add to several lexical knowledge bases (LKBs) for this language, available for a longer time. The previous resources were exploited to answer word similarity tests, also recently available for Portuguese. We conclude that there are several valid approaches for this task, but not one that outperforms all the others in every single test. For instance, distributional models seem to capture relatedness better, but LKBs are better suited for computing genuine similarity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Notes

  1. 1.

    https://www.aclweb.org/aclwiki/index.php?title=Similarity_(State_of_the_art).

  2. 2.

    http://www.inf.pucrs.br/linatural/wikimodels/similarity.html.

  3. 3.

    http://metashare.metanet4u.eu/.

  4. 4.

    RG-65 also targets similarity but, as far as we know, the process of differentiating similarity and relatedness was much more thorough in SimLex-999.

  5. 5.

    https://github.com/nlx-group/lx-dsemvectors/.

  6. 6.

    https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md.

  7. 7.

    http://paginas.fe.up.pt/~arocha/AED1/0607/trabalhos/thesaurus.txt.

  8. 8.

    http://pt.wiktionary.org (2015 dump).

  9. 9.

    http://ontopt.dei.uc.pt/.

References

  1. Banjade, R., Maharjan, N., Niraula, N.B., Rus, V., Gautam, D.: Lemon and tea are not similar: measuring word-to-word similarity by combining different methods. In: Gelbukh, A. (ed.) CICLing 2015. LNCS, vol. 9041, pp. 335–346. Springer, Cham (2015). doi:10.1007/978-3-319-18111-0_25

    Chapter  Google Scholar 

  2. Barreiro, A.: ParaMT: a paraphraser for machine translation. In: Teixeira, A., Lima, V.L.S., Oliveira, L.C., Quaresma, P. (eds.) PROPOR 2008. LNCS, vol. 5190, pp. 202–211. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85980-2_21

    Chapter  Google Scholar 

  3. Barreiro, A.: Port4NooJ: an open source, ontology-driven Portuguese linguistic system with applications in machine translation. In: Proceedings of the 2008 International NooJ Conference (NooJ 2008). Newcastle-upon-Tyne: Cambridge Scholars Publishing, Budapest, Hungary (2010)

    Google Scholar 

  4. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016)

  5. Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of lexical semantic relatedness. Comput. Linguist. 32(1), 13–47 (2006)

    Article  Google Scholar 

  6. Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: the concept revisited. ACM Trans. Inf. Syst. 20(1), 116–131 (2002)

    Article  Google Scholar 

  7. Fonseca, E.R., dos Santos, L.B., Criscuolo, M., Aluísio, S.M.: Visão geral da avaliação de similaridade semântica e inferência textual. Linguamática 8(2), 3–13 (2016)

    Google Scholar 

  8. Gonçalo Oliveira, H.: CONTO.PT: groundwork for the automatic creation of a fuzzy Portuguese wordnet. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds.) PROPOR 2016. LNCS, vol. 9727, pp. 283–295. Springer, Cham (2016). doi:10.1007/978-3-319-41552-9_29

    Chapter  Google Scholar 

  9. Gonçalo Oliveira, H.: Comparing and combining Portuguese lexical-semantic knowledge bases. In: Proceedings of the 6th Symposium on Languages, Applications and Technologies (SLATE 2017), pp. 16:1–16:14. OASICS, Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2017)

    Google Scholar 

  10. Gonçalo Oliveira, H., Santos, D., Gomes, P., Seco, N.: PAPEL: a dictionary-based lexical ontology for Portuguese. In: Teixeira, A., Lima, V.L.S., Oliveira, L.C., Quaresma, P. (eds.) PROPOR 2008. LNCS, vol. 5190, pp. 31–40. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85980-2_4

    Chapter  Google Scholar 

  11. Granada, R., Trojahn, C., Vieira, R.: Comparing semantic relatedness between word pairs in Portuguese using wikipedia. In: Baptista, J., Mamede, N., Candeias, S., Paraboni, I., Pardo, T.A.S., Volpe Nunes, M.G. (eds.) PROPOR 2014. LNCS, vol. 8775, pp. 170–175. Springer, Cham (2014). doi:10.1007/978-3-319-09761-9_17

    Chapter  Google Scholar 

  12. Harris, Z.: Distributional structure. Word 10(2–3), 146–162 (1954)

    Article  Google Scholar 

  13. Hill, F., Reichart, R., Korhonen, A.: Simlex-999: evaluating semantic models with genuine similarity estimation. Comput. Linguist. 41(4), 665–695 (2015)

    Article  MathSciNet  Google Scholar 

  14. Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual International Conference on Systems Documentation (SIGDOC 1986), NY, USA, pp. 24–26 (1986)

    Google Scholar 

  15. Luong, T., Socher, R., Manning, C.: Better word representations with recursive neural networks for morphology. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pp. 104–113. ACL Press, Sofia, Bulgaria, August 2013

    Google Scholar 

  16. Maziero, E.G., Pardo, T.A.S., Felippo, A.D., Dias-da-Silva, B.C.: A Base de Dados Lexical e a Interface Web do TeP 2.0 - Thesaurus Eletrônico para o Português do Brasil. In: VI Workshop em Tecnologia da Informação e Linguagem Humana, pp. 390–392. TIL (2008)

    Google Scholar 

  17. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of the Workshop Track of the International Conference on Learning Representations (ICLR), Scottsdale, Arizona (2013)

    Google Scholar 

  18. de Paiva, V., Rademaker, A., de Melo, G.: OpenWordNet-PT: an open brazilian wordnet for reasoning. In: Proceedings of 24th International Conference on Computational Linguistics. COLING (Demo Paper) (2012)

    Google Scholar 

  19. Pilehvar, M.T., Jurgens, D., Navigli, R.: Align, disambiguate and walk: a unified approach for measuring semantic similarity. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013, Sofia, Bulgaria, vol. 1: Long Papers, pp. 1341–1351. ACL Press (2013)

    Google Scholar 

  20. Pilehvar, M.T., Navigli, R.: From senses to texts: an all-in-one graph-based approach for measuring semantic similarity. Artif. Intell. 228, 95–128 (2015)

    Article  MathSciNet  Google Scholar 

  21. Pinheiro, V., Furtado, V., Albuquerque, A.: Semantic textual similarity of Portuguese-language texts: an approach based on the semantic inferentialism model. In: Baptista, J., Mamede, N., Candeias, S., Paraboni, I., Pardo, T.A.S., Volpe Nunes, M.G. (eds.) PROPOR 2014. LNCS, vol. 8775, pp. 183–188. Springer, Cham (2014). doi:10.1007/978-3-319-09761-9_19

    Chapter  Google Scholar 

  22. Rodrigues, J., Branco, A., Neale, S., Silva, J.: LX-DSemVectors: distributional semantics models for Portuguese. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds.) PROPOR 2016. LNCS, vol. 9727, pp. 259–270. Springer, Cham (2016). doi:10.1007/978-3-319-41552-9_27

    Chapter  Google Scholar 

  23. Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Commun. ACM 8(10), 627–633 (1965)

    Article  Google Scholar 

  24. Simões, A., Sanromán, Á.I., Almeida, J.J.: Dicionário-Aberto: a source of resources for the Portuguese language processing. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdigão, F. (eds.) PROPOR 2012. LNCS, vol. 7243, pp. 121–127. Springer, Heidelberg (2012). doi:10.1007/978-3-642-28885-2_14

    Chapter  Google Scholar 

  25. Simões, A., Guinovart, X.G.: Bootstrapping a Portuguese wordnet from galician, spanish and english wordnets. In: Navarro Mesa, J.L., Ortega, A., Teixeira, A., Hernández Pérez, E., Quintana Morales, P., Ravelo García, A., Guerra Moreno, I., Toledano, D.T. (eds.) IberSPEECH 2014. LNCS, vol. 8854, pp. 239–248. Springer, Cham (2014). doi:10.1007/978-3-319-13623-3_25

    Chapter  Google Scholar 

  26. Speer, R., Chin, J., Havasi, C.: Conceptnet 5.5: an open multilingual graph of general knowledge. In: Proceedings of 31st AAAI Conference on Artificial Intelligence, San Francisco, California, USA, pp. 4444–4451 (2017)

    Google Scholar 

  27. Turney, P.D.: Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: Raedt, L., Flach, P. (eds.) ECML 2001. LNCS, vol. 2167, pp. 491–502. Springer, Heidelberg (2001). doi:10.1007/3-540-44795-4_42

    Chapter  Google Scholar 

  28. Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37(1), 141–188 (2010)

    Article  MathSciNet  Google Scholar 

  29. Wilkens, R., Zilio, L., Ferreira, E., Villavicencio, A.: B\(^2\)SG: a TOEFL-like task for Portuguese. In: Proceedings of 10th International Conference on Language Resources and Evaluation. LREC, ELRA (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hugo Gonçalo Oliveira .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Gonçalo Oliveira, H. (2017). Unsupervised Approaches for Computing Word Similarity in Portuguese. In: Oliveira, E., Gama, J., Vale, Z., Lopes Cardoso, H. (eds) Progress in Artificial Intelligence. EPIA 2017. Lecture Notes in Computer Science(), vol 10423. Springer, Cham. https://doi.org/10.1007/978-3-319-65340-2_67

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-65340-2_67

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-65339-6

  • Online ISBN: 978-3-319-65340-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics