The Portuguese B $$^2$$ SG: A Semantic Test for Distributional Thesaurus

Wilkens, Rodrigo; Zilio, Leonardo; Ferreira, Eduardo; Villavicencio, Aline

doi:10.1007/978-3-319-41552-9_33

Rodrigo Wilkens¹⁸,
Leonardo Zilio¹⁸,
Eduardo Ferreira¹⁸ &
…
Aline Villavicencio¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9727))

Included in the following conference series:

International Conference on Computational Processing of the Portuguese Language

598 Accesses
6 Altmetric

Abstract

The lack of availability of gold standards for evaluation of distributional thesauri is a stumbling block that prevents a direct comparison of alternative approaches in a uniform way. Here we present B$^2$SG, a TOEFL-like task for Portuguese that contains 2,875 tests with semantic relations (synonyms, antonyms and hypernyms) for nouns and verbs. The resource is validated by comparing it with lexical resources and by human judgment. The resource was used for evaluating two distributional thesauri: one built from lemmata and the other from surface forms. The evaluation of thesauri demonstrated that the use of lemmata is slightly more accurate than the use surface forms for building distributional thesauri. B$^2$SG is readily available for download (http://www.inf.ufrgs.br/pln/resource/B2SG.zip).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
A preliminary version of $B^2SG$, yet without validation, was presented in [16].
2.
Available at http://www.linguateca.pt/ACDC.
3.
This list of 10,000 words include both target and related words.
4.
The parsed corpus does not include the Corpus Brasileiro [1] because the lemma information is different from the one in the other corpora, since it is not parsed with PALAVRAS.
5.
The results for the 2 criteria were very similar, because both thesauri had good coverage in relation to the test items.
6.
As Pennington, Socher and Manning [14] point out, larger corpora lead to better statistics, so we can assume that a larger corpus of lemmatized forms would present even better results.

References

Berber Sardinha, T., Moreira Filho, J., Alambert, E.: O corpus brasileiro. Comunicaçao ao VII Encontro de Lingüıstica de Corpus (2008)
Google Scholar
Bick, E.: The parsing system Palavras. Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework (2000)
Google Scholar
Bond, F., Paik, K.: A survey of wordnets and their licenses. In: Proceedings of the 6th Global WordNet Conference. pp. 64–71 (2012)
Google Scholar
Dias-da-Silva, B.C., Felippo, A.D., das Graças Volpe Nunes, M.: The automatic mapping of Princeton WordNet lexical-conceptual relations onto the brazilian portuguese wordnet database. In: Proceedings of LREC 2008, European Language Resources Association, Marrakech, Morocco (2008)
Google Scholar
Fellbaum, C.: WordNet. Wiley Online Library, New York (1998)
MATH Google Scholar
Freitag, D., Blume, M., Byrnes, J., Chow, E., Kapadia, S., Rohwer, R., Wang, Z.: New experiments in distributional representations of synonymy. In: Proceedings of the Ninth Conference on Computational Natural Language Learning. pp. 25–32. Association for Computational Linguistics (2005)
Google Scholar
Gonçalo Oliveira, H., Gomes, P.: Towards the automatic creation of a wordnet from a term-based lexical network. In: Proceedings of the ACL Workshop TextGraphs-5: Graph-based Methods for Natural Language Processing. pp. 10–18. ACL Press (July 2010). http://eden.dei.uc.pt/~hroliv/pubs/GoncaloOliveira_Gomes2010_TextGraphs5_postconf.pdf
Landauer, T.K., Dumais, S.T.: A solution to plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104(2), 211 (1997)
Article Google Scholar
Lin, D.: Automatic retrieval and clustering of similar words. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - vol. 2. pp. 768–774. ACL 1998, Association for Computational Linguistics (1998)
Google Scholar
Marrafa, P.: WordNet do Português: uma base de dados de conhecimento linguístico. Instituto de Camões, Lisboa (2002)
Google Scholar
Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, Makuhari, Chiba, Japan, pp. 1045–1048, 26–30 September 2010
Google Scholar
Navigli, R., Ponzetto, S.P.: Babelnet: building a very large multilingual semantic network. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. pp. 216–225. Association for Computational Linguistics (2010)
Google Scholar
de Paiva, V., Rademaker, A., de Melo, G.: OpenWordNet-PT: an open Brazilian WordNet for reasoning. In: Proceedings of the 24th International Conference on Computational Linguistics (2012). http://www.coling2012-iitb.org (Demonstration Paper). Published also asTechreport http://hdl.handle.net/10438/10274
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. EMNLP 14, 1532–1543 (2014)
Google Scholar
Vossen, P. (ed.): EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Kluwer Academic Publishers, Norwell (1998)
MATH Google Scholar
Wilkens, R., Zilio, L., Gonçalves, G., Ferreira, E., Villavicencio, A.: Tesauros distribucionais para o português: avaliação de metodologias. In: Proceedings of STIL 2015. Sociedade Brasileira de Computação (2015)
Google Scholar

Download references

Acknowledgments

This research was partially developed in the context of the project Text Simplification of Complex Expressions, sponsored by Samsung Eletrônica da Amazônia Ltda., in the terms of the Brazilian law n. 8.248/91. This work was also partly supported by CNPq (482520/2012- 4, 312114/2015-0) and FAPERGS AiMWEst.

Author information

Authors and Affiliations

Institute of Informatics, UFRGS, Porto Alegre, Brazil
Rodrigo Wilkens, Leonardo Zilio, Eduardo Ferreira & Aline Villavicencio

Authors

Rodrigo Wilkens
View author publications
You can also search for this author in PubMed Google Scholar
Leonardo Zilio
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo Ferreira
View author publications
You can also search for this author in PubMed Google Scholar
Aline Villavicencio
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rodrigo Wilkens .

Editor information

Editors and Affiliations

Universidade de Lisbon, Portugal
João Silva
ISCTE-IUL, Lisbon, Portugal
Ricardo Ribeiro
Universidade de Évora, Évora, Portugal
Paulo Quaresma
Universidade de Caxias do Sul, Caxias do Suö, Brazil
André Adami
Universidade de Lisbon, Lisboa, Portugal
António Branco

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wilkens, R., Zilio, L., Ferreira, E., Villavicencio, A. (2016). The Portuguese B$^2$SG: A Semantic Test for Distributional Thesaurus. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds) Computational Processing of the Portuguese Language. PROPOR 2016. Lecture Notes in Computer Science(), vol 9727. Springer, Cham. https://doi.org/10.1007/978-3-319-41552-9_33

Download citation

DOI: https://doi.org/10.1007/978-3-319-41552-9_33
Published: 21 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41551-2
Online ISBN: 978-3-319-41552-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The Portuguese B\(^2\)SG: A Semantic Test for Distributional Thesaurus

Abstract

Access this chapter

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

The Portuguese B\(^2\)SG: A Semantic Test for Distributional Thesaurus

Abstract

Access this chapter

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation