Abstract
In this article we exploit the possibility on bootstrapping an European Portuguese WordNet from the English, Spanish and Galician wordnets using Probabilistic Translation Dictionaries automatically created from parallel corpora.
The process generated a total of 56 770 synsets and 97 058 variants. An evaluation of the results using the Brazilian OpenWordNet-PT as a gold standard resulted on a precision varying from 53% to 75% percent, depending on the cut-line. The results were satisfying and comparable to similar experiments using the WN-Toolkit.
This research has been carried out thanks to the Project SKATeR (TIN2012-38584-C06-01 and TIN2012-38584-C06-04) supported by the Ministry of Economy and Competitiveness of the Spanish Government.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Atserias, J., Villarejo, L., Rigau, G., Agirre, E., Carroll, J., Magnini, B., Vossen, P.: The MEANING Multilingual Central Repository. In: Second International WordNet Conference, pp. 80–210 (2004)
Fernández Montraveta, A., Vázquez, G.: La construcción del wordnet 3.0 en español. In: Castillo, M.A., Platero, J.M.G. (eds.) La Lexicografía en su Dimensión Teórica, pp. 201–220. Universidad de Málaga, Málaga (2010)
Gómez Guinovart, X.: A hybrid corpus-based approach to bilingual terminology extraction. In: Fandiño, I.M.S., Crespo, B. (eds.) Encoding the Past, Decoding the Future: Corpora in the 21st Century, pp. 147–175. Cambridge Scholar Publishing, Newcastle upon Tyne (2012)
Gómez Guinovart, X., Clemente, X.M.G., Pereira, A.G., Lorenzo, V.T.: Galnet: WordNet 3.0 do galego. Linguamática 3(1), 61–67 (2011)
Gómez Guinovart, X., Oliver, T.: Methodology and evaluation of the Galician WordNet expansion with the WN-Toolkit. Procesamiento del Lenguaje Natural 53, 43–50 (2014)
Gonçalo Oliveira, H., Costa, H., Gomes, P.: Extracção de conhecimento léxico-semântico a partir de resumos da Wikipédia. In: Proceedings of INFORUM 2010, Simpósio de Informática. Braga, Portugal (September 2010)
Gonçalo Oliveira, H., Gomes, P.: Towards the automatic creation of a wordnet from a term-based lexical network. In: Proceedings of the ACL Workshop TextGraphs-5: Graph-based Methods for Natural Language Processing, pp. 10–18. ACL Press (July 2010)
Gonçalo Oliveira, H., Gomes, P.: Automatic discovery of fuzzy synsets from dictionary definitions. In: Proceedings of 22nd International Joint Conference on Artificial Intelligence, IJCAI 2011, pp. 1801–1806. AAAI Press, Barcelona (2011)
González, A., Laparra, E., Rigau, G.: Multilingual central repository version 3.0: upgrading a very large lexical knowledge base. In: 6th Global WordNet Conference, Matsue, Japan (2012)
Levenshtein, V.I.: On the minimal redundancy of binary error-correcting codes. Information and Control 28(4), 268–291 (1975)
Maziero, E.G., Pardo, T.A.S., Di Felippo, A., Dias-da Silva, B.C.: A base de dados lexical e a interface Web do TeP 2.0: Thesaurus eletrônico para o português do brasil. In: Companion Proceedings of the XIV Brazilian Symposium on Multimedia and the Web, WebMedia 2008, pp. 390–392. ACM, New York (2008)
de Melo, G., Weikum, G.: Towards a universal wordnet by learning from combined evidence. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, pp. 513–522. ACM, New York (2009)
Miller, G.A.: WordNet: A lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)
Oliver, A.: Wn-toolkit: Automatic generation of wordnets following the expand model. In: Proceedings of the 7th Global WordNetConference, Tartu, Estonia (2014)
Padró, L.: Analizadores multilingües en FreeLing. Linguamática 3(2), 13–20 (2011)
de Paiva, V., Rademaker, A., de Melo, G.: OpenWordNet-PT: An open Brazilian WordNet for reasoning. In: Proceedings of the 24th International Conference on Computational Linguistics (2012)
Simões, A., Almeida, J.J., Carvalho, N.R.: Defining a probabilistic translation dictionaries algebra. In: Correia, L., Reis, L.P., Cascalho, J., Gomes, L., Guerra, H., Cardoso, P. (eds.) XVI Portuguese Conference on Artificial Inteligence - EPIA, pp. 444–455. Angra do Heroismo, Azores (2013)
Simões, A., Guinovart, X.G.: Dictionary Alignment by Rewrite-based Entry Translation. In: Leal, J.P., Rocha, R., Simões, A. (eds.) 2nd Symposium on Languages, Applications and Technologies. OpenAccess Series in Informatics (OASIcs), vol. 29, pp. 237–247. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl (2013)
Simões, A.M., Almeida, J.J.: NATools – a statistical word aligner workbench. Procesamiento del Lenguaje Natural 31, 217–224 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Simões, A., Guinovart, X.G. (2014). Bootstrapping a Portuguese WordNet from Galician, Spanish and English Wordnets. In: Navarro Mesa, J.L., et al. Advances in Speech and Language Technologies for Iberian Languages. Lecture Notes in Computer Science(), vol 8854. Springer, Cham. https://doi.org/10.1007/978-3-319-13623-3_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-13623-3_25
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13622-6
Online ISBN: 978-3-319-13623-3
eBook Packages: Computer ScienceComputer Science (R0)