Abstract
This paper presents simple methods for adding new words to a wordnet. We use the Finnish wordnet, FinnWordNet, as an example. We pay particular attention to high- and medium-frequency words thus far missing from FinnWordNet, and arrive at an estimate for the number of culture-specific words among them. We also find that the majority of the high- and medium-frequency words are compounds, which makes them relatively easy to add by using the head word of a compound to locate hypernym synset candidates. Another goal of ours is to add new synonyms to the existing synsets of FinnWordNet. We present a method that finds candidates for new synonyms from a bilingual lexical resource by exploiting the direct word sense translation correspondences between FinnWordNet and the Princeton WordNet. We apply the method to the interlanguage links between articles on the same topic in the Finnish and English Wikipedias on the one hand, and to the translations in the Finnish and English Wiktionaries on the other, and compare the results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
The list was compiled at CSC in 2004, and it is available online at http://www.csc.fi/tutkimus/alat/kielitiede/taajuussanasto-B9996/view.
- 6.
FTC has been compiled by the Research Institute for the Languages in Finland, the Department of General Linguistics of the University of Helsinki and the Foreign Languages Department of the University of Joensuu. More information is available online at http://www.csc.fi/english/research/software/ftc.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
The figures for the English Wikipedia are based on the dump of article contents on 4 August 2011.
- 14.
- 15.
Even with the titles with disambiguation tags included, the intersection of English Wikipedia titles and PWN nouns is still smaller than the 80,295 reported by Navigli and Ponzetto (2010) because we have omitted the titles of disambiguation and redirection pages.
- 16.
However, the wordnet words include proper nouns and abbreviations.
- 17.
- 18.
“tour noun”, Stevenson (2010). Helsinki University, 18 October 2011, http://www.oxfordreference.com/views/ENTRY.html?subview=Main&entry=t140.e0873630.
References
Alkhalifa, Musa, and Horacio Rodríguez. 2009. Automatically extending NE coverage of Arabic WordNet using Wikipedia. In Proceedings of 3rd international conference on arabic language processing (CITALA’09), Rabat, Morocco, 23–30. http://www.emi.ac.ma/citala2009/docs/citala%20papers/%28N%B00%4-Paper%2036%29.pdf.
Erdmann, Maike, Kotaro Nakayama, Takahiro Hara, and Shojiro Nishio. 2008. An approach for extracting bilingual terminology from Wikipedia. In Database systems for advanced applications, eds. Jayant Haritsa, Ramamohanarao Kotagiri, and Vikram Pudi. Vol. 4947 of Lecture notes in computer science, 380–392. Heidelberg: Springer. doi:10.1007/978-3-540-78568-2_28.
Erdmann, Maike, Kotaro Nakayama, Takahiro Hara, and Shojiro Nishio. 2009. Improving the extraction of bilingual terminology from Wikipedia. ACM Transactions on Multimedia Computing Communications and Applications 5: 1–17. doi:10.1145/1596990.1596995.
Fellbaum, Christiane, ed. 1998. WordNet: An electronic lexical database. Cambridge: MIT Press.
Jäppinen, Harri, Aarno Lehtola, Eero Nelimarkka, and Matti Ylilammi. 1983. Knowledge engineering approach to morphological analysis. In First conference of the European chapter of ACL, Pisa, Italy, 49–51.
Jäppinen, Harri, and Matti Ylilammi. 1986. Associative model of morphological analysis: An empirical inquiry. Computational Linguistics 12: 257–272.
Krizhanovsky, Andrew A. 2010. Transformation of Wiktionary entry structure into tables and relations in a relational database schema. CoRR 1011.1368. http://dblp.uni-trier.de/db/journals/corr/corr1011.html#abs-1%011-1368, preprint.
Lindén, Krister, and Lauri Carlson. 2010. FinnWordNet—WordNet på finska via översättning. LexicoNordica 17: 119–140.
Matuschek, Michael, and Iryna Gurevych. 2010. Beyond the synset: Synonyms in collaboratively constructed semantic resources. In Workshop on computational approaches to synonymy at the symposium on re-thinking synonymy, ed. Antti Arppe, 58–59. Helsinki: University of Helsinki.
Medelyan, Olena, David Milne, Catherine Legg, and Ian H. Witten. 2009. Mining meaning from Wikipedia. International Journal of Human-Computer Studies 67: 716–754. doi:10.1016/j.ijhcs.2009.05.004, http://dl.acm.org/citation.cfm?id=1618876.1619040.
Meyer, Christian M., and Iryna Gurevych. 2010. Worth its weight in gold or yet another resource—a comparative study of Wiktionary, OpenThesaurus and GermaNet. In Proceedings of the 11th international conference on intelligent text processing and computational linguistics (CICLing 2010), ed. Alexander Gelbukh. Vol. 6008 of Lecture notes in computer science, 38–49. Berlin: Springer. http://www.informatik.tu-darmstadt.de/fileadmin/user_upload/G%roup_UKP/publikationen/2010/cicling2010-meyer-lsrcomparison.pdf.
Meyer, Christian M., and Iryna Gurevych. 2011. What psycholinguists know about chemistry: Aligning Wiktionary and WordNet for increased domain coverage. In Proceedings of the 5th international joint conference on natural language processing (IJCNLP), 883–892. http://www.christian-meyer.org/research/publications/ijcnlp20%11/.
Navarro, Emmanuel, Franck Sajous, Gau Bruno, Laurent Prévot, Hsieh ShuKai, Kuo Tzu-Yi, Pierre Magistry, and Huang Chu-Ren. 2009. Wiktionary and NLP: Improving synonymy networks. In Proceedings of the 2009 workshop on the people’s Web meets NLP: Collaboratively constructed semantic resources, 19–27. Stroudsburg: Association for Computational Linguistics. http://dl.acm.org/citation.cfm?id=1699765.1699768.
Navigli, Roberto, and Simone Paolo Ponzetto. 2010. BabelNet: Building a very large multilingual semantic network. In Proceedings of the 48th annual meeting of the Association for Computational Linguistics, Uppsala, Sweden, 216–225.
Niemann, Elisabeth, and Iryna Gurevych. 2011. The people’s web meets linguistic knowledge: Automatic sense alignment of Wikipedia and WordNet. In Proceedings of the 9th international conference on computational semantics, Oxford, UK, 205–214. http://dl.acm.org/ft_gateway.cfm?id=2002691&type=pdf.
Niemi, Jyrki, Krister Lindén, and Mirka Hyvärinen. 2012. Using a bilingual resource to add synonyms to a wordnet: FinnWordNet and Wikipedia as an example. In Proceedings of the 6th international global Wordnet conference (GWC 2012), 227–231. Matsue: Global WordNet Association. ISBN 978-80-263-0244-5.
Pääkkö, Paula, and Krister Lindén. 2012. Finding a location for a new word in WordNet. In Proceedings of the 6th international global Wordnet conference (GWC 2012), 286–293. Matsue: Global WordNet Association. ISBN 978-80-263-0244-5.
Ruiz-Casado, Maria, Enrique Alfonseca, and Pablo Castells. 2005. Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets. In Advances in Web intelligence, eds. Piotr Szczepaniak, Janusz Kacprzyk, and Adam Niewiadomski. Vol. 3528 of Lecture notes in computer science, 947–950. Berlin: Springer. doi:10.1007/11495772_59.
Sjöbergh, Jonas, Olof Sjöbergh, and Kenji Araki. 2008. What types of translations hide in Wikipedia. In Proceedings of the 3rd international conference on large-scale knowledge resources: Construction and application (LKR’08), 59–66. Berlin: Springer. http://dl.acm.org/citation.cfm?id=1787800.1787808.
Stevenson, Angus, ed. 2010. Oxford dictionary of English. London: Oxford University Press. Oxford Reference Online.
Toral, Antonio, Óscar Ferrández, Eneko Agirre, and Rafael Muñoz. 2009. A study on linking Wikipedia categories to Wordnet synsets using text similarity. In Proceedings of the international conference RANLP-2009, 449–454. Borovets: Association for Computational Linguistics. http://www.aclweb.org/anthology/R09-1080.
Tyers, Francis M., and Jacques A. Pienaar. 2008. Extracting bilingual word pairs from Wikipedia. In Collaboration: interoperability between people in the creation of language resources for less-resourced languages (A SALTMIL workshop), 19–22. http://ixa2.si.ehu.es/saltmil/files/EBWPFW.pdf.
Valkonen, Kari, Harri Jäppinen, and Aarno Lehtola. 1987. Blackboard-based dependency parsing. In Proceedings of IJCAI’87, tenth international joint conference on artificial intelligence, 700–702.
Vossen, Piek, ed. 1998. EuroWordNet: A multilingual database with lexical semantic networks. Dordrecht: Kluwer Academic.
Zesch, Torsten, Christof Müller, and Iryna Gurevych. 2008. Extracting lexical semantic knowledge from Wikipedia and Wiktionary. In Proceedings of the international conference on language resources and evaluation, LREC 2008, 1646–1652. Marrakech: European Language Resources Association. http://www.lrec-conf.org/proceedings/lrec2008/summaries/420.html.
Acknowledgements
We are grateful to our two reviewers, Prof Lars Borin and Dr Antti Arppe, for their many insightful comments that helped us clarify many of our claims and statements. All remaining errors are of course our own.
We also wish to acknowledge the FIN-CLARIN and META-NORD funding for making this work possible. The META-NORD project has received funding from the European Union’s ICT Policy Support Programme as part of the Competitiveness and Innovation Framework Programme under grant agreement no. 270899.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Lindén, K., Niemi, J., Hyvärinen, M. (2012). Extending and Updating the Finnish Wordnet. In: Santos, D., Lindén, K., Ng’ang’a, W. (eds) Shall We Play the Festschrift Game?. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30773-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-30773-7_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30772-0
Online ISBN: 978-3-642-30773-7
eBook Packages: Computer ScienceComputer Science (R0)