Skip to main content

Extending and Updating the Finnish Wordnet

  • Chapter
Book cover Shall We Play the Festschrift Game?

Abstract

This paper presents simple methods for adding new words to a wordnet. We use the Finnish wordnet, FinnWordNet, as an example. We pay particular attention to high- and medium-frequency words thus far missing from FinnWordNet, and arrive at an estimate for the number of culture-specific words among them. We also find that the majority of the high- and medium-frequency words are compounds, which makes them relatively easy to add by using the head word of a compound to locate hypernym synset candidates. Another goal of ours is to add new synonyms to the existing synsets of FinnWordNet. We present a method that finds candidates for new synonyms from a bilingual lexical resource by exploiting the direct word sense translation correspondences between FinnWordNet and the Princeton WordNet. We apply the method to the interlanguage links between articles on the same topic in the Finnish and English Wikipedias on the one hand, and to the translations in the Finnish and English Wiktionaries on the other, and compare the results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.ling.helsinki.fi/en/lt/research/finnwordnet/.

  2. 2.

    http://www.wikipedia.org.

  3. 3.

    http://www.wiktionary.org.

  4. 4.

    http://en.wiktionary.org/wiki/Wiktionary:Main_Page.

  5. 5.

    The list was compiled at CSC in 2004, and it is available online at http://www.csc.fi/tutkimus/alat/kielitiede/taajuussanasto-B9996/view.

  6. 6.

    FTC has been compiled by the Research Institute for the Languages in Finland, the Department of General Linguistics of the University of Helsinki and the Foreign Languages Department of the University of Joensuu. More information is available online at http://www.csc.fi/english/research/software/ftc.

  7. 7.

    http://www.connexor.eu/technology/machinese/glossary/fdg/index.html.

  8. 8.

    http://dumps.wikimedia.org/.

  9. 9.

    http://download.wikimedia.org/fiwiki/20110829/fiwiki-20110829-pages-articles.xml.bz2.

  10. 10.

    http://download.wikimedia.org/fiwiktionary/20110921/fiwiktionary-20110921-pages-articles.xml.bz2.

  11. 11.

    http://download.wikimedia.org/enwiktionary/20110920/enwiktionary-20110920-pages-articles.xml.bz2.

  12. 12.

    https://gna.org/projects/omorfi.

  13. 13.

    The figures for the English Wikipedia are based on the dump of article contents on 4 August 2011.

  14. 14.

    The numbers of unique wordnet nouns |W WN | in Table 8 are slightly smaller than the corresponding \(| \mathit {W}_{ \mathit {N}}( \mathit { \mathit {WN}_{ \mathit {L}}})|\) in Table 3 because of capitalizing the first word of all nouns (or noun phrases) and counting only the unique ones.

  15. 15.

    Even with the titles with disambiguation tags included, the intersection of English Wikipedia titles and PWN nouns is still smaller than the 80,295 reported by Navigli and Ponzetto (2010) because we have omitted the titles of disambiguation and redirection pages.

  16. 16.

    However, the wordnet words include proper nouns and abbreviations.

  17. 17.

    http://www.ling.helsinki.fi/en/lt/research/finnwordnet/testdata/lcfs/.

  18. 18.

    tour noun”, Stevenson (2010). Helsinki University, 18 October 2011, http://www.oxfordreference.com/views/ENTRY.html?subview=Main&entry=t140.e0873630.

References

  • Alkhalifa, Musa, and Horacio Rodríguez. 2009. Automatically extending NE coverage of Arabic WordNet using Wikipedia. In Proceedings of 3rd international conference on arabic language processing (CITALA’09), Rabat, Morocco, 23–30. http://www.emi.ac.ma/citala2009/docs/citala%20papers/%28N%B00%4-Paper%2036%29.pdf.

    Google Scholar 

  • Erdmann, Maike, Kotaro Nakayama, Takahiro Hara, and Shojiro Nishio. 2008. An approach for extracting bilingual terminology from Wikipedia. In Database systems for advanced applications, eds. Jayant Haritsa, Ramamohanarao Kotagiri, and Vikram Pudi. Vol. 4947 of Lecture notes in computer science, 380–392. Heidelberg: Springer. doi:10.1007/978-3-540-78568-2_28.

    Chapter  Google Scholar 

  • Erdmann, Maike, Kotaro Nakayama, Takahiro Hara, and Shojiro Nishio. 2009. Improving the extraction of bilingual terminology from Wikipedia. ACM Transactions on Multimedia Computing Communications and Applications 5: 1–17. doi:10.1145/1596990.1596995.

    Article  Google Scholar 

  • Fellbaum, Christiane, ed. 1998. WordNet: An electronic lexical database. Cambridge: MIT Press.

    MATH  Google Scholar 

  • Jäppinen, Harri, Aarno Lehtola, Eero Nelimarkka, and Matti Ylilammi. 1983. Knowledge engineering approach to morphological analysis. In First conference of the European chapter of ACL, Pisa, Italy, 49–51.

    Google Scholar 

  • Jäppinen, Harri, and Matti Ylilammi. 1986. Associative model of morphological analysis: An empirical inquiry. Computational Linguistics 12: 257–272.

    Google Scholar 

  • Krizhanovsky, Andrew A. 2010. Transformation of Wiktionary entry structure into tables and relations in a relational database schema. CoRR 1011.1368. http://dblp.uni-trier.de/db/journals/corr/corr1011.html#abs-1%011-1368, preprint.

  • Lindén, Krister, and Lauri Carlson. 2010. FinnWordNet—WordNet på finska via översättning. LexicoNordica 17: 119–140.

    Google Scholar 

  • Matuschek, Michael, and Iryna Gurevych. 2010. Beyond the synset: Synonyms in collaboratively constructed semantic resources. In Workshop on computational approaches to synonymy at the symposium on re-thinking synonymy, ed. Antti Arppe, 58–59. Helsinki: University of Helsinki.

    Google Scholar 

  • Medelyan, Olena, David Milne, Catherine Legg, and Ian H. Witten. 2009. Mining meaning from Wikipedia. International Journal of Human-Computer Studies 67: 716–754. doi:10.1016/j.ijhcs.2009.05.004, http://dl.acm.org/citation.cfm?id=1618876.1619040.

    Article  Google Scholar 

  • Meyer, Christian M., and Iryna Gurevych. 2010. Worth its weight in gold or yet another resource—a comparative study of Wiktionary, OpenThesaurus and GermaNet. In Proceedings of the 11th international conference on intelligent text processing and computational linguistics (CICLing 2010), ed. Alexander Gelbukh. Vol. 6008 of Lecture notes in computer science, 38–49. Berlin: Springer. http://www.informatik.tu-darmstadt.de/fileadmin/user_upload/G%roup_UKP/publikationen/2010/cicling2010-meyer-lsrcomparison.pdf.

    Google Scholar 

  • Meyer, Christian M., and Iryna Gurevych. 2011. What psycholinguists know about chemistry: Aligning Wiktionary and WordNet for increased domain coverage. In Proceedings of the 5th international joint conference on natural language processing (IJCNLP), 883–892. http://www.christian-meyer.org/research/publications/ijcnlp20%11/.

    Google Scholar 

  • Navarro, Emmanuel, Franck Sajous, Gau Bruno, Laurent Prévot, Hsieh ShuKai, Kuo Tzu-Yi, Pierre Magistry, and Huang Chu-Ren. 2009. Wiktionary and NLP: Improving synonymy networks. In Proceedings of the 2009 workshop on the people’s Web meets NLP: Collaboratively constructed semantic resources, 19–27. Stroudsburg: Association for Computational Linguistics. http://dl.acm.org/citation.cfm?id=1699765.1699768.

    Chapter  Google Scholar 

  • Navigli, Roberto, and Simone Paolo Ponzetto. 2010. BabelNet: Building a very large multilingual semantic network. In Proceedings of the 48th annual meeting of the Association for Computational Linguistics, Uppsala, Sweden, 216–225.

    Google Scholar 

  • Niemann, Elisabeth, and Iryna Gurevych. 2011. The people’s web meets linguistic knowledge: Automatic sense alignment of Wikipedia and WordNet. In Proceedings of the 9th international conference on computational semantics, Oxford, UK, 205–214. http://dl.acm.org/ft_gateway.cfm?id=2002691&type=pdf.

    Google Scholar 

  • Niemi, Jyrki, Krister Lindén, and Mirka Hyvärinen. 2012. Using a bilingual resource to add synonyms to a wordnet: FinnWordNet and Wikipedia as an example. In Proceedings of the 6th international global Wordnet conference (GWC 2012), 227–231. Matsue: Global WordNet Association. ISBN 978-80-263-0244-5.

    Google Scholar 

  • Pääkkö, Paula, and Krister Lindén. 2012. Finding a location for a new word in WordNet. In Proceedings of the 6th international global Wordnet conference (GWC 2012), 286–293. Matsue: Global WordNet Association. ISBN 978-80-263-0244-5.

    Google Scholar 

  • Ruiz-Casado, Maria, Enrique Alfonseca, and Pablo Castells. 2005. Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets. In Advances in Web intelligence, eds. Piotr Szczepaniak, Janusz Kacprzyk, and Adam Niewiadomski. Vol. 3528 of Lecture notes in computer science, 947–950. Berlin: Springer. doi:10.1007/11495772_59.

    Chapter  Google Scholar 

  • Sjöbergh, Jonas, Olof Sjöbergh, and Kenji Araki. 2008. What types of translations hide in Wikipedia. In Proceedings of the 3rd international conference on large-scale knowledge resources: Construction and application (LKR’08), 59–66. Berlin: Springer. http://dl.acm.org/citation.cfm?id=1787800.1787808.

    Chapter  Google Scholar 

  • Stevenson, Angus, ed. 2010. Oxford dictionary of English. London: Oxford University Press. Oxford Reference Online.

    Google Scholar 

  • Toral, Antonio, Óscar Ferrández, Eneko Agirre, and Rafael Muñoz. 2009. A study on linking Wikipedia categories to Wordnet synsets using text similarity. In Proceedings of the international conference RANLP-2009, 449–454. Borovets: Association for Computational Linguistics. http://www.aclweb.org/anthology/R09-1080.

    Google Scholar 

  • Tyers, Francis M., and Jacques A. Pienaar. 2008. Extracting bilingual word pairs from Wikipedia. In Collaboration: interoperability between people in the creation of language resources for less-resourced languages (A SALTMIL workshop), 19–22. http://ixa2.si.ehu.es/saltmil/files/EBWPFW.pdf.

    Google Scholar 

  • Valkonen, Kari, Harri Jäppinen, and Aarno Lehtola. 1987. Blackboard-based dependency parsing. In Proceedings of IJCAI’87, tenth international joint conference on artificial intelligence, 700–702.

    Google Scholar 

  • Vossen, Piek, ed. 1998. EuroWordNet: A multilingual database with lexical semantic networks. Dordrecht: Kluwer Academic.

    MATH  Google Scholar 

  • Zesch, Torsten, Christof Müller, and Iryna Gurevych. 2008. Extracting lexical semantic knowledge from Wikipedia and Wiktionary. In Proceedings of the international conference on language resources and evaluation, LREC 2008, 1646–1652. Marrakech: European Language Resources Association. http://www.lrec-conf.org/proceedings/lrec2008/summaries/420.html.

    Google Scholar 

Download references

Acknowledgements

We are grateful to our two reviewers, Prof Lars Borin and Dr Antti Arppe, for their many insightful comments that helped us clarify many of our claims and statements. All remaining errors are of course our own.

We also wish to acknowledge the FIN-CLARIN and META-NORD funding for making this work possible. The META-NORD project has received funding from the European Union’s ICT Policy Support Programme as part of the Competitiveness and Innovation Framework Programme under grant agreement no. 270899.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Krister Lindén .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Lindén, K., Niemi, J., Hyvärinen, M. (2012). Extending and Updating the Finnish Wordnet. In: Santos, D., Lindén, K., Ng’ang’a, W. (eds) Shall We Play the Festschrift Game?. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30773-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-30773-7_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-30772-0

  • Online ISBN: 978-3-642-30773-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics