Abstract
This paper is a contribution to the discussion on compiling computational lexical resources from conventional dictionaries. It describes the theoretical as well as practical problems that are encountered when reusing a conventional dictionary for compiling a lexical-semantic resource in terms of a wordnet. More specifically, it describes the methodological issues of compiling a wordnet for Danish, DanNet, from a monolingual basis, and not—as is often seen—by applying the translational expansion method with Princeton WordNet as the English source. Thus, we apply as our basis a large, corpus-based printed dictionary of modern Danish. Using this approach, we discuss the issues of readjusting inconsistent and/or underspecified hyponymy hierarchies taken from the conventional dictionary, sense distinctions as opposed to the synonym sets of wordnets, generating semantic wordnet relations on the basis of sense definitions, and finally, supplementing missing or implicit information.
















Similar content being viewed by others
Notes
Actually, Ruus (1995: 130) argues that some of these hyponyms are characterised by the fact that a limited set of features can distinguish them from each other. She uses Grandy’s terminology and calls such hyponyms contrast sets.
It should be made clear that multiple inheritance is also rather frequent with 1st Order Entities. For instance, in the previously mentioned example of grøntsag (vegetable) several vegetables are encoded partly with plante (plant) or plantedel (part of plant) as hypernym, partly with grøntsag as hypernym.
‘Domain’ is an ontological type that has been inserted by DanNet (not EWN ontology). Such additions are given in cases where large groups of synsets call for a more specific ontological type than what is given by the EuroWordNet Ontology. Another example of a DanNet extension of the ontology is the ontological type BodyPart.
A more detailed account of this is given in Asmussen (2007).
References
Agirre, E., Ansa, O., Arregi, X., Artola, X., Díaz de Ilarraza, A., Lersundi, M., et al. (2000). Extraction of semantic relations from a Basque monolingual dictionary using constraint grammar. In Proceedings from the ninth Euralex international congress (pp. 639–640). Universität Stuttgart.
Asmussen, J. (2007). Korpuslinguistische Verfahren zur Optimierung lexikalisch-semantischer Beschreibungen. In W. Kallmeyer & G. Zifonun (Eds.), Jahrbuch des Instituts für Deutsche Sprache 2006 (pp. 123–151). Berlin and New York: Walter de Gruyter.
Boguraev, B., & Briscoe, T. (Eds.). (1989). Computational lexicography for natural language processing. London and New York: Longman.
Church, K., & Hanks, P. (1989). Word association norms, mutual information and lexicography. In ACL proceedings, 27th annual meeting, Vancouver.
Cruse, D. A. (1991). Lexical semantics. Cambridge: Cambridge University Press.
Cruse, D. A. (2002). Hyponymy and its varieties. In R. Green, C. A. Bean, & S. H. Myaeng (Eds.), The semantics of relationships: An interdisciplinary perspective, information science and knowledge management (pp. 2–21). Springer.
DDO = Hjorth, E., Kristensen, K., et al. (Eds.). (2003–2005). Den Danske Ordbog 1–6 (‘The Danish dictionary 1–6’). Copenhagen: Gyldendal and Society for Danish Language and Literature.
Derwojedowa, M., Piasecki, M., Szpakowicz, S., Zawislawska, M., & Broda, B. (2008). Words, concepts and relations in the construction of the polish WordNet. In Global WordNet Conference 2008 (pp. 162–177). Szeged, Hungary.
Dirven, R., & Verspoor, M. (Eds.). (1998). Cognititive exploration of language and linguistics. Amsterdam/Philadelphia: John Benjamins.
Dunning, T. (1994). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61–74.
Fellbaum, C. (2002). Parallel hierarchies in the verb lexicon. In Proceedings of the OntoLex workshop, LREC (pp. 27–31). Las Palmas, Spain.
Fernández-Montraveta, A., Vázquez, G., & Fellbaum, C. (2008). The Spanish version of WordNet 3.0. Text resources and lexical knowledge. In Text, translation, computational processing (pp. 175–182). Berlin and New York: Mouton de Gruyter.
Fillmore, C. J., Johnson, C. R., & Petruck, M. R. L. (2003). Background to FrameNet. International Journal of Lexicography, 16(3), 235–250 (Oxford: Oxford University Press).
Fontenelle, T. (1997). Using a bilingual dictionary to create semantic networks. International Journal of Lexicography, 10(4), 275–303.
Guarino, N. (1998). Some ontological principles for designing upper level lexical resources. In Proceedings from the first international conference on language resources and evaluation (pp. 527–534). Granada.
Guarino, N., & Welty, C. (2002). Identity and subsumption. In R. Green, C. A. Bean & S. H. Myaeng (Eds.), The semantics of relationships: An interdisciplinary perspective, information science and knowledge management. Springer.
Huang, C., Hsiao, P., Su, I., & Ke, X. (2008). Paranymy: Enriching ontological knowledge in WordNets. In Proceedings of the fourth global WordNet conference (pp. 221–228). Szeged, Hungary.
Ide, N., & Véronis, J. (1995). Knowledge extraction from machine-readable dictionaries: An evaluation. In P. Steffens (Ed.), Machine translation and the lexicon, third international EAMT workshop, Heidelberg, April 26–28, 1993, proceedings. Lecture Notes in Computer Science 898, Springer.
Ide, N., & Wilks, Y. (2007). Making sense about senses. In E. Agirre & P. Edmonds (Eds.), Word sense disambiguation—Algorithms and applications. Springer.
Jackson, H. (2002). Lexicography: An introduction. London: Routledge.
Kilgarriff, A. (1997). I don’t believe in word senses. Computers and the Humanities, 31(2), 91–113.
Kokkinakis, D., Toporowska Gronostaj, M., & Warmenius, K. (2000). Annotating, disambiguating & automatically extending the coverage of the Swedish SIMPLE lexicon. In Proceedings from the second international conference on language resources and evaluation (pp. 1397–1405). Athens.
Lenci, A., Bel, N., Busa, F., Calzolari, N., Gola, E., Monachini, M., et al. (2000). SIMPLE—A general framework for the development of multilingual lexicons. International Journal of Lexicography, 13(4), 249–263.
Levin, B. (1993). English verb classes and alternations—A preliminary investigation. Chicago: The University of Chicago Press.
Lorentzen, H. (2004). The Danish dictionary at large: Presentation, problems and perspectives. In G. Williams & S. Vessier (Eds.), Proceedings of the eleventh EURALEX international congress (pp. 285–294). Lorient, France.
Lyons, J. (1977). Semantics. Cambridge: Press Syndicate of the University of Cambridge.
Márton, M., Hatvani, C., Kuti, J., Szarvas, G., Csirik, J., Prószéky, G., et al. (2008). Methods and results of the Hungarian WordNet project. In Proceedings of the fourth global WordNet conference (pp. 311–320). Szeged, Hungary.
Miller, G. A. (1998). Nouns in WordNet. In C. Fellbaum (Ed.), WordNet—An electronic lexical database (pp. 23–47). Cambridge, London: The MIT Press.
Norling-Christensen, O., & Asmussen, J. (1998). The corpus of the Danish dictionary. Lexikos. Afrilex Series, 8, 223–242.
Pedersen, B. S., & Nimb, S. (2000). Semantic encoding of Danish verbs in SIMPLE—Adapting a verb-framed model to a satellite-framed language. In Proceedings from the second international conference on language resources and evaluation (pp. 1405–1412), Language resources and evaluation—LREC 2000, Athens.
Pedersen, B. S., & Paggio, P. (2004). The Danish SIMPLE lexicon and its application in content-based querying. Nordic Journal of Linguistics, 27(1), 97–127.
Pedersen, B. S., & Sørensen, N. H. (2006). Towards sounder taxonomies in wordnets. In A. Oltramari Chu-Ren Huang, A. Lenci, P. Buuitelaar, & C. Fellbaum (Eds.). Ontolex 2006 at 5th international conference on language resources and evaluation (pp. 9–16), Genova, Italy.
Pedersen, B. S., Braasch, A., Henriksen, L., Olsen, S., & Povlsen, C. (2008). Merging a syntactic resource with a WordNet: A feasibility study of a merge between STO and DanNet. In Proceedings from the sixth international conference on language resources and evaluation, Marrakech, Morocco.
Pustejovsky, J. (1995). The generative lexicon. Cambridge, MA: The MIT Press.
Rigau, G., & Agirre, E. (2002). Semi-automatic methods for WordNet construction. In Tutorial at 2002 international WordNet conference, Mysore, India.
Rodríguez, H., Farwell, D., Farreres, J., Bertran, M., Alkhalifa, M., Martí, M. A., et al. (2008). Arabic WordNet: Current state and future extension. In Proceedings of the fourth global WordNet conference (pp. 387–405). Szeged, Hungary.
Ruus, H. (1995). Danske kerneord. Copenhagen: Museum Tusculanums Forlag.
Svensén, B. (1993). Practical lexicography. Principles and methods of dictionary-making. Oxford: Oxford University Press [translated from the Swedish Handbok i lexikografi (1987) by Sykes, J. & Schofield, K.].
Veale, T., & Hao, Y. (2008). Enriching WordNet with folk knowledge and stereotypes. In Proceedings of the fourth global WordNet conference, Szeged, Hungary.
Vossen, P. (Ed.). (1999). EuroWordNet, a multilingual database with lexical semantic networks. The Netherlands: Kluwer.
Vossen, P., Maks, I., Segers, R., & van der Vliet, H. (2008). Integrating lexical units, synsets and ontology in the Cornetto database. In Proceedings from the 6th international conference on language resources and evaluation, language resources and evaluation—LREC 2008, Marrakech, Morocco.
Zgusta, L. (1988). Pragmatics, lexicography and dictionaries of English. World Englishes, 7(3), 243–253.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pedersen, B.S., Nimb, S., Asmussen, J. et al. DanNet: the challenge of compiling a wordnet for Danish by reusing a monolingual dictionary. Lang Resources & Evaluation 43, 269–299 (2009). https://doi.org/10.1007/s10579-009-9092-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-009-9092-1