Abstract
Princeton WordNet is the most widely-used lexical resource in natural language processing and continues to provide a gold standard model of semantics. However, there are still significant quality issues with the resource and these affect the performance of all NLP systems built on this resource. One major issue is that many nodes are insufficiently defined and new links need to be added to increase performance in NLP. We combine the use of graph-based metrics with measures of ambiguity in order to predict which synsets are difficult for word sense disambiguation, a major NLP task, which is dependent on good lexical information. We show that this method allows use to find poorly defined nodes with a 89.9% precision, which would assist manual annotators to focus on improving the most in-need parts of the WordNet graph.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
This is calculated as usual as the number of links (triples) divided by the number of nodes (entities) in the graph.
- 3.
- 4.
The type of the links, such as ‘hypernym’, are ignored in this work.
- 5.
We use the implementations provided by NetworkX (https://networkx.github.io) for our analysis.
- 6.
References
Fellbaum, C.: WordNet. Wiley Online Library, New York (1998)
Rothe, S., Schütze, H.: Autoextend: extending word embeddings to embeddings for synsets and lexemes. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics. Long Papers, vol. 1 (2015)
Rychalska, B., Pakulska, K., Chodorowska, K., Walczak, W., Andruszkiewicz, P.: Samsung Poland NLP team at SemEval-2016 task 1: necessity for methods to measure semantic similarity. In: Proceedings of the 10th International Workshop on Semantic Evaluation, pp. 614–620 (2016)
Agirre, E., Banea, C., Cer, D., Diab, M., Gonzalez-Agirre, A., Mihalcea, R., Rigau, G., Wiebe, J.: Semeval-2016 task 1: semantic textual similarity, monolingual and cross-lingual evaluation. In: Proceedings of the 10th International Workshop on Semantic Evaluation, pp. 509–523 (2016)
Bond, F., Vossen, P., McCrae, J.P., Fellbaum, C.: CILI: the collaborative interlingual index. In: Proceedings of the Global WordNet Conference 2016 (2016)
Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., et al.: DBpedia-a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web 6(2), 167–195 (2015)
Navigli, R., Ponzetto, S.P.: Babelnet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)
Vossen, P., Bond, F., McCrae, J.P.: Toward a truly multilingual Global Wordnet Grid. In: Proceedings of the Global WordNet Conference 2016 (2016)
Cuadros, M., Rigau, G.: Quality assessment of large scale knowledge resources. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (2006)
Agirre, E., Soroa, A.: Personalizing Pagerank for word sense disambiguation. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 33–41. Association for Computational Linguistics (2009)
Lohk, A., Fellbaum, C., Vohandu, L.: Tuning hierarchies in Princeton WordNet. In: Proceedings of the Global WordNet Conference (2016)
Liu, Y., Yu, J., Wen, Z., Yu, S.: Two kinds of hypernymy faults in wordnet: the cases of ring and isolator. In: Proceedings of the Second Global WordNet Conference, pp. 347–351 (2004)
Smrž, P.: Quality control for wordnet development. In: Proceedings of the Second International WordNet Conference (2004)
Nadig, R., Ramanand, J., Bhattacharyya, P.: Automatic evaluation of wordnet synonyms and hypernyms. In: Proceedings of ICON-2008: 6th International Conference on Natural Language Processing, p. 831 (2008)
Krstev, C., Pavlović-Lažetić, G., Obradović, I., Vitas, D.: Corpora issues in validation of Serbian WordNet. In: Matoušek, V., Mautner, P. (eds.) TSD 2003. LNCS, vol. 2807, pp. 132–137. Springer, Heidelberg (2003). doi:10.1007/978-3-540-39398-6_19
Gangemi, A., Guarino, N., Masolo, C., Oltramari, A.: Sweetening WORDNET with DOLCE. AI Mag. 24(3), 13 (2003)
Kaplan, A.N., Schubert, L.K.: Measuring and improving the quality of world knowledge extracted from wordnet. University of Rochester, Rochester (2001)
Yong, C., Foo, S.K.: A case study on inter-annotator agreement for word sense disambiguation. In: Proceedings of the ACL SIGLEX Workshop on Standardizing Lexical Resources (SIGLEX 1999), College Park (1999)
Carpuat, M., Ngai, G., Fung, P., Church, K.: Creating a bilingual ontology: a corpus-based approach for aligning WordNet and HowNet. In: Proceedings of the 1st Global WordNet Conference, pp. 284–292 (2002)
Bonacich, P.: Power and centrality: a family of measures. Am. J. Sociol. 92(5), 1170–1182 (1987)
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Technical report, Stanford InfoLab (1999)
Quinlan, J.R., et al.: Learning with continuous classes. In: 5th Australian Joint Conference on Artificial Iintelligence, Singapore, vol. 92, pp. 343–348 (1992)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
McCrae, J.P., Prangnawarat, N. (2017). Identifying Poorly-Defined Concepts in WordNet with Graph Metrics. In: van Erp, M., et al. Knowledge Graphs and Language Technology. ISWC 2016. Lecture Notes in Computer Science(), vol 10579. Springer, Cham. https://doi.org/10.1007/978-3-319-68723-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-68723-0_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68722-3
Online ISBN: 978-3-319-68723-0
eBook Packages: Computer ScienceComputer Science (R0)