Identifying Poorly-Defined Concepts in WordNet with Graph Metrics

McCrae, John P.; Prangnawarat, Narumol

doi:10.1007/978-3-319-68723-0_6

John P. McCrae²⁴ &
Narumol Prangnawarat²⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10579))

Included in the following conference series:

International Semantic Web Conference

628 Accesses

Abstract

Princeton WordNet is the most widely-used lexical resource in natural language processing and continues to provide a gold standard model of semantics. However, there are still significant quality issues with the resource and these affect the performance of all NLP systems built on this resource. One major issue is that many nodes are insufficiently defined and new links need to be added to increase performance in NLP. We combine the use of graph-based metrics with measures of ambiguity in order to predict which synsets are difficult for word sense disambiguation, a major NLP task, which is dependent on good lexical information. We show that this method allows use to find poorly defined nodes with a 89.9% precision, which would assist manual annotators to focus on improving the most in-need parts of the WordNet graph.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Topological properties and organizing principles of semantic networks

Article Open access 20 July 2023

The Role of Common-Sense Knowledge in Assessing Semantic Association

Article 24 September 2018

Reducing Large Semantic Graphs to Improve Semantic Relatedness

Notes

1.
For example: https://lists.princeton.edu/cgi-bin/wa?A2=ind1509&L=wn-users&P= R2&1=wn-users&9=A&I=-3&J=on.
2.
This is calculated as usual as the number of links (triples) divided by the number of nodes (entities) in the graph.
3.
http://web.eecs.umich.edu/~mihalcea/downloads.html#semcor.
4.
The type of the links, such as ‘hypernym’, are ignored in this work.
5.
We use the implementations provided by NetworkX (https://networkx.github.io) for our analysis.
6.
http://www.cs.waikato.ac.nz/ml/weka/.

References

Fellbaum, C.: WordNet. Wiley Online Library, New York (1998)
MATH Google Scholar
Rothe, S., Schütze, H.: Autoextend: extending word embeddings to embeddings for synsets and lexemes. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics. Long Papers, vol. 1 (2015)
Google Scholar
Rychalska, B., Pakulska, K., Chodorowska, K., Walczak, W., Andruszkiewicz, P.: Samsung Poland NLP team at SemEval-2016 task 1: necessity for methods to measure semantic similarity. In: Proceedings of the 10th International Workshop on Semantic Evaluation, pp. 614–620 (2016)
Google Scholar
Agirre, E., Banea, C., Cer, D., Diab, M., Gonzalez-Agirre, A., Mihalcea, R., Rigau, G., Wiebe, J.: Semeval-2016 task 1: semantic textual similarity, monolingual and cross-lingual evaluation. In: Proceedings of the 10th International Workshop on Semantic Evaluation, pp. 509–523 (2016)
Google Scholar
Bond, F., Vossen, P., McCrae, J.P., Fellbaum, C.: CILI: the collaborative interlingual index. In: Proceedings of the Global WordNet Conference 2016 (2016)
Google Scholar
Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., et al.: DBpedia-a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web 6(2), 167–195 (2015)
Google Scholar
Navigli, R., Ponzetto, S.P.: Babelnet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)
Article MATH MathSciNet Google Scholar
Vossen, P., Bond, F., McCrae, J.P.: Toward a truly multilingual Global Wordnet Grid. In: Proceedings of the Global WordNet Conference 2016 (2016)
Google Scholar
Cuadros, M., Rigau, G.: Quality assessment of large scale knowledge resources. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (2006)
Google Scholar
Agirre, E., Soroa, A.: Personalizing Pagerank for word sense disambiguation. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 33–41. Association for Computational Linguistics (2009)
Google Scholar
Lohk, A., Fellbaum, C., Vohandu, L.: Tuning hierarchies in Princeton WordNet. In: Proceedings of the Global WordNet Conference (2016)
Google Scholar
Liu, Y., Yu, J., Wen, Z., Yu, S.: Two kinds of hypernymy faults in wordnet: the cases of ring and isolator. In: Proceedings of the Second Global WordNet Conference, pp. 347–351 (2004)
Google Scholar
Smrž, P.: Quality control for wordnet development. In: Proceedings of the Second International WordNet Conference (2004)
Google Scholar
Nadig, R., Ramanand, J., Bhattacharyya, P.: Automatic evaluation of wordnet synonyms and hypernyms. In: Proceedings of ICON-2008: 6th International Conference on Natural Language Processing, p. 831 (2008)
Google Scholar
Krstev, C., Pavlović-Lažetić, G., Obradović, I., Vitas, D.: Corpora issues in validation of Serbian WordNet. In: Matoušek, V., Mautner, P. (eds.) TSD 2003. LNCS, vol. 2807, pp. 132–137. Springer, Heidelberg (2003). doi:10.1007/978-3-540-39398-6_19
Chapter Google Scholar
Gangemi, A., Guarino, N., Masolo, C., Oltramari, A.: Sweetening WORDNET with DOLCE. AI Mag. 24(3), 13 (2003)
MATH Google Scholar
Kaplan, A.N., Schubert, L.K.: Measuring and improving the quality of world knowledge extracted from wordnet. University of Rochester, Rochester (2001)
Google Scholar
Yong, C., Foo, S.K.: A case study on inter-annotator agreement for word sense disambiguation. In: Proceedings of the ACL SIGLEX Workshop on Standardizing Lexical Resources (SIGLEX 1999), College Park (1999)
Google Scholar
Carpuat, M., Ngai, G., Fung, P., Church, K.: Creating a bilingual ontology: a corpus-based approach for aligning WordNet and HowNet. In: Proceedings of the 1st Global WordNet Conference, pp. 284–292 (2002)
Google Scholar
Bonacich, P.: Power and centrality: a family of measures. Am. J. Sociol. 92(5), 1170–1182 (1987)
Article Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Technical report, Stanford InfoLab (1999)
Google Scholar
Quinlan, J.R., et al.: Learning with continuous classes. In: 5th Australian Joint Conference on Artificial Iintelligence, Singapore, vol. 92, pp. 343–348 (1992)
Google Scholar

Download references

Author information

Authors and Affiliations

Insight Centre for Data Analytics, National University of Ireland, Galway, Republic of Ireland
John P. McCrae & Narumol Prangnawarat

Authors

John P. McCrae
View author publications
You can also search for this author in PubMed Google Scholar
Narumol Prangnawarat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to John P. McCrae .

Editor information

Editors and Affiliations

Digital Humanities Group, KNAW Humanities Cluster, Amsterdam, The Netherlands
Marieke van Erp
University of Leipzig, Leipzig, Germany
Sebastian Hellmann
National University of Ireland, Galway, Ireland
John P. McCrae
Institut für Informatik, Goethe University Frankfurt, Frankfurt, Hessen, Germany
Christian Chiarcos
Division of Web Science and Technology, Department of Computer Science, KAIST, Daejeon, Korea (Republic of)
Key-Sun Choi
Universidad Politécnica de Madrid, Madrid, Spain
Jorge Gracia
Waseda University, Tokyo, Japan
Yoshihiko Hayashi
Ontolonomy LLC, Yokohama, Japan
Seiji Koide
Apple San Francisco, San Francisco, California, USA
Pablo Mendes
Inst für Info & Wirtschaftsinfo, Universität Mannheim, Mannheim, Baden-Württemberg, Germany
Heiko Paulheim
National Institute of Informatics, Tokyo, Japan
Hideaki Takeda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

McCrae, J.P., Prangnawarat, N. (2017). Identifying Poorly-Defined Concepts in WordNet with Graph Metrics. In: van Erp, M., et al. Knowledge Graphs and Language Technology. ISWC 2016. Lecture Notes in Computer Science(), vol 10579. Springer, Cham. https://doi.org/10.1007/978-3-319-68723-0_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-68723-0_6
Published: 29 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68722-3
Online ISBN: 978-3-319-68723-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics