Skip to main content

Identifying Poorly-Defined Concepts in WordNet with Graph Metrics

  • Conference paper
  • First Online:
  • 597 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10579))

Abstract

Princeton WordNet is the most widely-used lexical resource in natural language processing and continues to provide a gold standard model of semantics. However, there are still significant quality issues with the resource and these affect the performance of all NLP systems built on this resource. One major issue is that many nodes are insufficiently defined and new links need to be added to increase performance in NLP. We combine the use of graph-based metrics with measures of ambiguity in order to predict which synsets are difficult for word sense disambiguation, a major NLP task, which is dependent on good lexical information. We show that this method allows use to find poorly defined nodes with a 89.9% precision, which would assist manual annotators to focus on improving the most in-need parts of the WordNet graph.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    For example: https://lists.princeton.edu/cgi-bin/wa?A2=ind1509&L=wn-users&P= R2&1=wn-users&9=A&I=-3&J=on.

  2. 2.

    This is calculated as usual as the number of links (triples) divided by the number of nodes (entities) in the graph.

  3. 3.

    http://web.eecs.umich.edu/~mihalcea/downloads.html#semcor.

  4. 4.

    The type of the links, such as ‘hypernym’, are ignored in this work.

  5. 5.

    We use the implementations provided by NetworkX (https://networkx.github.io) for our analysis.

  6. 6.

    http://www.cs.waikato.ac.nz/ml/weka/.

References

  1. Fellbaum, C.: WordNet. Wiley Online Library, New York (1998)

    MATH  Google Scholar 

  2. Rothe, S., Schütze, H.: Autoextend: extending word embeddings to embeddings for synsets and lexemes. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics. Long Papers, vol. 1 (2015)

    Google Scholar 

  3. Rychalska, B., Pakulska, K., Chodorowska, K., Walczak, W., Andruszkiewicz, P.: Samsung Poland NLP team at SemEval-2016 task 1: necessity for methods to measure semantic similarity. In: Proceedings of the 10th International Workshop on Semantic Evaluation, pp. 614–620 (2016)

    Google Scholar 

  4. Agirre, E., Banea, C., Cer, D., Diab, M., Gonzalez-Agirre, A., Mihalcea, R., Rigau, G., Wiebe, J.: Semeval-2016 task 1: semantic textual similarity, monolingual and cross-lingual evaluation. In: Proceedings of the 10th International Workshop on Semantic Evaluation, pp. 509–523 (2016)

    Google Scholar 

  5. Bond, F., Vossen, P., McCrae, J.P., Fellbaum, C.: CILI: the collaborative interlingual index. In: Proceedings of the Global WordNet Conference 2016 (2016)

    Google Scholar 

  6. Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., et al.: DBpedia-a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web 6(2), 167–195 (2015)

    Google Scholar 

  7. Navigli, R., Ponzetto, S.P.: Babelnet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  8. Vossen, P., Bond, F., McCrae, J.P.: Toward a truly multilingual Global Wordnet Grid. In: Proceedings of the Global WordNet Conference 2016 (2016)

    Google Scholar 

  9. Cuadros, M., Rigau, G.: Quality assessment of large scale knowledge resources. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (2006)

    Google Scholar 

  10. Agirre, E., Soroa, A.: Personalizing Pagerank for word sense disambiguation. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 33–41. Association for Computational Linguistics (2009)

    Google Scholar 

  11. Lohk, A., Fellbaum, C., Vohandu, L.: Tuning hierarchies in Princeton WordNet. In: Proceedings of the Global WordNet Conference (2016)

    Google Scholar 

  12. Liu, Y., Yu, J., Wen, Z., Yu, S.: Two kinds of hypernymy faults in wordnet: the cases of ring and isolator. In: Proceedings of the Second Global WordNet Conference, pp. 347–351 (2004)

    Google Scholar 

  13. Smrž, P.: Quality control for wordnet development. In: Proceedings of the Second International WordNet Conference (2004)

    Google Scholar 

  14. Nadig, R., Ramanand, J., Bhattacharyya, P.: Automatic evaluation of wordnet synonyms and hypernyms. In: Proceedings of ICON-2008: 6th International Conference on Natural Language Processing, p. 831 (2008)

    Google Scholar 

  15. Krstev, C., Pavlović-Lažetić, G., Obradović, I., Vitas, D.: Corpora issues in validation of Serbian WordNet. In: Matoušek, V., Mautner, P. (eds.) TSD 2003. LNCS, vol. 2807, pp. 132–137. Springer, Heidelberg (2003). doi:10.1007/978-3-540-39398-6_19

    Chapter  Google Scholar 

  16. Gangemi, A., Guarino, N., Masolo, C., Oltramari, A.: Sweetening WORDNET with DOLCE. AI Mag. 24(3), 13 (2003)

    MATH  Google Scholar 

  17. Kaplan, A.N., Schubert, L.K.: Measuring and improving the quality of world knowledge extracted from wordnet. University of Rochester, Rochester (2001)

    Google Scholar 

  18. Yong, C., Foo, S.K.: A case study on inter-annotator agreement for word sense disambiguation. In: Proceedings of the ACL SIGLEX Workshop on Standardizing Lexical Resources (SIGLEX 1999), College Park (1999)

    Google Scholar 

  19. Carpuat, M., Ngai, G., Fung, P., Church, K.: Creating a bilingual ontology: a corpus-based approach for aligning WordNet and HowNet. In: Proceedings of the 1st Global WordNet Conference, pp. 284–292 (2002)

    Google Scholar 

  20. Bonacich, P.: Power and centrality: a family of measures. Am. J. Sociol. 92(5), 1170–1182 (1987)

    Article  Google Scholar 

  21. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Technical report, Stanford InfoLab (1999)

    Google Scholar 

  22. Quinlan, J.R., et al.: Learning with continuous classes. In: 5th Australian Joint Conference on Artificial Iintelligence, Singapore, vol. 92, pp. 343–348 (1992)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John P. McCrae .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

McCrae, J.P., Prangnawarat, N. (2017). Identifying Poorly-Defined Concepts in WordNet with Graph Metrics. In: van Erp, M., et al. Knowledge Graphs and Language Technology. ISWC 2016. Lecture Notes in Computer Science(), vol 10579. Springer, Cham. https://doi.org/10.1007/978-3-319-68723-0_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-68723-0_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-68722-3

  • Online ISBN: 978-3-319-68723-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics