Skip to main content

Extraction of Hypernyms from Dictionaries with a Little Help from Word Embeddings

  • Conference paper
  • First Online:
Analysis of Images, Social Networks and Texts (AIST 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11179))

Abstract

The paper investigates several techniques for hypernymy extraction from a large collection of dictionary definitions in Russian. First, definitions from different dictionaries are clustered, then single words and multiwords are extracted as hypernym candidates. A classification-based approach on pre-trained word embeddings is implemented as a complementary technique. In total, we extracted about 40K unique hypernym candidates for 22K word entries. Evaluation showed that the proposed methods applied to a large collection of dictionary data are a viable option for automatic extraction of hyponym/hypernym pairs. The obtained data is available for research purposes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A different, though less common approach is to define a word trough its synonyms:

    car – a motorcar or automobile, or through cognate words: running – the action of the verb to run.

  2. 2.

    http://ruscorpora.ru.

  3. 3.

    https://ru.wiktionary.org/.

  4. 4.

    See for example GermaNet, http://www.sfs.uni-tuebingen.de/GermaNet/.

  5. 5.

    The list http://ruscorpora.ru/corpora-freq.html contains 6.8 million bigrams with frequency above 3. We also matched the extracted multiwords with Wikipedia titles, but there were fewer than 9% matches, and we did not use this as a selection criterion.

  6. 6.

    The embeddings can be downloaded from http://rusvectores.org/models/, model ruwikiruscorpora_upos_skipgram_300_2_2018.

  7. 7.

    https://radimrehurek.com/gensim/models/word2vec.html.

  8. 8.

    http://russe.nlpub.org/downloads/.

  9. 9.

    http://scikit-learn.org/stable/modules/svm.html.

  10. 10.

    https://github.com/YARN-semantic-relations/hyponymic-relationship.

References

  1. Baroni, M., Bernardi, R., Do, N.Q., Shan, C.: Entailment above the word level in distributional semantics. In: EACL, pp. 23–32 (2012)

    Google Scholar 

  2. Caraballo, S.A.: Automatic construction of a hypernym-labeled noun hierarchy from text. In: ACL (1999)

    Google Scholar 

  3. Cederberg, S., Widdows, D.: Using LSA and noun coordination information to improve the precision and recall of automatic hyponymy extraction. In: CoNLL, pp. 111–118 (2003)

    Google Scholar 

  4. Fu, R., Guo, J., Qin, B., Che, W., Wang, H., Liu, T.: Learning semantic hierarchies via word embeddings. In: ACL, pp. 1199–1209 (2014)

    Google Scholar 

  5. Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: COLING, pp. 539–545 (1992)

    Google Scholar 

  6. Hu, J., et al.: Enhancing text clustering by leveraging wikipedia semantics. In: SIGIR, pp. 179–186 (2008)

    Google Scholar 

  7. Kiselev, Y., et al.: Russian lexicographic landscape: a tale of 12 dictionaries. In: Dialog (2015)

    Google Scholar 

  8. Kiselev, Y., Porshnev, S., Mukhin, M.: Method of extracting hyponym-hypernym relationships for nouns from definitions of explanatory dictionaries. Softw. Eng. 10, 38–48 (2015). (in Russian)

    Google Scholar 

  9. Kozareva, Z., Riloff, E., Hovy, E.: Semantic class learning from the web with hyponym pattern linkage graphs. In: ACL-HLT, pp. 1048–1056 (2008)

    Google Scholar 

  10. Kutuzov, A., Kuzmenko, E.: WebVectors: a toolkit for building web interfaces for vector semantic models. In: AIST, pp. 155–161 (2017)

    Google Scholar 

  11. Loukachevitch, N., Lashevich, G.: Multiword expressions in Russian thesauri RuThes and RuWordnet. In: AINL, pp. 1–6 (2016)

    Google Scholar 

  12. Mikolov, T., Yih, W.t., Zweig, G.: Linguistic regularities in continuous space word representations. In: NAACL, pp. 746–751 (2013)

    Google Scholar 

  13. Navigli, R., Ponzetto, S.P.: BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)

    Article  MathSciNet  Google Scholar 

  14. Navigli, R., Velardi, P., Faralli, S.: A graph-based algorithm for inducing lexical taxonomies from scratch. In: IJCAI, vol. 11, pp. 1872–1877 (2011)

    Google Scholar 

  15. Ponzetto, S.P., Strube, M.: Deriving a large scale taxonomy from Wikipedia. In: AAAI, vol. 7, pp. 1440–1445 (2007)

    Google Scholar 

  16. Roller, S., Erk, K., Boleda, G.: Inclusive yet selective: supervised distributional hypernymy detection. In: COLING, pp. 1025–1036 (2014)

    Google Scholar 

  17. Sabirova, K., Lukanin, A.: Automatic extraction of hypernyms and hyponyms from russian texts. In: AIST, pp. 35–40 (2014)

    Google Scholar 

  18. Shudo, K., Kurahone, A., Tanabe, T.: A comprehensive dictionary of multiword expressions. In: ACL-HLT, pp. 161–170 (2011)

    Google Scholar 

  19. Shwartz, V., Goldberg, Y., Dagan, I.: Improving hypernymy detection with an integrated path-based and distributional method. In: ACL, pp. 2389–2398 (2016)

    Google Scholar 

  20. Snow, R., Jurafsky, D., Ng, A.Y.: Learning syntactic patterns for automatic hypernym discovery. In: NIPS, pp. 1297–1304 (2005)

    Google Scholar 

  21. Vylomova, E., Rimell, L., Cohn, T., Baldwin, T.: Take and took, gaggle and goose, book and read: evaluating the utility of vector differences for lexical relation learning. In: ACL, pp. 1671–1682 (2016)

    Google Scholar 

  22. Widdows, D., Dorow, B.: A graph model for unsupervised lexical acquisition. In: COLING, pp. 1–7 (2002)

    Google Scholar 

Download references

Acknowledgments

MK was supported by RFBR grant #15-37-50912, PB and YK were supported by RFH grant #16-04-12019.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maria Karyaeva .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Karyaeva, M., Braslavski, P., Kiselev, Y. (2018). Extraction of Hypernyms from Dictionaries with a Little Help from Word Embeddings. In: van der Aalst, W., et al. Analysis of Images, Social Networks and Texts. AIST 2018. Lecture Notes in Computer Science(), vol 11179. Springer, Cham. https://doi.org/10.1007/978-3-030-11027-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-11027-7_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-11026-0

  • Online ISBN: 978-3-030-11027-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics