Abstract
The paper investigates several techniques for hypernymy extraction from a large collection of dictionary definitions in Russian. First, definitions from different dictionaries are clustered, then single words and multiwords are extracted as hypernym candidates. A classification-based approach on pre-trained word embeddings is implemented as a complementary technique. In total, we extracted about 40K unique hypernym candidates for 22K word entries. Evaluation showed that the proposed methods applied to a large collection of dictionary data are a viable option for automatic extraction of hyponym/hypernym pairs. The obtained data is available for research purposes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
A different, though less common approach is to define a word trough its synonyms:
car – a motorcar or automobile, or through cognate words: running – the action of the verb to run.
- 2.
- 3.
- 4.
See for example GermaNet, http://www.sfs.uni-tuebingen.de/GermaNet/.
- 5.
The list http://ruscorpora.ru/corpora-freq.html contains 6.8 million bigrams with frequency above 3. We also matched the extracted multiwords with Wikipedia titles, but there were fewer than 9% matches, and we did not use this as a selection criterion.
- 6.
The embeddings can be downloaded from http://rusvectores.org/models/, model ruwikiruscorpora_upos_skipgram_300_2_2018.
- 7.
- 8.
- 9.
- 10.
References
Baroni, M., Bernardi, R., Do, N.Q., Shan, C.: Entailment above the word level in distributional semantics. In: EACL, pp. 23–32 (2012)
Caraballo, S.A.: Automatic construction of a hypernym-labeled noun hierarchy from text. In: ACL (1999)
Cederberg, S., Widdows, D.: Using LSA and noun coordination information to improve the precision and recall of automatic hyponymy extraction. In: CoNLL, pp. 111–118 (2003)
Fu, R., Guo, J., Qin, B., Che, W., Wang, H., Liu, T.: Learning semantic hierarchies via word embeddings. In: ACL, pp. 1199–1209 (2014)
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: COLING, pp. 539–545 (1992)
Hu, J., et al.: Enhancing text clustering by leveraging wikipedia semantics. In: SIGIR, pp. 179–186 (2008)
Kiselev, Y., et al.: Russian lexicographic landscape: a tale of 12 dictionaries. In: Dialog (2015)
Kiselev, Y., Porshnev, S., Mukhin, M.: Method of extracting hyponym-hypernym relationships for nouns from definitions of explanatory dictionaries. Softw. Eng. 10, 38–48 (2015). (in Russian)
Kozareva, Z., Riloff, E., Hovy, E.: Semantic class learning from the web with hyponym pattern linkage graphs. In: ACL-HLT, pp. 1048–1056 (2008)
Kutuzov, A., Kuzmenko, E.: WebVectors: a toolkit for building web interfaces for vector semantic models. In: AIST, pp. 155–161 (2017)
Loukachevitch, N., Lashevich, G.: Multiword expressions in Russian thesauri RuThes and RuWordnet. In: AINL, pp. 1–6 (2016)
Mikolov, T., Yih, W.t., Zweig, G.: Linguistic regularities in continuous space word representations. In: NAACL, pp. 746–751 (2013)
Navigli, R., Ponzetto, S.P.: BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)
Navigli, R., Velardi, P., Faralli, S.: A graph-based algorithm for inducing lexical taxonomies from scratch. In: IJCAI, vol. 11, pp. 1872–1877 (2011)
Ponzetto, S.P., Strube, M.: Deriving a large scale taxonomy from Wikipedia. In: AAAI, vol. 7, pp. 1440–1445 (2007)
Roller, S., Erk, K., Boleda, G.: Inclusive yet selective: supervised distributional hypernymy detection. In: COLING, pp. 1025–1036 (2014)
Sabirova, K., Lukanin, A.: Automatic extraction of hypernyms and hyponyms from russian texts. In: AIST, pp. 35–40 (2014)
Shudo, K., Kurahone, A., Tanabe, T.: A comprehensive dictionary of multiword expressions. In: ACL-HLT, pp. 161–170 (2011)
Shwartz, V., Goldberg, Y., Dagan, I.: Improving hypernymy detection with an integrated path-based and distributional method. In: ACL, pp. 2389–2398 (2016)
Snow, R., Jurafsky, D., Ng, A.Y.: Learning syntactic patterns for automatic hypernym discovery. In: NIPS, pp. 1297–1304 (2005)
Vylomova, E., Rimell, L., Cohn, T., Baldwin, T.: Take and took, gaggle and goose, book and read: evaluating the utility of vector differences for lexical relation learning. In: ACL, pp. 1671–1682 (2016)
Widdows, D., Dorow, B.: A graph model for unsupervised lexical acquisition. In: COLING, pp. 1–7 (2002)
Acknowledgments
MK was supported by RFBR grant #15-37-50912, PB and YK were supported by RFH grant #16-04-12019.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Karyaeva, M., Braslavski, P., Kiselev, Y. (2018). Extraction of Hypernyms from Dictionaries with a Little Help from Word Embeddings. In: van der Aalst, W., et al. Analysis of Images, Social Networks and Texts. AIST 2018. Lecture Notes in Computer Science(), vol 11179. Springer, Cham. https://doi.org/10.1007/978-3-030-11027-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-11027-7_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11026-0
Online ISBN: 978-3-030-11027-7
eBook Packages: Computer ScienceComputer Science (R0)