Semi-supervised Relation Extraction from Monolingual Dictionary for Russian WordNet

Alexeyevsky, Daniil

doi:10.1007/978-3-319-77113-7_38

Semi-supervised Relation Extraction from Monolingual Dictionary for Russian WordNet

Daniil Alexeyevsky¹⁴

Conference paper
First Online: 10 October 2018

879 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10761))

Abstract

Monolingual dictionaries are a voluminous loosely structured source of lexical and ontological information. Numerous attempts were made to extract WordNet or ontology relations from monolingual dictionaries with varying success. Most such attempts are based on morphosyntactic rules. Difficulty of the information extraction task greatly depends on discipline of dictionary editors. Despite frequently being excellent for the human reader the discipline is rarely strict enough to allow effortless data mining on dictionaries.

Here an improvement to rule-based approach to relation extraction is put forward. The improved approach is to automatically cluster similar definitions, then manually create either one or two relation extraction rules per cluster. This helps to reduce amount of annotator work, to increase quality of rule application and to pay more attention to some of rare cases. To group definitions with similar structure mixed n-gram features were employed, their usefulness is discussed.

The work is performed on Big Explanatory Dictionary of Russian language. Definitions are grouped to 100 clusters, annotated and correctness assessed. The average accuracy is 86% for hypernym extraction, which is high for works of the same scope.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Available at https://bitbucket.org/dendik/russian-wordnet-rules, http://www.cicling.org/2016/data/311.

References

Alexeyevsky, D., Temchenko, A.V.: WSD in monolingual dictionaries for Russian WordNet. In: Fellbaum, C., Forŏscu, C., Mititelu, V., Vossen, P. (eds.) Proceedings of the Eighth Global WordNet Conference, pp. 10–15. Bucharest, Romania, January 2016
Google Scholar
Barnbrook, G., Sinclair, J.: Specialised corpus, local and functional grammars. Small Corpus Stud. ELT: Theor. Pract. 5, 237 (2001)
Article Google Scholar
Benitez, L., Cervell, S., Escudero, G., Lopez, M., Rigau, G., Taulé, M.: Methods and Tools for Building the Catalan WordNet. CoRR cmp-lg/9806009 (1998). http://arxiv.org/abs/cmp-lg/9806009
Bordea, G., Buitelaar, P., Faralli, S., Navigli, R.: Semeval-2015 task 17: Taxonomy Extraction Evaluation (TExEval). In: Proceedings of the 9th International Workshop on Semantic Evaluation. Association for Computational Linguistics (2015)
Google Scholar
Bramsen, P., Escobar-Molano, M., Patel, A., Alonso, R.: Extracting social power relationships from natural language. pp. 773–782. Association for Computational Linguistics (2011)
Google Scholar
Fellbaum, C.: WordNet. Wiley Online Library (1998)
Google Scholar
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora, pp. 539–545. Association for Computational Linguistics (1992)
Google Scholar
Kummerfeld, J.K., Hall, D., Curran, J.R., Klein, D.: Parser showdown at the wall street corral: an empirical investigation of error types in parser output, pp. 1048–1059. Association for Computational Linguistics (2012)
Google Scholar
Kuznetsov, S.A.: The Newest Big Explanatory Dictionary of Russian Language. RIPOL-Norint, St.Petersburg (2008)
Google Scholar
Lindén, K., Niemi, J.: Is it possible to create a very large wordnet in 100 days? An evaluation. Lang. Resour. Eval. 48(2), 191–201 (2014)
Article Google Scholar
Navigli, R., Velardi, P.: Learning word-class lattices for definition and hypernym extraction, pp. 1318–1327. Association for Computational Linguistics (2010)
Google Scholar
Oliveira, H.G., Gomes, P.: Automatic Discovery of Fuzzy Synsets from Dictionary Definitions, pp. 1801–1806 (2011)
Google Scholar
Oliveira, H.G., Santos, D., Gomes, P.: Relations extracted from a portuguese dictionary: results and first evaluation, pp. 541–552 (2009)
Google Scholar
Pedersen, B.S., Nimb, S., Asmussen, J., Sørensen, N.H., Trap-Jensen, L., Lorentzen, H.: DanNet: the challenge of compiling a wordnet for Danish by reusing a monolingual dictionary. Lang. Resour. Eval. 43(3), 269–299 (2009). https://doi.org/10.1007/s10579-009-9092-1
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Sabirova, K., Lukanin, A.: Automatic extraction of hypernyms and hyponyms from russian texts. In: Ignatov, D.I., Khachay, M.Y., Panchenko, A., Konstantinova, N., Yavorsky, R., Ustalov, D. (eds.) Supplementary Proceedings of the 3rd International Conference on Analysis of Images, Social Networks and Texts (AIST 2014), vol. 1197, pp. 35–40. Citeseer (2014)
Google Scholar
Segalovich, I.: A Fast Morphological Algorithm with Unknown Word Guessing Induced by a Dictionary for a Web Search Engine, pp. 273–280. Citeseer (2003). https://tech.yandex.ru/mystem/
Van Rossum, G.: Python Programming Language, vol. 41 (2007)
Google Scholar
Vossen, P.: A Multilingual Database with Lexical Semantic Networks. Springer, Dordrecht (1998). https://doi.org/10.1007/978-94-017-1491-4
Wang, T., Hirst, G.: Extracting Synonyms from Dictionary Definitions, pp. 471–477 (2009)
Google Scholar
Weeds, J., Clarke, D., Reffin, J., Weir, D., Keller, B.: Learning to distinguish hypernyms and co-hyponyms, pp. 2249–2259. Dublin City University and Association for Computational Linguistics (2014)
Google Scholar
Yamane, J., Takatani, T., Yamada, H., Miwa, M., Sasaki, Y.: Distributional Hypernym Generation by Jointly Learning Clusters and Projections (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Linguistics, Faculty of Humanities, National Research University Higher School of Economics, Moscow, Russia
Daniil Alexeyevsky

Authors

Daniil Alexeyevsky
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniil Alexeyevsky .

Editor information

Editors and Affiliations

CIC, Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alexeyevsky, D. (2018). Semi-supervised Relation Extraction from Monolingual Dictionary for Russian WordNet. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science(), vol 10761. Springer, Cham. https://doi.org/10.1007/978-3-319-77113-7_38

Download citation

DOI: https://doi.org/10.1007/978-3-319-77113-7_38
Published: 10 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77112-0
Online ISBN: 978-3-319-77113-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics