Abstract
Monolingual dictionaries are a voluminous loosely structured source of lexical and ontological information. Numerous attempts were made to extract WordNet or ontology relations from monolingual dictionaries with varying success. Most such attempts are based on morphosyntactic rules. Difficulty of the information extraction task greatly depends on discipline of dictionary editors. Despite frequently being excellent for the human reader the discipline is rarely strict enough to allow effortless data mining on dictionaries.
Here an improvement to rule-based approach to relation extraction is put forward. The improved approach is to automatically cluster similar definitions, then manually create either one or two relation extraction rules per cluster. This helps to reduce amount of annotator work, to increase quality of rule application and to pay more attention to some of rare cases. To group definitions with similar structure mixed n-gram features were employed, their usefulness is discussed.
The work is performed on Big Explanatory Dictionary of Russian language. Definitions are grouped to 100 clusters, annotated and correctness assessed. The average accuracy is 86% for hypernym extraction, which is high for works of the same scope.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
References
Alexeyevsky, D., Temchenko, A.V.: WSD in monolingual dictionaries for Russian WordNet. In: Fellbaum, C., Forŏscu, C., Mititelu, V., Vossen, P. (eds.) Proceedings of the Eighth Global WordNet Conference, pp. 10–15. Bucharest, Romania, January 2016
Barnbrook, G., Sinclair, J.: Specialised corpus, local and functional grammars. Small Corpus Stud. ELT: Theor. Pract. 5, 237 (2001)
Benitez, L., Cervell, S., Escudero, G., Lopez, M., Rigau, G., Taulé, M.: Methods and Tools for Building the Catalan WordNet. CoRR cmp-lg/9806009 (1998). http://arxiv.org/abs/cmp-lg/9806009
Bordea, G., Buitelaar, P., Faralli, S., Navigli, R.: Semeval-2015 task 17: Taxonomy Extraction Evaluation (TExEval). In: Proceedings of the 9th International Workshop on Semantic Evaluation. Association for Computational Linguistics (2015)
Bramsen, P., Escobar-Molano, M., Patel, A., Alonso, R.: Extracting social power relationships from natural language. pp. 773–782. Association for Computational Linguistics (2011)
Fellbaum, C.: WordNet. Wiley Online Library (1998)
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora, pp. 539–545. Association for Computational Linguistics (1992)
Kummerfeld, J.K., Hall, D., Curran, J.R., Klein, D.: Parser showdown at the wall street corral: an empirical investigation of error types in parser output, pp. 1048–1059. Association for Computational Linguistics (2012)
Kuznetsov, S.A.: The Newest Big Explanatory Dictionary of Russian Language. RIPOL-Norint, St.Petersburg (2008)
Lindén, K., Niemi, J.: Is it possible to create a very large wordnet in 100 days? An evaluation. Lang. Resour. Eval. 48(2), 191–201 (2014)
Navigli, R., Velardi, P.: Learning word-class lattices for definition and hypernym extraction, pp. 1318–1327. Association for Computational Linguistics (2010)
Oliveira, H.G., Gomes, P.: Automatic Discovery of Fuzzy Synsets from Dictionary Definitions, pp. 1801–1806 (2011)
Oliveira, H.G., Santos, D., Gomes, P.: Relations extracted from a portuguese dictionary: results and first evaluation, pp. 541–552 (2009)
Pedersen, B.S., Nimb, S., Asmussen, J., Sørensen, N.H., Trap-Jensen, L., Lorentzen, H.: DanNet: the challenge of compiling a wordnet for Danish by reusing a monolingual dictionary. Lang. Resour. Eval. 43(3), 269–299 (2009). https://doi.org/10.1007/s10579-009-9092-1
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Sabirova, K., Lukanin, A.: Automatic extraction of hypernyms and hyponyms from russian texts. In: Ignatov, D.I., Khachay, M.Y., Panchenko, A., Konstantinova, N., Yavorsky, R., Ustalov, D. (eds.) Supplementary Proceedings of the 3rd International Conference on Analysis of Images, Social Networks and Texts (AIST 2014), vol. 1197, pp. 35–40. Citeseer (2014)
Segalovich, I.: A Fast Morphological Algorithm with Unknown Word Guessing Induced by a Dictionary for a Web Search Engine, pp. 273–280. Citeseer (2003). https://tech.yandex.ru/mystem/
Van Rossum, G.: Python Programming Language, vol. 41 (2007)
Vossen, P.: A Multilingual Database with Lexical Semantic Networks. Springer, Dordrecht (1998). https://doi.org/10.1007/978-94-017-1491-4
Wang, T., Hirst, G.: Extracting Synonyms from Dictionary Definitions, pp. 471–477 (2009)
Weeds, J., Clarke, D., Reffin, J., Weir, D., Keller, B.: Learning to distinguish hypernyms and co-hyponyms, pp. 2249–2259. Dublin City University and Association for Computational Linguistics (2014)
Yamane, J., Takatani, T., Yamada, H., Miwa, M., Sasaki, Y.: Distributional Hypernym Generation by Jointly Learning Clusters and Projections (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Alexeyevsky, D. (2018). Semi-supervised Relation Extraction from Monolingual Dictionary for Russian WordNet. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science(), vol 10761. Springer, Cham. https://doi.org/10.1007/978-3-319-77113-7_38
Download citation
DOI: https://doi.org/10.1007/978-3-319-77113-7_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77112-0
Online ISBN: 978-3-319-77113-7
eBook Packages: Computer ScienceComputer Science (R0)