Abstract
This paper presents an automatic and language-independent approach for word net extension by reusing existing freely available bilingual resources, such as machine-readable dictionaries and on-line encyclopaedias. The approach is applied to Slovene and French. The words from the bilingual resources are assigned one or several synset ids based on a classifier that relies on a set of features, the most important one of which is distributional similarity. Automatic, manual and task-based evaluations show good results in terms of both coverage and quality.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
http://presis.amebis.si/prevajanje/ [3.6.2011].
- 2.
http://translate.google.com/ [3.6.2011].
References
Agirre, E., Soroa, A.: Personalizing pagerank for word sense disambiguation. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL’09). pp. 33–41. Athens, Greece (2009).
Arhar, Š., Gorjanc, V.: Korpus fidaplus: nova generacija slovenskega referenčnega korpusa. Jezik in slovstvo 52(2), 95–110 (2008)
Bernhard, D., Gurevych, I.: Combining lexical semantic resources with question and answer archives for translation-based answer finding. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2, Vol. 2. pp. 728–736. ACL ’09, Association for Computational Linguistics, Stroudsburg, PA, USA (2009). http://dl.acm.org/citation.cfm?id=1690219.1690248
Daumé III, H.: Notes on CG and LM-BFGS optimization of logistic regression (August 2004), paper. http://pub.hal3.name#daume04cg-bfgs, implementation. http://hal3.name/megam/
De Melo, G., Weikum, G.: Towards a universal wordnet by learning from combined evidence. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 513–522. ACM (2009)
Dyvik, H.: Translations as semantic mirrors: from parallel corpus to wordnet. Lang. Comput. 49(1), 311–326 (2004)
Erjavec, T., Fišer, D.: Building the slovene wordnet: first steps, first problems. In: Proceedings of the 3rd International WordNet Conference (GWC’06). vol. 2006 (2006)
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Ferraresi, A., Bernardini, S., Picci, G., Baroni, M.: Web corpora for bilingual lexicography: a pilot study of english/french collocation extraction and translation. In: Using Corpora in Contrastive and Translation Studies. Cambridge Scholars Publishing, Newcastle (2010)
Fišer, D., Sagot, B.: Combining multiple resources to build reliable wordnets. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2008. LNCS (LNAI), vol. 5246, pp. 61–68. Springer, Heidelberg (2008)
Fišer, D., Špela Vintar: Uporaba wordneta za boljše razdvoumljanje pri strojnem prevajanju. In: Proceedings of the 13th International Multiconference Information Society (IS’10). Ljubljana, Slovenia (2010)
Fung, P.: A pattern matching method for finding noun and proper noun translations from noisy parallel corpora. In: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, pp. 236–243. Association for Computational Linguistics (1995)
Grad, A., Leeming, H. (eds.): Slovensko-Angleški Slovar. DZS, Ljubljana (1998)
Grad, A., Škerlj, R., Vitorovič, N. (eds.): Angleški-Slovenski Slovar. DZS, Ljubljana (1999)
Ide, N., Erjavec, T., Tufis, D.: Sense discrimination with parallel corpora. In: Proceedings of ACL’02 Workshop on Word Sense Disambiguation: Recent Successes and Future Directions, pp. 54–60. Philadelphia (2002)
Knight, K., Luk, S.K.: Building a large-scale knowledge base for machine translation. AAAI 94, 773–778 (1994)
Korošec, T., Fekonja, M., Jehart, A., Pečelin, F., Ulčar, M., Žabkar, A., Dernovšek, Z.: Vojaški slovar. Ministrstvo za obrambo (2002)
Mouton, C., de Chalendar, G.: Jaws: Just another wordnet subset. Actes de TALN (2010)
Navigli, R., Ponzetto, S.P.: BabelNet: Building a very large multilingual semantic network. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 216–225. Uppsala, Sweden (2010)
Ponzetto, S.P., Navigli, R.: Large-scale taxonomy mapping for restructuring and integrating wikipedia. In: IJCAI, vol. 9, pp. 2083–2088 (2009)
Reiter, N., Hartung, M., Frank, A.: A Resource-poor approach for linking ontology classes to wikipedia articles. In: Bos, J., Delmonte, R. (eds.) Semantics in Text Processing. STEP 2008 Conference Proceedings, Research in Computational Semantics, vol. 1, pp. 381–387. College Publications (2008)
Ruiz-Casado, M., Alfonseca, E., Castells, P.: Automatic assignment of wikipedia encyclopedic entries to wordnet synsets. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 380–386. Springer, Heidelberg (2005)
Tufis, D.: Balkanet design and development of a multilingual balkan wordnet. Rom. J. Inf. Sci. Technol. 7(1–2), 107–124 (2000)
Vossen, P. (ed.): EuroWordNet : A Multilingual Database with Lexical Semantic Networks for European Languages. Kluwer, Dordrecht (1999)
Widdows, D., Ferraro, K.: Semantic vectors: a scalable open source package and online technology management application. In: LREC (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Sagot, B., Fišer, D. (2014). Classification-Based Extension of Wordnets from Heterogeneous Resources. In: Vetulani, Z., Mariani, J. (eds) Human Language Technology Challenges for Computer Science and Linguistics. LTC 2011. Lecture Notes in Computer Science(), vol 8387. Springer, Cham. https://doi.org/10.1007/978-3-319-08958-4_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-08958-4_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08957-7
Online ISBN: 978-3-319-08958-4
eBook Packages: Computer ScienceComputer Science (R0)