Skip to main content

Classification-Based Extension of Wordnets from Heterogeneous Resources

  • Conference paper
  • First Online:
  • 832 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8387))

Abstract

This paper presents an automatic and language-independent approach for word net extension by reusing existing freely available bilingual resources, such as machine-readable dictionaries and on-line encyclopaedias. The approach is applied to Slovene and French. The words from the bilingual resources are assigned one or several synset ids based on a classifier that relies on a set of features, the most important one of which is distributional similarity. Automatic, manual and task-based evaluations show good results in terms of both coverage and quality.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://presis.amebis.si/prevajanje/ [3.6.2011].

  2. 2.

    http://translate.google.com/ [3.6.2011].

References

  1. Agirre, E., Soroa, A.: Personalizing pagerank for word sense disambiguation. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL’09). pp. 33–41. Athens, Greece (2009).

    Google Scholar 

  2. Arhar, Š., Gorjanc, V.: Korpus fidaplus: nova generacija slovenskega referenčnega korpusa. Jezik in slovstvo 52(2), 95–110 (2008)

    Google Scholar 

  3. Bernhard, D., Gurevych, I.: Combining lexical semantic resources with question and answer archives for translation-based answer finding. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2, Vol. 2. pp. 728–736. ACL ’09, Association for Computational Linguistics, Stroudsburg, PA, USA (2009). http://dl.acm.org/citation.cfm?id=1690219.1690248

  4. Daumé III, H.: Notes on CG and LM-BFGS optimization of logistic regression (August 2004), paper. http://pub.hal3.name#daume04cg-bfgs, implementation. http://hal3.name/megam/

  5. De Melo, G., Weikum, G.: Towards a universal wordnet by learning from combined evidence. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 513–522. ACM (2009)

    Google Scholar 

  6. Dyvik, H.: Translations as semantic mirrors: from parallel corpus to wordnet. Lang. Comput. 49(1), 311–326 (2004)

    Google Scholar 

  7. Erjavec, T., Fišer, D.: Building the slovene wordnet: first steps, first problems. In: Proceedings of the 3rd International WordNet Conference (GWC’06). vol. 2006 (2006)

    Google Scholar 

  8. Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  9. Ferraresi, A., Bernardini, S., Picci, G., Baroni, M.: Web corpora for bilingual lexicography: a pilot study of english/french collocation extraction and translation. In: Using Corpora in Contrastive and Translation Studies. Cambridge Scholars Publishing, Newcastle (2010)

    Google Scholar 

  10. Fišer, D., Sagot, B.: Combining multiple resources to build reliable wordnets. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2008. LNCS (LNAI), vol. 5246, pp. 61–68. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  11. Fišer, D., Špela Vintar: Uporaba wordneta za boljše razdvoumljanje pri strojnem prevajanju. In: Proceedings of the 13th International Multiconference Information Society (IS’10). Ljubljana, Slovenia (2010)

    Google Scholar 

  12. Fung, P.: A pattern matching method for finding noun and proper noun translations from noisy parallel corpora. In: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, pp. 236–243. Association for Computational Linguistics (1995)

    Google Scholar 

  13. Grad, A., Leeming, H. (eds.): Slovensko-Angleški Slovar. DZS, Ljubljana (1998)

    Google Scholar 

  14. Grad, A., Škerlj, R., Vitorovič, N. (eds.): Angleški-Slovenski Slovar. DZS, Ljubljana (1999)

    Google Scholar 

  15. Ide, N., Erjavec, T., Tufis, D.: Sense discrimination with parallel corpora. In: Proceedings of ACL’02 Workshop on Word Sense Disambiguation: Recent Successes and Future Directions, pp. 54–60. Philadelphia (2002)

    Google Scholar 

  16. Knight, K., Luk, S.K.: Building a large-scale knowledge base for machine translation. AAAI 94, 773–778 (1994)

    Google Scholar 

  17. Korošec, T., Fekonja, M., Jehart, A., Pečelin, F., Ulčar, M., Žabkar, A., Dernovšek, Z.: Vojaški slovar. Ministrstvo za obrambo (2002)

    Google Scholar 

  18. Mouton, C., de Chalendar, G.: Jaws: Just another wordnet subset. Actes de TALN (2010)

    Google Scholar 

  19. Navigli, R., Ponzetto, S.P.: BabelNet: Building a very large multilingual semantic network. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 216–225. Uppsala, Sweden (2010)

    Google Scholar 

  20. Ponzetto, S.P., Navigli, R.: Large-scale taxonomy mapping for restructuring and integrating wikipedia. In: IJCAI, vol. 9, pp. 2083–2088 (2009)

    Google Scholar 

  21. Reiter, N., Hartung, M., Frank, A.: A Resource-poor approach for linking ontology classes to wikipedia articles. In: Bos, J., Delmonte, R. (eds.) Semantics in Text Processing. STEP 2008 Conference Proceedings, Research in Computational Semantics, vol. 1, pp. 381–387. College Publications (2008)

    Google Scholar 

  22. Ruiz-Casado, M., Alfonseca, E., Castells, P.: Automatic assignment of wikipedia encyclopedic entries to wordnet synsets. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 380–386. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  23. Tufis, D.: Balkanet design and development of a multilingual balkan wordnet. Rom. J. Inf. Sci. Technol. 7(1–2), 107–124 (2000)

    Google Scholar 

  24. Vossen, P. (ed.): EuroWordNet : A Multilingual Database with Lexical Semantic Networks for European Languages. Kluwer, Dordrecht (1999)

    Google Scholar 

  25. Widdows, D., Ferraro, K.: Semantic vectors: a scalable open source package and online technology management application. In: LREC (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benoît Sagot .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Sagot, B., Fišer, D. (2014). Classification-Based Extension of Wordnets from Heterogeneous Resources. In: Vetulani, Z., Mariani, J. (eds) Human Language Technology Challenges for Computer Science and Linguistics. LTC 2011. Lecture Notes in Computer Science(), vol 8387. Springer, Cham. https://doi.org/10.1007/978-3-319-08958-4_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08958-4_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08957-7

  • Online ISBN: 978-3-319-08958-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics