Skip to main content

Improving Bilingual Lexicon Extraction from Comparable Corpora Using Window-Based and Syntax-Based Models

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8404))

Abstract

This paper proposes two strategies for combining a window-based and a syntax-based context representation for the task of bilingual lexicon extraction from comparable corpora. The first strategy involves combining the scores assigned to translations by both models and using them for ranking and selection; the second strategy involves a combination of the context features provided by the two models prior to applying the lexicon extraction method. The reported results show that the combination of the two context representations significantly improves the performance of bilingual lexicon extraction compared to using each of the representations individually.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Fung, P.: A statistical view on bilingual lexicon extraction: From parallel corpora to non-parallel corpora. In: Farwell, D., Gerber, L., Hovy, E. (eds.) AMTA 1998. LNCS (LNAI), vol. 1529, pp. 1–17. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  2. Rapp, R.: Automatic identification of word translations from unrelated english and german corpora. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL 1999), College Park, MD, USA, pp. 519–526 (1999)

    Google Scholar 

  3. Chiao, Y.C., Zweigenbaum, P.: Looking for candidate translational equivalents in specialized, comparable corpora. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), Tapei, Taiwan, pp. 1208–1212 (2002)

    Google Scholar 

  4. Prochasson, E., Morin, E.: Anchor points for bilingual extraction from small specialized comparable corpora. TAL 50(1), 283–304 (2009)

    Google Scholar 

  5. Yu, K., Tsujii, J.: Extracting bilingual dictionary from comparable corpora with dependency heterogeneity. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL-Short 2009, Boulder, Colorado, Companion Volume: Short Papers, pp. 121–124 (2009)

    Google Scholar 

  6. Laroche, A., Langlais, P.: Revisiting context-based projection methods for term-translation spotting in comparable corpora. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), Beijing, China, pp. 617–625 (2010)

    Google Scholar 

  7. Gaussier, E., Renders, J.M., Matveeva, I., Goutte, C., Déjean, H.: A geometric view on bilingual lexicon extraction from comparable corpora. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), Barcelona, Spain, pp. 526–533 (July 2004)

    Google Scholar 

  8. Morin, E., Daille, B., Takeuchi, K., Kageura, K.: Bilingual Terminology Mining – Using Brain, not brawn comparable corpora. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007), Prague, Czech Republic, pp. 664–671 (2007)

    Google Scholar 

  9. Déjean, H., Sadat, F., Gaussier, E.: An approach based on multilingual thesauri and model combination for bilingual lexicon extraction. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), Taipei, Taiwan, pp. 218–224 (2002)

    Google Scholar 

  10. Otero, P.G.: Evaluating two different methods for the task of extracting bilingual lexicons from comparable corpora. In: Proceedings of LREC 2008 Workshop on Comparable Corpora (LREC 2008), Marrakech, Marroco, pp. 19–26 (2008)

    Google Scholar 

  11. Otero, P.G.: Learning bilingual lexicons from comparable english and spanish corpora. In: Proceedings of Machine Translation Summit XI, pp. 191–198 (2007)

    Google Scholar 

  12. Andrade, D., Matsuzaki, T., Tsujii, J.: Effective use of dependency structure for bilingual lexicon creation. In: Gelbukh, A. (ed.) CICLing 2011, Part II. LNCS, vol. 6609, pp. 80–92. Springer, Heidelberg (2011)

    Google Scholar 

  13. Ismail, A., Manandhar, S.: Bilingual lexicon extraction from comparable corpora using indomain terms. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), Beijing, China, pp. 481–489 (2010)

    Google Scholar 

  14. Bouamor, D., Semmar, N., Zweigenbaum, P.: Context vector disambiguation for bilingual lexicon extraction from comparable corpora. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), Sofia, Bulgaria, pp. 759–764 (2013)

    Google Scholar 

  15. Fano, R.M.: Transmission of Information: A Statistical Theory of Communications. MIT Press, Cambridge (1961)

    Google Scholar 

  16. Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1993)

    Google Scholar 

  17. Salton, G., Lesk, M.E.: Computer evaluation of indexing and text processing. Journal of the Association for Computational Machinery 15(1), 8–36 (1968)

    Article  MATH  Google Scholar 

  18. Grefenstette, G.: Explorations in Automatic Thesaurus Discovery. Kluwer Academic Publisher, Boston (1994)

    Google Scholar 

  19. Lin, D.: Dependency-based evaluation of minipar. In: Proceedings of the Workshop on the Evaluation of Parsing Systems, First International Conference on Language Resources and Evaluation (LREC 1998), Granada, Spain (1998)

    Google Scholar 

  20. Garera, N., Callison-Burch, C., Yarowsky, D.: Improving translation lexicon induction from monolingual corpora via dependency contexts and part-of-speech equivalences. In: Proceedings of the 13th Conference on Computational Natural Language Learning (CoNLL 2009), Boulder, Colorado, pp. 129–137 (2009)

    Google Scholar 

  21. Otero, P.G.: The meaning of syntactic dependencies. Linguistik Online (2008)

    Google Scholar 

  22. Grefenstette, G.: Corpus-derived first, second and third-order word affinities. In: Proceedings of the 6th Congress of the European Association for Lexicography (EURALEX 1994), Amsterdam, The Netherlands, pp. 279–290 (1994)

    Google Scholar 

  23. Aslam, J.A., Montague, M.: Models for Metasearch. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2001), New Orleans, Louisiana, USA, pp. 276–284 (2001)

    Google Scholar 

  24. Groc, C.D.: Babouk: Focused web crawling for corpus compilation and automatic terminology extraction. In: Proceedings of the IEEE-WICACM International Conferences on Web Intelligence, Lyon, France, pp. 497–498 (2011)

    Google Scholar 

  25. Daille, B., Morin, E.: French-english terminology extraction from comparable corpora. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 707–718. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  26. Hazem, A., Morin, E.: Ica for bilingual lexicon extraction from comparable corpora. In: Proceedings of the 5th Workshop on Building and Using Comparable Corpora (BUCC 2012), Istanbul, Turkey (2012)

    Google Scholar 

  27. Manning, D.C., Raghavan, P., Schütze, H.: Introduction to information retrieval. Cambridge University Press (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hazem, A., Morin, E. (2014). Improving Bilingual Lexicon Extraction from Comparable Corpora Using Window-Based and Syntax-Based Models. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-54903-8_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-54902-1

  • Online ISBN: 978-3-642-54903-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics