Abstract
This paper proposes two strategies for combining a window-based and a syntax-based context representation for the task of bilingual lexicon extraction from comparable corpora. The first strategy involves combining the scores assigned to translations by both models and using them for ranking and selection; the second strategy involves a combination of the context features provided by the two models prior to applying the lexicon extraction method. The reported results show that the combination of the two context representations significantly improves the performance of bilingual lexicon extraction compared to using each of the representations individually.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Fung, P.: A statistical view on bilingual lexicon extraction: From parallel corpora to non-parallel corpora. In: Farwell, D., Gerber, L., Hovy, E. (eds.) AMTA 1998. LNCS (LNAI), vol. 1529, pp. 1–17. Springer, Heidelberg (1998)
Rapp, R.: Automatic identification of word translations from unrelated english and german corpora. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL 1999), College Park, MD, USA, pp. 519–526 (1999)
Chiao, Y.C., Zweigenbaum, P.: Looking for candidate translational equivalents in specialized, comparable corpora. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), Tapei, Taiwan, pp. 1208–1212 (2002)
Prochasson, E., Morin, E.: Anchor points for bilingual extraction from small specialized comparable corpora. TAL 50(1), 283–304 (2009)
Yu, K., Tsujii, J.: Extracting bilingual dictionary from comparable corpora with dependency heterogeneity. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL-Short 2009, Boulder, Colorado, Companion Volume: Short Papers, pp. 121–124 (2009)
Laroche, A., Langlais, P.: Revisiting context-based projection methods for term-translation spotting in comparable corpora. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), Beijing, China, pp. 617–625 (2010)
Gaussier, E., Renders, J.M., Matveeva, I., Goutte, C., Déjean, H.: A geometric view on bilingual lexicon extraction from comparable corpora. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), Barcelona, Spain, pp. 526–533 (July 2004)
Morin, E., Daille, B., Takeuchi, K., Kageura, K.: Bilingual Terminology Mining – Using Brain, not brawn comparable corpora. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007), Prague, Czech Republic, pp. 664–671 (2007)
Déjean, H., Sadat, F., Gaussier, E.: An approach based on multilingual thesauri and model combination for bilingual lexicon extraction. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), Taipei, Taiwan, pp. 218–224 (2002)
Otero, P.G.: Evaluating two different methods for the task of extracting bilingual lexicons from comparable corpora. In: Proceedings of LREC 2008 Workshop on Comparable Corpora (LREC 2008), Marrakech, Marroco, pp. 19–26 (2008)
Otero, P.G.: Learning bilingual lexicons from comparable english and spanish corpora. In: Proceedings of Machine Translation Summit XI, pp. 191–198 (2007)
Andrade, D., Matsuzaki, T., Tsujii, J.: Effective use of dependency structure for bilingual lexicon creation. In: Gelbukh, A. (ed.) CICLing 2011, Part II. LNCS, vol. 6609, pp. 80–92. Springer, Heidelberg (2011)
Ismail, A., Manandhar, S.: Bilingual lexicon extraction from comparable corpora using indomain terms. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), Beijing, China, pp. 481–489 (2010)
Bouamor, D., Semmar, N., Zweigenbaum, P.: Context vector disambiguation for bilingual lexicon extraction from comparable corpora. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), Sofia, Bulgaria, pp. 759–764 (2013)
Fano, R.M.: Transmission of Information: A Statistical Theory of Communications. MIT Press, Cambridge (1961)
Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1993)
Salton, G., Lesk, M.E.: Computer evaluation of indexing and text processing. Journal of the Association for Computational Machinery 15(1), 8–36 (1968)
Grefenstette, G.: Explorations in Automatic Thesaurus Discovery. Kluwer Academic Publisher, Boston (1994)
Lin, D.: Dependency-based evaluation of minipar. In: Proceedings of the Workshop on the Evaluation of Parsing Systems, First International Conference on Language Resources and Evaluation (LREC 1998), Granada, Spain (1998)
Garera, N., Callison-Burch, C., Yarowsky, D.: Improving translation lexicon induction from monolingual corpora via dependency contexts and part-of-speech equivalences. In: Proceedings of the 13th Conference on Computational Natural Language Learning (CoNLL 2009), Boulder, Colorado, pp. 129–137 (2009)
Otero, P.G.: The meaning of syntactic dependencies. Linguistik Online (2008)
Grefenstette, G.: Corpus-derived first, second and third-order word affinities. In: Proceedings of the 6th Congress of the European Association for Lexicography (EURALEX 1994), Amsterdam, The Netherlands, pp. 279–290 (1994)
Aslam, J.A., Montague, M.: Models for Metasearch. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2001), New Orleans, Louisiana, USA, pp. 276–284 (2001)
Groc, C.D.: Babouk: Focused web crawling for corpus compilation and automatic terminology extraction. In: Proceedings of the IEEE-WICACM International Conferences on Web Intelligence, Lyon, France, pp. 497–498 (2011)
Daille, B., Morin, E.: French-english terminology extraction from comparable corpora. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 707–718. Springer, Heidelberg (2005)
Hazem, A., Morin, E.: Ica for bilingual lexicon extraction from comparable corpora. In: Proceedings of the 5th Workshop on Building and Using Comparable Corpora (BUCC 2012), Istanbul, Turkey (2012)
Manning, D.C., Raghavan, P., Schütze, H.: Introduction to information retrieval. Cambridge University Press (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hazem, A., Morin, E. (2014). Improving Bilingual Lexicon Extraction from Comparable Corpora Using Window-Based and Syntax-Based Models. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_26
Download citation
DOI: https://doi.org/10.1007/978-3-642-54903-8_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54902-1
Online ISBN: 978-3-642-54903-8
eBook Packages: Computer ScienceComputer Science (R0)