Improving Bilingual Lexicon Extraction from Comparable Corpora Using Window-Based and Syntax-Based Models

Hazem, Amir; Morin, Emmanuel

doi:10.1007/978-3-642-54903-8_26

Amir Hazem¹⁷ &
Emmanuel Morin¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8404))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

Abstract

This paper proposes two strategies for combining a window-based and a syntax-based context representation for the task of bilingual lexicon extraction from comparable corpora. The first strategy involves combining the scores assigned to translations by both models and using them for ranking and selection; the second strategy involves a combination of the context features provided by the two models prior to applying the lexicon extraction method. The reported results show that the combination of the two context representations significantly improves the performance of bilingual lexicon extraction compared to using each of the representations individually.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Fung, P.: A statistical view on bilingual lexicon extraction: From parallel corpora to non-parallel corpora. In: Farwell, D., Gerber, L., Hovy, E. (eds.) AMTA 1998. LNCS (LNAI), vol. 1529, pp. 1–17. Springer, Heidelberg (1998)
Chapter Google Scholar
Rapp, R.: Automatic identification of word translations from unrelated english and german corpora. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL 1999), College Park, MD, USA, pp. 519–526 (1999)
Google Scholar
Chiao, Y.C., Zweigenbaum, P.: Looking for candidate translational equivalents in specialized, comparable corpora. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), Tapei, Taiwan, pp. 1208–1212 (2002)
Google Scholar
Prochasson, E., Morin, E.: Anchor points for bilingual extraction from small specialized comparable corpora. TAL 50(1), 283–304 (2009)
Google Scholar
Yu, K., Tsujii, J.: Extracting bilingual dictionary from comparable corpora with dependency heterogeneity. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL-Short 2009, Boulder, Colorado, Companion Volume: Short Papers, pp. 121–124 (2009)
Google Scholar
Laroche, A., Langlais, P.: Revisiting context-based projection methods for term-translation spotting in comparable corpora. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), Beijing, China, pp. 617–625 (2010)
Google Scholar
Gaussier, E., Renders, J.M., Matveeva, I., Goutte, C., Déjean, H.: A geometric view on bilingual lexicon extraction from comparable corpora. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), Barcelona, Spain, pp. 526–533 (July 2004)
Google Scholar
Morin, E., Daille, B., Takeuchi, K., Kageura, K.: Bilingual Terminology Mining – Using Brain, not brawn comparable corpora. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007), Prague, Czech Republic, pp. 664–671 (2007)
Google Scholar
Déjean, H., Sadat, F., Gaussier, E.: An approach based on multilingual thesauri and model combination for bilingual lexicon extraction. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), Taipei, Taiwan, pp. 218–224 (2002)
Google Scholar
Otero, P.G.: Evaluating two different methods for the task of extracting bilingual lexicons from comparable corpora. In: Proceedings of LREC 2008 Workshop on Comparable Corpora (LREC 2008), Marrakech, Marroco, pp. 19–26 (2008)
Google Scholar
Otero, P.G.: Learning bilingual lexicons from comparable english and spanish corpora. In: Proceedings of Machine Translation Summit XI, pp. 191–198 (2007)
Google Scholar
Andrade, D., Matsuzaki, T., Tsujii, J.: Effective use of dependency structure for bilingual lexicon creation. In: Gelbukh, A. (ed.) CICLing 2011, Part II. LNCS, vol. 6609, pp. 80–92. Springer, Heidelberg (2011)
Google Scholar
Ismail, A., Manandhar, S.: Bilingual lexicon extraction from comparable corpora using indomain terms. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), Beijing, China, pp. 481–489 (2010)
Google Scholar
Bouamor, D., Semmar, N., Zweigenbaum, P.: Context vector disambiguation for bilingual lexicon extraction from comparable corpora. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), Sofia, Bulgaria, pp. 759–764 (2013)
Google Scholar
Fano, R.M.: Transmission of Information: A Statistical Theory of Communications. MIT Press, Cambridge (1961)
Google Scholar
Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1993)
Google Scholar
Salton, G., Lesk, M.E.: Computer evaluation of indexing and text processing. Journal of the Association for Computational Machinery 15(1), 8–36 (1968)
Article MATH Google Scholar
Grefenstette, G.: Explorations in Automatic Thesaurus Discovery. Kluwer Academic Publisher, Boston (1994)
Google Scholar
Lin, D.: Dependency-based evaluation of minipar. In: Proceedings of the Workshop on the Evaluation of Parsing Systems, First International Conference on Language Resources and Evaluation (LREC 1998), Granada, Spain (1998)
Google Scholar
Garera, N., Callison-Burch, C., Yarowsky, D.: Improving translation lexicon induction from monolingual corpora via dependency contexts and part-of-speech equivalences. In: Proceedings of the 13th Conference on Computational Natural Language Learning (CoNLL 2009), Boulder, Colorado, pp. 129–137 (2009)
Google Scholar
Otero, P.G.: The meaning of syntactic dependencies. Linguistik Online (2008)
Google Scholar
Grefenstette, G.: Corpus-derived first, second and third-order word affinities. In: Proceedings of the 6th Congress of the European Association for Lexicography (EURALEX 1994), Amsterdam, The Netherlands, pp. 279–290 (1994)
Google Scholar
Aslam, J.A., Montague, M.: Models for Metasearch. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2001), New Orleans, Louisiana, USA, pp. 276–284 (2001)
Google Scholar
Groc, C.D.: Babouk: Focused web crawling for corpus compilation and automatic terminology extraction. In: Proceedings of the IEEE-WICACM International Conferences on Web Intelligence, Lyon, France, pp. 497–498 (2011)
Google Scholar
Daille, B., Morin, E.: French-english terminology extraction from comparable corpora. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 707–718. Springer, Heidelberg (2005)
Chapter Google Scholar
Hazem, A., Morin, E.: Ica for bilingual lexicon extraction from comparable corpora. In: Proceedings of the 5th Workshop on Building and Using Comparable Corpora (BUCC 2012), Istanbul, Turkey (2012)
Google Scholar
Manning, D.C., Raghavan, P., Schütze, H.: Introduction to information retrieval. Cambridge University Press (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire d’Informatique de Nantes-Atlantique (LINA), Université de Nantes, 44322, Nantes Cedex 3, France
Amir Hazem & Emmanuel Morin

Authors

Amir Hazem
View author publications
You can also search for this author in PubMed Google Scholar
Emmanuel Morin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, Av. Juan Dios Bátiz, Col. Nueva Industrial Vallejo, 07738, Mexico D.F, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hazem, A., Morin, E. (2014). Improving Bilingual Lexicon Extraction from Comparable Corpora Using Window-Based and Syntax-Based Models. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_26

Download citation

DOI: https://doi.org/10.1007/978-3-642-54903-8_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54902-1
Online ISBN: 978-3-642-54903-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics