Skip to main content

French-English Terminology Extraction from Comparable Corpora

  • Conference paper
Natural Language Processing – IJCNLP 2005 (IJCNLP 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3651))

Included in the following conference series:

  • 1656 Accesses


This article presents a method of extracting bilingual lexica composed of single-word terms (SWTs) and multi-word terms (MWTs) from comparable corpora of a technical domain. First, this method extracts MWTs in each language, and then uses statistical methods to align single words and MWTs by exploiting the term contexts. After explaining the difficulties involved in aligning MWTs and specifying our approach, we show the adopted process for bilingual terminology extraction and the resources used in our experiments. Finally, we evaluate our approach and demonstrate its significance, particularly in relation to non-compositional MWT alignment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others


  1. Cao, Y., Li, H.: Base Noun Phrase Translation Using Web Data and the EM Algorithm. In: Proceeding of the 19th International Conference on Computational Linguistics (COLING 2002), Tapei, Taiwan, pp. 127–133 (2002)

    Google Scholar 

  2. Carl, M., Langlais, P.: An intelligent Terminology Database as a pre-processor for Statistical Machine Translation. In: Chien, L.F., Daille, B., Kageura, L., Nakagawa, H. (eds.) Proceeding of the COLING 2002 2nd International Workshop on Computational Terminology (COMPUTERM 2002), Tapei, Taiwan, pp. 15–21 (2002)

    Google Scholar 

  3. Chiao, Y.C.: Extraction lexicale bilingue à partir de textes médicaux comparables : application à la recherche d’information translangue. PhD thesis, Université Pierre et Marie Curie, Paris VI (2004)

    Google Scholar 

  4. Chiao, Y.C., Zweigenbaum, P.: Looking for candidate translational equivalents in specialized, comparable corpora. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), Tapei, Taiwan, pp. 1208–1212 (2002)

    Google Scholar 

  5. Daille, B.: Conceptual Structuring through Term Variations. In: Bond, F., Korhonen, A., MacCarthy, D., Villacicencio, A. (eds.) Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, pp. 9–16 (2003)

    Google Scholar 

  6. Daille, B.: Terminology Mining. In: Pazienza, M. (ed.) Information Extraction in the Web Era, pp. 29–44. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  7. Daille, B., Gaussier, E., Langé, J.-M.: Towards Automatic Extraction of Monolingual and Bilingual Terminology. In: Proceedings of the 15th International Conference on Computational Linguistics (COLING 1994), vol. 1, pp. 515–521 (1994)

    Google Scholar 

  8. Déjean, H., Sadat, F., Gaussier, E.: An approach based on multilingual thesauri and model combination for bilingual lexicon extraction. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), pp. 218–224 (2002)

    Google Scholar 

  9. Déjean, H., Gaussier, E.: Une nouvelle approche à l’extraction de lexiques bilingues à partir de corpus comparables. Lexicometrica, Alignement lexical dans les corpus multilingues, 1–22 (2002)

    Google Scholar 

  10. Fung, P.: A statistical view on bilingual lexicon extraction: From parallel corpora to non-parallel corpora. In: Farwell, D., Gerber, L., Hovy, E. (eds.) AMTA 1998. LNCS (LNAI), vol. 1529, pp. 1–17. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  11. Gaussier, E., Langé, J.M.: Modèles statistiques pour l’extraction de lexiques bilingues. Traitement Automatique des Langues (TAL) 36, 133–155 (1995)

    Google Scholar 

  12. Jacquemin, C.: Spotting and Discovering Terms through Natural Language Processing. MIT Press, Cambridge (2001)

    Google Scholar 

  13. Melamed, I.D.: Empirical Methods for Exploiting Parallel Texts. MIT Press, Cambridge (2001)

    Google Scholar 

  14. Rapp, R.: Automatic Identification of Word Translations from Unrelated English and German Corpora. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL 1999), pp. 519–526 (1999)

    Google Scholar 

  15. Salton, G., Lesk, M.E.: Computer Evaluation of Indexing and Text Processing. Journal of the Association for Computational Machinery 15, 8–36 (1968)

    MATH  Google Scholar 

  16. Tanimoto, T.T.: An elementary mathematical theory of classification. Technical report, IBM Research (1958)

    Google Scholar 

  17. Veronis, J. (ed.): Parallel Text Processing. Kluwer Academic Publishers, Dordrecht (2000)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations


Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Daille, B., Morin, E. (2005). French-English Terminology Extraction from Comparable Corpora. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg.

Download citation

  • DOI:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29172-5

  • Online ISBN: 978-3-540-31724-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics