Abstract
This article presents a method of extracting bilingual lexica composed of single-word terms (SWTs) and multi-word terms (MWTs) from comparable corpora of a technical domain. First, this method extracts MWTs in each language, and then uses statistical methods to align single words and MWTs by exploiting the term contexts. After explaining the difficulties involved in aligning MWTs and specifying our approach, we show the adopted process for bilingual terminology extraction and the resources used in our experiments. Finally, we evaluate our approach and demonstrate its significance, particularly in relation to non-compositional MWT alignment.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Cao, Y., Li, H.: Base Noun Phrase Translation Using Web Data and the EM Algorithm. In: Proceeding of the 19th International Conference on Computational Linguistics (COLING 2002), Tapei, Taiwan, pp. 127–133 (2002)
Carl, M., Langlais, P.: An intelligent Terminology Database as a pre-processor for Statistical Machine Translation. In: Chien, L.F., Daille, B., Kageura, L., Nakagawa, H. (eds.) Proceeding of the COLING 2002 2nd International Workshop on Computational Terminology (COMPUTERM 2002), Tapei, Taiwan, pp. 15–21 (2002)
Chiao, Y.C.: Extraction lexicale bilingue à partir de textes médicaux comparables : application à la recherche d’information translangue. PhD thesis, Université Pierre et Marie Curie, Paris VI (2004)
Chiao, Y.C., Zweigenbaum, P.: Looking for candidate translational equivalents in specialized, comparable corpora. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), Tapei, Taiwan, pp. 1208–1212 (2002)
Daille, B.: Conceptual Structuring through Term Variations. In: Bond, F., Korhonen, A., MacCarthy, D., Villacicencio, A. (eds.) Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, pp. 9–16 (2003)
Daille, B.: Terminology Mining. In: Pazienza, M. (ed.) Information Extraction in the Web Era, pp. 29–44. Springer, Heidelberg (2003)
Daille, B., Gaussier, E., Langé, J.-M.: Towards Automatic Extraction of Monolingual and Bilingual Terminology. In: Proceedings of the 15th International Conference on Computational Linguistics (COLING 1994), vol. 1, pp. 515–521 (1994)
Déjean, H., Sadat, F., Gaussier, E.: An approach based on multilingual thesauri and model combination for bilingual lexicon extraction. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), pp. 218–224 (2002)
Déjean, H., Gaussier, E.: Une nouvelle approche à l’extraction de lexiques bilingues à partir de corpus comparables. Lexicometrica, Alignement lexical dans les corpus multilingues, 1–22 (2002)
Fung, P.: A statistical view on bilingual lexicon extraction: From parallel corpora to non-parallel corpora. In: Farwell, D., Gerber, L., Hovy, E. (eds.) AMTA 1998. LNCS (LNAI), vol. 1529, pp. 1–17. Springer, Heidelberg (1998)
Gaussier, E., Langé, J.M.: Modèles statistiques pour l’extraction de lexiques bilingues. Traitement Automatique des Langues (TAL) 36, 133–155 (1995)
Jacquemin, C.: Spotting and Discovering Terms through Natural Language Processing. MIT Press, Cambridge (2001)
Melamed, I.D.: Empirical Methods for Exploiting Parallel Texts. MIT Press, Cambridge (2001)
Rapp, R.: Automatic Identification of Word Translations from Unrelated English and German Corpora. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL 1999), pp. 519–526 (1999)
Salton, G., Lesk, M.E.: Computer Evaluation of Indexing and Text Processing. Journal of the Association for Computational Machinery 15, 8–36 (1968)
Tanimoto, T.T.: An elementary mathematical theory of classification. Technical report, IBM Research (1958)
Veronis, J. (ed.): Parallel Text Processing. Kluwer Academic Publishers, Dordrecht (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Daille, B., Morin, E. (2005). French-English Terminology Extraction from Comparable Corpora. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_62
Download citation
DOI: https://doi.org/10.1007/11562214_62
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29172-5
Online ISBN: 978-3-540-31724-1
eBook Packages: Computer ScienceComputer Science (R0)