Abstract
In this paper, we describe the approach we used in the Cross-Language Evaluation Forum CLEF 2002, and more specifically in the GIRT Task. The approach is based on (1) the extraction of two bilingual lexicons, one from parallel corpora and the other one from comparable corpora, (2) the optimal combination of these bilingual lexicons for Cross-Language Information Retrieval and (3) the combination with monolingual IR on parallel corpora. While our original submission to CLEF2002 was restricted to short queries (using only the title field), we present here the results extended to complete queries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Braschler, M., Peters, C.: CLEF 2002: Methodology and Metrics. Lecture Notes for Computer Science Series. This volume. 363
Gale, W. A., Church, K. W.: A program for aligning sentences in bilingual corpora. In: Meeting of the Association for Computational Linguistics. (1991) 177-184 364
Brown, P., Lai, J., Mercer, R.: Aligning sentences in parallel corpora. In: Proceedings of the 29th Annual Meeting for the Association of Computational Linguistics. (1996) 169-176 364
Kay, M., Röscheisen, M.: Test-translation alignment. Computational Linguistics 19 (1993) 121–142 364
Brown, P., Pietra, S.D., Pietra, V. D., Mercer, R.: The mathematics of statistical machine learning translation: Parameter estimation. Computational Linguistics 19 (1993) 263–311 364
Hiemstra, D.: Using statistical methods to create a bilingual dictionary. Master’s thesis, Universiteit Twente (1996) 364
Melamed, I.D.: A word-to-word model of translational equivalence. In: Proceedings of the Thirty-Fifth Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics. (1997) 490-497 364
Gaussier, E.: Flow network models for word alignment and terminology extraction from bilingual corpora. In: Proceedings of the joint 17th International Conference on Computational Linguistics and 26th Annual Meeting of the Association for Computational Linguistics. (1998) 444-450 364
Hull, D.: Automating the constuction of bilingual terminology lexicons. Terminlogy 5 (1997) 364
Bishop, Y., Fienberg, S., Holland, P.: Discrete Multivariate Analysis. MIT Press (1975) 364
Rapp, R.: Identifying word translations in nonparallel texts. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics. (1995) 365
Peters, C., Picchi, E.: Capturing the comparable: A system for querying comparable text corpora. In: JADT’95 — 3rd International Conference on Statistical Analysis of Textual Data. (1995) 255-262 365
Tanaka, K., Iwasaki, H.: Extraction of lexical translations from non-aligned corpora. In: International Conference on Computational Linguistics, COLING’96. (1996) 365
Shahzad, I., Ohtake, K., Masuyama, S., Yamamoto, K.: Identifying translations of compound nouns using non-aligned corpora. In: Proceedings of the Workshop MAL’99. (1999) pp. 108-113 365
Fung, P.: A statistical view on bilingual lexicon extraction: From parallel corpora to non-parallel corpora. In Véronis, J., ed.: Parallel Text Processing. (2000) 365
Salton, G., McGill, J.: Introduction to Modern Information Retrieval. New York, McGraw-Hill (1983) 365
Déjean, H., Gaussier, E.: Une nouvelle approche l’extraction de lexiques bilingues partir de corpus comparables. lexicometrica (2002) 366
Déjean, H., Gaussier, E., Sadat, F.: Bilingual terminology extraction: An approach based on multilingual thesaurus applicable to comparable corpora. In: International Conference on Computational Linguistics, Coling’02. (2002) 366
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Renders, JM., Déjean, H., Gaussier, É. (2003). Assessing Automatically Extracted Bilingual Lexicons for CLIR in Vertical Domains: XRCE Participation in the GIRT Track of CLEF 2002. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds) Advances in Cross-Language Information Retrieval. CLEF 2002. Lecture Notes in Computer Science, vol 2785. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45237-9_32
Download citation
DOI: https://doi.org/10.1007/978-3-540-45237-9_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40830-7
Online ISBN: 978-3-540-45237-9
eBook Packages: Springer Book Archive