Skip to main content

Assessing Automatically Extracted Bilingual Lexicons for CLIR in Vertical Domains: XRCE Participation in the GIRT Track of CLEF 2002

  • Conference paper
Advances in Cross-Language Information Retrieval (CLEF 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2785))

Included in the following conference series:

Abstract

In this paper, we describe the approach we used in the Cross-Language Evaluation Forum CLEF 2002, and more specifically in the GIRT Task. The approach is based on (1) the extraction of two bilingual lexicons, one from parallel corpora and the other one from comparable corpora, (2) the optimal combination of these bilingual lexicons for Cross-Language Information Retrieval and (3) the combination with monolingual IR on parallel corpora. While our original submission to CLEF2002 was restricted to short queries (using only the title field), we present here the results extended to complete queries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Braschler, M., Peters, C.: CLEF 2002: Methodology and Metrics. Lecture Notes for Computer Science Series. This volume. 363

    Google Scholar 

  2. Gale, W. A., Church, K. W.: A program for aligning sentences in bilingual corpora. In: Meeting of the Association for Computational Linguistics. (1991) 177-184 364

    Google Scholar 

  3. Brown, P., Lai, J., Mercer, R.: Aligning sentences in parallel corpora. In: Proceedings of the 29th Annual Meeting for the Association of Computational Linguistics. (1996) 169-176 364

    Google Scholar 

  4. Kay, M., Röscheisen, M.: Test-translation alignment. Computational Linguistics 19 (1993) 121–142 364

    Google Scholar 

  5. Brown, P., Pietra, S.D., Pietra, V. D., Mercer, R.: The mathematics of statistical machine learning translation: Parameter estimation. Computational Linguistics 19 (1993) 263–311 364

    Google Scholar 

  6. Hiemstra, D.: Using statistical methods to create a bilingual dictionary. Master’s thesis, Universiteit Twente (1996) 364

    Google Scholar 

  7. Melamed, I.D.: A word-to-word model of translational equivalence. In: Proceedings of the Thirty-Fifth Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics. (1997) 490-497 364

    Google Scholar 

  8. Gaussier, E.: Flow network models for word alignment and terminology extraction from bilingual corpora. In: Proceedings of the joint 17th International Conference on Computational Linguistics and 26th Annual Meeting of the Association for Computational Linguistics. (1998) 444-450 364

    Google Scholar 

  9. Hull, D.: Automating the constuction of bilingual terminology lexicons. Terminlogy 5 (1997) 364

    Google Scholar 

  10. Bishop, Y., Fienberg, S., Holland, P.: Discrete Multivariate Analysis. MIT Press (1975) 364

    Google Scholar 

  11. Rapp, R.: Identifying word translations in nonparallel texts. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics. (1995) 365

    Google Scholar 

  12. Peters, C., Picchi, E.: Capturing the comparable: A system for querying comparable text corpora. In: JADT’95 — 3rd International Conference on Statistical Analysis of Textual Data. (1995) 255-262 365

    Google Scholar 

  13. Tanaka, K., Iwasaki, H.: Extraction of lexical translations from non-aligned corpora. In: International Conference on Computational Linguistics, COLING’96. (1996) 365

    Google Scholar 

  14. Shahzad, I., Ohtake, K., Masuyama, S., Yamamoto, K.: Identifying translations of compound nouns using non-aligned corpora. In: Proceedings of the Workshop MAL’99. (1999) pp. 108-113 365

    Google Scholar 

  15. Fung, P.: A statistical view on bilingual lexicon extraction: From parallel corpora to non-parallel corpora. In Véronis, J., ed.: Parallel Text Processing. (2000) 365

    Google Scholar 

  16. Salton, G., McGill, J.: Introduction to Modern Information Retrieval. New York, McGraw-Hill (1983) 365

    MATH  Google Scholar 

  17. Déjean, H., Gaussier, E.: Une nouvelle approche l’extraction de lexiques bilingues partir de corpus comparables. lexicometrica (2002) 366

    Google Scholar 

  18. Déjean, H., Gaussier, E., Sadat, F.: Bilingual terminology extraction: An approach based on multilingual thesaurus applicable to comparable corpora. In: International Conference on Computational Linguistics, Coling’02. (2002) 366

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Renders, JM., Déjean, H., Gaussier, É. (2003). Assessing Automatically Extracted Bilingual Lexicons for CLIR in Vertical Domains: XRCE Participation in the GIRT Track of CLEF 2002. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds) Advances in Cross-Language Information Retrieval. CLEF 2002. Lecture Notes in Computer Science, vol 2785. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45237-9_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45237-9_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40830-7

  • Online ISBN: 978-3-540-45237-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics