Report on CLEF-2003 Experiments: Two Ways of Extracting Multilingual Resources from Corpora

Cancedda, Nicola; Déjean, Hervé; Gaussier, Éric; Renders, Jean-Michel; Vinokourov, Alexei

doi:10.1007/978-3-540-30222-3_9

Nicola Cancedda¹⁹,
Hervé Déjean¹⁹,
Éric Gaussier¹⁹,
Jean-Michel Renders¹⁹ &
…
Alexei Vinokourov²⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3237))

Included in the following conference series:

Workshop of the Cross-Language Evaluation Forum for European Languages

399 Accesses

Abstract

We present two main approaches to cross-language information retrieval based on the exploitation of multilingual corpora to derive cross-lingual term-term correspondences. These two approaches are evaluated in the framework of the multilingual-4 (ML4) task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Experiments on Cross-Language Information Retrieval Using Comparable Corpora of Chinese, Japanese, and Korean Languages

Lessons Learnt from Experiments on the Ad Hoc Multilingual Test Collections at CLEF

Adjusting Machine Translation Datasets for Document-Level Cross-Language Information Retrieval: Methodology

References

Hull, D., Grefenstette, G.: Querying across Languages: a Dictionary-Based Approach to Multilingual Information Retrieval. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (1996)
Google Scholar
Ballesteros, L., Croft, B.W.: Phrasal Translation and Query Expansion Techniques for Cross-Language Information Retrieval. In: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (1997)
Google Scholar
Davis, M.W., Ogden, W.C.: QUILT: Implementing a Large-Scale Cross-Language Text Retrieval System. In: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (1997)
Google Scholar
Gey, F.C., Jiang, H., Petras, V., Chen, A.: Cross-Language Retrieval for the CLEF Collections - Comparing Multiple Methods of Retrieval. In: Peters, C. (ed.) CLEF 2000. LNCS, vol. 2069, pp. 116–128. Springer, Heidelberg (2001)
Chapter Google Scholar
Savoy, J.: Report on CLEF-2002 Experiments: Combining Multiple Sources of Evidence. In: Peters, C., Braschler, M., Gonzalo, J. (eds.) CLEF 2002. LNCS, vol. 2785, pp. 66–90. Springer, Heidelberg (2003)
Chapter Google Scholar
Nie, J.-Y., Simard, M., Isabelle, P., Durand, R.: Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (1999)
Google Scholar
Brown, P., Della Pietra, S., Della Pietra, V., Mercer, R.L.: The Mathematics of Statistical Machine Learning Translation: Parameter Estimation. Computational Linguistics 19(2), 263–311 (1993)
Google Scholar
Peters, C., Picchi, E.: Capturing the Comparable: A System for Querying Comparable Text Corpora. In: Bolasco, S., Lebart, L., Salem, A. (eds.) JADT 1995 - 3rd International Conference on Statistical Analysis of Textual Data, pp. 255–262 (1995)
Google Scholar
Littman, M.L., Dumais, S.T., Landauer, T.K.: Automatic cross-language information retrieval using latent semantic indexing. In: Grefenstette, G. (ed.) Cross language information retrieval. Kluwer, Dordrecht (1998)
Google Scholar
Bach, F.R., Jordan, M.I.: Kernel indepedendent component analysis. Journal of Machine Learning Research 3, 1–48 (2002)
Article MathSciNet Google Scholar
Lai, P.L., Fyfe, C.: Kernel and nonlinear canonical correlation analysis. International Journal of Neural Systems 10(5), 365–377 (2000)
Google Scholar
Vinokourov, A., Shawe-Taylor, J., Cristianini, N.: Inferring a Semantic Representation of Text via Cross-Language Correlation Analysis. In: Advances of Neural Information Processing Systems, vol. 15 (2002)
Google Scholar
Germann, U.: Aligned Hansards of the 36th Parliament of Canada (2001) (Release 2001-1a), http://www.isi.edu/natural-language/download/hansard/
Dagan, I., Itai, I.: Word Sense Disambiguation using a Second Language Monolingual Corpus. Computational Linguistics 2(4) (1994)
Google Scholar
Gale, W.A., Church, K.W.: A Program for Aligning Sentences in Bilingual Corpora. In: Meeting of the Association for Computational Linguistics, pp. 177–184 (1991)
Google Scholar
Gaussier, E.: Flow Network Models for Word ALignment and Terminology Extraction from Bilingual Corpora. In: Proceedings of the joint 17th International Conference on Computational Linguistics and 26th Annual Meeting of the Association for Computational Linguistics, pp. 444–450 (1998)
Google Scholar
Hiemstra, D.: Using Statistical Methods to create a Bilingual Dictionary. Masters Thesis. Universiteit Twente (1996)
Google Scholar
Hull, D.: Automating the constuction of bilingual terminology lexicons. Terminlogy 5(2) (1997)
Google Scholar
Gaussier, E., Hull, D., Ait-Mokhtar, S.: Term Alignment in Use: Machine-Aided Human Translation. In: Véronis, J. (ed.) Parallel Text Processing Alignment and Use of Translation Corpora. Kluwer Academic Publishers, Dordrecht (2000)
Google Scholar
Bishop, Y., Fienberg, S., Holland, P.: Discrete Multivariate Analysis. MIT Press, Cambridge (1975)
MATH Google Scholar
Tanaka, K., Iwasaki, H.: Extraction of Lexical Translations from Non-Aligned Corpora. In: International Conference on Computational Linguistics, COLING 1996 (1996)
Google Scholar
Shahzad, I., Ohtake, K., Masuyama, S., Yamamoto, K.: Identifying Translations of Compound Nouns Using Non-aligned Corpora. In: Proceedings of the Workshop MAL 1999, pp. 108–113 (1999)
Google Scholar
Rapp, R.: Automatic Identification of Word Translations from Unrelated English and German Corpora. In: Proceedings of the European Association for Computational Linguistics (1999)
Google Scholar
Fung, P.: A Statistical View on Bilingual Lexicon Extraction: From parallel corpora to non-parallel corpora. In: Véronis, J. (ed.) Parallel Text Processing. Alignment and Use of Translation Corpora. Kluwer Academic Publishers, Dordrecht (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Xerox Research Centre, Europe
Nicola Cancedda, Hervé Déjean, Éric Gaussier & Jean-Michel Renders
Royal Holloway University of London,
Alexei Vinokourov

Authors

Nicola Cancedda
View author publications
You can also search for this author in PubMed Google Scholar
Hervé Déjean
View author publications
You can also search for this author in PubMed Google Scholar
Éric Gaussier
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Michel Renders
View author publications
You can also search for this author in PubMed Google Scholar
Alexei Vinokourov
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ISTI-CNR, Area di Ricerca, Pisa, Italy
Carol Peters
No Affiliations,
Julio Gonzalo & Martin Braschler &
German Institute for International and Security Affairs, Stiftung Wissenschaft und Politik (SWP), Ludwigkirchplatz 3-4, 10719, Berlin, Germany
Michael Kluck

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cancedda, N., Déjean, H., Gaussier, É., Renders, JM., Vinokourov, A. (2004). Report on CLEF-2003 Experiments: Two Ways of Extracting Multilingual Resources from Corpora. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds) Comparative Evaluation of Multilingual Information Access Systems. CLEF 2003. Lecture Notes in Computer Science, vol 3237. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30222-3_9

Download citation

DOI: https://doi.org/10.1007/978-3-540-30222-3_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24017-4
Online ISBN: 978-3-540-30222-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Report on CLEF-2003 Experiments: Two Ways of Extracting Multilingual Resources from Corpora

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Experiments on Cross-Language Information Retrieval Using Comparable Corpora of Chinese, Japanese, and Korean Languages

Lessons Learnt from Experiments on the Ad Hoc Multilingual Test Collections at CLEF

Adjusting Machine Translation Datasets for Document-Level Cross-Language Information Retrieval: Methodology

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Report on CLEF-2003 Experiments: Two Ways of Extracting Multilingual Resources from Corpora

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Experiments on Cross-Language Information Retrieval Using Comparable Corpora of Chinese, Japanese, and Korean Languages

Lessons Learnt from Experiments on the Ad Hoc Multilingual Test Collections at CLEF

Adjusting Machine Translation Datasets for Document-Level Cross-Language Information Retrieval: Methodology

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation