Abstract
This paper describes and evaluates different retrieval strategies that are useful for search operations on document collections written in various European languages, namely French, Italian, Spanish and German. We also suggest and evaluate different query translation schemes based on freely available translation resources. In order to cross language barriers, we propose a combined query translation approach that has resulted in interesting retrieval effectiveness. Finally, we suggest a collection merging strategy based on logistic regression that tends to perform better than other merging approaches.
Article PDF
Similar content being viewed by others
References
Amati G, Carpineto C and Romano G (2003) Italian monolingual information retrieval with PROSIT. In: Peters C, Braschler M, Gonzalo J and Kluck M, Eds. Cross-Language Information Retrieval and Evaluation, Springer-Verlag, Berlin, 2003, pp. 257–264.
Ballesteros L and Croft WB (1998) Resolving ambiguity for cross-language retrieval. In: Croft WB, Moffat A, van Rijsbergen CJ, Wilkinson R and Zobel J, Eds. Proceedings of the 21st International Conference of the ACM-SIGIR 1998, The ACM Press, New York, pp. 64–71.
Brand R and Brünner M (2003) Océ at CLEF 2002. In: Peters C, Braschler M, Gonzalo J and Kluck M, Eds. Cross-Language Information Retrieval and Evaluation, Springer-Verlag, Berlin, 2003, pp. 59–65.
Braschler M and Schäuble P (2001) Experiments with the Eurospider retrieval system for CLEF 2000. In: Peters C, Ed. Cross-Language Information Retrieval and Evaluation, LNCS #2069, Springer-Verlag, Berlin, 2001, pp. 140–148.
Braschler M and Peters C (2002) CLEF methodology and metrics. In: Peters C, Braschler M, Gonzalo J and Kluck M, Eds. Cross-Language Information Retrieval and Evaluation, LNCS #2406, Springer-Verlag, Berlin, 2002. pp. 394–404.
Braschler M, Ripplinger B and Schäuble P (2002) Experiments with the Eurospider retrieval system for CLEF 2001. In: Peters C, Braschler M, Gonzalo J and Kluck M, Eds. Cross-Language Information Retrieval and Evaluation, LNCS #2406, Springer-Verlag, Berlin, 2002, pp. 102–117.
Braschler M, Göhring A and Schäuble P (2003) Eurospider at CLEF 2002. In: Peters C, Braschler M, Gonzalo J and Kluck M, Eds. Cross-Language Information Retrieval and Evaluation, Springer-Verlag, Berlin, 2003, pp. 164–174.
Buckley C, Singhal A, Mitra M and Salton G (1996) New retrieval approaches using SMART. In: Harman DK, Ed. Proceedings of TREC-4, NIST Publication #500-236, Gaithersburg, 1996, pp. 25-48.
Callan JP, Lu Z and Croft, WB (1995) Searching distributed collections with inference networks. In: Fox EA, Ingwersen P and Fidel R., Eds. Proceedings of the 18th International Conference of the ACM-SIGIR The ACM Press, New York, pp. 21–28.
Callan JP (2000) Distributed information retrieval. In: Croft WB, Ed. Advances in Information Retrieval, Kluwer, Boston, pp. 127–150.
Chen A (2002) Multilingual information retrieval using English and Chinese queries. In: Peters C, Braschler M, Gonzalo J and Kluck M, Eds. Cross-Language Information Retrieval and Evaluation, LNCS #2406, Springer-Verlag, Berlin, pp. 44–58.
Chen A (2003) Cross-language retrieval experiments at CLEF-2002. In: Peters C, Braschler M, Gonzalo J and Kluck M, Eds. Cross-Language Information Retrieval and Evaluation, Springer-Verlag, Berlin, 2003, pp. 28–48.
Dumais ST (1994) Latent semantic indexing (LSI) and TREC-2. In: Harman DK, Ed. Proceedings TREC-2, NIST Publication #500-215, Gaithersburg, pp. 105-115.
Efron B and Tibshirani RJ (1993) An Introduction to the Bootstrap. Chapman & Hall, New-York.
Figuerola CG, Gómez R and Zazo Rodríguez AF (2002) Spanish monolingual track: the impact of stemming on retrieval. In: Peters C, Braschler M, Gonzalo J and Kluck M, Eds. Cross-Language Information Retrieval and Evaluation, LNCS #2406, Springer-Verlag, Berlin, pp. 253–261.
Fox EA and Shaw JA (1994) Combination of multiple searches. In: Harman DK, Ed. Proceedings TREC-2, NIST Publication #500-215, Gaithersburg, pp. 243-249.
Gachot DA, Lange E and Yang J (1998) The SYSTRAN NLP browser: an application of machine translation technology. In: Grefenstette G, Ed. Cross-Language Information Retrieval, Kluwer, Boston, pp. 105–118.
Gey F, Jiang H, Petras V and Chen A (2001) Cross-language retrieval for the CLEF collections-comparing multiple methods of retrieval. In: Peters C, Ed. Cross-Language Information Retrieval and Evaluation, LNCS #2069, Springer-Verlag, Berlin, pp. 116–128.
Gey FC, Jiang H and Perelman N (2002) Working with Russian queries for the GIRT, bilingual and multilingual CLEF tasks. In: Peters C, Braschler M, Gonzalo J and Kluck M, eds. Cross-Language Information Retrieval and Evaluation, LNCS #2406, Springer-Verlag, Berlin, pp. 235–243.
Harter SP (1975) A probabilistic approach to automatic keyword indexing: Part I. On the distribution of specialty words in a technical literature. Journal of the American Society for Information Science, 26:197–206.
Hiemstra D, Kraaij W, Pohlmann R and Westerveld T (2001) Translation resources, merging strategies, and relevance feedback for cross-language information retrieval. In: Peters C, Ed. Cross-Language Information Retrieval and Evaluation, LNCS #2069, Springer-Verlag, Berlin, pp. 102–115.
Hosmer DW and Lemeshow S (2000) Applied Logistic Regression, 2nd Edn. John Wiley & Sons, New York.
Hull D (1993) Using statistical testing in the evaluation of retrieval experiments. In: Korfhage R, Rasmussen E and Willett P, Eds. Proceedings of the 16th International Conference of the ACM-SIGIR'93, The ACM Press, New York, pp. 329–338.
Kleinbaum DG and Klein M (2002) Logistic Regression, 2nd edn. Springer-Verlag, New York.
Kraaij W (2002) TNO at CLEF 2001: Comparing translation resources. In: Peters C, Braschler M, Gonzalo J and Kluck M, Eds. Cross-Language Information Retrieval and Evaluation, LNCS #2406, Springer-Verlag, Berlin, pp. 78–93.
Kwok KL, Grunfeld L and Lewis DD (1995). TREC-3 ad-hoc, routing retrieval and thresholding experiments using PIRCS. In: Harman DK, Ed. Proceedings TREC-3, NIST Publication #500-225, Gaithersburg, 1995, pp. 247-255.
Kwok KL, Grunfeld L, Dinstl N and Chan M (2001) TREC-9 cross-language, web and question-answering track experiments using PIRCS. In: Voorhees EM and Harman DK, Eds. Proceedings TREC-9. NIST Publication #500-249, Gaithersburg, pp. 417-426.
Le Calvé A and Savoy J (2000) Database merging strategy based on logistic regression. Information Processing & Management, 36:341–359.
Lovins JB (1968) Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11:22–31.
MacFarlane A (2003) PLIERS and Snowball at CLEF 2002. In: Peters C, Braschler M, Gonzalo J and Kluck M, Eds. Cross-Language Information Retrieval and Evaluation. Springer-Verlag, Berlin, 2003, pp. 321–335.
McNamee P, Mayfield J and Piatko C (2001) A language-independent approach to European text retrieval. In: Peters C, Ed. Cross-Language Information Retrieval and Evaluation, LNCS #2069, Springer-Verlag, Berlin, pp. 129–139.
McNamee P and Mayfield J (2002) JHU/APL experiments at CLEF: translation resources and score normalization. In: Peters C, Braschler M, Gonzalo J and Kluck M, Eds. Cross-Language Information Retrieval and Evaluation, LNCS #2406, Springer-Verlag, Berlin, pp. 193–208.
Martínez-Santiago F, Martín M A and Ureña A (2003) SINAI at CLEF 2002: experiments with merging strategies. In: Peters C, Braschler M, Gonzalo J and Kluck M, Eds. Cross-Language Information Retrieval and Evaluation. Springer-Verlag, Berlin, 2003, pp. 187–196.
Molina-Salgado H, Moulinier I, Knudson M, Lund E and Sekhon K (2002) Thomson legal and regulatory at CLEF 2001: Monolingual and bilingual experiments. In: Peters C, Braschler M, Gonzalo J and Kluck M, Eds. Cross-Language Information Retrieval and Evaluation, LNCS #2406, Springer-Verlag, Berlin, pp. 226–234.
Monz C and de Rijke M (2002) Shallow morphological analysis in monolingual information retrieval for Dutch, German and Italian. In: Peters C, Braschler M, Gonzalo J and Kluck M, Eds. Cross-Language Information Retrieval and Evaluation, LNCS #2406, Springer-Verlag, Berlin, 2002, pp. 262–277.
Nie JY, Simard M, Isabelle P and Durand R (1999) Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the web. In: Hearst M, Gey F and Tong R, Eds. Proceedings of the 22nd International Conference of the ACM-SIGIR 1999, The ACM Press, New York, pp. 74–81.
Nie JY, Simard M and Forster G (2001) Multilingual information retrieval based on parallel texts from the web. In: Peters C, Ed. Cross-Language Information Retrieval and Evaluation, LNCS #2069, Springer-Verlag, Berlin, pp. 188–201.
Nie J and Simard M (2002) Using statistical translation models for bilingual IR. In: Peters C, Braschler M, Gonzalo J and Kluck M, Eds. Cross-Language Information Retrieval and Evaluation, LNCS #2406, Springer-Verlag, Berlin, pp. 137–150.
Oard DW, Levow G-A and Cabezas CI (2001) CLEF experiments at Maryland: statistical stemming and backoff translation. In: Peters C, Ed. Cross-Language Information Retrieval and Evaluation, LNCS #2069, Springer-Verlag, Berlin, pp. 176–187.
Porter MF (1980) An algorithm for suffix stripping. Program, 14:130–137.
Powell AL, French JC, Callan J, Connell M and Viles CL (2000) The impact of database selection on distributed searching. In: Belkin NJ, Ingwersen P and Leong M-K, Eds. Proceedings of the 23rd International Conference of the ACM-SIGIR 2000, The ACM Press, New York, pp. 232–239.
Robertson SE and Sparck Jones K. (1976) Relevance weighting of search terms. Journal of the American Society for Information Science, 27:129–146.
Robertson SE and Walker S (1994) Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In: Croft WB and van Rijsbergen CJ, Eds. Proceedings of the 17th International Conference of the ACM-SIGIR'94,Springer-Verlag, London, pp. 232–241.
Robertson SE, Walker S and Beaulieu M (2000) Experimentation as a way of life: Okapi at TREC. Information Processing & Management, 36:95–108.
Salton G and McGill MJ (1983) Introduction to Modern Information Retrieval. McGraw-Hill, New York.
Salton G and Buckley C (1988) Term weighting approaches in automatic text retrieval. Information Processing & Management, 24:513–523.
Savoy J (1997) Statistical inference in retrieval effectiveness evaluation. Information Processing & Management, 33:495–512.
Savoy J (1999) A stemming procedure and stopword list for general French corpora. Journal of the American Society for Information Science, 50:944–952.
Savoy J and Rasolofo Y (2001) Report on the TREC-9 experiment: link-based retrieval and distributed collections. In: Voorhees EM and Harman DK, Eds. Proceedings TREC-9, NIST Publication #500-249, Gaithersburg, pp. 579-588.
Savoy J (2002a) Report on CLEF-2001 experiments: effective combined query-translation approach. In: Peters C, Braschler M, Gonzalo J and Kluck M, Eds. Cross-Language Information Retrieval and Evaluation, LNCS #2406, Springer-Verlag, Berlin, pp. 27–43.
Savoy J (2002b) Recherche d'informations dans des corpus en langue française: Utilisation du référentiel Amaryllis. TSI, Technique et Science Informatiques, 21:345–373.
Savoy J (2003) Report on CLEF-2002 experiments: combining multiple sources of evidence. In: Peters C, Braschler M, Gonzalo J and Kluck M, Eds. Cross-Language Information Retrieval and Evaluation. Springer-Verlag, Berlin, 2003, pp. 66–90.
Singhal A, Choi J, Hindle D, Lewis DD and Pereira F (1999) AT&T at TREC-7. In: Voorhees EM and Harman DK, Eds. Proceedings TREC-7, NIST Publication #500-242, Gaithersburg, pp. 239-251.
Sproat R (1992) Morphology and Computation. The MIT Press, Cambridge.
van Rijsbergen CJ (1979) Information Retrieval, 2nd edn. Butterworths, London.
Voorhees EM, Gupta NK and Johnson-Laird B (1995) The collection fusion problem. In: Harman DK, Ed. Proceedings TREC-3, NIST Publication #500-225, Gaithersburg, pp. 95-104.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Savoy, J. Combining Multiple Strategies for Effective Monolingual and Cross-Language Retrieval. Information Retrieval 7, 121–148 (2004). https://doi.org/10.1023/B:INRT.0000009443.51912.e7
Issue Date:
DOI: https://doi.org/10.1023/B:INRT.0000009443.51912.e7