Abstract
We present a new approach based on neural networks to solve the merging strategy problem for Cross-Lingual Information Retrieval (CLIR). In addition to language barrier issues in CLIR systems, how to merge a ranked list that contains documents in different languages from several text collections is also critical. We propose a merging strategy based on competitive learning to obtain a single ranking of documents merging the individual lists from the separate retrieved documents. The main contribution of the paper is to show the effectiveness of the Learning Vector Quantization (LVQ) algorithm in solving the merging problem. In order to investigate the effects of varying the number of codebook vectors, we have carried out several experiments with different values for this parameter. The results demonstrate that the LVQ algorithm is a good alternative merging strategy.
Similar content being viewed by others
References
Chen, A.: Cross-Language Retrieval Experiments at CLEF-2002, In: C. Peters (ed.), Proceedings of the CLEF 2002 Cross-Language Text Retrieval System Evaluation Campaign. Lecture Notes in Computer Science, pp. 5–20, 2003.
N. Cristianini J. Shawe-Taylor (2000) An Introduction to Support Vector Machines Cambridge University Press CA
Dumais, S.: Latent Semantic Indexing (LSI) and TREC-2, In: NIST (ed.), Proceedings of TREC’2, Vol. 500. Gaithersburg, pp. 105–115, 1994.
W. Frakes R. Baeza-Yates (Eds) (1992) Information Retrieval: Data, Structures and Algorithm Prentice Hall NJ
Genkin, A., Lewis, D. D. and Madigan, D.: Large-Scale Bayesian Logistic Regression for Text Categorization. Technical report, 2004.
G. Grefenstette (1998) Cross-Language Information Retrieval Kluwer academic publishers Boston, USA
Joachims, T.: Learning to Classify Text Using Support Vector Machines. The Netherlands Kluwer, 2002.
T. Kohonen (1995) Self-organization and Associative Memory EditionNumber2 Springer Verlag Berlin
T. Kohonen J. Hynninen J. Kangas J. Laaksonen K. Torkkola (1996) LVQ-PAK: The Learning Vector Quantization Program Package University of Technology, Laboratory of Computer and Information Science Helsinki, Finland
A. Le Calvé J. Savoy (2000) ArticleTitleDatabase merging strategy based on logistic regression Information Processing and Management 36 341–359 Occurrence Handle10.1016/S0306-4573(99)00036-9
C. Manning H. Schtze (Eds) (2000) Foundations of Statistical Natural Language Processing MIT Press MA
Neumann, G.: Morphix Software Package, http://www.dfki.de/ñeumann/morphix/morphix.html, 2003.
Powell, A. L., French, J. C., Callan, J., Connell, M. and Viles, C. L.: The impact of database selection on distributed searching, In: T. A. Press (ed.), Proceedings of the 23rd International Conference of the ACM-SIGIR’2000. New York, pp. 232–239, 2000.
S. E. Robertson S. Walker M. Beaulieu (2000) ArticleTitleExperimentation as a Way of Life: Okapi at TREC Information Processing and Management 1 IssueID36 95–108 Occurrence Handle10.1016/S0306-4573(99)00046-1
G. Salton M. J. McGill (1983) Introduction to Modern Information Retrieval McGraw-Hill London, U.K.
Savoy, J.: Report on CLEF-2001 Experiments, In: C. Peters (ed.) Proceedings of the CLEF 2001 Cross-Language Text Retrieval System Evaluation Campaign. Lecture Notes in Computer Science. pp. 27–43, 2002.
Savoy, J.: Report on CLEF-2002 Experiments: Combining Multiple Sources of Evidence, In: C. Peters (ed.), Proceedings of the CLEF 2002 Cross-Language Text Retrieval System Evaluation Campaign. Lecture Notes in Computer Science. pp. 31–46, 2003.
Towell, G., Voorhees, E., Gupta, N. and Johnson-Laird, B. Learning Collection Fusion Strategies for Information Retrieval, In: Proceedings Twelfth Anual Machine Learning Conference, 1995.
Voorhees, E., Gupta, N. and Jhonson-Laird, B. The collection fusion problem, In: NIST (ed.), Proceedings of the 3th Text Retrieval Conference TREC-3, Vol. 500. Gaithersburg, pp. 95–104, 1995.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Martín-Valdivia, M.T., Martínez-Santiago, F. & Ureña-López, L.A. Merging Strategy for Cross-Lingual Information Retrieval Systems based on Learning Vector Quantization. Neural Process Lett 22, 149–161 (2005). https://doi.org/10.1007/s11063-005-2659-y
Issue Date:
DOI: https://doi.org/10.1007/s11063-005-2659-y