Skip to main content
Log in

Merging Strategy for Cross-Lingual Information Retrieval Systems based on Learning Vector Quantization

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

We present a new approach based on neural networks to solve the merging strategy problem for Cross-Lingual Information Retrieval (CLIR). In addition to language barrier issues in CLIR systems, how to merge a ranked list that contains documents in different languages from several text collections is also critical. We propose a merging strategy based on competitive learning to obtain a single ranking of documents merging the individual lists from the separate retrieved documents. The main contribution of the paper is to show the effectiveness of the Learning Vector Quantization (LVQ) algorithm in solving the merging problem. In order to investigate the effects of varying the number of codebook vectors, we have carried out several experiments with different values for this parameter. The results demonstrate that the LVQ algorithm is a good alternative merging strategy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Chen, A.: Cross-Language Retrieval Experiments at CLEF-2002, In: C. Peters (ed.), Proceedings of the CLEF 2002 Cross-Language Text Retrieval System Evaluation Campaign. Lecture Notes in Computer Science, pp. 5–20, 2003.

  2. N. Cristianini J. Shawe-Taylor (2000) An Introduction to Support Vector Machines Cambridge University Press CA

    Google Scholar 

  3. Dumais, S.: Latent Semantic Indexing (LSI) and TREC-2, In: NIST (ed.), Proceedings of TREC’2, Vol. 500. Gaithersburg, pp. 105–115, 1994.

  4. W. Frakes R. Baeza-Yates (Eds) (1992) Information Retrieval: Data, Structures and Algorithm Prentice Hall NJ

    Google Scholar 

  5. Genkin, A., Lewis, D. D. and Madigan, D.: Large-Scale Bayesian Logistic Regression for Text Categorization. Technical report, 2004.

  6. G. Grefenstette (1998) Cross-Language Information Retrieval Kluwer academic publishers Boston, USA

    Google Scholar 

  7. Joachims, T.: Learning to Classify Text Using Support Vector Machines. The Netherlands Kluwer, 2002.

  8. T. Kohonen (1995) Self-organization and Associative Memory EditionNumber2 Springer Verlag Berlin

    Google Scholar 

  9. T. Kohonen J. Hynninen J. Kangas J. Laaksonen K. Torkkola (1996) LVQ-PAK: The Learning Vector Quantization Program Package University of Technology, Laboratory of Computer and Information Science Helsinki, Finland

    Google Scholar 

  10. A. Le Calvé J. Savoy (2000) ArticleTitleDatabase merging strategy based on logistic regression Information Processing and Management 36 341–359 Occurrence Handle10.1016/S0306-4573(99)00036-9

    Article  Google Scholar 

  11. C. Manning H. Schtze (Eds) (2000) Foundations of Statistical Natural Language Processing MIT Press MA

    Google Scholar 

  12. Neumann, G.: Morphix Software Package, http://www.dfki.de/ñeumann/morphix/morphix.html, 2003.

  13. Powell, A. L., French, J. C., Callan, J., Connell, M. and Viles, C. L.: The impact of database selection on distributed searching, In: T. A. Press (ed.), Proceedings of the 23rd International Conference of the ACM-SIGIR’2000. New York, pp. 232–239, 2000.

  14. S. E. Robertson S. Walker M. Beaulieu (2000) ArticleTitleExperimentation as a Way of Life: Okapi at TREC Information Processing and Management 1 IssueID36 95–108 Occurrence Handle10.1016/S0306-4573(99)00046-1

    Article  Google Scholar 

  15. G. Salton M. J. McGill (1983) Introduction to Modern Information Retrieval McGraw-Hill London, U.K.

    Google Scholar 

  16. Savoy, J.: Report on CLEF-2001 Experiments, In: C. Peters (ed.) Proceedings of the CLEF 2001 Cross-Language Text Retrieval System Evaluation Campaign. Lecture Notes in Computer Science. pp. 27–43, 2002.

  17. Savoy, J.: Report on CLEF-2002 Experiments: Combining Multiple Sources of Evidence, In: C. Peters (ed.), Proceedings of the CLEF 2002 Cross-Language Text Retrieval System Evaluation Campaign. Lecture Notes in Computer Science. pp. 31–46, 2003.

  18. Towell, G., Voorhees, E., Gupta, N. and Johnson-Laird, B. Learning Collection Fusion Strategies for Information Retrieval, In: Proceedings Twelfth Anual Machine Learning Conference, 1995.

  19. Voorhees, E., Gupta, N. and Jhonson-Laird, B. The collection fusion problem, In: NIST (ed.), Proceedings of the 3th Text Retrieval Conference TREC-3, Vol. 500. Gaithersburg, pp. 95–104, 1995.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. T. Martín-Valdivia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Martín-Valdivia, M.T., Martínez-Santiago, F. & Ureña-López, L.A. Merging Strategy for Cross-Lingual Information Retrieval Systems based on Learning Vector Quantization. Neural Process Lett 22, 149–161 (2005). https://doi.org/10.1007/s11063-005-2659-y

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-005-2659-y

Keywords

Navigation