Skip to main content
Log in

A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Learning to Rank (LTR) techniques use machine learning to rank documents. In this paper, we propose a new LTR based framework for cross-language information retrieval (CLIR). The core idea of the proposed framework is the use of the knowledge of training queries in the target language as well as the training queries in the source language to extract features and to construct the ranking model instead of using only the training queries in the source language. The proposed framework is composed of two main components. The first component extracts monolingual and cross-lingual features from the queries and the documents. To extract the cross-lingual features, we introduce a general approach based on translation probabilities where translation knowledge, which is created from a combination of probabilistic dictionary extracted from translation resources with the translation knowledge available in the queries in the target language, is used to fill the gap between the documents and the queries. The second component of the proposed framework trains a ranking model to optimize the proposed loss function for an input LTR algorithm, and the features. The new loss function is proposed for any listwise LTR algorithm to construct a ranking model for CLIR. To this end, the loss function of the LTR algorithm is calculated for both training data in the target language and training data in the source language. We propose a linear interpolation of the harmonic mean of two loss functions (monolingual and cross-lingual) and the ratio of these two loss functions as the new loss function. The output of this framework is a cross-lingual ranking model that is created with the goal of minimizing the proposed loss function. Experimental results show that the proposed framework outperforms the baseline information retrieval methods and other LTR ranking models in terms of Mean Average Precision (MAP). The findings also indicate that the use of cross-lingual features considerably increases the efficiency of the framework in terms of MAP and Normalized Discounted Cumulative Gain (NDCG).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. The HAMSHAHRI corpus is a standard collection that has been used in the Ad Hoc Track of the CLEF2008-2009.

  2. CLEF Adhoc Multilingual Task: The evaluation packages are available via the ELRA catalogue (http://catalog.elra.info).

  3. The CLEF Test Suite for the CLEF 2000-2003 Campaigns, catalogue reference: ELRA-E0008.

  4. www.clef-initiative.eu

  5. http://snowball.tartarus.org/

References

  1. Li H (2014) Learning to rank for information retrieval and natural language processing. Synth Lect Human Lang Technol 7(3):1–121

    Article  Google Scholar 

  2. Mitra B, Craswell N (2018) An introduction to neural information retrieval. Found Trends Inf Retr 13(1):1–126

    Article  Google Scholar 

  3. Nie JY (2010) Cross-language information retrieval. Synth Lect Human Lang Technol 3(1):1–125

    Article  Google Scholar 

  4. Ghanbari E, Shakery A (2019b) Query-dependent learning to rank for cross-lingual information retrieval. Knowl Inf Syst 59(3):711–743

    Article  Google Scholar 

  5. AleAhmad A, Amiri H, Darrudi E, Rahgozar M, Oroumchian F (2009) Hamshahri: A standard Persian text collection. Knowl Based Syst 22(5):382–387

    Article  Google Scholar 

  6. Shashua A, Levin A (2002) Ranking with large margin principle: Two approaches. In: Advances in neural information processing systems 15, The MIT Press, pp 937–944

  7. Crammer K, Singer Y (2001) Pranking with ranking. In: Advances in neural information processing systems 14, The MIT Press, pp 641–647

  8. Zhou W, Li J, Zhou Y, Memon MH (2019) Bayesian pairwise learning to rank via one-class collaborative filtering. Neurocomputing 367:176–187

    Article  Google Scholar 

  9. Koppel M, Segner A, Wagener M, Pensel L, Karwath A, Kramer S (2019) Pairwise learning to rank by neural networks revisited: reconstruction, theoretical analysis and practical performance. In: Joint european conference on machine learning and knowledge discovery in databases, Springer, pp 237–252

  10. Oliveira IFD, Ailon N, Davidov O (2018) A new and flexible approach to the analysis of paired comparison data. J Mach Learn Res 19(60):1–29

    MathSciNet  MATH  Google Scholar 

  11. Iaqi M, Xinyang Y, Weijing T, Zhe Z, Lichan H, Ed C, Qiaozhu M (2021) Learning-to-rank with partitioned preference: fast estimation for the plackett-luce model. In: Proceedings of the international conference on artificial intelligence and statistics, PMLR, pp 928–936

  12. Liu D, Li Z, Ma Y, Zhang Y (2020) Listwise learning to rank with extreme order sensitive constraint via cross-correntropy. Concurrency and Computation: Practice and Experience 1–11

  13. Yu HT, Jatowt A, Joho H, Jose JM, Yang X, Chen L (2019) WassRank: Listwise document ranking using optimal transport theory. In: Proceedings of the Twelfth ACM international conference on web search and data mining, ACM, pp 24–32

  14. Chen Y, Duffner S, Stoian A, Dufour JY, Baskurt A (2021) List-wise learning-to-rank with convolutional neural networks for person re-identification. Mach Vis Appl 32(2):1–4

    Article  Google Scholar 

  15. Ghanbari E, Shakery A (2019a) ERR.Rank: An algorithm based on learning to rank for direct optimization of Expected Reciprocal Rank. Appl Intell 49(3):1185–1199

    Article  Google Scholar 

  16. Sharma VK, Mittal N (2018) Cross-lingual information retrieval: A dictionary-based query translation approach. In: Advances in computer and computational sciences, Springer, pp 611–618

  17. Vulic I, Francine Moens M (2015) Monolingual and cross-lingual information retrieval models based on (bilingual) word embeddings. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 363–372

  18. Ruder S, Vulić I (2019) A survey of cross-lingual word embedding models. J Artif Intell Res 65:569–631

    Article  MathSciNet  Google Scholar 

  19. Peng X, Zhou D (2020) A framework for learning cross-lingual word embedding with topics. In: Proceedings of the joint international conference on web and big data asia-pacific web (APWeb) and web-age information management, WAIM, Springer, pp 285–293

  20. Gao W, Niu C, Zhou M, Wong KF (2009) Joint ranking for multilingual web search

  21. Tsai MF, Chen HH, Wang YT (2011) Learning a merge model for multilingual information retrieval. Inf Process Manag 47(5):635–646

    Article  Google Scholar 

  22. Azarbonyad H, Shakery A, Faili H (2012) Using learning to rank approach for parallel corpora based cross language information retrieval. In: Proceedings of the 20th european conference on artificial intelligence, IOS Press, pp 79–84

  23. Sasaki S, Sun S, Schamoni S, Duh K, Inui K (2018) Cross-lingual learning-to-rank with shared representations. In: Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics. Human Language Technologies, Association for Computational Linguistics, pp 458–463

  24. Azarbonyad H, Shakery A, Faili H (2019) A learning to rank approach for cross-language information retrieval exploiting multiple translation resources. Nat Lang Eng 25(3):363–384

    Article  Google Scholar 

  25. Sari S, Adriani M (2014) Learning to rank for determining relevant document in Indonesian-English cross language information retrieval using BM25. In: International conference on advanced computer science and information system, pp 309–314

  26. Schamoni S (2013) Reducing feature space for learning to rank in cross-language information retrieval. PhD thesis, Ruprecht-Karls-University Heidelberg

  27. Cao Y, Hou L, Li J, Liu Z, Li C, Chen X, Dong T (2018) Joint representation learning of cross-lingual words and entities via attentive distant supervision. In: Proceedings of the 2018 conference on empirical methods in natural language processing, association for computational linguistics, pp 227–237

  28. Da San Martino G, Romeo S, Barroón-Cedeño A, Joty S, Maàrquez L, Moschitti A, Nakov P (2017) Cross-language question re-ranking. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 1145–1148

  29. Wang Q, Wu W, Qi Y, Zhao Y (2021) Deep bayesian active learning for learning to rank: A case study in answer selection. IEEE Trans Knowl Data Eng. 4

  30. He T, Li Y, Zou Z, Wu Q (2019) L2R-QA: An open-domain question answering framework. In: International conference on intelligent science and big data engineering, Springer, pp 151– 162

  31. Wan X, Luo F, Sun X, Huang S, Jg Y (2019) Cross-language document summarization via extraction and ranking of multiple summaries. Knowl Inf Syst 58(2):481–499

    Article  Google Scholar 

  32. Awan MN, Beg MO (2021) Top-rank: a topical position rank for extraction and classification of key phrases in text. Comput Speech Lang 65:101–116

    Article  Google Scholar 

  33. Godavarthy A, Fang Y (2016) Cross-language microblog retrieval using latent semantic modeling. In: Proceedings of the 2016 ACM international conference on the theory of information retrieval, ACM, pp 303–306

  34. Rahimi R, Shakery A (2017) Online learning to rank for cross-language information retrieval. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 1033–1036

  35. Ai Q, Yang T, Wang H, Mao J (2021) Unbiased learning to rank: online or offline?. ACM Trans Inf Syst (TOIS) 39(2):1–29

    Article  Google Scholar 

  36. Chang L, Haoyun F, Maarten DR (2019) A contextual-bandit approach to online learning to rank for relevance and diversity. arXiv:http://arxiv.org/abs/191200508

  37. Darwish K, Oard DW (2003) Probabilistic structured query methods. In: Proceedings of the 26th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 338–344

  38. Ferro N, Silvello G (2015) CLEF2000-2014: lessons learnt from Ad Hoc retrieval. In: Proceedings of the 6th Italian information retrieval workshop, CEUR workshop proceedings, CEUR-WS.org pp 2000–2003

  39. Ferro N, Peters C (2019) From multilingual to multimodal: the evolution of CLEF over two decades. Information retrieval evaluation in a changing world: lessons learned from 20 Years of CLEF 3–44

  40. Ferro N, Silvello G (2017) 3.5K runs, 5K topics, 3M assessments and 70M measures: What trends in 10 years of Adhoc-ish CLEF?. Inf Proc Manag 53(1):175–202

    Article  Google Scholar 

  41. Ferro N, Silvello G (2016) The CLEF monolingual grid of points. In: Experimental IR meets multilinguality, multimodality, and interaction: 7th international conference of the clef initiative, Springer, pp 16–27

  42. Mansouri A, Faili H (2012) State-of-the-art english to persian statistical machine translation system. In: Proceedings of the 16th CSI international symposium on artificial intelligence and signal processing, IEEE, pp 174–179

  43. Miangah TM (2009) Constructing a large-scale english-persian parallel corpus. Meta: Journal des traducteurs /Meta:Translators’ Journal 54(1):181–188

    Article  Google Scholar 

  44. Jabbari F, Bakhshaei S, Ziabary SMM, Khadivi S (2012) Developing an open-domain english-farsi translation system using AFEC: Amirkabir bilingual farsi-english corpus. In: Proceedings of the 4th workshop on computational approaches to arabic script-based languages, association for computational linguistics, pp 17–23

  45. Mizan English-Persian Parallel Corpus (2013) Supreme Council of Information and Communication Technology, Tehran, Iran, Retrieved from. http://dadegan.ir/catalog/mizan

  46. Tiedemann J (2012) Parallel Data, Tools and Interfaces in OPUS. In: Lrec, european language resources association (ELRA), pp 2214–2218

  47. Och FJ, Ney H (2003) A Systematic Comparison of Various Statistical Alignment Models. Comput Linguist 29(1):19–51

    Article  Google Scholar 

  48. Xu J, Li H (2007) AdaRank: a boosting algorithm for information retrieval. In: Proceedings of the 30th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 391–398

  49. Cao Z, Qin T, Liu TY, Tsai MF, Li H (2007) Learning to rank: from pairwise approach to listwise approach. In: Proceedings of the 24th international conference on machine learning, ACM, pp 129–136

  50. Voorhees EM, Harman DK (2005) TREC: Experiment and evaluation in information retrieval. The MIT Press

  51. Jarvelin K, Kekalainen J (2002) Cumulated gain-based evaluation of IR techniques. ACM Trans Inf Syst 20(4):422–446

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported in part by a grant from the Institute for Research in Fundamental Sciences (no. CS 1398-4-223).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Azadeh Shakery.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghanbari, E., Shakery, A. A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval. Appl Intell 52, 3156–3174 (2022). https://doi.org/10.1007/s10489-021-02592-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02592-z

Keywords

Navigation