Abstract
Learning to Rank (LTR) techniques use machine learning to rank documents. In this paper, we propose a new LTR based framework for cross-language information retrieval (CLIR). The core idea of the proposed framework is the use of the knowledge of training queries in the target language as well as the training queries in the source language to extract features and to construct the ranking model instead of using only the training queries in the source language. The proposed framework is composed of two main components. The first component extracts monolingual and cross-lingual features from the queries and the documents. To extract the cross-lingual features, we introduce a general approach based on translation probabilities where translation knowledge, which is created from a combination of probabilistic dictionary extracted from translation resources with the translation knowledge available in the queries in the target language, is used to fill the gap between the documents and the queries. The second component of the proposed framework trains a ranking model to optimize the proposed loss function for an input LTR algorithm, and the features. The new loss function is proposed for any listwise LTR algorithm to construct a ranking model for CLIR. To this end, the loss function of the LTR algorithm is calculated for both training data in the target language and training data in the source language. We propose a linear interpolation of the harmonic mean of two loss functions (monolingual and cross-lingual) and the ratio of these two loss functions as the new loss function. The output of this framework is a cross-lingual ranking model that is created with the goal of minimizing the proposed loss function. Experimental results show that the proposed framework outperforms the baseline information retrieval methods and other LTR ranking models in terms of Mean Average Precision (MAP). The findings also indicate that the use of cross-lingual features considerably increases the efficiency of the framework in terms of MAP and Normalized Discounted Cumulative Gain (NDCG).
Similar content being viewed by others
Notes
The HAMSHAHRI corpus is a standard collection that has been used in the Ad Hoc Track of the CLEF2008-2009.
CLEF Adhoc Multilingual Task: The evaluation packages are available via the ELRA catalogue (http://catalog.elra.info).
The CLEF Test Suite for the CLEF 2000-2003 Campaigns, catalogue reference: ELRA-E0008.
References
Li H (2014) Learning to rank for information retrieval and natural language processing. Synth Lect Human Lang Technol 7(3):1–121
Mitra B, Craswell N (2018) An introduction to neural information retrieval. Found Trends Inf Retr 13(1):1–126
Nie JY (2010) Cross-language information retrieval. Synth Lect Human Lang Technol 3(1):1–125
Ghanbari E, Shakery A (2019b) Query-dependent learning to rank for cross-lingual information retrieval. Knowl Inf Syst 59(3):711–743
AleAhmad A, Amiri H, Darrudi E, Rahgozar M, Oroumchian F (2009) Hamshahri: A standard Persian text collection. Knowl Based Syst 22(5):382–387
Shashua A, Levin A (2002) Ranking with large margin principle: Two approaches. In: Advances in neural information processing systems 15, The MIT Press, pp 937–944
Crammer K, Singer Y (2001) Pranking with ranking. In: Advances in neural information processing systems 14, The MIT Press, pp 641–647
Zhou W, Li J, Zhou Y, Memon MH (2019) Bayesian pairwise learning to rank via one-class collaborative filtering. Neurocomputing 367:176–187
Koppel M, Segner A, Wagener M, Pensel L, Karwath A, Kramer S (2019) Pairwise learning to rank by neural networks revisited: reconstruction, theoretical analysis and practical performance. In: Joint european conference on machine learning and knowledge discovery in databases, Springer, pp 237–252
Oliveira IFD, Ailon N, Davidov O (2018) A new and flexible approach to the analysis of paired comparison data. J Mach Learn Res 19(60):1–29
Iaqi M, Xinyang Y, Weijing T, Zhe Z, Lichan H, Ed C, Qiaozhu M (2021) Learning-to-rank with partitioned preference: fast estimation for the plackett-luce model. In: Proceedings of the international conference on artificial intelligence and statistics, PMLR, pp 928–936
Liu D, Li Z, Ma Y, Zhang Y (2020) Listwise learning to rank with extreme order sensitive constraint via cross-correntropy. Concurrency and Computation: Practice and Experience 1–11
Yu HT, Jatowt A, Joho H, Jose JM, Yang X, Chen L (2019) WassRank: Listwise document ranking using optimal transport theory. In: Proceedings of the Twelfth ACM international conference on web search and data mining, ACM, pp 24–32
Chen Y, Duffner S, Stoian A, Dufour JY, Baskurt A (2021) List-wise learning-to-rank with convolutional neural networks for person re-identification. Mach Vis Appl 32(2):1–4
Ghanbari E, Shakery A (2019a) ERR.Rank: An algorithm based on learning to rank for direct optimization of Expected Reciprocal Rank. Appl Intell 49(3):1185–1199
Sharma VK, Mittal N (2018) Cross-lingual information retrieval: A dictionary-based query translation approach. In: Advances in computer and computational sciences, Springer, pp 611–618
Vulic I, Francine Moens M (2015) Monolingual and cross-lingual information retrieval models based on (bilingual) word embeddings. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 363–372
Ruder S, Vulić I (2019) A survey of cross-lingual word embedding models. J Artif Intell Res 65:569–631
Peng X, Zhou D (2020) A framework for learning cross-lingual word embedding with topics. In: Proceedings of the joint international conference on web and big data asia-pacific web (APWeb) and web-age information management, WAIM, Springer, pp 285–293
Gao W, Niu C, Zhou M, Wong KF (2009) Joint ranking for multilingual web search
Tsai MF, Chen HH, Wang YT (2011) Learning a merge model for multilingual information retrieval. Inf Process Manag 47(5):635–646
Azarbonyad H, Shakery A, Faili H (2012) Using learning to rank approach for parallel corpora based cross language information retrieval. In: Proceedings of the 20th european conference on artificial intelligence, IOS Press, pp 79–84
Sasaki S, Sun S, Schamoni S, Duh K, Inui K (2018) Cross-lingual learning-to-rank with shared representations. In: Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics. Human Language Technologies, Association for Computational Linguistics, pp 458–463
Azarbonyad H, Shakery A, Faili H (2019) A learning to rank approach for cross-language information retrieval exploiting multiple translation resources. Nat Lang Eng 25(3):363–384
Sari S, Adriani M (2014) Learning to rank for determining relevant document in Indonesian-English cross language information retrieval using BM25. In: International conference on advanced computer science and information system, pp 309–314
Schamoni S (2013) Reducing feature space for learning to rank in cross-language information retrieval. PhD thesis, Ruprecht-Karls-University Heidelberg
Cao Y, Hou L, Li J, Liu Z, Li C, Chen X, Dong T (2018) Joint representation learning of cross-lingual words and entities via attentive distant supervision. In: Proceedings of the 2018 conference on empirical methods in natural language processing, association for computational linguistics, pp 227–237
Da San Martino G, Romeo S, Barroón-Cedeño A, Joty S, Maàrquez L, Moschitti A, Nakov P (2017) Cross-language question re-ranking. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 1145–1148
Wang Q, Wu W, Qi Y, Zhao Y (2021) Deep bayesian active learning for learning to rank: A case study in answer selection. IEEE Trans Knowl Data Eng. 4
He T, Li Y, Zou Z, Wu Q (2019) L2R-QA: An open-domain question answering framework. In: International conference on intelligent science and big data engineering, Springer, pp 151– 162
Wan X, Luo F, Sun X, Huang S, Jg Y (2019) Cross-language document summarization via extraction and ranking of multiple summaries. Knowl Inf Syst 58(2):481–499
Awan MN, Beg MO (2021) Top-rank: a topical position rank for extraction and classification of key phrases in text. Comput Speech Lang 65:101–116
Godavarthy A, Fang Y (2016) Cross-language microblog retrieval using latent semantic modeling. In: Proceedings of the 2016 ACM international conference on the theory of information retrieval, ACM, pp 303–306
Rahimi R, Shakery A (2017) Online learning to rank for cross-language information retrieval. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 1033–1036
Ai Q, Yang T, Wang H, Mao J (2021) Unbiased learning to rank: online or offline?. ACM Trans Inf Syst (TOIS) 39(2):1–29
Chang L, Haoyun F, Maarten DR (2019) A contextual-bandit approach to online learning to rank for relevance and diversity. arXiv:http://arxiv.org/abs/191200508
Darwish K, Oard DW (2003) Probabilistic structured query methods. In: Proceedings of the 26th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 338–344
Ferro N, Silvello G (2015) CLEF2000-2014: lessons learnt from Ad Hoc retrieval. In: Proceedings of the 6th Italian information retrieval workshop, CEUR workshop proceedings, CEUR-WS.org pp 2000–2003
Ferro N, Peters C (2019) From multilingual to multimodal: the evolution of CLEF over two decades. Information retrieval evaluation in a changing world: lessons learned from 20 Years of CLEF 3–44
Ferro N, Silvello G (2017) 3.5K runs, 5K topics, 3M assessments and 70M measures: What trends in 10 years of Adhoc-ish CLEF?. Inf Proc Manag 53(1):175–202
Ferro N, Silvello G (2016) The CLEF monolingual grid of points. In: Experimental IR meets multilinguality, multimodality, and interaction: 7th international conference of the clef initiative, Springer, pp 16–27
Mansouri A, Faili H (2012) State-of-the-art english to persian statistical machine translation system. In: Proceedings of the 16th CSI international symposium on artificial intelligence and signal processing, IEEE, pp 174–179
Miangah TM (2009) Constructing a large-scale english-persian parallel corpus. Meta: Journal des traducteurs /Meta:Translators’ Journal 54(1):181–188
Jabbari F, Bakhshaei S, Ziabary SMM, Khadivi S (2012) Developing an open-domain english-farsi translation system using AFEC: Amirkabir bilingual farsi-english corpus. In: Proceedings of the 4th workshop on computational approaches to arabic script-based languages, association for computational linguistics, pp 17–23
Mizan English-Persian Parallel Corpus (2013) Supreme Council of Information and Communication Technology, Tehran, Iran, Retrieved from. http://dadegan.ir/catalog/mizan
Tiedemann J (2012) Parallel Data, Tools and Interfaces in OPUS. In: Lrec, european language resources association (ELRA), pp 2214–2218
Och FJ, Ney H (2003) A Systematic Comparison of Various Statistical Alignment Models. Comput Linguist 29(1):19–51
Xu J, Li H (2007) AdaRank: a boosting algorithm for information retrieval. In: Proceedings of the 30th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 391–398
Cao Z, Qin T, Liu TY, Tsai MF, Li H (2007) Learning to rank: from pairwise approach to listwise approach. In: Proceedings of the 24th international conference on machine learning, ACM, pp 129–136
Voorhees EM, Harman DK (2005) TREC: Experiment and evaluation in information retrieval. The MIT Press
Jarvelin K, Kekalainen J (2002) Cumulated gain-based evaluation of IR techniques. ACM Trans Inf Syst 20(4):422–446
Acknowledgements
This research was supported in part by a grant from the Institute for Research in Fundamental Sciences (no. CS 1398-4-223).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ghanbari, E., Shakery, A. A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval. Appl Intell 52, 3156–3174 (2022). https://doi.org/10.1007/s10489-021-02592-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02592-z