A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval

Ghanbari, Elham; Shakery, Azadeh

doi:10.1007/s10489-021-02592-z

A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval

Published: 01 July 2021

Volume 52, pages 3156–3174, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

403 Accesses
3 Citations
Explore all metrics

Abstract

Learning to Rank (LTR) techniques use machine learning to rank documents. In this paper, we propose a new LTR based framework for cross-language information retrieval (CLIR). The core idea of the proposed framework is the use of the knowledge of training queries in the target language as well as the training queries in the source language to extract features and to construct the ranking model instead of using only the training queries in the source language. The proposed framework is composed of two main components. The first component extracts monolingual and cross-lingual features from the queries and the documents. To extract the cross-lingual features, we introduce a general approach based on translation probabilities where translation knowledge, which is created from a combination of probabilistic dictionary extracted from translation resources with the translation knowledge available in the queries in the target language, is used to fill the gap between the documents and the queries. The second component of the proposed framework trains a ranking model to optimize the proposed loss function for an input LTR algorithm, and the features. The new loss function is proposed for any listwise LTR algorithm to construct a ranking model for CLIR. To this end, the loss function of the LTR algorithm is calculated for both training data in the target language and training data in the source language. We propose a linear interpolation of the harmonic mean of two loss functions (monolingual and cross-lingual) and the ratio of these two loss functions as the new loss function. The output of this framework is a cross-lingual ranking model that is created with the goal of minimizing the proposed loss function. Experimental results show that the proposed framework outperforms the baseline information retrieval methods and other LTR ranking models in terms of Mean Average Precision (MAP). The findings also indicate that the use of cross-lingual features considerably increases the efficiency of the framework in terms of MAP and Normalized Discounted Cumulative Gain (NDCG).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Query-dependent learning to rank for cross-lingual information retrieval

Article 04 July 2018

An axiomatic approach to corpus-based cross-language information retrieval

Article 09 April 2020

Multilingual information retrieval in the language modeling framework

Article 06 May 2015

Notes

The HAMSHAHRI corpus is a standard collection that has been used in the Ad Hoc Track of the CLEF2008-2009.
CLEF Adhoc Multilingual Task: The evaluation packages are available via the ELRA catalogue (http://catalog.elra.info).
The CLEF Test Suite for the CLEF 2000-2003 Campaigns, catalogue reference: ELRA-E0008.
www.clef-initiative.eu
http://snowball.tartarus.org/

References

Li H (2014) Learning to rank for information retrieval and natural language processing. Synth Lect Human Lang Technol 7(3):1–121
Article Google Scholar
Mitra B, Craswell N (2018) An introduction to neural information retrieval. Found Trends Inf Retr 13(1):1–126
Article Google Scholar
Nie JY (2010) Cross-language information retrieval. Synth Lect Human Lang Technol 3(1):1–125
Article Google Scholar
Ghanbari E, Shakery A (2019b) Query-dependent learning to rank for cross-lingual information retrieval. Knowl Inf Syst 59(3):711–743
Article Google Scholar
AleAhmad A, Amiri H, Darrudi E, Rahgozar M, Oroumchian F (2009) Hamshahri: A standard Persian text collection. Knowl Based Syst 22(5):382–387
Article Google Scholar
Shashua A, Levin A (2002) Ranking with large margin principle: Two approaches. In: Advances in neural information processing systems 15, The MIT Press, pp 937–944
Crammer K, Singer Y (2001) Pranking with ranking. In: Advances in neural information processing systems 14, The MIT Press, pp 641–647
Zhou W, Li J, Zhou Y, Memon MH (2019) Bayesian pairwise learning to rank via one-class collaborative filtering. Neurocomputing 367:176–187
Article Google Scholar
Koppel M, Segner A, Wagener M, Pensel L, Karwath A, Kramer S (2019) Pairwise learning to rank by neural networks revisited: reconstruction, theoretical analysis and practical performance. In: Joint european conference on machine learning and knowledge discovery in databases, Springer, pp 237–252
Oliveira IFD, Ailon N, Davidov O (2018) A new and flexible approach to the analysis of paired comparison data. J Mach Learn Res 19(60):1–29
MathSciNet MATH Google Scholar
Iaqi M, Xinyang Y, Weijing T, Zhe Z, Lichan H, Ed C, Qiaozhu M (2021) Learning-to-rank with partitioned preference: fast estimation for the plackett-luce model. In: Proceedings of the international conference on artificial intelligence and statistics, PMLR, pp 928–936
Liu D, Li Z, Ma Y, Zhang Y (2020) Listwise learning to rank with extreme order sensitive constraint via cross-correntropy. Concurrency and Computation: Practice and Experience 1–11
Yu HT, Jatowt A, Joho H, Jose JM, Yang X, Chen L (2019) WassRank: Listwise document ranking using optimal transport theory. In: Proceedings of the Twelfth ACM international conference on web search and data mining, ACM, pp 24–32
Chen Y, Duffner S, Stoian A, Dufour JY, Baskurt A (2021) List-wise learning-to-rank with convolutional neural networks for person re-identification. Mach Vis Appl 32(2):1–4
Article Google Scholar
Ghanbari E, Shakery A (2019a) ERR.Rank: An algorithm based on learning to rank for direct optimization of Expected Reciprocal Rank. Appl Intell 49(3):1185–1199
Article Google Scholar
Sharma VK, Mittal N (2018) Cross-lingual information retrieval: A dictionary-based query translation approach. In: Advances in computer and computational sciences, Springer, pp 611–618
Vulic I, Francine Moens M (2015) Monolingual and cross-lingual information retrieval models based on (bilingual) word embeddings. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 363–372
Ruder S, Vulić I (2019) A survey of cross-lingual word embedding models. J Artif Intell Res 65:569–631
Article MathSciNet Google Scholar
Peng X, Zhou D (2020) A framework for learning cross-lingual word embedding with topics. In: Proceedings of the joint international conference on web and big data asia-pacific web (APWeb) and web-age information management, WAIM, Springer, pp 285–293
Gao W, Niu C, Zhou M, Wong KF (2009) Joint ranking for multilingual web search
Tsai MF, Chen HH, Wang YT (2011) Learning a merge model for multilingual information retrieval. Inf Process Manag 47(5):635–646
Article Google Scholar
Azarbonyad H, Shakery A, Faili H (2012) Using learning to rank approach for parallel corpora based cross language information retrieval. In: Proceedings of the 20th european conference on artificial intelligence, IOS Press, pp 79–84
Sasaki S, Sun S, Schamoni S, Duh K, Inui K (2018) Cross-lingual learning-to-rank with shared representations. In: Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics. Human Language Technologies, Association for Computational Linguistics, pp 458–463
Azarbonyad H, Shakery A, Faili H (2019) A learning to rank approach for cross-language information retrieval exploiting multiple translation resources. Nat Lang Eng 25(3):363–384
Article Google Scholar
Sari S, Adriani M (2014) Learning to rank for determining relevant document in Indonesian-English cross language information retrieval using BM25. In: International conference on advanced computer science and information system, pp 309–314
Schamoni S (2013) Reducing feature space for learning to rank in cross-language information retrieval. PhD thesis, Ruprecht-Karls-University Heidelberg
Cao Y, Hou L, Li J, Liu Z, Li C, Chen X, Dong T (2018) Joint representation learning of cross-lingual words and entities via attentive distant supervision. In: Proceedings of the 2018 conference on empirical methods in natural language processing, association for computational linguistics, pp 227–237
Da San Martino G, Romeo S, Barroón-Cedeño A, Joty S, Maàrquez L, Moschitti A, Nakov P (2017) Cross-language question re-ranking. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 1145–1148
Wang Q, Wu W, Qi Y, Zhao Y (2021) Deep bayesian active learning for learning to rank: A case study in answer selection. IEEE Trans Knowl Data Eng. 4
He T, Li Y, Zou Z, Wu Q (2019) L2R-QA: An open-domain question answering framework. In: International conference on intelligent science and big data engineering, Springer, pp 151– 162
Wan X, Luo F, Sun X, Huang S, Jg Y (2019) Cross-language document summarization via extraction and ranking of multiple summaries. Knowl Inf Syst 58(2):481–499
Article Google Scholar
Awan MN, Beg MO (2021) Top-rank: a topical position rank for extraction and classification of key phrases in text. Comput Speech Lang 65:101–116
Article Google Scholar
Godavarthy A, Fang Y (2016) Cross-language microblog retrieval using latent semantic modeling. In: Proceedings of the 2016 ACM international conference on the theory of information retrieval, ACM, pp 303–306
Rahimi R, Shakery A (2017) Online learning to rank for cross-language information retrieval. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 1033–1036
Ai Q, Yang T, Wang H, Mao J (2021) Unbiased learning to rank: online or offline?. ACM Trans Inf Syst (TOIS) 39(2):1–29
Article Google Scholar
Chang L, Haoyun F, Maarten DR (2019) A contextual-bandit approach to online learning to rank for relevance and diversity. arXiv:http://arxiv.org/abs/191200508
Darwish K, Oard DW (2003) Probabilistic structured query methods. In: Proceedings of the 26th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 338–344
Ferro N, Silvello G (2015) CLEF2000-2014: lessons learnt from Ad Hoc retrieval. In: Proceedings of the 6th Italian information retrieval workshop, CEUR workshop proceedings, CEUR-WS.org pp 2000–2003
Ferro N, Peters C (2019) From multilingual to multimodal: the evolution of CLEF over two decades. Information retrieval evaluation in a changing world: lessons learned from 20 Years of CLEF 3–44
Ferro N, Silvello G (2017) 3.5K runs, 5K topics, 3M assessments and 70M measures: What trends in 10 years of Adhoc-ish CLEF?. Inf Proc Manag 53(1):175–202
Article Google Scholar
Ferro N, Silvello G (2016) The CLEF monolingual grid of points. In: Experimental IR meets multilinguality, multimodality, and interaction: 7th international conference of the clef initiative, Springer, pp 16–27
Mansouri A, Faili H (2012) State-of-the-art english to persian statistical machine translation system. In: Proceedings of the 16th CSI international symposium on artificial intelligence and signal processing, IEEE, pp 174–179
Miangah TM (2009) Constructing a large-scale english-persian parallel corpus. Meta: Journal des traducteurs /Meta:Translators’ Journal 54(1):181–188
Article Google Scholar
Jabbari F, Bakhshaei S, Ziabary SMM, Khadivi S (2012) Developing an open-domain english-farsi translation system using AFEC: Amirkabir bilingual farsi-english corpus. In: Proceedings of the 4th workshop on computational approaches to arabic script-based languages, association for computational linguistics, pp 17–23
Mizan English-Persian Parallel Corpus (2013) Supreme Council of Information and Communication Technology, Tehran, Iran, Retrieved from. http://dadegan.ir/catalog/mizan
Tiedemann J (2012) Parallel Data, Tools and Interfaces in OPUS. In: Lrec, european language resources association (ELRA), pp 2214–2218
Och FJ, Ney H (2003) A Systematic Comparison of Various Statistical Alignment Models. Comput Linguist 29(1):19–51
Article Google Scholar
Xu J, Li H (2007) AdaRank: a boosting algorithm for information retrieval. In: Proceedings of the 30th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 391–398
Cao Z, Qin T, Liu TY, Tsai MF, Li H (2007) Learning to rank: from pairwise approach to listwise approach. In: Proceedings of the 24th international conference on machine learning, ACM, pp 129–136
Voorhees EM, Harman DK (2005) TREC: Experiment and evaluation in information retrieval. The MIT Press
Jarvelin K, Kekalainen J (2002) Cumulated gain-based evaluation of IR techniques. ACM Trans Inf Syst 20(4):422–446
Article Google Scholar

Download references

Acknowledgements

This research was supported in part by a grant from the Institute for Research in Fundamental Sciences (no. CS 1398-4-223).

Author information

Authors and Affiliations

School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran
Elham Ghanbari & Azadeh Shakery
Department of Computer Engineering, Yadegar-e-Imam Khomeini (RAH) Shahre Rey Branch, Islamic Azad University, Tehran, Iran
Elham Ghanbari
School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
Azadeh Shakery

Authors

Elham Ghanbari
View author publications
You can also search for this author in PubMed Google Scholar
Azadeh Shakery
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Azadeh Shakery.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghanbari, E., Shakery, A. A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval. Appl Intell 52, 3156–3174 (2022). https://doi.org/10.1007/s10489-021-02592-z

Download citation

Accepted: 02 June 2021
Published: 01 July 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s10489-021-02592-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval

Abstract

Access this article

Similar content being viewed by others

Query-dependent learning to rank for cross-lingual information retrieval

An axiomatic approach to corpus-based cross-language information retrieval

Multilingual information retrieval in the language modeling framework

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval

Abstract

Access this article

Similar content being viewed by others

Query-dependent learning to rank for cross-lingual information retrieval

An axiomatic approach to corpus-based cross-language information retrieval

Multilingual information retrieval in the language modeling framework

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation