Query-dependent learning to rank for cross-lingual information retrieval

Ghanbari, Elham; Shakery, Azadeh

doi:10.1007/s10115-018-1232-8

Query-dependent learning to rank for cross-lingual information retrieval

Regular Paper
Published: 04 July 2018

Volume 59, pages 711–743, (2019)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Elham Ghanbari¹ &
Azadeh Shakery^1,2

507 Accesses
7 Citations
Explore all metrics

Abstract

Learning to rank (LTR), as a machine learning technique for ranking tasks, has become one of the most popular research topics in the area of information retrieval (IR). Cross-lingual information retrieval (CLIR), in which the language of the query is different from the language of the documents, is one of the important IR tasks that can potentially benefit from LTR. Our focus in this paper is the use of LTR for CLIR. To rank the documents in the target language in response to the query in the source language, we propose a local query-dependent approach based on LTR for CLIR, which is called LQ-DLTR for CLIR. The core idea of LQ-DLTR for CLIR is the use of the local characteristics of similar queries to construct the LTR model, instead of using a single global ranking model for all queries. Since the query and the documents are in different languages, the traditional features that are used in LTR cannot be used directly for CLIR. Thus, defining appropriate features is a major step in the use of LTR for CLIR. In this paper, three categories of cross-lingual features are defined: query–document features, document features, and query features. To define the cross-lingual features, translation resources are used to fill the gap between the documents and the queries. Then, in LQ-DLTR for CLIR, a neighborhood of similar queries based on cross-lingual query features is used to create a local ranking function by the LTR algorithm for a given query. The LTR algorithm uses two cross-lingual feature sets, namely document features and query–document features, to learn the model. The query features that are used to identify the neighbors are not involved in the learning phase. Experimental results indicate that the CLIR performance improves with the use of cross-lingual features that use several translations and their probabilities to compute the features, compared to the use of monolingual features in traditional LTR, which translate a query according to the best translation and ignore the probabilities. Moreover, experimental results show that LQ-DLTR for CLIR outperforms the baseline information retrieval methods and other LTR ranking models in terms of the MAP and NDCG measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval

Article 01 July 2021

An axiomatic approach to corpus-based cross-language information retrieval

Article 09 April 2020

Multilingual information retrieval in the language modeling framework

Article 06 May 2015

Notes

The HAMSHAHRI corpus is a standard collection that has been used in the Ad Hoc Track of the CLEF2008 and 2009.
CLEF Adhoc Multilingual Task: The evaluation packages are available via the ELRA catalogue (http://catalog.elra.info).
The CLEF Test Suite for the CLEF2000–2003 Campaigns, catalogue reference: ELRA-E0008.
www.clef-initiative.eu.
http://snowball.tartarus.org/.
http://lemurproject.org/ranklib.php.

References

AleAhmad A, Amiri H, Darrudi E, Rahgozar M, Oroumchian F (2009) Hamshahri: a standard Persian text collection. Knowl-Based Syst 22(5):382–387
Article Google Scholar
Amini MR, Usunier N, Goutte C (2009) Learning from multiple partially observed views-an application to multilingual text categorization. In: Advances in neural information processing systems 22. The MIT Press, pp 28–36
Azarbonyad H, Shakery A, Faili H (2012) Using learning to rank approach for parallel corpora based cross language information retrieval. In: Proceedings of the 20th European conference on artificial intelligence. IOS Press, pp 79–84
Azarbonyad H, Shakery A, Faili H (2013) Exploiting multiple translation resources for English-Persian cross language information retrieval. In: Information access evaluation. Multilinguality, multimodality, and visualization: 4th international conference of the CLEF initiative. Springer, pp 93–99
Brin S, Page L (1998) The anatomy of a large-scale hypertextual Web search engine. Comput Netw ISDN Syst 30(1–7):107–117
Article Google Scholar
Cao Z, Qin T, Liu TY, Tsai MF, Li H (2007) Learning to rank: from pairwise approach to listwise approach. In: Proceedings of the 24th international conference on machine learning. ACM, pp 129–136
Cronen-Townsend S, Zhou Y, Croft WB (2002) Predicting query performance. In: Proceedings of the 25th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 299–306
Dadashkarimi J, Shakery A, Faili H (2014) A probabilistic translation method for dictionary-based cross-lingual information retrieval in agglutinative languages. arXiv preprint arXiv:1411.1006
Darwish K, Oard DW (2003) Probabilistic structured query methods. In: Proceedings of the 26th international ACM SIGIR conference on research and development in informaiton retrieval. ACM, pp 338–344
Ferro N, Silvello G (2016a) 3.5K runs, 5K topics, 3M assessments and 70M measures: What trends in 10 years of Adhoc-ish CLEF? Info Process Manag 53(1):175–202
Article Google Scholar
Ferro N, Silvello G (2016b) The CLEF monolingual grid of points. In: Information access evaluation. Multilinguality, multimodality, and interaction: 7th international conference of the CLEF initiative. Springer, pp 16–27
Gao W, Blitzer J, Zhou M, Wong KF (2009) Exploiting bilingual information to improve web search. In: Proceedings of the 47th annual meeting of the association for computational linguistics and the 4th international joint conference on natural language processing (ACL-IJCNLP). Association for Computational Linguistics, pp 1075–1083
Gao W, Niu C, Zhou M, Wong KF (2009) Joint ranking for multilingual web search. In: Proceedings of the 31st European conference on IR research. Springer, pp 114–125
Geng X, Liu TY, Qin T, Arnold A, Li H, Shum HY (2008) Query dependent ranking using k-nearest neighbor. In: Proceedings of the 31st international ACM SIGIR conference on research and development in information retrieval. ACM, pp 115–122
He B, Ounis I (2004) Inferring query performance using pre-retrieval predictors. In: Proceedings of the 10th symposium on string processing and information retrieval. Springer, pp 43–54
Hedlund T, Airio E, Keskustalo H, Lehtokangas R, Pirkola A, Jarvelin K (2004) Dictionary-based cross-language information retrieval: learning experiences from CLEF 20002002. Inf Retr 7(1/2):99–119
Article Google Scholar
Herbert B, Szarva G, Gurevych I (2011) Combining query translation techniques to improve cross-language information retrieval. In: Proceedings of the 33rd European conference on IR research. Springer, pp 712–715
Hieber F (2015) Translation-based ranking in cross-language information retrieval. Ph.D. thesis, Department of Computational Linguistics, Heidelberg University
Jabbari F, Bakhshaei S, Ziabary SMM, Khadivi S (2012) Developing an open-domain English-Farsi translation system using AFEC: Amirkabir Bilingual Farsi-English Corpus. In: Proceedings of the 4th workshop on computational approaches to Arabic script-based Languages. ACM, pp 17–23
Jarvelin K, Kekalainen J (2002) Cumulated gain-based evaluation of IR techniques. ACM Trans Inf Syst 20(4):422–446
Article Google Scholar
Kashefi O (2018) MIZAN: a large persian-english parallel corpus. arXiv preprint arXiv:1801.02107
Kim S, Ko Y, Oard DW (2015) Combining lexical and statistical translation evidence for cross-language information retrieval. J Assoc Inf Sci Technol 66(1):23–39
Article Google Scholar
Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Proceedings of the 10th machine translation summit, pp 79–86
Kraaij W, De Jong F (2004) Transitive probabilistic CLIR models. In: Proceedings of the 7th international RIAO conference, CID, pp 69–81
Kraaij W, Westerveld T, Hiemstra D (2002) The importance of prior probabilities for entry page search. In: Proceedings of the 25th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 27–34
Li H (2014) Learning to rank for information retrieval and natural language processing. Synth Lect Hum Lang Technol 7(3):1–121
Article Google Scholar
Liu TY (2011) Learning to rank for information retrieval. Springer, Berlin
Book MATH Google Scholar
Lwin PHM (2012) Query dependent ranking for information retrieval based on query clustering. Int J Inf Commun Technol 2(1):25–30
Google Scholar
Manning CD, Raghavan P, Schutze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
Book MATH Google Scholar
Mansouri A, Faili H (2012) State-of-the-art English to Persian statistical machine translation system. In: Proceedings of the 16th CSI international symposium on artificial intelligence and signal processing. IEEE, pp 174–179
Miangah TM (2009) Constructing a large-scale english-persian parallel corpus. Meta: Trans J 54(1):181–188
Article Google Scholar
Ni W, Huang Y, Xie M (2008) A query dependent approach to learning to rank for information retrieval. In: Proceedings of the 9th international conference on web-age information management. IEEE, pp 262–269
Nie JY (2010) Cross-language information retrieval. Synth Lect Hum Lang Technol 3(1):1–125
Article Google Scholar
Nie JY, Isabelle P, Plamondon P, Foster G (1998) Using a probabilistic translation model for cross-language information retrieval. In: Proceedings of the 6th workshop on very large Corpora. Association for Computational Linguistics, pp 18–27
Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51
Article MATH Google Scholar
Peng J, MacDonald C, Ounis I (2010) Learning to select a ranking function. In: Proceedings of the 32nd European conference on IR research. Springer, pp 114–126
Rahimi R, Shakery A (2013) A language modeling approach for extracting translation knowledge from comparable corpora. In: Proceedings of the 35th European conference on IR research. Springer, pp 606–617
Rahimi R, Shakery A, King I (2015a) Extracting translations from comparable corpora for cross-Language information retrieval using the language modeling framework. Inf Process Manag 52(2):299–318
Article Google Scholar
Rahimi R, Shakery A, King I (2015b) Multilingual information retrieval in the language modeling framework. Inf Retr 18(3):246–281
Article Google Scholar
Robertson S, Walker S, Jones S, Hancock-Beaulieu M, Gatford M (1994) Okapi at TREC-3. In: Proceedings of the 3rd text retrieval conference (TREC-3), pp 109–126
Sari S, Adriani M (2014) Learning to rank for determining relevant document in Indonesian-English cross language information retrieval using BM25. In: International conference on advanced computer science and information system. IEEE, pp 309–314
Schamoni S (2013) Reducing feature space for learning to rank in cross-language information retrieval. Ph.D. thesis, University Heidelberg
Schamoni S, Riezler S (2015) Combining orthogonal information in large-scale cross-language information retrieval. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 943–946
Scholer F, Williams HE, Turpin A (2004) Query association surrogates for web search. J Am Soc Inf Sci Technol 55(7):637–650
Article Google Scholar
Sharma VK, Mittal N (2016) Cross lingual information retrieval (CLIR): review of tools, challenges and translation approaches corpora ontology NER Google translator Homonymy Polysemy. In: Information systems design and intelligent applications, Vol. 433. Springer, pp 699–708
Tiedemann J (2012) Parallel data, tools and interfaces in OPUS. In: Proceedings of the eight international conference on language resources and evaluation, European language resources association (ELRA), pp 2214–2218
Tsai MF, Chen HH, Wang YT (2011) Learning a merge model for multilingual information retrieval. Inf Process Manag 47(5):635–646
Article Google Scholar
Tsai MF, Wang YT, Chen HH (2008) A study of learning a merge model for multilingual information retrieval. In: Proceedings of the 31st international ACM SIGIR conference on research and development in information retrieval. ACM, pp 195–202
Ture F, Lin J (2014) Exploiting representations from statistical machine translation for cross-language information retrieval. ACM Trans Inf Syst 32(4):19–32
Article Google Scholar
Usunier N, Amini MR, Goutte C (2011) Multiview semi-supervised learning for ranking multilingual documents. In: Proceedings of the 2011 European conference on machine learning and knowledge discovery in databases. Springer, pp 443–458
Voorhees EM, Harman DK (2005) TREC: experiment and evaluation in information retrieval. The MIT Press, Cambridge
Google Scholar
Vulic I, francine Moens M (2015) Monolingual and cross-lingual information retrieval models based on (bilingual) word embeddings. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 363–372
Xu J, Li H (2007) AdaRank: a boosting algorithm for information retrieval. In: Proceedings of the 30th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 391–398
Zhai C (2007) Statistical language models for information retrieval—a critical review. Found Trends® Inf Retr 2(3):137–213
Article Google Scholar
Zhai C, Lafferty J (2004) A study of smoothing methods for language models applied to information retrieval. ACM Trans Inf Syst 22(2):179–214
Article Google Scholar
Zhao Y, Scholer F, Tsegay Y (2008) Effective pre-retrieval query performance prediction using similarity and variability evidence. In: Proceedings of the 30th European conference on IR research. Springer, pp 52–64

Download references

Acknowledgements

We are grateful to the anonymous reviewers for their constructive comments. This research was supported in part by a grant from the school of computer science, Institute for Research in Fundamental Sciences (No. CS 1397-4-55).

Author information

Authors and Affiliations

School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran
Elham Ghanbari & Azadeh Shakery
School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
Azadeh Shakery

Authors

Elham Ghanbari
View author publications
You can also search for this author in PubMed Google Scholar
Azadeh Shakery
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Azadeh Shakery.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghanbari, E., Shakery, A. Query-dependent learning to rank for cross-lingual information retrieval. Knowl Inf Syst 59, 711–743 (2019). https://doi.org/10.1007/s10115-018-1232-8

Download citation

Received: 19 July 2017
Revised: 25 April 2018
Accepted: 10 May 2018
Published: 04 July 2018
Issue Date: 04 June 2019
DOI: https://doi.org/10.1007/s10115-018-1232-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Query-dependent learning to rank for cross-lingual information retrieval

Abstract

Access this article

Similar content being viewed by others

A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval

An axiomatic approach to corpus-based cross-language information retrieval

Multilingual information retrieval in the language modeling framework

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Query-dependent learning to rank for cross-lingual information retrieval

Abstract

Access this article

Similar content being viewed by others

A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval

An axiomatic approach to corpus-based cross-language information retrieval

Multilingual information retrieval in the language modeling framework

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation