Abstract
We present a framework based on a Learning to Rank setting for a text-image retrieval task. In Information Retrieval, the goal is to compute the similarity between a document and an user query. In the context of text-image retrieval where several similarities exist, human intervention is often needed to decide on the way to combine them. On the other hand, with the Learning to Rank approach the combination of the similarities is done automatically. Learning to Rank is a paradigm where the learnt objective function is able to produce a ranked list of images when a user query is given. These score functions are generally a combination of similarities between a document and a query. In the past, Learning to Rank algorithms were successfully applied to text retrieval where they outperformed baselines such as BM25 or TFIDF. This inspired us to apply our state-of-the-art algorithm, called OWPC (Usunier et al. 2009), to the text-image retrieval task. At this time, no benchmarks are available, therefore we present a framework for building one. The empirical validation of this algorithm is done on the dataset constructed through comparison of typical text-image retrieval similarities. In both cases, visual only and text and visual, our algorithm performs better than a simple baseline.




Similar content being viewed by others
Notes
Constituting pairs is the more expensive part of Learning to Rank where in the worst case we need to create \(O({{[{{\bf y}}}]}*{{[{\bar {\bf y}}]}})\) pairs with [y] the number of relevant elements and \({{[{\bar {\bf y}}]}}\) irrelevant ones.
For clarity, we restricted ourselves to the case where the relevance judgements are binary. We can also notice that this count ignores the relative positions of the relevant documents.
We preferred use the 2006 collection where more information and assessed queries are available than for the 2007 or 2008 collections.
The score function is a scalar product between a weight vector and the feature vector, so the results of this product is dependent of the scale of the values.
References
Aslam JA, Kanoulas E, Pavlu V, Savev S, Yilmaz E (2009) Document selection methodologies for efficient and effective learning-to-rank. In: SIGIR ’09: proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM, New York, NY, USA, pp 468–475
Burges CJC, Ragno R, Le QV (2006) Learning to rank with nonsmooth cost functions. In: NIPS, pp 193–200
Burges CJC, Shaked T, Renshaw E, Lazier A, Deeds M, Hamilton N, Hullender GN (2005) Learning to rank using gradient descent. In: ICML, pp 89–96
Cao Y, Xu J, Liu T-Y, Li H, Huang Y, Hon H-W (2006) Adapting ranking svm to document retrieval. In: SIGIR, pp 186–193
Cao Z, Qin T, Liu T-Y, Tsai M-F, Li H (2007) Learning to rank: from pairwise approach to listwise approach. In: ICML, pp 129–136
La Cascia M, Sethi S, Sclaroff S (1998) Combining textual and visual cues for content-based image retrieval on the world wide web. In: In IEEE workshop on content-based access of image and video libraries, pp 24–28
Clough P, Grubinger M, Deselaers T, Hanbury A, Müller H (2006) Overview of the imageclef 2006 photographic retrieval and object annotation tasks. In: CLEF, pp 579–594
Cohen WW, Schapire RE, Singer Y (1997) Learning to order things. In: NIPS
Cossock D, Zhang T (2006) Subset ranking using regression. In: COLT, pp 605–619
Datta R, Joshi D, Li J, Wang JZ (2008) Image retrieval: ideas, influences, and trends of the new age. ACM Comput Surv 40(2):1–60
Faria FF, Veloso A, Almeida HM, Valle E, da Silva Torres R, Gonçalves MA, Meira Jr W (2010) Learning to rank for content-based image retrieval. In: MIR ’10: proceedings of the international conference on multimedia information retrieval. ACM, New York, NY, USA, pp 285–294
Freund Y, Iyer R, Schapire RE, Singer Y (2003) An efficient boosting algorithm for combining preferences. JMLR 4:933–969
Grubinger M, Clough PD, Müller H, Deselaers T (2006) The iapr benchmark: a new evaluation resource for visual information systems. In: International conference on language resources and evaluation
Har-Peled S, Roth D, Zimak D (2002) Constraint classification for multiclass classification and ranking. In: NIPS, pp 785–792
Hu Y, Li MJ, Yu N (2008) Multiple-instance ranking: learning to rank images for image retrieval. In: CVPR08, pp 1–8
Järvelin K, Kekäläinen J (2000) Ir evaluation methods for retrieving highly relevant documents. In: SIGIR. ACM, New York, NY, USA, pp 41–48
Joachims T (2002) Optimizing search engines using clickthrough data. In: KDD, pp 133–142
Li M, Zheng Y-T, Lin S-X, Zhang Y-D, Chua T-S (2008) Multimedia evidence fusion for video concept detection via owa operator. In: MMM ’09: proceedings of the 15th international multimedia modeling conference on advances in multimedia modeling. Springer, Berlin, Heidelberg, pp 208–216
Porter MF (1980) An algorithm for suffix stripping. Program 14(3):130–137
Robertson SE, Walker S, Hancock-Beaulieu M, Gull A, Lau M (1992) Okapi at trec. In: TREC, pp 21–30
Rui Y, Huang T (2000) Optimizing learning in image retrieval. In: CVPR, pp 236–243
Taylor M, Guiver J, Robertson S, Minka T (2008) Softrank: optimizing non-smooth rank metrics. In: WSDM ’08. ACM, pp 77–86
Tollari S, Detyniecki M, Fakeri-Tabrizi A, Marsala C, Amini M-R, Gallinari P (2008) Using visual concepts and fast visual diversity to improve image retrieval. In: Peters C, Deselaers T, Ferro N, Gonzalo J, Jones GJF, Kurimo M, Mandl T, Peñas A, Petras V (eds) CLEF. Lecture notes in computer science, vol 5706. Springer, pp 577–584
Tollari S, Glotin H (2007) Web image retrieval on imageval: evidences on visualness and textualness concept dependency in fusion model. In: ACM international conference on image and video retrieval (ACM CIVR)
Tollari S, Glotin H (2008) Learning optimal visual features from web sampling in online image retrieval. In: IEEE international conference on acoustics, speech and signal processing (ICASSP)
Tong S, Chang E (2001) Support vector machine active learning for image retrieval. In: MULTIMEDIA ’01: proceedings of the ninth ACM international conference on multimedia. ACM, New York, NY, USA, pp 107–118
Tsochantaridis I, Joachims T, Hofmann T, Altun Y (2005) Large margin methods for structured and interdependent output variables. J Mach Learn Res 6:1453–1484
Usunier N, Buffoni D, Gallinari P (2009) Ranking with ordered weighted pairwise classification. In: Danyluk AP, Bottou L, Littman ML (eds) ICML. ACM international conference proceeding series, vol 382. ACM, p 133
Xu J, Li H (2007) Adarank: a boosting algorithm for information retrieval. In: SIGIR, pp 391–398
Xu J, Liu T-Y, Lu M, Li H, Ma W-Y (2008) Directly optimizing evaluation measures in learning to rank. In: SIGIR, pp 107–114
Yager RR (1988) On ordered weighted averaging aggregation operators in multi-criteria decision making. IEEE Trans Syst Man Cybern 18:183–190
Yates RB, Ribeiro-Neto B (1999) Modern information retrieval. Addison Wesley
Yue Y, Finley T, Radlinski F, Joachims T (2007) A support vector method for optimizing average precision. In: SIGIR, pp 271–278
Zhai C, Lafferty J (2004) A study of smoothing methods for language models applied to information retrieval. ACM Trans Inf Syst 22(2):179–214
Zhou XS, Huang TS (2002) Unifying keywords and visual contents in image retrieval. IEEE Multimed 9(2):23–33
Acknowledgement
This work was partially supported by the French National Agency of Research (ANR-06-MDCA-002 AVEIR project).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Buffoni, D., Tollari, S. & Gallinari, P. A Learning to Rank framework applied to text-image retrieval. Multimed Tools Appl 60, 161–180 (2012). https://doi.org/10.1007/s11042-011-0806-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-011-0806-1