Skip to main content
Log in

Building effective SVM concept detectors from clickthrough data for large-scale image retrieval

  • Regular Paper
  • Published:
International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Abstract

Clickthrough data is a source of information that can be used for automatically building concept detectors for image retrieval. Previous studies, however, have shown that in many cases the resulting training sets suffer from severe label noise that has a significant impact in the SVM concept detector performance. This paper evaluates and proposes a set of strategies for automatically building effective concept detectors from clickthrough data. These strategies focus on: (1) automatic training set generation; (2) assignment of label confidence weights to the training samples and (3) using these weights at the classifier level to improve concept detector effectiveness. For training set selection and in order to assign weights to individual training samples three Information Retrieval (IR) models are examined: vector space models, BM25 and language models. Three SVM variants that take into account importance at the classifier level are evaluated and compared to the standard SVM: the Fuzzy SVM, the Power SVM, and the Bilateral-weighted Fuzzy SVM. Experiments conducted on the MM Grand Challenge dataset (consisting of 1M images and 82.3M unique clicks) for 40 concepts demonstrate that (1) on average, all weighted SVM variants are more effective than the standard SVM; (2) the vector space model produces the best training sets and best weights; (3) the Bilateral-weighted Fuzzy SVM produces the best results but is very sensitive to weight assignment and (4) the Fuzzy SVM is the most robust training approach for varying levels of label noise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. https://image.bing.com.

References

  1. Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167. doi:10.1023/A:1009715923555

    Article  Google Scholar 

  2. Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27:1–27:27. doi:10.1145/1961189.1961199

  3. Chapelle O, Zhang Y (2009) A dynamic bayesian network click model for web search ranking. In: Proceedings of the 18th International Conference on World Wide Web, ACM, New York, NY, USA, WWW ’09, pp 1–10. doi:10.1145/1526709.1526711

  4. Craswell N, Szummer M (2007) Random walks on the click graph. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, NY, USA, SIGIR ’07, pp 239–246. doi:10.1145/1277741.1277784

  5. Dupret G, Liao C (2010) A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, ACM, New York, NY, USA, WSDM ’10, pp 181–190. doi:10.1145/1718487.1718510

  6. Fang Q, Xu H, Wang R, Qian S, Wang T, Sang J, Xu C (2013) Towards MSR-Bing challenge: ensemble of diverse models for image retrieval. http://research.microsoft.com/en-us/events/irc2013/paper_irc_nlpr-mmc.pdf. Accessed 15 Aug 2014

  7. Hiemstra D (1998) A linguistically motivated probabilistic model of information retrieval. In: Research and Advanced Technology for Digital Libraries, Lecture Notes in Computer Science, vol 1513, pp 569–584. Springer, Berlin Heidelberg. doi:10.1007/3-540-49653-X_34

  8. Hsu CC, Han MF, Chang SH, Chung HY (2009) Fuzzy support vector machines with the uncertainty of parameter C. Expert Systems Appl 36(3, Part 2):6654–6658. doi:10.1016/j.eswa.2008.08.032

  9. Hua XS, Yang L, Wang J, Wang J, Ye M, Wang K, Rui Y, Li J (2013) Clickage: towards bridging semantic and intent gaps via mining click logs of search engines. In: Proceedings of the 21st ACM International Conference on Multimedia, ACM, New York, NY, USA, MM ’13, pp 243–252. doi:10.1145/2502081.2502283

  10. Inoue T, Abe S (2001) Fuzzy support vector machines for pattern classification. In: Proceedings of International Joint Conference on Neural Networks, 2001. IJCNN ’01., vol 2, pp 1449–1454. doi:10.1109/IJCNN.2001.939575

  11. Jain V, Varma M (2011) Learning to re-rank: query-dependent image re-ranking using click data. In: Proceedings of the 20th International Conference on World Wide Web, ACM, New York, NY, USA, WWW ’11, pp 277–286. doi:10.1145/1963405.1963447. http://doi.acm.org/10.1145/1963405.1963447

  12. Jilani T, Burney S (2008) Multiclass bilateral-weighted fuzzy support vector machine to evaluate financial strength credit rating. In: Proceedings of International Conference on Computer Science and Information Technology, 2008. ICCSIT ’08, pp 342–348. doi:10.1109/ICCSIT.2008.191

  13. Lin CF, Wang SD (2002) Fuzzy support vector machines. IEEE Trans Neural Netw 13(2):464–471. doi:10.1109/72.991432

    Article  Google Scholar 

  14. Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press

  15. Min R, Cheng HD (2009) Effective image retrieval using dominant color descriptor and fuzzy support vector machine. Pattern Recogn 42(1):147–157. doi:10.1016/j.patcog.2008.07.001

    Article  MATH  Google Scholar 

  16. Pan Y, Yao T, Yang K, Li H, Ngo CW, Wang J, Mei T (2013) Image search by graph-based label propagation with image representation from DNN. In: Proceedings of the 21st ACM International Conference on Multimedia, ACM, New York, NY, USA, MM ’13, pp 397–400. doi:10.1145/2502081.2508128

  17. Pan Y, Yao T, Mei T, Li H, Ngo CW, Rui Y (2014) Click-through-based cross-view learning for image search. In: Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, NY, USA, SIGIR ’14, pp 717–726. doi:10.1145/2600428.2609568. http://doi.acm.org/10.1145/2600428.2609568

  18. Radlinski F, Joachims T (2005) Query chains: learning to rank from implicit feedback. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, ACM, New York, NY, USA, KDD ’05, pp 239–248. doi:10.1145/1081870.1081899

  19. Rao Y, Mundur P, Yesha Y (2006) Fuzzy SVM ensembles for relevance feedback in image retrieval. In: Proceedings of the 5th International Conference on Image and Video Retrieval, Springer, Berlin, Heidelberg, CIVR’06, pp 350–359. doi:10.1007/11788034_36

  20. Robertson S, Zaragoza H (2009) The probabilistic relevance framework: BM25 and beyond. Now Publishers Inc

  21. Robertson SE, Walker S (1994) Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: Proceedings of the 17th Annual International ACM SIGIR Conf erence on Research and Development in Information Retreival, Springer, New York Inc, New York, NY, USA, SIGIR ’94, pp 232–241

  22. Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620. doi:10.1145/361219.361220

  23. Van de Sande KEA, Gevers T, Snoek CGM (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell 32(9):1582–1596. doi:10.1109/TPAMI.2009.154

    Article  Google Scholar 

  24. Sarafis I, Diou C, Delopoulos A (2014a) Building robust concept detectors from clickthrough data: a study in the msr-bing dataset. In: Semantic and Social Media Adaptation and Personalization (SMAP), 2014 9th International Workshop, pp 66–71. doi:10.1109/SMAP.2014.22

  25. Sarafis I, Diou C, Tsikrika T, Delopoulos A (2014) Weighted SVM from clickthrough data for image retrieval. In: IEEE International Conference on Image Process 2014 (ICIP 2014). France, Paris, pp 3051–3055

  26. Snoek CGM, Worring M (2009) Concept-based video retrieval. Found Trends Inf Retr 2(4):215–322. doi:10.1561/1500000014

    Article  Google Scholar 

  27. Sohail A, Bhattacharya P, Mudur S, Krishnamurthy S (2011) Classification of ultrasound medical images using distance based feature selection and fuzzy-SVM. In: Pattern Recognit and Image Anal, Lecture Notes in Computer Science, vol 6669, pp 176–183. Springer, Berlin Heidelberg. doi:10.1007/978-3-642-21257-4_22

  28. Sun Z, Ruan D, Ma Y, Hu X, Zhang Xg (2009) Crack defects detection in radiographic weldment images using FSVM and beamlet transform. In: Proceedings of the 6th International Conference on Fuzzy Systems and Knowl Discoverey, vol 3, IEEE Press, Piscataway, NJ, USA, FSKD’09, pp 402–406

  29. Tsikrika T, Diou C (2014) Multi-evidence user group discovery in professional image search. In: de Rijke M, Kenter T, de Vries A, Zhai C, de Jong F, Radinsky K, Hofmann K (eds) Advances in Information Retrieval, Lecture Notes in Computer Science, vol 8416, pp 693–699. Springer, Berlin. doi:10.1007/978-3-319-06028-6_78. http://dx.doi.org/10.1007/978-3-319-06028-6_78

  30. Tsikrika T, Diou C, de Vries AP, Delopoulos A (2009) Are clickthrough data reliable as image annotations? In: Proceedings of the Theseus/ImageCLEF workshop on visual information retrieval. Fraunhofer Verlag, Corfu

  31. Tsikrika T, Diou C, de Vries A, Delopoulos A (2011) Reliability and effectiveness of clickthrough data for automatic image annotation. Multimed Tools Appl 55(1):27–52. doi:10.1007/s11042-010-0584-1

    Article  Google Scholar 

  32. Wang L, Cen S, Bai H, Huang C, Zhao N, Liu B, Feng Y, Dong Y (2013) France telecom orange labs (beijing) at MSR-Bing challenge on image retrieval 2013. http://www.research.microsoft.com/en-us/events/irc2013/paper_irc_orange.pdf Accessed 15 Aug 2014

  33. Wang Y, Wang S, Lai KK (2005) A new fuzzy support vector machine to evaluate credit risk. Trans Fuzzy Syst 13(6):820–831. doi:10.1109/TFUZZ.2005.859320

    Article  Google Scholar 

  34. Wu CC, Chu KY, Kuo YH, Chen YY, Lee WY, Hsu WH (2013) Search-based relevance association with auxiliary contextual cues. In: Proceedings of the 21st ACM International Conference on Multimedia, ACM, New York, NY, USA, MM ’13, pp 393–396. doi:10.1145/2502081.2508127

  35. Wu K, Yap KH (2006) Fuzzy SVM for content-based image retrieval: a pseudo-label support vector machine framework. Comp Intell Mag 1(2):10–16. doi:10.1109/MCI.2006.1626490

    Article  Google Scholar 

  36. Gm Xian (2010) An identification method of malignant and benign liver tumors from ultrasonography based on GLCM texture features and fuzzy SVM. Expert Syst Appl 37(10):6737–6741. doi:10.1016/j.eswa.2010.02.067

    Article  Google Scholar 

  37. Yang X, Zhang Y, Yao T, Ngo CW, Mei T (2014) Click-boosting multi-modality graph-based reranking for image search. Multimed Syst 1–11. doi:10.1007/s00530-014-0379-8

  38. Yu SX (2012) Power SVM: generalization with exemplar classification uncertainty. In: Proceedings of the 2012 IEEE Conference on Comput Visual and Pattern Recognition (CVPR), IEEE Computer Society, Washington, DC, USA, CVPR ’12, pp 2144–2151

  39. Zhang Y, Yang X, Mei T (2014) Image search reranking with query-dependent click-based relevance feedback. Image Process IEEE Trans 23(10):4448–4459. doi:10.1109/TIP.2014.2346991

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ioannis Sarafis.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sarafis, I., Diou, C. & Delopoulos, A. Building effective SVM concept detectors from clickthrough data for large-scale image retrieval. Int J Multimed Info Retr 4, 129–142 (2015). https://doi.org/10.1007/s13735-015-0080-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13735-015-0080-5

Keywords

Navigation