Skip to main content
Log in

Neural ranking for automatic image annotation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Automatic image annotation aims to predict labels for images according to their semantic contents and has become a research focus in computer vision, as it helps people to edit, retrieve and understand large image collections. In the last decades, researchers have proposed many approaches to solve this task and achieved remarkable performance on several standard image datasets. In this paper, we propose a novel learning to rank approach to address image auto-annotation problem. Unlike typical learning to rank algorithms for image auto-annotation which directly rank annotations for image, our approach consists of two phases. In the first phase, neural ranking models are trained to rank image’s semantic neighbors. Then nearest-neighbor based models propagate annotations from these semantic neighbors to the image. Thus our approach integrates learning to rank algorithms and nearest-neighbor based models, including TagProp and 2PKNN, and inherits their advantages. Experimental results show that our method achieves better or comparable performance compared with the state-of-the-art methods on four challenging benchmarks including Corel5K, ESP Games, IAPR TC-12 and NUS-WIDE.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. We also tried to use ReLU which perform slightly inferior than using SER (F1 values decrease %4 ∼ %7 in our experiments on four datasets.)

  2. The source code of TagProp is available at: http://lear.inrialpes.fr/people/guillaumin/code.php#tagprop

  3. The source code of 2PKNN is available at: http://researchweb.iiit.ac.in/yashaswi.verma/eccv12/2pknn.zip

  4. These features are available at: http://lear.inrialpes.fr/people/guillaumin/data.php

  5. These features can be find at: http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm

References

  1. Agrawal A, Lu J, Antol S (2015) Vqa: Visual question answering. Int J Comput Vis 123(1):4–31

    Article  MathSciNet  Google Scholar 

  2. Ballan L, Uricchio T, Seidenari L, Bimbo AD (2014) A cross-media model for automatic image annotation. In: ACM ICMR, pp 73–80

  3. Blei D, Jordan M (2003) Modeling annotated data. In: ACM SIGIR, pp 127–134

  4. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  5. Burges C (2005) Learning to rank using gradient descent. In: ICML, pp 89–96

  6. Burges C (2010) From ranknet to lambdarank to lambdamart: An overview. In: Technical report, Microsoft Research

  7. Cai D, He X, Han J (2007) Semi-supervised discriminant analysis. In: ICCV

  8. Cao Z, Qin T (2007) Learning to rank: from pairwise approach to listwise approach. In: ICML, pp 129–136

  9. Carneiro G, Chan A, Moreno P, Vasconcelos N (2007) Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans Pattern Anal Mach Intell 29(3):394–410

    Article  Google Scholar 

  10. Chatfield K, Lempitsky V, Vedaldi A, Zisserman A (2011) The devil is in the details: an evaluation of recent feature encoding methods. In: BMVC, pp 1–12

  11. Chopra S, Hadsell R, LeCun Y (2005) Learning a similarity metric discriminatively, with application to face verification. In: CVPR, pp 539–546

  12. Dehghani M, Zamani H, Severyn A, Kamps J, Croft WB (2017) Neural ranking models with weak supervision. In: ACM SIGIR, pp 65–74

  13. Deng J, Dong W, Socher R, Li L, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: CVPR, pp 248–255

  14. Fabian L, Michael J, Nebojsa J (2013) Efficient ranking from pairwise comparisons. In: ICML, pp 109–117

  15. Fenga S, Manmatha R, Lavrenko V (2004) Multiple bernoulli relevance models for image and video annotation. In: CVPR, pp 1002–1009

  16. Fernando B, Anderson P, Hutter M, Gould S (2016) Discriminative hierarchical rank pooling for activity recognition. In: CVPR, pp 1924–1932

  17. Fernando B, Gawes E, Oramas J, Ghodrati J, Tuytelaars T (2017) Rank pooling for action recognition. IEEE Trans Pattern Anal Mach Intell 39(4):773–787

    Article  Google Scholar 

  18. Fu H, Zhang Q, Qiu G (2012) Random forest for image annotation. In: ECCV, pp 86–99

  19. Gao Z, Nie W, Liu A (2016) Evaluation of local spatial-temporal features for cross-view action recognition. Neurocomputing 173(1):110–117

    Article  Google Scholar 

  20. Gao Z, Zhang H, Liu A (2016) Human action recognition on depth dataset. Neural Comput Applic 27(7):2047–2054

    Article  Google Scholar 

  21. Gao Z, Zhang L, Chen M (2014) Enhanced and hierarchical structure algorithm for data imbalance problem in semantic extraction under massive video dataset. Multimedia Tools Appl 68(3):641–657

    Article  Google Scholar 

  22. Gong Y, Jia Y, Leung T, Toshev A, Ioffe S (2014) Deep convolutional ranking for multilabel image annotation. arXiv:13124894

  23. Gong Y, Ke Q, Isard M, Lazebnik S (2014) A multi-view embedding space for modeling internet images, tags, and their semantics. Int J Comput Vis 106(2):210–233

    Article  Google Scholar 

  24. Gong Y, Wang L, Hodosh M, Hockenmaier J, Lazebnik S (2014) Improving image-setence embeddings using large weakly annotated photo collections. In: ECCV, pp 529–545

  25. Gu Y, Xue H, Yang J (2016) Cross-modal saliency correlation for image annotation. Neural Process Lett 45(3):777–789

    Article  Google Scholar 

  26. Guillaumin M, Mensink T, Verbeek J, Schmid C (2009) Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In: ICCV, pp 309–316

  27. Hardoon D, Szedmak S, Shawe-Taylor J (2004) Cannonical correlation analysis: An overview with application to learning methods. Neural Comput 16(12):2639–2664

    Article  MATH  Google Scholar 

  28. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR, pp 770–778

  29. Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: ICML, pp 448–456

  30. Jeon J, Lavreko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: ACM SIGIR, pp 119–126

  31. Joachims T (2002) Optimizing search engines using clickthrough data. In: ACM SIGKDD, pp 133–142

  32. Johnson J, Ballan L, Fei-Fei L (2015) Love thy neighbors: Image annotation by exploiting image metadata. In: ICCV, pp 4624–4632

  33. Kang F, Sukthankar R (2006) Correlated label propagation with application to multi-label learning. In: CVPR, pp 1719–1726

  34. Kingma D, Ba J (2014) Adam: A method for stochastic optimization. arXiv:14126980

  35. Kiros R, Szepesvari C (2015) Deep representations and codes for image auto-annotation. In: NIPS, pp 917–925

  36. Klein B, Lev G, Sadeh G, Wolf L (2015) Fisher vectors derived from hybrid gaussian-laplacian mixture models for image annotation. arXiv:14117399

  37. Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: NIPS, pp 1106–1114

  38. Lavrenko V, Manmatha R, Jeon J (2004) A model for learning the semantics of pictures. In: NIPS, pp 553–560

  39. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR, pp 2169–2178

  40. Li X, Snoek C, Worring M (2007) Learning social tag relevance by neighbor voting. IEEE TMM 11(7):1310–1322

    Google Scholar 

  41. Li Z, Liu J, Xu C, Lu H (2013) Mlrank: Multi-correlation learning to rank for image annotation. Pattern Recogn 46(10):2700–2710

    Article  MATH  Google Scholar 

  42. Liu J, Li M, Liu Q, Lu H, Ma S (2009) Image annotation via graph learning. Pattern Recogn 42(2):218–228

    Article  MATH  Google Scholar 

  43. Liu T (2009) Learning to rank for information retrieval. Found Trends Inf Retr 3(3):225–331

    Article  Google Scholar 

  44. Lowe D (2004) Distinctive image features from scale-invariant keypoints. IJCV 60(2):91–110

    Article  Google Scholar 

  45. Makadia A, Pavlovic V, Kumar S (2008) A new baseline for image annotation. In: ECCV, pp 316–329

  46. Makadia A, Pavlovic V, Kumar S (2010) Baselines for image annotation. Int J Comput Vis 90(1):88–105

    Article  Google Scholar 

  47. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:13013781

  48. Montazer G, Giveki D (2017) Scene classification using multi-resolution waholb features and neural network classifier. Neural Process Lett 46(2):681–704

    Article  Google Scholar 

  49. Moran S, Lanvrenko V (2014) Sparse kernel learning for image annotation. In: ACM ICMR, p 113

  50. Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV 42(3):145–175

    Article  MATH  Google Scholar 

  51. Peng X, Zou C, Qiao Y, Peng Q (2010) Action recognition with stacked fisher vectors. In: ECCV, pp 581–595

  52. Perronnin F, Sanchez J, Mensink T (2010) Improving the fisher kernel for large scale image classification. In: ECCV, pp 143–156

  53. Song Y, Zhuang Z, Li H, Zhao Q, Li J, Lee W, Giles CL (2008) Real-time automatic tag recommendation. In: ACM SIGIR, pp 515–522

  54. Thomas D, Andreas K, Joel W (2014) Parallelizing exploration-exploitation tradeoffs in gaussian process bandit optimization. J Mach Learn Res 15(1):3873–3923

    MathSciNet  MATH  Google Scholar 

  55. Thorsten J (2006) Training linear svms in linear time. In: KDD, pp 217–226

  56. Venkatesh N, Subhransu M, Manmatha R (2015) Automatic image annotation using deep learning representations. In: ACM ICMR, pp 603–606

  57. Verma Y, Jawahar C (2012) Image annotation using metric learning in semantic neighbourhoods. In: ECCV, pp 836–849

  58. Verma Y, Jawahar C (2013) Exploring svm for image annotation in presence of confusing labels. In: British Machine Vision Conference, pp 1–11

  59. Wang J, Yang Y, Mao J, Huang Z, Huang C, Xu W (2016) Cnn-rnn: A unified framework for multi-label image classification. In: CVPR, pp 2285–2294

  60. Wang L, Liu L, Khan L (2004) Automatic image annotation and retrieval ussing subspace clustering algorithm. In: ACM International Workshop Multimedia Databases, pp 100–108

  61. Weston J, Bengio S, Usunier N (2011) Wsabie: Scaling up to large vocabulary image annotation. In: IJCAI, pp 2764–2770

  62. Wu F, Jing X, Yue D (2017) Multi-view discriminant dictionary learning via learning view-specific and shared structured dictionaries for image classification. Neural Process Lett 45(2):649–666

    Article  Google Scholar 

  63. Yan X, Su XG (2009) Linear regression analysis: Theory and computing. World Scientfic Publishing Co, Inc, River Edge

    Book  MATH  Google Scholar 

  64. Yan Y, Nie F, Li W, Gao C, Yang Y, Xu D (2016) Image classification by cross-media active learning with privileged information. IEEE Trans Multimedia 18(12):2494–2502

    Article  Google Scholar 

  65. Yang C, Dong M, Hua J (2007) Region-based image annotation using asymmetrical support vector machine-based multiple-instance learning. In: CVPR, pp 2057–2063

  66. Yang Y, Xu D, Nie F, Yan S, Zhuang Y (2010) Image clustering using local discriminant models and global integration. IEEE Trans Image Process 19(10):2761–2773

    Article  MathSciNet  MATH  Google Scholar 

  67. Yang Y, Nie F, Xu D, Luo J, Zhuang Y, Pan Y (2012) A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Trans Pattern Anal Mach Intell 34(4):723–742

    Article  Google Scholar 

  68. Yun H, Raman P, Vishwanathan S (2014) Ranking via robust binary classification. In: NIPS, pp 2582–2590

  69. Zhang S, Huang J, Huang Y (2010) Automatic image annotation using group sparsity. In: CVPR, pp 3312–3319

  70. Zhu L, Xu Z, Yang Y, Hauptmann AG (2017) Uncovering the temporal context for video question answering. Int J Comput Vis 124(3):409–421

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work is supported by the Natural Science Foundation of China (No. 61572162) and the Zhejiang Provincial Key Science and Technology Project Foundation (No. 2018C01012).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Hua Hu or Haiyang Hu.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, W., Hu, H. & Hu, H. Neural ranking for automatic image annotation. Multimed Tools Appl 77, 22385–22406 (2018). https://doi.org/10.1007/s11042-018-5973-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-5973-x

Keywords

Navigation