Neural ranking for automatic image annotation

Zhang, Weifeng; Hu, Hua; Hu, Haiyang

doi:10.1007/s11042-018-5973-x

Neural ranking for automatic image annotation

Published: 25 April 2018

Volume 77, pages 22385–22406, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Weifeng Zhang^1,2,3,
Hua Hu¹ &
Haiyang Hu¹

236 Accesses
7 Citations
3 Altmetric
Explore all metrics

Abstract

Automatic image annotation aims to predict labels for images according to their semantic contents and has become a research focus in computer vision, as it helps people to edit, retrieve and understand large image collections. In the last decades, researchers have proposed many approaches to solve this task and achieved remarkable performance on several standard image datasets. In this paper, we propose a novel learning to rank approach to address image auto-annotation problem. Unlike typical learning to rank algorithms for image auto-annotation which directly rank annotations for image, our approach consists of two phases. In the first phase, neural ranking models are trained to rank image’s semantic neighbors. Then nearest-neighbor based models propagate annotations from these semantic neighbors to the image. Thus our approach integrates learning to rank algorithms and nearest-neighbor based models, including TagProp and 2PKNN, and inherits their advantages. Experimental results show that our method achieves better or comparable performance compared with the state-of-the-art methods on four challenging benchmarks including Corel5K, ESP Games, IAPR TC-12 and NUS-WIDE.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Article Open access 06 February 2017

Learning with Noisy Correspondence

Article 13 April 2024

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

Article 15 September 2023

Notes

We also tried to use ReLU which perform slightly inferior than using SER (F1 values decrease %4 ∼ %7 in our experiments on four datasets.)
The source code of TagProp is available at: http://lear.inrialpes.fr/people/guillaumin/code.php#tagprop
The source code of 2PKNN is available at: http://researchweb.iiit.ac.in/yashaswi.verma/eccv12/2pknn.zip
These features are available at: http://lear.inrialpes.fr/people/guillaumin/data.php
These features can be find at: http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm

References

Agrawal A, Lu J, Antol S (2015) Vqa: Visual question answering. Int J Comput Vis 123(1):4–31
Article MathSciNet Google Scholar
Ballan L, Uricchio T, Seidenari L, Bimbo AD (2014) A cross-media model for automatic image annotation. In: ACM ICMR, pp 73–80
Blei D, Jordan M (2003) Modeling annotated data. In: ACM SIGIR, pp 127–134
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
Burges C (2005) Learning to rank using gradient descent. In: ICML, pp 89–96
Burges C (2010) From ranknet to lambdarank to lambdamart: An overview. In: Technical report, Microsoft Research
Cai D, He X, Han J (2007) Semi-supervised discriminant analysis. In: ICCV
Cao Z, Qin T (2007) Learning to rank: from pairwise approach to listwise approach. In: ICML, pp 129–136
Carneiro G, Chan A, Moreno P, Vasconcelos N (2007) Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans Pattern Anal Mach Intell 29(3):394–410
Article Google Scholar
Chatfield K, Lempitsky V, Vedaldi A, Zisserman A (2011) The devil is in the details: an evaluation of recent feature encoding methods. In: BMVC, pp 1–12
Chopra S, Hadsell R, LeCun Y (2005) Learning a similarity metric discriminatively, with application to face verification. In: CVPR, pp 539–546
Dehghani M, Zamani H, Severyn A, Kamps J, Croft WB (2017) Neural ranking models with weak supervision. In: ACM SIGIR, pp 65–74
Deng J, Dong W, Socher R, Li L, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: CVPR, pp 248–255
Fabian L, Michael J, Nebojsa J (2013) Efficient ranking from pairwise comparisons. In: ICML, pp 109–117
Fenga S, Manmatha R, Lavrenko V (2004) Multiple bernoulli relevance models for image and video annotation. In: CVPR, pp 1002–1009
Fernando B, Anderson P, Hutter M, Gould S (2016) Discriminative hierarchical rank pooling for activity recognition. In: CVPR, pp 1924–1932
Fernando B, Gawes E, Oramas J, Ghodrati J, Tuytelaars T (2017) Rank pooling for action recognition. IEEE Trans Pattern Anal Mach Intell 39(4):773–787
Article Google Scholar
Fu H, Zhang Q, Qiu G (2012) Random forest for image annotation. In: ECCV, pp 86–99
Gao Z, Nie W, Liu A (2016) Evaluation of local spatial-temporal features for cross-view action recognition. Neurocomputing 173(1):110–117
Article Google Scholar
Gao Z, Zhang H, Liu A (2016) Human action recognition on depth dataset. Neural Comput Applic 27(7):2047–2054
Article Google Scholar
Gao Z, Zhang L, Chen M (2014) Enhanced and hierarchical structure algorithm for data imbalance problem in semantic extraction under massive video dataset. Multimedia Tools Appl 68(3):641–657
Article Google Scholar
Gong Y, Jia Y, Leung T, Toshev A, Ioffe S (2014) Deep convolutional ranking for multilabel image annotation. arXiv:13124894
Gong Y, Ke Q, Isard M, Lazebnik S (2014) A multi-view embedding space for modeling internet images, tags, and their semantics. Int J Comput Vis 106(2):210–233
Article Google Scholar
Gong Y, Wang L, Hodosh M, Hockenmaier J, Lazebnik S (2014) Improving image-setence embeddings using large weakly annotated photo collections. In: ECCV, pp 529–545
Gu Y, Xue H, Yang J (2016) Cross-modal saliency correlation for image annotation. Neural Process Lett 45(3):777–789
Article Google Scholar
Guillaumin M, Mensink T, Verbeek J, Schmid C (2009) Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In: ICCV, pp 309–316
Hardoon D, Szedmak S, Shawe-Taylor J (2004) Cannonical correlation analysis: An overview with application to learning methods. Neural Comput 16(12):2639–2664
Article MATH Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR, pp 770–778
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: ICML, pp 448–456
Jeon J, Lavreko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: ACM SIGIR, pp 119–126
Joachims T (2002) Optimizing search engines using clickthrough data. In: ACM SIGKDD, pp 133–142
Johnson J, Ballan L, Fei-Fei L (2015) Love thy neighbors: Image annotation by exploiting image metadata. In: ICCV, pp 4624–4632
Kang F, Sukthankar R (2006) Correlated label propagation with application to multi-label learning. In: CVPR, pp 1719–1726
Kingma D, Ba J (2014) Adam: A method for stochastic optimization. arXiv:14126980
Kiros R, Szepesvari C (2015) Deep representations and codes for image auto-annotation. In: NIPS, pp 917–925
Klein B, Lev G, Sadeh G, Wolf L (2015) Fisher vectors derived from hybrid gaussian-laplacian mixture models for image annotation. arXiv:14117399
Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: NIPS, pp 1106–1114
Lavrenko V, Manmatha R, Jeon J (2004) A model for learning the semantics of pictures. In: NIPS, pp 553–560
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR, pp 2169–2178
Li X, Snoek C, Worring M (2007) Learning social tag relevance by neighbor voting. IEEE TMM 11(7):1310–1322
Google Scholar
Li Z, Liu J, Xu C, Lu H (2013) Mlrank: Multi-correlation learning to rank for image annotation. Pattern Recogn 46(10):2700–2710
Article MATH Google Scholar
Liu J, Li M, Liu Q, Lu H, Ma S (2009) Image annotation via graph learning. Pattern Recogn 42(2):218–228
Article MATH Google Scholar
Liu T (2009) Learning to rank for information retrieval. Found Trends Inf Retr 3(3):225–331
Article Google Scholar
Lowe D (2004) Distinctive image features from scale-invariant keypoints. IJCV 60(2):91–110
Article Google Scholar
Makadia A, Pavlovic V, Kumar S (2008) A new baseline for image annotation. In: ECCV, pp 316–329
Makadia A, Pavlovic V, Kumar S (2010) Baselines for image annotation. Int J Comput Vis 90(1):88–105
Article Google Scholar
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:13013781
Montazer G, Giveki D (2017) Scene classification using multi-resolution waholb features and neural network classifier. Neural Process Lett 46(2):681–704
Article Google Scholar
Moran S, Lanvrenko V (2014) Sparse kernel learning for image annotation. In: ACM ICMR, p 113
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV 42(3):145–175
Article MATH Google Scholar
Peng X, Zou C, Qiao Y, Peng Q (2010) Action recognition with stacked fisher vectors. In: ECCV, pp 581–595
Perronnin F, Sanchez J, Mensink T (2010) Improving the fisher kernel for large scale image classification. In: ECCV, pp 143–156
Song Y, Zhuang Z, Li H, Zhao Q, Li J, Lee W, Giles CL (2008) Real-time automatic tag recommendation. In: ACM SIGIR, pp 515–522
Thomas D, Andreas K, Joel W (2014) Parallelizing exploration-exploitation tradeoffs in gaussian process bandit optimization. J Mach Learn Res 15(1):3873–3923
MathSciNet MATH Google Scholar
Thorsten J (2006) Training linear svms in linear time. In: KDD, pp 217–226
Venkatesh N, Subhransu M, Manmatha R (2015) Automatic image annotation using deep learning representations. In: ACM ICMR, pp 603–606
Verma Y, Jawahar C (2012) Image annotation using metric learning in semantic neighbourhoods. In: ECCV, pp 836–849
Verma Y, Jawahar C (2013) Exploring svm for image annotation in presence of confusing labels. In: British Machine Vision Conference, pp 1–11
Wang J, Yang Y, Mao J, Huang Z, Huang C, Xu W (2016) Cnn-rnn: A unified framework for multi-label image classification. In: CVPR, pp 2285–2294
Wang L, Liu L, Khan L (2004) Automatic image annotation and retrieval ussing subspace clustering algorithm. In: ACM International Workshop Multimedia Databases, pp 100–108
Weston J, Bengio S, Usunier N (2011) Wsabie: Scaling up to large vocabulary image annotation. In: IJCAI, pp 2764–2770
Wu F, Jing X, Yue D (2017) Multi-view discriminant dictionary learning via learning view-specific and shared structured dictionaries for image classification. Neural Process Lett 45(2):649–666
Article Google Scholar
Yan X, Su XG (2009) Linear regression analysis: Theory and computing. World Scientfic Publishing Co, Inc, River Edge
Book MATH Google Scholar
Yan Y, Nie F, Li W, Gao C, Yang Y, Xu D (2016) Image classification by cross-media active learning with privileged information. IEEE Trans Multimedia 18(12):2494–2502
Article Google Scholar
Yang C, Dong M, Hua J (2007) Region-based image annotation using asymmetrical support vector machine-based multiple-instance learning. In: CVPR, pp 2057–2063
Yang Y, Xu D, Nie F, Yan S, Zhuang Y (2010) Image clustering using local discriminant models and global integration. IEEE Trans Image Process 19(10):2761–2773
Article MathSciNet MATH Google Scholar
Yang Y, Nie F, Xu D, Luo J, Zhuang Y, Pan Y (2012) A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Trans Pattern Anal Mach Intell 34(4):723–742
Article Google Scholar
Yun H, Raman P, Vishwanathan S (2014) Ranking via robust binary classification. In: NIPS, pp 2582–2590
Zhang S, Huang J, Huang Y (2010) Automatic image annotation using group sparsity. In: CVPR, pp 3312–3319
Zhu L, Xu Z, Yang Y, Hauptmann AG (2017) Uncovering the temporal context for video question answering. Int J Comput Vis 124(3):409–421
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work is supported by the Natural Science Foundation of China (No. 61572162) and the Zhejiang Provincial Key Science and Technology Project Foundation (No. 2018C01012).

Author information

Authors and Affiliations

School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
Weifeng Zhang, Hua Hu & Haiyang Hu
Zhejiang Future Technology Institute, Jiaxing, China
Weifeng Zhang
Science and Technology on Communication Information Security Control Laboratory, Jiangnan Electronic Communication Institute, Jiaxing, China
Weifeng Zhang

Authors

Weifeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hua Hu
View author publications
You can also search for this author in PubMed Google Scholar
Haiyang Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Hua Hu or Haiyang Hu.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, W., Hu, H. & Hu, H. Neural ranking for automatic image annotation. Multimed Tools Appl 77, 22385–22406 (2018). https://doi.org/10.1007/s11042-018-5973-x

Download citation

Received: 22 August 2017
Revised: 28 March 2018
Accepted: 04 April 2018
Published: 25 April 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s11042-018-5973-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Neural ranking for automatic image annotation

Abstract

Access this article

Similar content being viewed by others

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Learning with Noisy Correspondence

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interests

Ethical approval

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Neural ranking for automatic image annotation

Abstract

Access this article

Similar content being viewed by others

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Learning with Noisy Correspondence

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interests

Ethical approval

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation