Image auto-annotation via tag-dependent random search over range-constrained visual neighbours

Lin, Zijia; Ding, Guiguang; Hu, Mingqing

doi:10.1007/s11042-013-1811-3

Image auto-annotation via tag-dependent random search over range-constrained visual neighbours

Published: 03 January 2014

Volume 74, pages 4091–4116, (2015)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Zijia Lin¹,
Guiguang Ding² &
Mingqing Hu³

258 Accesses
5 Citations
Explore all metrics

Abstract

The quantity setting of visual neighbours can be critical for the performance of many previously proposed visual-neighbour-based (VNB) image auto-annotation methods. And in those methods, each candidate tag of a to-be-annotated image would be better to have its own trustworthy part of visual neighbours for score prediction. Hence in this paper we propose to use a constrained range rather than an identical and fixed number of visual neighbours for VNB methods to allow more flexible choices of neighbours, and then put forward a novel tag-dependent random search process to estimate the tag-dependent trust degrees of visual neighbours for each candidate tag. We further propose an effective image auto-annotation method termed TagSearcher based on a widely-used conditional probability model for auto-annotation, considering image-dependent weights of visual neighbours, tag-dependent trust degrees of visual neighbours and votes for a candidate tag from visual neighbours. Extensive experiments conducted on both a benchmark dataset and real-world web images present that the proposed TagSearcher can yield inspiring annotation performance and also reduce the performance sensitivity to the quantity setting of visual neighbours.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive Tag Selection for Image Annotation

Adaptive image annotation: refining labels according to contents and relations

Article 30 January 2022

PERIA-Framework: A Prediction Extension Revision Image Annotation Framework

Notes

See: https://www.facebook.com/
See: http://www.flickr.com/
The 30 non-abstract concepts are: aircraft, ball, beach, bike, bird, book, bridge, car, chair, child, clock, countryside, dog, door, fire, fish, flower, house, kite, lamp, mountain, mushroom, pen, rabbit, river, sky, sun, tower, train, tree.
The features include: Color Correlogram, Color Layout, CEDD, Edge Histogram, FCTH, HSV Color Histogram, JCD, Jpeg Coefficient Histogram, RGB Color Histogram, Scalable Color, SURF with bag-of-words model.

References

A focus on efficiency - a whitepaper from facebook, ericsson and qualcomm (2013)
Carneiro G, Chan AB, Moreno PJ, Vasconcelos N (2007) Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans Pattern Anal Mach Intell 29(3):394–410
Article Google Scholar
Chang E, Goh K, Sychay G, Wu G (2003) Cbsa: content-based soft annotation for multimodal image retrieval using bayes point machines. IEEE Trans Circ Syst Video Technol 13(1):26–38
Article Google Scholar
Chen X, Hu X, Zhou Z, Lu C, Rosen G, He T, Park EKA (2010) A probabilistic topic-connection model for automatic image annotation. In: Proceedings of the 19th ACM international conference on information and knowledge management
Duygulu P, Barnard K, de Freitas JFG, Forsyth DA (2002) Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: Proceedings of the 7th European conference on computer vision
Feng SL, Manmatha R, Lavrenko V (2004) Multiple bernoulli relevance models for image and video annotation. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Guillaumin M, Mensink T, Verbeek J, Schmid C (2009) Tagprop: discriminative metric learning in nearest neighbor models for image auto-annotation. In: Proceedings of the IEEE international conference on computer vision
Hauptmann AG (2005) Lessons for the future from a decade of informedia video analysis research. In: Proceedings of the 4th international conference on image and video retrieval
Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of the 26th international ACM SIGIR conference on research and development in informaion retrieval
Jiang JJ, Conrath DW (1997) Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of the international conference on Research in Computational Linguistics
Ke X, Li S, Cao D (2012) A two-level model for automatic image annotation. Multimedia Tools Appl 61(1):195–212
Article Google Scholar
Lavrenko V, Manmatha R, Jeon J (2003) A model for learning the semantics of pictures. In: advances in neural information processing systems
Li X, Chen L, Zhang L, Lin F, Ma W (2006) Image annotation by large-scale content-based image retrieval. In: Proceedings of the 14th annual ACM international conference on multimedia
Li Z, Liu J, Zhu X, Liu T, Lu H (2010) Image annotation using multi-correlation probabilistic matrix factorization. In: Proceedings of the international conference on multimedia
Li X, Snoek CGM, Worring M (2009) Annotating images by harnessing worldwide user-tagged photos. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing
Li J, Wang JZ (2003) Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans Pattern Anal Mach Intell 25(9):1075–1088
Article Google Scholar
Liu D, Hua X, Yang L, Wang M, Zhang H (2009) Tag ranking. In: Proceedings of the 18th international conference on world wide web
Liu J, Li M, Liu Q, Lu H, Ma S (2009) Image annotation via graph learning. Pattern Recognit 42(2):218–228
Article MATH Google Scholar
Liu J, Li M, Ma W, Liu Q, Lu H (2006) An adaptive graph model for automatic image annotation. In: Proceedings of the 8th ACM international workshop on multimedia information retrieval
Liu J, Wang B, Li M, Li Z, Ma W, Lu H, Ma S (2007) Dual cross-media relevance model for image annotation. In: Proceedings of the 15th international conference on multimedia
Lowe DG (1999) Object recognition from local scale-invariant features. In: The proceedings of the 7th IEEE international conference on computer vision
Lux M, Chatzichristofis SA (2008) Lire: lucene image retrieval: an extensible java cbir library. In: Proceedings of the 16th ACM international conference on multimedia
Makadia A, Pavlovic V, Kumar S (2008) A new baseline for image annotation. In: Proceedings of the 10th european conference on computer vision
Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41
Article Google Scholar
Salembier P, Sikora T (2002) Introduction to MPEG-7: multimedia content description interface. Wiley
Sigurbjörnsson B, Zwol RV (2008) Flickr tag recommendation based on collective knowledge. In: Proceedings of the 17th international conference on World Wide Web
Wang H, Huang H, Ding C (2009) Image annotation using multi-label correlated green’s function. In: IEEE 12th international conference on computer vision
Wang C, Jing F, Zhang L, Zhang H (2006) Image annotation refinement using random walk with restarts. In: Proceedings of the 14th annual ACM international conference on multimedia
Wang C, Yan S, Zhang L, Zhang H (2009) Multi-label sparse coding for automatic image annotation. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Wang F, Zhang C (2006) Label propagation through linear neighborhoods. In: Proceedings of the 23rd international conference on machine learning
Wang F, Zhang C (2008) Label propagation through linear neighborhoods. IEEE Trans Knowl Data Eng 20:55–67
Article Google Scholar
Wang X, Zhang L, Li X, Ma W (2008) Annotating images by mining image search results. IEEE Trans Pattern Anal Mach Intell 30(11)
Xu X, Jiang Y, Peng L, Xue X, Zhou Z (2011) Ensemble approach based on conditional random field for multi-label image and video annotation. In: Proceedings of the 19th ACM international conference on multimedia
Yang Y, Wu F, Nie F, Shen HT, Zhuang Y, Hauptmann AG (2012) Web and personal image annotation by mining label correlation with relaxed visual graph embedding. IEEE Trans Image Process
Zhang S, Huang J, Huang Y, Yu Y, Li H, Metaxas DN (2010) Automatic image annotation using group sparsity. In: Proceedings of the IEEE conference on computer vision and pattern recognition

Download references

Acknowledgments

This research was supported by the National Basic Research Project of China (Grant No. 2011CB707000) and the National Natural Science Foundation of China(Grant No. 61271394, 61005045).

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
Zijia Lin
School of Software, Tsinghua University, Beijing, 100084, China
Guiguang Ding
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
Mingqing Hu

Authors

Zijia Lin
View author publications
You can also search for this author in PubMed Google Scholar
Guiguang Ding
View author publications
You can also search for this author in PubMed Google Scholar
Mingqing Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zijia Lin.

Appendix

1.1 A1. Convergence proof

Note that in formula (15), each column of the matrix $\mathbf {S}^{\left (n-h\right )^{T}}$ denoting the successive relations between vertices is L1 normalized to sum up to 1, and the entries of any $\mathbf {S}^{\left (n-h\right )^{T}}$ or F ^{n − h} are all between 0 and 1. For any δ lying in (0, 1), there always exists γ < 1 subject to 1 − δ < γ, and thus we can derive that:

$$\begin{array}{l} \sum\limits_{j}\left(\prod\limits_{h=1}^{n-1}\left(\left(1-\delta\right)\mathbf{S}^{\left(n-h\right)^{T}}\mathbf{F}^{\left(n-h\right)}\right)\right)_{ji} \\ = \sum\limits_{j}\sum\limits_{k}\left(\left(1-\delta\right)\mathbf{S}^{\left(n-1\right)^{T}}\mathbf{F}^{\left(n-1\right)}\right)_{jk}\left(\prod\limits_{h=2}^{n-1}\left(\left(1-\delta\right)\mathbf{S}^{\left(n-h\right)^{T}}\mathbf{F}^{\left(n-h\right)}\right)\right)_{ki} \\ = \sum\limits_{k}\left(\prod\limits_{h=2}^{n-1}\left(\left(1-\delta\right)\mathbf{S}^{\left(n-h\right)^{T}}\mathbf{F}^{\left(n-h\right)}\right)\right)_{ki}\left(1-\delta\right)\sum\limits_{j}\left(\mathbf{S}^{\left(n-1\right)^{T}}\mathbf{F}^{\left(n-1\right)}\right)_{jk} \\ \leqslant \left(1-\delta\right)\sum\limits_{k}\left(\prod\limits_{h=2}^{n-1}\left(\left(1-\delta\right)\mathbf{S}^{\left(n-h\right)^{T}}\mathbf{F}^{\left(n-h\right)}\right)\right)_{ki}\sum\limits_{j}\left(\mathbf{S}^{\left(n-1\right)^{T}}\right)_{jk} \\ = \left(1-\delta\right)\sum\limits_{k}\left(\prod\limits_{h=2}^{n-1}\left(\left(1-\delta\right)\mathbf{S}^{\left(n-h\right)^{T}}\mathbf{F}^{\left(n-h\right)}\right)\right)_{ki} \\ \leqslant \gamma\sum\limits_{k}\left(\prod\limits_{h=2}^{n-1}\left(\left(1-\delta\right)\mathbf{S}^{\left(n-h\right)^{T}}\mathbf{F}^{\left(n-h\right)}\right)\right)_{ki} \\ \leqslant \gamma\left(\gamma\sum\limits_{k}\left(\prod\limits_{h=3}^{n-1}\left(\left(1-\delta\right)\mathbf{S}^{\left(n-h\right)^{T}}\mathbf{F}^{\left(n-h\right)}\right)\right)_{ki}\right) \\ \leqslant \ldots \\ \leqslant \gamma^{n-1} \end{array} $$

Therefore, the sum of each column of $\prod _{h=1}^{n-1}\left (\left (1-\delta \right )\mathbf {S}^{\left (n-h\right )^{T}}\mathbf {F}^{\left (n-h\right )}\right )$ will tend to be zero as n goes to positive infinity. Since (1 − δ), $\mathbf {S}^{\left (n-h\right )^{T}}$ and F ^{n − h} are all non-negative, $\prod _{h=1}^{n-1}\left (\left (1-\delta \right )\mathbf {S}^{\left (n-h\right )^{T}}\mathbf {F}^{\left (n-h\right )}\right )$ will also be non-negative and thus $\left (\prod _{h=1}^{n-1}\left (\left (1-\delta \right ) \mathbf {S}^{\left (n-h\right )^{T}}\mathbf {F}^{\left (n-h\right )}\right )\right )\mathbf {p}^{\left (1\right )}$ will converge to a zero vector. Then formula (15) can be simplified as follows.

$$\mathbf{p}_{\pi} = \lim\limits_{n\rightarrow\infty}\delta\left(1+\sum\limits_{k=1}^{n-1}\left(1-\delta\right)^{k}\prod\limits_{h=1}^{k}\left(\mathbf{S}^{\left(n-h\right)^{T}}\mathbf{F}^{\left(n-h\right)}\right)\right)\mathbf{p}^{\left(0\right)} $$

Since we only focus on the ratios between entries of p _π rather than their values, the formula above can be further simplified as follows.

$$\mathbf{p}_{\pi} \sim \lim\limits_{n\rightarrow\infty}\left(1+\sum\limits_{k=1}^{n-1}\left(1-\delta\right)^{k}\prod\limits_{h=1}^{k}\left(\mathbf{S}^{\left(n-h\right)^{T}}\mathbf{F}^{\left(n-h\right)}\right)\right)\mathbf{p}^{\left(0\right)} $$

And it can be seen that entries of p _π keep increasing as n increases. Since entries of any $\mathbf {S}^{\left (n-h\right )^{T}}$, F ^{n − h}and p ⁽⁰⁾ are all non-negative and lie in [0, 1], we can further derive that:

$$\begin{array}{l} \lim\limits_{n\rightarrow\infty}\left(1+\sum\limits_{k=1}^{n-1}\left(1-\delta\right)^{k}\prod\limits_{h=1}^{k}\left(\mathbf{S}^{\left(n-h\right)^{T}}\mathbf{F}^{\left(n-h\right)}\right)\right)\mathbf{p}^{\left(0\right)} \\ = \lim\limits_{n\rightarrow\infty}\mathbf{p}^{\left(0\right)}+\sum\limits_{k=1}^{n-1}\left(1-\delta\right)^{k}\prod\limits_{h=1}^{k}\left(\mathbf{S}^{\left(n-h\right)^{T}}\mathbf{F}^{\left(n-h\right)}\right)\mathbf{p}^{\left(0\right)} \\ \preceq \lim\limits_{n\rightarrow\infty}\left(\mathbf{p}^{\left(0\right)}+\sum\limits_{k=1}^{n-1}\left(1-\delta\right)^{k}\mathbb{I}\right) \end{array} $$

where 𝕀 is a column vector with entries all being 1, and “ ≼ ” means entry-wise “ ≤ ”. Since (1 − δ) lies in (0, 1), $\sum _{k=1}^{n-1}\left (1-\delta \right )^{k}$ will converge as n goes to positive infinity. Then p _π is a monotonically increasing function w.r.t the step n and has an upper bound. Therefore, it will converge as n goes to positive infinity, meaning that convergence of the proposed tag-dependent random search process is guaranteed.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, Z., Ding, G. & Hu, M. Image auto-annotation via tag-dependent random search over range-constrained visual neighbours. Multimed Tools Appl 74, 4091–4116 (2015). https://doi.org/10.1007/s11042-013-1811-3

Download citation

Published: 03 January 2014
Issue Date: June 2015
DOI: https://doi.org/10.1007/s11042-013-1811-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Image auto-annotation via tag-dependent random search over range-constrained visual neighbours

Abstract

Access this article

Similar content being viewed by others

Adaptive Tag Selection for Image Annotation

Adaptive image annotation: refining labels according to contents and relations

PERIA-Framework: A Prediction Extension Revision Image Annotation Framework

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

1.1 A1. Convergence proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Image auto-annotation via tag-dependent random search over range-constrained visual neighbours

Abstract

Access this article

Similar content being viewed by others

Adaptive Tag Selection for Image Annotation

Adaptive image annotation: refining labels according to contents and relations

PERIA-Framework: A Prediction Extension Revision Image Annotation Framework

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 A1. Convergence proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation