Skip to main content
Log in

Image auto-annotation via tag-dependent random search over range-constrained visual neighbours

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The quantity setting of visual neighbours can be critical for the performance of many previously proposed visual-neighbour-based (VNB) image auto-annotation methods. And in those methods, each candidate tag of a to-be-annotated image would be better to have its own trustworthy part of visual neighbours for score prediction. Hence in this paper we propose to use a constrained range rather than an identical and fixed number of visual neighbours for VNB methods to allow more flexible choices of neighbours, and then put forward a novel tag-dependent random search process to estimate the tag-dependent trust degrees of visual neighbours for each candidate tag. We further propose an effective image auto-annotation method termed TagSearcher based on a widely-used conditional probability model for auto-annotation, considering image-dependent weights of visual neighbours, tag-dependent trust degrees of visual neighbours and votes for a candidate tag from visual neighbours. Extensive experiments conducted on both a benchmark dataset and real-world web images present that the proposed TagSearcher can yield inspiring annotation performance and also reduce the performance sensitivity to the quantity setting of visual neighbours.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. See: https://www.facebook.com/

  2. See: http://www.flickr.com/

  3. The 30 non-abstract concepts are: aircraft, ball, beach, bike, bird, book, bridge, car, chair, child, clock, countryside, dog, door, fire, fish, flower, house, kite, lamp, mountain, mushroom, pen, rabbit, river, sky, sun, tower, train, tree.

  4. The features include: Color Correlogram, Color Layout, CEDD, Edge Histogram, FCTH, HSV Color Histogram, JCD, Jpeg Coefficient Histogram, RGB Color Histogram, Scalable Color, SURF with bag-of-words model.

References

  1. A focus on efficiency - a whitepaper from facebook, ericsson and qualcomm (2013)

  2. Carneiro G, Chan AB, Moreno PJ, Vasconcelos N (2007) Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans Pattern Anal Mach Intell 29(3):394–410

    Article  Google Scholar 

  3. Chang E, Goh K, Sychay G, Wu G (2003) Cbsa: content-based soft annotation for multimodal image retrieval using bayes point machines. IEEE Trans Circ Syst Video Technol 13(1):26–38

    Article  Google Scholar 

  4. Chen X, Hu X, Zhou Z, Lu C, Rosen G, He T, Park EKA (2010) A probabilistic topic-connection model for automatic image annotation. In: Proceedings of the 19th ACM international conference on information and knowledge management

  5. Duygulu P, Barnard K, de Freitas JFG, Forsyth DA (2002) Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: Proceedings of the 7th European conference on computer vision

  6. Feng SL, Manmatha R, Lavrenko V (2004) Multiple bernoulli relevance models for image and video annotation. In: Proceedings of the IEEE conference on computer vision and pattern recognition

  7. Guillaumin M, Mensink T, Verbeek J, Schmid C (2009) Tagprop: discriminative metric learning in nearest neighbor models for image auto-annotation. In: Proceedings of the IEEE international conference on computer vision

  8. Hauptmann AG (2005) Lessons for the future from a decade of informedia video analysis research. In: Proceedings of the 4th international conference on image and video retrieval

  9. Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of the 26th international ACM SIGIR conference on research and development in informaion retrieval

  10. Jiang JJ, Conrath DW (1997) Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of the international conference on Research in Computational Linguistics

  11. Ke X, Li S, Cao D (2012) A two-level model for automatic image annotation. Multimedia Tools Appl 61(1):195–212

    Article  Google Scholar 

  12. Lavrenko V, Manmatha R, Jeon J (2003) A model for learning the semantics of pictures. In: advances in neural information processing systems

  13. Li X, Chen L, Zhang L, Lin F, Ma W (2006) Image annotation by large-scale content-based image retrieval. In: Proceedings of the 14th annual ACM international conference on multimedia

  14. Li Z, Liu J, Zhu X, Liu T, Lu H (2010) Image annotation using multi-correlation probabilistic matrix factorization. In: Proceedings of the international conference on multimedia

  15. Li X, Snoek CGM, Worring M (2009) Annotating images by harnessing worldwide user-tagged photos. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing

  16. Li J, Wang JZ (2003) Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans Pattern Anal Mach Intell 25(9):1075–1088

    Article  Google Scholar 

  17. Liu D, Hua X, Yang L, Wang M, Zhang H (2009) Tag ranking. In: Proceedings of the 18th international conference on world wide web

  18. Liu J, Li M, Liu Q, Lu H, Ma S (2009) Image annotation via graph learning. Pattern Recognit 42(2):218–228

    Article  MATH  Google Scholar 

  19. Liu J, Li M, Ma W, Liu Q, Lu H (2006) An adaptive graph model for automatic image annotation. In: Proceedings of the 8th ACM international workshop on multimedia information retrieval

  20. Liu J, Wang B, Li M, Li Z, Ma W, Lu H, Ma S (2007) Dual cross-media relevance model for image annotation. In: Proceedings of the 15th international conference on multimedia

  21. Lowe DG (1999) Object recognition from local scale-invariant features. In: The proceedings of the 7th IEEE international conference on computer vision

  22. Lux M, Chatzichristofis SA (2008) Lire: lucene image retrieval: an extensible java cbir library. In: Proceedings of the 16th ACM international conference on multimedia

  23. Makadia A, Pavlovic V, Kumar S (2008) A new baseline for image annotation. In: Proceedings of the 10th european conference on computer vision

  24. Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41

    Article  Google Scholar 

  25. Salembier P, Sikora T (2002) Introduction to MPEG-7: multimedia content description interface. Wiley

  26. Sigurbjörnsson B, Zwol RV (2008) Flickr tag recommendation based on collective knowledge. In: Proceedings of the 17th international conference on World Wide Web

  27. Wang H, Huang H, Ding C (2009) Image annotation using multi-label correlated green’s function. In: IEEE 12th international conference on computer vision

  28. Wang C, Jing F, Zhang L, Zhang H (2006) Image annotation refinement using random walk with restarts. In: Proceedings of the 14th annual ACM international conference on multimedia

  29. Wang C, Yan S, Zhang L, Zhang H (2009) Multi-label sparse coding for automatic image annotation. In: Proceedings of the IEEE conference on computer vision and pattern recognition

  30. Wang F, Zhang C (2006) Label propagation through linear neighborhoods. In: Proceedings of the 23rd international conference on machine learning

  31. Wang F, Zhang C (2008) Label propagation through linear neighborhoods. IEEE Trans Knowl Data Eng 20:55–67

    Article  Google Scholar 

  32. Wang X, Zhang L, Li X, Ma W (2008) Annotating images by mining image search results. IEEE Trans Pattern Anal Mach Intell 30(11)

  33. Xu X, Jiang Y, Peng L, Xue X, Zhou Z (2011) Ensemble approach based on conditional random field for multi-label image and video annotation. In: Proceedings of the 19th ACM international conference on multimedia

  34. Yang Y, Wu F, Nie F, Shen HT, Zhuang Y, Hauptmann AG (2012) Web and personal image annotation by mining label correlation with relaxed visual graph embedding. IEEE Trans Image Process

  35. Zhang S, Huang J, Huang Y, Yu Y, Li H, Metaxas DN (2010) Automatic image annotation using group sparsity. In: Proceedings of the IEEE conference on computer vision and pattern recognition

Download references

Acknowledgments

This research was supported by the National Basic Research Project of China (Grant No. 2011CB707000) and the National Natural Science Foundation of China(Grant No. 61271394, 61005045).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zijia Lin.

Appendix

Appendix

1.1 A1. Convergence proof

Note that in formula (15), each column of the matrix \(\mathbf {S}^{\left (n-h\right )^{T}}\) denoting the successive relations between vertices is L1 normalized to sum up to 1, and the entries of any \(\mathbf {S}^{\left (n-h\right )^{T}}\) or F nh are all between 0 and 1. For any δ lying in (0, 1), there always exists γ < 1 subject to 1 − δ < γ, and thus we can derive that:

$$\begin{array}{l} \sum\limits_{j}\left(\prod\limits_{h=1}^{n-1}\left(\left(1-\delta\right)\mathbf{S}^{\left(n-h\right)^{T}}\mathbf{F}^{\left(n-h\right)}\right)\right)_{ji} \\ = \sum\limits_{j}\sum\limits_{k}\left(\left(1-\delta\right)\mathbf{S}^{\left(n-1\right)^{T}}\mathbf{F}^{\left(n-1\right)}\right)_{jk}\left(\prod\limits_{h=2}^{n-1}\left(\left(1-\delta\right)\mathbf{S}^{\left(n-h\right)^{T}}\mathbf{F}^{\left(n-h\right)}\right)\right)_{ki} \\ = \sum\limits_{k}\left(\prod\limits_{h=2}^{n-1}\left(\left(1-\delta\right)\mathbf{S}^{\left(n-h\right)^{T}}\mathbf{F}^{\left(n-h\right)}\right)\right)_{ki}\left(1-\delta\right)\sum\limits_{j}\left(\mathbf{S}^{\left(n-1\right)^{T}}\mathbf{F}^{\left(n-1\right)}\right)_{jk} \\ \leqslant \left(1-\delta\right)\sum\limits_{k}\left(\prod\limits_{h=2}^{n-1}\left(\left(1-\delta\right)\mathbf{S}^{\left(n-h\right)^{T}}\mathbf{F}^{\left(n-h\right)}\right)\right)_{ki}\sum\limits_{j}\left(\mathbf{S}^{\left(n-1\right)^{T}}\right)_{jk} \\ = \left(1-\delta\right)\sum\limits_{k}\left(\prod\limits_{h=2}^{n-1}\left(\left(1-\delta\right)\mathbf{S}^{\left(n-h\right)^{T}}\mathbf{F}^{\left(n-h\right)}\right)\right)_{ki} \\ \leqslant \gamma\sum\limits_{k}\left(\prod\limits_{h=2}^{n-1}\left(\left(1-\delta\right)\mathbf{S}^{\left(n-h\right)^{T}}\mathbf{F}^{\left(n-h\right)}\right)\right)_{ki} \\ \leqslant \gamma\left(\gamma\sum\limits_{k}\left(\prod\limits_{h=3}^{n-1}\left(\left(1-\delta\right)\mathbf{S}^{\left(n-h\right)^{T}}\mathbf{F}^{\left(n-h\right)}\right)\right)_{ki}\right) \\ \leqslant \ldots \\ \leqslant \gamma^{n-1} \end{array} $$

Therefore, the sum of each column of \(\prod _{h=1}^{n-1}\left (\left (1-\delta \right )\mathbf {S}^{\left (n-h\right )^{T}}\mathbf {F}^{\left (n-h\right )}\right )\) will tend to be zero as n goes to positive infinity. Since (1 − δ), \(\mathbf {S}^{\left (n-h\right )^{T}}\) and F nh are all non-negative, \(\prod _{h=1}^{n-1}\left (\left (1-\delta \right )\mathbf {S}^{\left (n-h\right )^{T}}\mathbf {F}^{\left (n-h\right )}\right )\) will also be non-negative and thus \(\left (\prod _{h=1}^{n-1}\left (\left (1-\delta \right ) \mathbf {S}^{\left (n-h\right )^{T}}\mathbf {F}^{\left (n-h\right )}\right )\right )\mathbf {p}^{\left (1\right )}\) will converge to a zero vector. Then formula (15) can be simplified as follows.

$$\mathbf{p}_{\pi} = \lim\limits_{n\rightarrow\infty}\delta\left(1+\sum\limits_{k=1}^{n-1}\left(1-\delta\right)^{k}\prod\limits_{h=1}^{k}\left(\mathbf{S}^{\left(n-h\right)^{T}}\mathbf{F}^{\left(n-h\right)}\right)\right)\mathbf{p}^{\left(0\right)} $$

Since we only focus on the ratios between entries of p π rather than their values, the formula above can be further simplified as follows.

$$\mathbf{p}_{\pi} \sim \lim\limits_{n\rightarrow\infty}\left(1+\sum\limits_{k=1}^{n-1}\left(1-\delta\right)^{k}\prod\limits_{h=1}^{k}\left(\mathbf{S}^{\left(n-h\right)^{T}}\mathbf{F}^{\left(n-h\right)}\right)\right)\mathbf{p}^{\left(0\right)} $$

And it can be seen that entries of p π keep increasing as n increases. Since entries of any \(\mathbf {S}^{\left (n-h\right )^{T}}\), F nhand p (0) are all non-negative and lie in [0, 1], we can further derive that:

$$\begin{array}{l} \lim\limits_{n\rightarrow\infty}\left(1+\sum\limits_{k=1}^{n-1}\left(1-\delta\right)^{k}\prod\limits_{h=1}^{k}\left(\mathbf{S}^{\left(n-h\right)^{T}}\mathbf{F}^{\left(n-h\right)}\right)\right)\mathbf{p}^{\left(0\right)} \\ = \lim\limits_{n\rightarrow\infty}\mathbf{p}^{\left(0\right)}+\sum\limits_{k=1}^{n-1}\left(1-\delta\right)^{k}\prod\limits_{h=1}^{k}\left(\mathbf{S}^{\left(n-h\right)^{T}}\mathbf{F}^{\left(n-h\right)}\right)\mathbf{p}^{\left(0\right)} \\ \preceq \lim\limits_{n\rightarrow\infty}\left(\mathbf{p}^{\left(0\right)}+\sum\limits_{k=1}^{n-1}\left(1-\delta\right)^{k}\mathbb{I}\right) \end{array} $$

where 𝕀 is a column vector with entries all being 1, and “ ≼ ” means entry-wise “ ≤ ”. Since (1 − δ) lies in (0, 1), \(\sum _{k=1}^{n-1}\left (1-\delta \right )^{k}\) will converge as n goes to positive infinity. Then p π is a monotonically increasing function w.r.t the step n and has an upper bound. Therefore, it will converge as n goes to positive infinity, meaning that convergence of the proposed tag-dependent random search process is guaranteed.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, Z., Ding, G. & Hu, M. Image auto-annotation via tag-dependent random search over range-constrained visual neighbours. Multimed Tools Appl 74, 4091–4116 (2015). https://doi.org/10.1007/s11042-013-1811-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-013-1811-3

Keywords

Navigation