Abstract
The quantity setting of visual neighbours can be critical for the performance of many previously proposed visual-neighbour-based (VNB) image auto-annotation methods. And in those methods, each candidate tag of a to-be-annotated image would be better to have its own trustworthy part of visual neighbours for score prediction. Hence in this paper we propose to use a constrained range rather than an identical and fixed number of visual neighbours for VNB methods to allow more flexible choices of neighbours, and then put forward a novel tag-dependent random search process to estimate the tag-dependent trust degrees of visual neighbours for each candidate tag. We further propose an effective image auto-annotation method termed TagSearcher based on a widely-used conditional probability model for auto-annotation, considering image-dependent weights of visual neighbours, tag-dependent trust degrees of visual neighbours and votes for a candidate tag from visual neighbours. Extensive experiments conducted on both a benchmark dataset and real-world web images present that the proposed TagSearcher can yield inspiring annotation performance and also reduce the performance sensitivity to the quantity setting of visual neighbours.
Similar content being viewed by others
Notes
The 30 non-abstract concepts are: aircraft, ball, beach, bike, bird, book, bridge, car, chair, child, clock, countryside, dog, door, fire, fish, flower, house, kite, lamp, mountain, mushroom, pen, rabbit, river, sky, sun, tower, train, tree.
The features include: Color Correlogram, Color Layout, CEDD, Edge Histogram, FCTH, HSV Color Histogram, JCD, Jpeg Coefficient Histogram, RGB Color Histogram, Scalable Color, SURF with bag-of-words model.
References
A focus on efficiency - a whitepaper from facebook, ericsson and qualcomm (2013)
Carneiro G, Chan AB, Moreno PJ, Vasconcelos N (2007) Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans Pattern Anal Mach Intell 29(3):394–410
Chang E, Goh K, Sychay G, Wu G (2003) Cbsa: content-based soft annotation for multimodal image retrieval using bayes point machines. IEEE Trans Circ Syst Video Technol 13(1):26–38
Chen X, Hu X, Zhou Z, Lu C, Rosen G, He T, Park EKA (2010) A probabilistic topic-connection model for automatic image annotation. In: Proceedings of the 19th ACM international conference on information and knowledge management
Duygulu P, Barnard K, de Freitas JFG, Forsyth DA (2002) Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: Proceedings of the 7th European conference on computer vision
Feng SL, Manmatha R, Lavrenko V (2004) Multiple bernoulli relevance models for image and video annotation. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Guillaumin M, Mensink T, Verbeek J, Schmid C (2009) Tagprop: discriminative metric learning in nearest neighbor models for image auto-annotation. In: Proceedings of the IEEE international conference on computer vision
Hauptmann AG (2005) Lessons for the future from a decade of informedia video analysis research. In: Proceedings of the 4th international conference on image and video retrieval
Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of the 26th international ACM SIGIR conference on research and development in informaion retrieval
Jiang JJ, Conrath DW (1997) Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of the international conference on Research in Computational Linguistics
Ke X, Li S, Cao D (2012) A two-level model for automatic image annotation. Multimedia Tools Appl 61(1):195–212
Lavrenko V, Manmatha R, Jeon J (2003) A model for learning the semantics of pictures. In: advances in neural information processing systems
Li X, Chen L, Zhang L, Lin F, Ma W (2006) Image annotation by large-scale content-based image retrieval. In: Proceedings of the 14th annual ACM international conference on multimedia
Li Z, Liu J, Zhu X, Liu T, Lu H (2010) Image annotation using multi-correlation probabilistic matrix factorization. In: Proceedings of the international conference on multimedia
Li X, Snoek CGM, Worring M (2009) Annotating images by harnessing worldwide user-tagged photos. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing
Li J, Wang JZ (2003) Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans Pattern Anal Mach Intell 25(9):1075–1088
Liu D, Hua X, Yang L, Wang M, Zhang H (2009) Tag ranking. In: Proceedings of the 18th international conference on world wide web
Liu J, Li M, Liu Q, Lu H, Ma S (2009) Image annotation via graph learning. Pattern Recognit 42(2):218–228
Liu J, Li M, Ma W, Liu Q, Lu H (2006) An adaptive graph model for automatic image annotation. In: Proceedings of the 8th ACM international workshop on multimedia information retrieval
Liu J, Wang B, Li M, Li Z, Ma W, Lu H, Ma S (2007) Dual cross-media relevance model for image annotation. In: Proceedings of the 15th international conference on multimedia
Lowe DG (1999) Object recognition from local scale-invariant features. In: The proceedings of the 7th IEEE international conference on computer vision
Lux M, Chatzichristofis SA (2008) Lire: lucene image retrieval: an extensible java cbir library. In: Proceedings of the 16th ACM international conference on multimedia
Makadia A, Pavlovic V, Kumar S (2008) A new baseline for image annotation. In: Proceedings of the 10th european conference on computer vision
Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41
Salembier P, Sikora T (2002) Introduction to MPEG-7: multimedia content description interface. Wiley
Sigurbjörnsson B, Zwol RV (2008) Flickr tag recommendation based on collective knowledge. In: Proceedings of the 17th international conference on World Wide Web
Wang H, Huang H, Ding C (2009) Image annotation using multi-label correlated green’s function. In: IEEE 12th international conference on computer vision
Wang C, Jing F, Zhang L, Zhang H (2006) Image annotation refinement using random walk with restarts. In: Proceedings of the 14th annual ACM international conference on multimedia
Wang C, Yan S, Zhang L, Zhang H (2009) Multi-label sparse coding for automatic image annotation. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Wang F, Zhang C (2006) Label propagation through linear neighborhoods. In: Proceedings of the 23rd international conference on machine learning
Wang F, Zhang C (2008) Label propagation through linear neighborhoods. IEEE Trans Knowl Data Eng 20:55–67
Wang X, Zhang L, Li X, Ma W (2008) Annotating images by mining image search results. IEEE Trans Pattern Anal Mach Intell 30(11)
Xu X, Jiang Y, Peng L, Xue X, Zhou Z (2011) Ensemble approach based on conditional random field for multi-label image and video annotation. In: Proceedings of the 19th ACM international conference on multimedia
Yang Y, Wu F, Nie F, Shen HT, Zhuang Y, Hauptmann AG (2012) Web and personal image annotation by mining label correlation with relaxed visual graph embedding. IEEE Trans Image Process
Zhang S, Huang J, Huang Y, Yu Y, Li H, Metaxas DN (2010) Automatic image annotation using group sparsity. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Acknowledgments
This research was supported by the National Basic Research Project of China (Grant No. 2011CB707000) and the National Natural Science Foundation of China(Grant No. 61271394, 61005045).
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 A1. Convergence proof
Note that in formula (15), each column of the matrix \(\mathbf {S}^{\left (n-h\right )^{T}}\) denoting the successive relations between vertices is L1 normalized to sum up to 1, and the entries of any \(\mathbf {S}^{\left (n-h\right )^{T}}\) or F n − h are all between 0 and 1. For any δ lying in (0, 1), there always exists γ < 1 subject to 1 − δ < γ, and thus we can derive that:
Therefore, the sum of each column of \(\prod _{h=1}^{n-1}\left (\left (1-\delta \right )\mathbf {S}^{\left (n-h\right )^{T}}\mathbf {F}^{\left (n-h\right )}\right )\) will tend to be zero as n goes to positive infinity. Since (1 − δ), \(\mathbf {S}^{\left (n-h\right )^{T}}\) and F n − h are all non-negative, \(\prod _{h=1}^{n-1}\left (\left (1-\delta \right )\mathbf {S}^{\left (n-h\right )^{T}}\mathbf {F}^{\left (n-h\right )}\right )\) will also be non-negative and thus \(\left (\prod _{h=1}^{n-1}\left (\left (1-\delta \right ) \mathbf {S}^{\left (n-h\right )^{T}}\mathbf {F}^{\left (n-h\right )}\right )\right )\mathbf {p}^{\left (1\right )}\) will converge to a zero vector. Then formula (15) can be simplified as follows.
Since we only focus on the ratios between entries of p π rather than their values, the formula above can be further simplified as follows.
And it can be seen that entries of p π keep increasing as n increases. Since entries of any \(\mathbf {S}^{\left (n-h\right )^{T}}\), F n − hand p (0) are all non-negative and lie in [0, 1], we can further derive that:
where 𝕀 is a column vector with entries all being 1, and “ ≼ ” means entry-wise “ ≤ ”. Since (1 − δ) lies in (0, 1), \(\sum _{k=1}^{n-1}\left (1-\delta \right )^{k}\) will converge as n goes to positive infinity. Then p π is a monotonically increasing function w.r.t the step n and has an upper bound. Therefore, it will converge as n goes to positive infinity, meaning that convergence of the proposed tag-dependent random search process is guaranteed.
Rights and permissions
About this article
Cite this article
Lin, Z., Ding, G. & Hu, M. Image auto-annotation via tag-dependent random search over range-constrained visual neighbours. Multimed Tools Appl 74, 4091–4116 (2015). https://doi.org/10.1007/s11042-013-1811-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-013-1811-3