Skip to main content
Log in

Automatic Image Annotation Based on Generalized Relevance Models

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

This paper presents a generalized relevance model for automatic image annotation through learning the correlations between images and annotation keywords. Different from previous relevance models that can only propagate keywords from the training images to the test ones, the proposed model can perform extra keyword propagation among the test images. We also give a convergence analysis of the iterative algorithm inspired by the proposed model. Moreover, to estimate the joint probability of observing an image with possible annotation keywords, we define the inter-image relations through proposing a new spatial Markov kernel based on 2D Markov models. The main advantage of our spatial Markov kernel is that the intra-image context can be exploited for automatic image annotation, which is different from the traditional bag-of-words methods. Experiments on two standard image databases demonstrate that the proposed model outperforms the state-of-the-art annotation models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4

Similar content being viewed by others

References

  1. Li, J., & Wang, J. (2003). Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9), 1075–1088.

    Article  Google Scholar 

  2. Gao, Y., Fan, J., Xue, X., & Jain, R. (2006). Automatic image annotation by incorporating feature hierarchy and boosting to scale up SVM classifiers. In Proc. ACM multimedia (pp. 901–910).

  3. Chang, E., Kingshy, G., Sychay, G., & Wu, G. (2003). CBSA: Content-based soft annotation for multimodal image retrieval using Bayes point machines. IEEE Transactions on Circuits and Systems for Video Technology, 13(1), 26–38.

    Article  Google Scholar 

  4. Jeon, J., Lavrenko, V., & Manmatha, R. (2003). Automatic image annotation and retrieval using cross-media relevance models. In Proc. SIGIR (pp. 119–126).

  5. Lavrenko, V., Manmatha, R., & Jeon, J. (2004). A model for learning the semantics of pictures. In Advances in neural information processing systems (Vol. 16, pp. 553–560).

  6. Feng, S., Manmatha, R., & Lavrenko, V. (2004). Multiple Bernoulli relevance models for image and video annotation. In Proc. CVPR (pp. 1002–1009).

  7. Liu, J., Wang, B., Li, M., Li, Z., Ma, W., Lu, H., et al. (2007). Dual cross-media relevance model for image annotation. In Proc. ACM multimedia (pp. 605–614).

  8. Lu, Z., & Ip, H. (2009). Image categorization by learning with context and consistency. In Proc. CVPR (pp. 2719–2726).

  9. Lu, Z., & Ip, H. (2010). Combining context, consistency, and diversity cues for interactive image categorization. IEEE Transactions on Multimedia, 12(3), 194–203.

    Article  Google Scholar 

  10. Li, J., Najmi, A., & Gray, R. (2000). Image classification by a two-dimensional hidden Markov model. IEEE Transactions on Signal Processing, 48(2), 517–533.

    Article  Google Scholar 

  11. Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.

    Article  Google Scholar 

  12. Salzenstein, F., & Collet, C. (2006). Fuzzy Markov random fields versus chains for multispectral image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(11), 1753–1767.

    Article  Google Scholar 

  13. Hofmann, T. (2001). Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 41, 177–196.

    Article  Google Scholar 

  14. Lu, Z., Peng, Y., & Ip, H. (2010). Image categorization via robust pLSA. Pattern Recognition Letters, 31(1), 36–43.

    Article  Google Scholar 

  15. Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. CVPR (pp. 2169–2178).

  16. Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2009). TextonBoost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. International Journal of Computer Vision, 81(1), 2–23.

    Article  Google Scholar 

  17. Duygulu, P., Barnard, K., de Freitas, N., & Forsyth, D. (2002). Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In Proc. ECCV (pp. 97–112).

  18. Liu, J., Li, M., Liu, Q., Lu, H., & Ma, S. (2009). Image annotation via graph learning. Pattern Recognition, 42(2), 218–228.

    Article  MATH  Google Scholar 

  19. Liu, J., Wang, B., Lu, H., & Ma, S. (2008). A graph-based image annotation framework. Pattern Recognition Letters, 29(4), 407–415.

    Article  Google Scholar 

  20. Yu, F., & Ip, H. (2006). Automatic semantic annotation of images using spatial hidden Markov model. In Proc. ICME (pp. 305–308).

  21. Yu, F., & Ip, H. (2008). Semantic content analysis and annotation of histological images. Computers in Biology and Medicine, 38(6), 635–649.

    Article  Google Scholar 

  22. Makadia, A., Pavlovic, V., & Kumar, S. (2008). A new baseline for image annotation. In Proc. ECCV (pp. 316–329).

  23. Golub, G., & Loan, C. V. (Eds.). (1989). Matrix computations. Baltimore: The Johns Hopkins University Press.

    MATH  Google Scholar 

Download references

Acknowledgements

The work described in this paper was supported by a grant from the Research Council of Hong Kong SAR, China (Project No. CityU 114007) and a grant from City University of Hong Kong (Project No. 7008040).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiwu Lu.

Appendix

Appendix

This appendix presents the proof of Theorem 1. First, we can obtain from Eq. 10 that

$$ F_t(W)= (\alpha S)^t F_0(W)+ (1-\alpha)\sum\limits_{i=0}^{t-1}(\alpha S)^i F_0(W). $$
(22)

Since the sum of each row of S is bounded by 1 (i.e. \(\sum_{J\in \mathcal{U}}P(J|I)\leq 1\)), from the Perron–Frobenius theorem [23], we know the spectral radius of S or ρ(S) ≤ 1. Moreover, since 0 < α < 1 (i.e. ρ(αS) < 1), we have

$$ \lim\limits_{t \to \infty}(\alpha S)^t =0,~\lim\limits_{t \to \infty}\sum\limits_{i=0}^{t-1}(\alpha S)^i =(I-\alpha S)^{-1}, $$
(23)

where I is the identity matrix. Hence, it follows from Eq. 22 that the sequence {F t (W)} will converge to

$$ F^*(W)=\lim\limits_{t \to \infty}F_t(W)=(1-\alpha)(I-\alpha S)^{-1}F_0(W), $$
(24)

when t → ∞. That is, we have proven the theorem.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, Z., Ip, H.H.S. Automatic Image Annotation Based on Generalized Relevance Models. J Sign Process Syst 65, 23–33 (2011). https://doi.org/10.1007/s11265-010-0544-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-010-0544-z

Keywords

Navigation