Abstract
The automatic annotation of images presents a particularly complex problem for machine learning researchers. In this work we experiment with semantic models and multi-class learning for the automatic annotation of query images. We represent the images using scale invariant transformation descriptors in order to account for similar objects appearing at slightly different scales and transformations. The resulting descriptors are utilised as visual terms for each image. We first aim to annotate query images by retrieving images that are similar to the query image. This approach uses the analogy that similar images would be annotated similarly as well. We then propose an image annotation method that learns a direct mapping from image descriptors to keywords. We compare the semantic based methods of Latent Semantic Indexing and Kernel Canonical Correlation Analysis (KCCA), as well as using a recently proposed vector label based learning method known as Maximum Margin Robot.
The authors would like to acknowledge the financial support of the European Community IST Programme; PASCAL Network of Excellence grant no. IST-2002-506778.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Barnard, K., Duygulu, P., Forsyth, D., de Fretias, N., Blei, D.M., Jordan, M.I.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)
Blei, D., Jordan, M.: Modeling annotated data. In: Proc. of the 26th Intl. Association for Computing Machinery Special Interest Group Information Retrieval Conference (ACM SIGIR) (2003)
Farquhar, J.D.R., Hardoon, D.R., Meng, H., Shawe-Taylor, J., Szedmak, S.: Two view learning: SVM-2K, theory and practice. In: Advances of Neural Information Processing Systems 19 (2005)
Fyfe, C., Lai, P.L.: Kernel and nonlinear canonical correlation analysis. International Journal of Neural Systems (2001)
Hardoon, D.R.: Semantic Models for Machine Learning. PhD thesis, University of Southampton (2006)
Hardoon, D.R., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Computation 16, 2639–2664 (2004)
Hare, J.S., Lewis, P.H.: On Image Retrieval Using Salient Regions with Vector-Spaces and Latent Semantics. In: Leow, W.-K., Lew, M., Chua, T.-S., Ma, W.-Y., Chaisorn, L., Bakker, E.M. (eds.) CIVR 2005. LNCS, vol. 3568, pp. 540–549. Springer, Heidelberg (2005)
Hare, J.S., Lewis, P.H.: Saliency-based models of image content and their application to auto-annotation by semantic propagation. In: Proceedings of Multimedia and the Semantic Web / European Semantic Web Conference (2005)
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the 7th IEEE International Conference on Computer vision, Kerkyra, Greece, pp. 1150–1157 (1999)
Mikolajczyk, K., Schmid, C.: Indexing based on scale invariant interest points. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Hawaii, USA, pp. 525–531 (2001)
Mikolajczyk, K., Schmid, C.: An affine invariant interest point detector. In: Proceedings of the 2002 European Conference on Computer vision, Copenhagen, Denmark, pp. 128–142 (2002)
Mikolajczyk, K., Schmid, C.: Indexing based on scale invariant interest points. In: International Conference on Computer Vision and Pattern Recognition, pp. 257–263 (2003)
Monay, F., Gatica-Perez, D.: On image auto-annotation with latent space models. In: MULTIMEDIA 2003: Proceedings of the eleventh ACM international conference on Multimedia, ACM Press, New York (2003)
Pan, J.-Y., Yang, H.-J., Faloutsos, C., Duygulu, P.: Gcap: Graph-based automatic image captioning. In: Proc. of the 4th International Workshop on Multimedia Data and Document Engineering (MDDE 2004), in conjunction with Computer Vision Pattern Recognition Conference (CVPR 2004) (2004)
Rousu, J., Saunders, C.J., Szedmak, S., Shawe-Taylor, J.: Learning hierarchical multi-category text classification models. In: ICML (2005)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, Berlin (1983)
Sebe, N., Tian, Q., Loupias, E., Lew, M., Huang, T.: Evaluation of salient point techniques. Image and Vision Computing 21, 1087–1095 (2003)
Xing, E.P., Yan, R., Hauptmann, A.G.: Mining associated text and images using dual-wing harmoniums. In: Uncertainty in Artificial Intelligence 2005 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hardoon, D.R., Saunders, C., Szedmak, S., Shawe-Taylor, J. (2006). A Correlation Approach for Automatic Image Annotation. In: Li, X., Zaïane, O.R., Li, Z. (eds) Advanced Data Mining and Applications. ADMA 2006. Lecture Notes in Computer Science(), vol 4093. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11811305_75
Download citation
DOI: https://doi.org/10.1007/11811305_75
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37025-3
Online ISBN: 978-3-540-37026-0
eBook Packages: Computer ScienceComputer Science (R0)