Abstract
Automatic image annotation is an important and challenging task, and becomes increasingly necessary when managing large image collections. This paper describes techniques for automatic image annotation that take advantage of collaboratively annotated image databases, so called visual folksonomies. Our approach applies two techniques based on image analysis: First, classification annotates images with a controlled vocabulary and second tag propagation along visually similar images. The latter propagates user generated, folksonomic annotations and is therefore capable of dealing with an unlimited vocabulary. Experiments with a pool of Flickr images demonstrate the high accuracy and efficiency of the proposed methods in the task of automatic image annotation. Both techniques were applied in the prototypical tag recommender “tagr”.





Similar content being viewed by others
Notes
Flickr gives API-level access to the photos and tags, as long as the pictures are set as public by their owners.
References
Aurnhammer M, Hanappe P, Steels L (2006) Integrating collaborative tagging and emergent semantics for image retrieval. In: Proceedings WWW2006, collaborative web tagging workshop, Southampton, May 2006
Ayache S, Quenot G, Gensel J (2007) Classifier fusion for SVM-based multimedia semantic indexing. In: European conf. on information retrieval, Rome, 2–5 April 2007
Breese JS, Heckerman D, Kadie C (1998) Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the 14th annual conference on uncertainty in artificial intelligence, pp 43–52. citeseer.ist.psu.edu/breese98empirical.html
Chang C, Lin CJ (2008) LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chakrabarti K, Mehrotra S (1999) The hybrid tree: an index structure for high dimensional feature spaces. In: Proceedings of the 15th international conference on data engineering, Washington, DC, 23–26 March 1999
Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24:603–619
Cusano C, Ciocca G, Schettini R (2003) Image annotation using SVM. In: Proceedings of internet imaging IV, SPIE, Santa Clara, 21–22 January 2003
Dalai N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Conf. on computer vision and pattern recognition, vol 1. IEEE, Piscataway, pp 886–893
Datta R, Li J, Wang J (2005) Content-based image retrieval—approaches and trends of the new age. In: MIR ’05: proceedings of the 7th ACM SIGMM international workshop on multimedia information retrieval. ACM, New York
Fellbaum C (Ed) (1998) WordNet: an electronic lexical database. MIT, Cambridge
Forsyth D, Ponce J (2002) Computer vision: a modern approach. Prentice Hall, Englewood Cliffs
Golder SA, Hubermann BA (2006) The structure of collaborative tagging systems. J Inf Sci 32/2:198–208
Hardoon DR, Saunders C, Szedmak S, Shawe-Taylor J (2006) A correlation approach for automatic image annotation. Int Conf Adv Data Mining Appl Springer LNAI 4093:681–692
Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: ACM SIGIR conference on research and development in information retrieval, Toronto, pp 119–126
Kern R, Granitzer M, Pammer V (2008) Extending folksonomies for image tagging. In: 9th international workshop on image analysis for multimedia interactive services WIAMIS 2008, Klagenfurt, 7–9 May 2008
Li X, Chen L, Zhang L, Lin F, Ma WY (2006) Image annotation by large-scale content-based image retrieval. In: Proceedings of ACM int. conf. on multimedia, Santa Barbara, 23–27 October 2006
Lindstaedt S, Pammer V, Mörzinger R, Kern R, Mülner H, Wagner C (2008) Recommending tags for pictures based on text, visual content and user context. In: Proceedings of the 3rd international conference on internet and web applications and services (ICIW 2008), Athens, 8–13 June 2008
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Manjunath BS, Ohm J-R, Vasudevan VV, Yamada A (2001) MPEG-7 color and texture descriptors. IEEE Trans Circuits Syst Video Technol 11:703–715, June
Mörzinger R, Thallinger G (2007) TRECVid 2007 high level feature extraction experimetns at JOANNEUM RESEARCH. In: Proceedings of TRECVID workshop, Gaithersburg, 5–6 November 2007
Pammer V, Ley T, Lindstaedts (2008) Waxmann Verlag, chap tagr: Unterstützung in kollaborativen Tagging Umgebungen Durch Semantische Und Assoziative Netzwerke. Medien in der Wissenschaft
Shaw B (2006) Learning from a visual folksonomy. Automatically annotating images from flickr, May
Sigurbjörnsson B, van Zwol R (2008) Flickr tag recommendation based on collective knowledge. In: Proceedings of the 17th international conference on world wide web, WWW 2008, Beijing, 21–25 April 2008
Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22:1349–1380
Wang X, Zhang L, Jing F, Ma WY (2006) AnnoSearch: image auto-annotation by search. In: Proceedings of the international conference on computer vision and pattern recognition, New York, 17–22 June 2006
Yavlinsky A, Schofield E, Rüger SM (2005) Automated image annotation using global features and robust nonparametric density estimation. In: Proceedings of the 4th int. conf. on image and video retrieval (CIVR), vol 3568, Singapore, July 2005, pp 507–517
Young IT, van Vliet LJ, van Ginkel M (2002) Recursive gabor filtering. In: ICPR ’00: proceedings of the int. conf. on pattern recognition, vol 50(11), November 2002, pp 2798–2805
Acknowledgements
The authors would like to thank their colleagues Marcus Thaler and Werner Haas for their support and feedback. The Know-Center is funded within the Austrian COMET Program—Competence Centers for Excellent Technologies—under the auspices of the Austrian Ministry of Transport, Innovation and Technology, the Austrian Ministry of Economics and Labor and by the State of Styria. COMET is managed by the Austrian Research Promotion Agency FFG.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lindstaedt, S., Mörzinger, R., Sorschag, R. et al. Automatic image annotation using visual content and folksonomies. Multimed Tools Appl 42, 97–113 (2009). https://doi.org/10.1007/s11042-008-0247-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-008-0247-7