Abstract
Image annotation is the task of assigning keywords to images with the goal of facilitating their organization and accessibility options (e.g., searching by keywords). Traditional annotation methods are based on supervised learning. Although being very effective, these methods require of large amounts of manually labeled images, and are limited in the sense that images can only be labeled with concepts seen during the training phase. Unsupervised automatic image annotation (UAIA) methods, on the other hand, neglect strongly-labeled images and instead rely on huge collections of unstructured text containing images for the annotation. In addition to not requiring labeled images, unsupervised techniques are advantageous because they can assign (virtually) any concept to an image. Despite these benefits, unsupervised methods have not been widely studied in image annotation, a reason for this is the lack of a reference framework for UAIA. In this line, this paper introduces two effective methods for UAIA in the context of a common framework inspired in the way a query is expanded throughout Automatic Query Expansion (AQE) in information retrieval. On the one hand, we describe a local method that processes text information associated to images retrieved when using the image to annotate as query, several methods from the state of the art can be described under this formulation. On the other hand, we propose a global method that pre-process offline the reference collection to identify visual-textual associations that are later used for annotation. Both methods are extensively evaluated in benchmarks for large-scale UAIA. Experimental results show the competitiveness of both strategies when compared to the state of the art. We foresee the AQE-based framework will pave the way for the development of alternative and effective methods for UAIA.
Similar content being viewed by others
Notes
In strongly labeled images, each object in an image is segmented and labeled, weakly labeled images are those for which concepts present in the image are given, but not the location of the objects.
We could not evaluate every configuration because all of these evaluations were performed by the organizers who restrict the maximum number of configurations that they can assess per user.
Interest points in images are represented with visual descriptors that are clustered (e.g., using k−means), the centers of the clusters are considered visual words and images are represented by a numerical vector that indicates the frequency of occurrence of visual words in images.
Please note that we do not include results for the 2015 dataset, because we do not have results from other authors to compare against. The mAP results obtained by our methods are 0.2095, and 0.2130, for the local and global methods, respectively.
A connotative description focuses on indicating the events happening and the feelings caused by the images.
A denotative description focuses on the enumeration of the objects present in the image.
References
Abdel-Hakim AE, Farag AA (2006) Csift: A sift descriptor with color invariant characteristics. In: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2, CVPR ’06, pp. 1978–1983. IEEE Computer Society
Barnard K, Duygulu P, Forsyth D, de Freitas N, Blei D, Jordan M (2003) Matching words and pictures. J Mach Learn Res 3:1107–1135
Barnard K, Forsyth D (2000) Learning the semantic of words and pictures. In: International Conference on Computer Vision, vol. 2, pp. 408–415
Blei DM, Jordan MI (2003) Modeling annotated data. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, SIGIR ’03, pp. 127–134. ACM, New York, NY, USA
Boiman O, Shechtman E, Irani M (2008) In defense of nearest-neighbor based image classification. In: 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), 24-26 June 2008, Anchorage, Alaska, USA
Budíková P, Botorek J, Batko M, Zezula P (2014) DISA at imageclef 2014 revised: Search-based image annotation with decaf features. CoRR abs/1409:4627
Budikova P, Botorek J, Batko M, Zezula P (2014) Disa at imageclef 2014: The search-based solution for scalable image annotation. In: CLEF 2014 Evaluation Labs and Workshop, Online Working Notes, pp. 360–371
Caicedo J, BenAbdallah J, González FON (2012) Multimodal representation, indexing, automated annotation and retrieval of image collection via non-negative matrix factorization. Neural Comput 76(1):50–60
Carpineto C, Romano G (2012) A survey of automatic query expansion in information retrieval. ACM Comput Surv 44(1):1:1–1:50
Datta R, Joshi D, Li J, Wang J (2008) Image retrieval: Ideas, influences, and trends of the new age. ACM Comput Surv 40(2)
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Divvala SK, Farhadi A, Guestrin C (2014) Learning everything about anything: Webly-supervised visual concept learning. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23-28, 2014, pp. 3270–3277
Escalante HJ, Montes M, Sucar E (2011) An energy-based model for region labeling. Comput Vis Image Underst 115(6):787–803
Escalante HJ, Montes M, Sucar E (2012) Multimodal document indexing based on semantic cohesion for image retrieval. Inf Retr 15(1):1–32
Escalante HJ, Montes M, Sucar E (2012) Semantic cohesion for image annotation and retrieval. Computación y Sistemas 10(1):121–126
Feng Y, Lapata M (2008) Automatic image annotation using auxiliary text information. In: Proceedings of ACL-08: HLT, pp. 272–280. Association for Computational Linguistics, Columbus, Ohio
Geusebroek JM, van den Boomgaard R, Smeulders AW, Geerts H (2001) Color invariance. IEEE Trans Pattern Anal Mach Intell 23(12):1338–1350
Grana C, Serra G, Manfredi M, Cucchiara R, Martoglia R, Mandreoli F (2013) Unimore at imageclef 2013: Scalable concept image annotation. In: CLEF 2013 Evaluation Labs and Workshop, Online Working Notes
Guillaumin M, Mensink T, Verbeek J, Schmid C (2009) Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In: International Conference on Computer Vision (ICCV)
Hanbury A (2008) A survey of methods for image annotation. J Vis Lang Comput 19(5):617–627
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding. arXiv:1408.5093
Joshi D, Wang J, Li J (2006) The story picturing engine—a system for automatic text illustration. ACM Trans Multimedia Comput Commun Appl 2(1):68–89
Kanehira A, Hidaka M, Mukuta Y, Tsuchiya Y, Mano T, Harada T (2014) Mil at imageclfe 2014: Scalable system for image annotation. In: CLEF 2014 Evaluation Labs and Workshop, Online Working Notes, pp. 411–420
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F., Burges C., Bottou L., Weinberger K. (eds) Advances in Neural Information Processing Systems 25, pp. 1097–1105. Curran Associates, Inc
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2, CVPR ’06, pp. 2169–2178. IEEE Computer Society, Washington, DC, USA
Liu M, Wang L, Nie L, Dai J, Ji D (2016) Event graph based contradiction recognition from big data collection. Neurocomputing 181:64–75. Big Data Driven Intelligent Transportation Systems
Liu M, Zhang L, Hu H, Nie L, Dai J (2016) A classification model for semantic entailment recognition with feature combination, pp 127-135
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60(2):91–110
Luo Y, Liu T, Tao D, Xu C (2015) Multiview matrix completion for multilabel image classification. IEEE Trans Image Processing 24(8):2355–2368
Luo Y, Tao D, Geng B, Xu C, Maybank SJ (2013) Manifold regularized multitask learning for semi-supervised multilabel image classification. IEEE Trans Image Processing 22(2):523–536
Makadia A, Pavlovic V, Kumar S (2010) Baselines for image annotation. Int J Comput Vis 90(1):88–105
Miller GA (1995) Wordnet: A lexical database for english. Commun ACM 38(11):39–41. https://wordnet.princeton.edu/
Mitra M, Singhal A, Buckley C (1998) Improving automatic query expansion. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’98, pp. 206-214. ACM, New York, NY, USA
Monay F, Gatica-Perez D (2003) On image auto-annotation with latent space models. In: Proceedings of the eleventh ACM international conference on Multimedia (MM), pp. 275–278
Newman D, Hettich S, Blake C, Merz C (1999) Uci repository of machine learning databases. https://archive.ics.uci.edu/ml/datasets/Corel+Image+Features
Novak D, Batko M, Zezula P (2015) Large-scale image retrieval using neural net descriptors. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’15, pp. 1039–1040. ACM
Pellegrin L, Escalante HJ, Montes-y-Gȯmez M (2014) Evaluating term-expansion for unsupervised image annotation. In: Human-Inspired Computing and Its Applications - 13th Mexican International Conference on Artificial Intelligence, MICAI 2014, Tuxtla Gutiérrez, Mexico, November 16-22, 2014. Proceedings, Part I, pp. 151–162
Pellegrin L, Vanegas J, Arevalo J, Beltran V, Escalante HJ, Gomez MM, Gonzalez FA (2015) Inaoe-unal at imageclef 2015: Scalable concept image annotation. In: CLEF (Working Notes), CEUR WS Proceedings, Vol. 1391
Putthividhya D, Attias H, Nagarajan S (2010) Topic regression multi-modal latent dirichlet allocation for image annotation. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3408– 3415
Qiu Y, Frei HP (1993) Concept based query expansion. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’93, pp. 160–169. ACM, New York, NY, USA
Reshma I, Ullah M, Aono M (2014) Ontology based classification for multi-label image annotation. In: 2014 IEEE International Conference of Advanced Informatics: Concept, Theory and Application ICAICTA 2014, Bandung, Aug. 20-21, 2014, pp. 226–231
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet Large Scale Visual Recognition Challenge. Int J Comput Vis:1–42
Sahbi H (2013) Cnrs - telecom paristech at imageclef 2013 scalable concept image annotation task: Winning annotations with context dependent svms. In: CLEF 2013 Evaluation Labs and Workshop, Online Working Notes, pp. 1–12
Sánchez-Oro J., Montalvo S, Montemayor A, Pantrigo J, Duarte A, Fresno V, Martínez R (2013) Urjc&uned at imageclef 2013 photo annotation task. In: CLEF 2013 Evaluation Labs and Workshop, Online Working Notes
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Tech. rep., arXiv technical report
Sivic J, Zisserman A (2006) Video google: Efficient visual search of videos. In: Ponce J, Hebert M, Schmid C, Zisserman A (eds) Toward Category-Level Object Recognition, Lecture Notes in Computer Science, vol. 4170, pp. 127–144. Springer Berlin Heidelberg
Stathopoulos S, Kalamboukis T (2014) Ipl at imageclef 2014: Scalable concept image annotation. In: CLEF 2014 Evaluation Labs and Workshop, Online Working Notes, pp. 398–403
Tao D, Jin L, Liu W, Li X (2013) Hessian regularized support vector machines for mobile image annotation on the cloud. IEEE Trans Multimedia 15(4):833–844
Tuytelaars T, Mikolajczyk K (2008) Local Invariant Feature Detectors: A Survey. Now Publishers Inc
Uricchio T, Bertini M, Ballan L, del Bimbo A (2013) MICC-UNIFI at ImageCLEF 2013 scalable concept image annotation. In: Working Notes for CLEF 2013 Conference, Valencia, Spain, September 23-26
Ushiku Y, Muraoka H, Inaba S, Fujisawa T, Yasumoto K, Gunji N, Higuchi T, Hara Y, Harada T, Kuniyoshi Y (2012) Isi at imageclef 2012: Scalable system for image annotation. In: CLEF 2012 Evaluation Labs and Workshop, Online Working Notes, pp. 1–12
Vanegas J, Arevalo J, Otálora S, Páez F, Pérez-Rubiano S, González F (2014) Mindlab at imageclef 2014: Scalable concept image annotation. In: CLEF 2014 Evaluation Labs and Workshop, Online Working Notes, pp. 404–410
Villegas M, Müller H, Gilbert A, Piras L, Wang J, Mikolajczyk K, de Herrera AGS, Bromuri S, Amin MA, Mohammed MK, Acar B, Uskudarli S, Marvasti NB, Aldana JF, del Mar Roldán García M (2015) General Overview of ImageCLEF at the CLEF 2015 Labs. Lecture Notes in Computer Science. Springer International Publishing
Villegas M, Paredes R (2012) Overview of the imageclef 2012 scalable concept image annotation task. In: CLEF 2012 Evaluation Labs and Workshop. Online Working Notes
Villegas M, Paredes R (2014) Overview of the imageclef 2014 scalable concept image annotation task. In: CLEF 2014 Evaluation Labs and Workshop, Online Working Notes, pp. 308–328
Villegas M, Paredes R, Thomee B (2013) Overview of the imageclef 2013 scalable concept image annotation subtask. In: CLEF 2013 Evaluation Labs and Workshop, Online Working Notes, pp. 1–19
Winn J, Criminisi A, Minka T (2005) Object categorization by learned universal visual dictionary. In: International Conference on Computer Vision, p. 18001807
Zeimpekis D, Gallopoulos E (2010) Grouping Multidimensional Data: Recent Advances in Clustering, chap. In: TMG: A MATLAB toolbox for generating term-document matrices from text collections, pp. 187–210. Springer
Zhang D, Islam M, Lu G (2012) A review on automatic image annotation techniques. Pattern Recogn 45(1):346–362
Acknowledgments
This work was supported by CONACYT under project grant CB-2014-241306 (Clasificación y recuperación de imágenes mediante técnicas de minería de textos). Also this work was partially supported by the LACCIR programme under project ID R1212LAC006 and supported by CONACyT under scholarship No. 214764. The authors would like to thank Jorge Vanegas and John Arevalo from UNAL for their support on the extraction of CNN visual features. The authors would like to thank Mauricio Villegas for his support on evaluation of considered datasets.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pellegrin, L., Escalante, H.J., Montes-y-Gómez, M. et al. Local and global approaches for unsupervised image annotation. Multimed Tools Appl 76, 16389–16414 (2017). https://doi.org/10.1007/s11042-016-3918-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-3918-9