Skip to main content
Log in

Local and global approaches for unsupervised image annotation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Image annotation is the task of assigning keywords to images with the goal of facilitating their organization and accessibility options (e.g., searching by keywords). Traditional annotation methods are based on supervised learning. Although being very effective, these methods require of large amounts of manually labeled images, and are limited in the sense that images can only be labeled with concepts seen during the training phase. Unsupervised automatic image annotation (UAIA) methods, on the other hand, neglect strongly-labeled images and instead rely on huge collections of unstructured text containing images for the annotation. In addition to not requiring labeled images, unsupervised techniques are advantageous because they can assign (virtually) any concept to an image. Despite these benefits, unsupervised methods have not been widely studied in image annotation, a reason for this is the lack of a reference framework for UAIA. In this line, this paper introduces two effective methods for UAIA in the context of a common framework inspired in the way a query is expanded throughout Automatic Query Expansion (AQE) in information retrieval. On the one hand, we describe a local method that processes text information associated to images retrieved when using the image to annotate as query, several methods from the state of the art can be described under this formulation. On the other hand, we propose a global method that pre-process offline the reference collection to identify visual-textual associations that are later used for annotation. Both methods are extensively evaluated in benchmarks for large-scale UAIA. Experimental results show the competitiveness of both strategies when compared to the state of the art. We foresee the AQE-based framework will pave the way for the development of alternative and effective methods for UAIA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. http://www.webpagefx.com/internet-real-time/

  2. https://www.instagram.com/

  3. https://www.pinterest.com/

  4. In strongly labeled images, each object in an image is segmented and labeled, weakly labeled images are those for which concepts present in the image are given, but not the location of the objects.

  5. We could not evaluate every configuration because all of these evaluations were performed by the organizers who restrict the maximum number of configurations that they can assess per user.

  6. Interest points in images are represented with visual descriptors that are clustered (e.g., using k−means), the centers of the clusters are considered visual words and images are represented by a numerical vector that indicates the frequency of occurrence of visual words in images.

  7. Please note that we do not include results for the 2015 dataset, because we do not have results from other authors to compare against. The mAP results obtained by our methods are 0.2095, and 0.2130, for the local and global methods, respectively.

  8. A connotative description focuses on indicating the events happening and the feelings caused by the images.

  9. A denotative description focuses on the enumeration of the objects present in the image.

References

  1. Abdel-Hakim AE, Farag AA (2006) Csift: A sift descriptor with color invariant characteristics. In: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2, CVPR ’06, pp. 1978–1983. IEEE Computer Society

  2. Barnard K, Duygulu P, Forsyth D, de Freitas N, Blei D, Jordan M (2003) Matching words and pictures. J Mach Learn Res 3:1107–1135

    MATH  Google Scholar 

  3. Barnard K, Forsyth D (2000) Learning the semantic of words and pictures. In: International Conference on Computer Vision, vol. 2, pp. 408–415

  4. Blei DM, Jordan MI (2003) Modeling annotated data. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, SIGIR ’03, pp. 127–134. ACM, New York, NY, USA

  5. Boiman O, Shechtman E, Irani M (2008) In defense of nearest-neighbor based image classification. In: 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), 24-26 June 2008, Anchorage, Alaska, USA

  6. Budíková P, Botorek J, Batko M, Zezula P (2014) DISA at imageclef 2014 revised: Search-based image annotation with decaf features. CoRR abs/1409:4627

    Google Scholar 

  7. Budikova P, Botorek J, Batko M, Zezula P (2014) Disa at imageclef 2014: The search-based solution for scalable image annotation. In: CLEF 2014 Evaluation Labs and Workshop, Online Working Notes, pp. 360–371

  8. Caicedo J, BenAbdallah J, González FON (2012) Multimodal representation, indexing, automated annotation and retrieval of image collection via non-negative matrix factorization. Neural Comput 76(1):50–60

    Google Scholar 

  9. Carpineto C, Romano G (2012) A survey of automatic query expansion in information retrieval. ACM Comput Surv 44(1):1:1–1:50

    Article  MATH  Google Scholar 

  10. Datta R, Joshi D, Li J, Wang J (2008) Image retrieval: Ideas, influences, and trends of the new age. ACM Comput Surv 40(2)

  11. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  12. Divvala SK, Farhadi A, Guestrin C (2014) Learning everything about anything: Webly-supervised visual concept learning. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23-28, 2014, pp. 3270–3277

  13. Escalante HJ, Montes M, Sucar E (2011) An energy-based model for region labeling. Comput Vis Image Underst 115(6):787–803

    Article  Google Scholar 

  14. Escalante HJ, Montes M, Sucar E (2012) Multimodal document indexing based on semantic cohesion for image retrieval. Inf Retr 15(1):1–32

    Article  Google Scholar 

  15. Escalante HJ, Montes M, Sucar E (2012) Semantic cohesion for image annotation and retrieval. Computación y Sistemas 10(1):121–126

    Google Scholar 

  16. Feng Y, Lapata M (2008) Automatic image annotation using auxiliary text information. In: Proceedings of ACL-08: HLT, pp. 272–280. Association for Computational Linguistics, Columbus, Ohio

  17. Geusebroek JM, van den Boomgaard R, Smeulders AW, Geerts H (2001) Color invariance. IEEE Trans Pattern Anal Mach Intell 23(12):1338–1350

    Article  Google Scholar 

  18. Grana C, Serra G, Manfredi M, Cucchiara R, Martoglia R, Mandreoli F (2013) Unimore at imageclef 2013: Scalable concept image annotation. In: CLEF 2013 Evaluation Labs and Workshop, Online Working Notes

  19. Guillaumin M, Mensink T, Verbeek J, Schmid C (2009) Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In: International Conference on Computer Vision (ICCV)

  20. Hanbury A (2008) A survey of methods for image annotation. J Vis Lang Comput 19(5):617–627

    Article  Google Scholar 

  21. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding. arXiv:1408.5093

  22. Joshi D, Wang J, Li J (2006) The story picturing engine—a system for automatic text illustration. ACM Trans Multimedia Comput Commun Appl 2(1):68–89

    Article  Google Scholar 

  23. Kanehira A, Hidaka M, Mukuta Y, Tsuchiya Y, Mano T, Harada T (2014) Mil at imageclfe 2014: Scalable system for image annotation. In: CLEF 2014 Evaluation Labs and Workshop, Online Working Notes, pp. 411–420

  24. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F., Burges C., Bottou L., Weinberger K. (eds) Advances in Neural Information Processing Systems 25, pp. 1097–1105. Curran Associates, Inc

  25. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2, CVPR ’06, pp. 2169–2178. IEEE Computer Society, Washington, DC, USA

  26. Liu M, Wang L, Nie L, Dai J, Ji D (2016) Event graph based contradiction recognition from big data collection. Neurocomputing 181:64–75. Big Data Driven Intelligent Transportation Systems

    Article  Google Scholar 

  27. Liu M, Zhang L, Hu H, Nie L, Dai J (2016) A classification model for semantic entailment recognition with feature combination, pp 127-135

  28. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60(2):91–110

    Article  Google Scholar 

  29. Luo Y, Liu T, Tao D, Xu C (2015) Multiview matrix completion for multilabel image classification. IEEE Trans Image Processing 24(8):2355–2368

    Article  MathSciNet  Google Scholar 

  30. Luo Y, Tao D, Geng B, Xu C, Maybank SJ (2013) Manifold regularized multitask learning for semi-supervised multilabel image classification. IEEE Trans Image Processing 22(2):523–536

    Article  MathSciNet  Google Scholar 

  31. Makadia A, Pavlovic V, Kumar S (2010) Baselines for image annotation. Int J Comput Vis 90(1):88–105

    Article  Google Scholar 

  32. Miller GA (1995) Wordnet: A lexical database for english. Commun ACM 38(11):39–41. https://wordnet.princeton.edu/

    Article  Google Scholar 

  33. Mitra M, Singhal A, Buckley C (1998) Improving automatic query expansion. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’98, pp. 206-214. ACM, New York, NY, USA

  34. Monay F, Gatica-Perez D (2003) On image auto-annotation with latent space models. In: Proceedings of the eleventh ACM international conference on Multimedia (MM), pp. 275–278

  35. Newman D, Hettich S, Blake C, Merz C (1999) Uci repository of machine learning databases. https://archive.ics.uci.edu/ml/datasets/Corel+Image+Features

  36. Novak D, Batko M, Zezula P (2015) Large-scale image retrieval using neural net descriptors. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’15, pp. 1039–1040. ACM

  37. Pellegrin L, Escalante HJ, Montes-y-Gȯmez M (2014) Evaluating term-expansion for unsupervised image annotation. In: Human-Inspired Computing and Its Applications - 13th Mexican International Conference on Artificial Intelligence, MICAI 2014, Tuxtla Gutiérrez, Mexico, November 16-22, 2014. Proceedings, Part I, pp. 151–162

  38. Pellegrin L, Vanegas J, Arevalo J, Beltran V, Escalante HJ, Gomez MM, Gonzalez FA (2015) Inaoe-unal at imageclef 2015: Scalable concept image annotation. In: CLEF (Working Notes), CEUR WS Proceedings, Vol. 1391

  39. Putthividhya D, Attias H, Nagarajan S (2010) Topic regression multi-modal latent dirichlet allocation for image annotation. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3408– 3415

  40. Qiu Y, Frei HP (1993) Concept based query expansion. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’93, pp. 160–169. ACM, New York, NY, USA

  41. Reshma I, Ullah M, Aono M (2014) Ontology based classification for multi-label image annotation. In: 2014 IEEE International Conference of Advanced Informatics: Concept, Theory and Application ICAICTA 2014, Bandung, Aug. 20-21, 2014, pp. 226–231

  42. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet Large Scale Visual Recognition Challenge. Int J Comput Vis:1–42

  43. Sahbi H (2013) Cnrs - telecom paristech at imageclef 2013 scalable concept image annotation task: Winning annotations with context dependent svms. In: CLEF 2013 Evaluation Labs and Workshop, Online Working Notes, pp. 1–12

  44. Sánchez-Oro J., Montalvo S, Montemayor A, Pantrigo J, Duarte A, Fresno V, Martínez R (2013) Urjc&uned at imageclef 2013 photo annotation task. In: CLEF 2013 Evaluation Labs and Workshop, Online Working Notes

  45. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Tech. rep., arXiv technical report

  46. Sivic J, Zisserman A (2006) Video google: Efficient visual search of videos. In: Ponce J, Hebert M, Schmid C, Zisserman A (eds) Toward Category-Level Object Recognition, Lecture Notes in Computer Science, vol. 4170, pp. 127–144. Springer Berlin Heidelberg

  47. Stathopoulos S, Kalamboukis T (2014) Ipl at imageclef 2014: Scalable concept image annotation. In: CLEF 2014 Evaluation Labs and Workshop, Online Working Notes, pp. 398–403

  48. Tao D, Jin L, Liu W, Li X (2013) Hessian regularized support vector machines for mobile image annotation on the cloud. IEEE Trans Multimedia 15(4):833–844

    Article  Google Scholar 

  49. Tuytelaars T, Mikolajczyk K (2008) Local Invariant Feature Detectors: A Survey. Now Publishers Inc

  50. Uricchio T, Bertini M, Ballan L, del Bimbo A (2013) MICC-UNIFI at ImageCLEF 2013 scalable concept image annotation. In: Working Notes for CLEF 2013 Conference, Valencia, Spain, September 23-26

  51. Ushiku Y, Muraoka H, Inaba S, Fujisawa T, Yasumoto K, Gunji N, Higuchi T, Hara Y, Harada T, Kuniyoshi Y (2012) Isi at imageclef 2012: Scalable system for image annotation. In: CLEF 2012 Evaluation Labs and Workshop, Online Working Notes, pp. 1–12

  52. Vanegas J, Arevalo J, Otálora S, Páez F, Pérez-Rubiano S, González F (2014) Mindlab at imageclef 2014: Scalable concept image annotation. In: CLEF 2014 Evaluation Labs and Workshop, Online Working Notes, pp. 404–410

  53. Villegas M, Müller H, Gilbert A, Piras L, Wang J, Mikolajczyk K, de Herrera AGS, Bromuri S, Amin MA, Mohammed MK, Acar B, Uskudarli S, Marvasti NB, Aldana JF, del Mar Roldán García M (2015) General Overview of ImageCLEF at the CLEF 2015 Labs. Lecture Notes in Computer Science. Springer International Publishing

  54. Villegas M, Paredes R (2012) Overview of the imageclef 2012 scalable concept image annotation task. In: CLEF 2012 Evaluation Labs and Workshop. Online Working Notes

  55. Villegas M, Paredes R (2014) Overview of the imageclef 2014 scalable concept image annotation task. In: CLEF 2014 Evaluation Labs and Workshop, Online Working Notes, pp. 308–328

  56. Villegas M, Paredes R, Thomee B (2013) Overview of the imageclef 2013 scalable concept image annotation subtask. In: CLEF 2013 Evaluation Labs and Workshop, Online Working Notes, pp. 1–19

  57. Winn J, Criminisi A, Minka T (2005) Object categorization by learned universal visual dictionary. In: International Conference on Computer Vision, p. 18001807

  58. Zeimpekis D, Gallopoulos E (2010) Grouping Multidimensional Data: Recent Advances in Clustering, chap. In: TMG: A MATLAB toolbox for generating term-document matrices from text collections, pp. 187–210. Springer

  59. Zhang D, Islam M, Lu G (2012) A review on automatic image annotation techniques. Pattern Recogn 45(1):346–362

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by CONACYT under project grant CB-2014-241306 (Clasificación y recuperación de imágenes mediante técnicas de minería de textos). Also this work was partially supported by the LACCIR programme under project ID R1212LAC006 and supported by CONACyT under scholarship No. 214764. The authors would like to thank Jorge Vanegas and John Arevalo from UNAL for their support on the extraction of CNN visual features. The authors would like to thank Mauricio Villegas for his support on evaluation of considered datasets.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luis Pellegrin.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pellegrin, L., Escalante, H.J., Montes-y-Gómez, M. et al. Local and global approaches for unsupervised image annotation. Multimed Tools Appl 76, 16389–16414 (2017). https://doi.org/10.1007/s11042-016-3918-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-3918-9

Keywords

Navigation