Skip to main content
Log in

Photo annotation: a survey

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript


Due to the large number of photos that are currently being generated, it is very important to have techniques to organize, search for, and retrieve such images. Photo annotation plays a key role in these mechanisms because it can link raw data (photos) to specific information that is essential for human beings to handle large amounts of content. However, the generation of photo annotation is still a difficult problem to solve as part of a well-known challenge called the semantic gap. In this paper, a literature review was conducted with the aim of investigating the most popular methods employed to produce photo annotations. Based on the papers surveyed, we identified that People (“Who?”), Location (“Where?”), and Event (“Where? When?”) are the most important features of photo annotation. We also established comparisons between similar photo annotation methods, highlighting key aspects of the most commonly used approaches. Moreover, we provide an overview of a general photo annotation process and present the main aspects of photo annotation representation comprising formats, context of usage, advantages and disadvantages. Finally, we discuss ways to improve photo annotation methods and present some future research guidelines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others









  8. See


  1. Abowd GD, Dey AK, Brown PJ, Davies N, Smith M, Steggles P (1999) Towards a better understanding of context and context-awareness. In: {HUC} 1999 - Proc. 1st Int. Symp. Handheld Ubiquitous Comput. Springer-Verlag, London, pp 304–307

  2. Ahern S, Eckles D, Good NS, King S, Naaman M, Nair R (2007) Over-exposed?: privacy patterns and considerations in online and mobile photo sharing. In: Proc. SIGCHI conf. Hum. Factors comput. Syst. pp 357–366

  3. Anguelov D, Lee K, Gökturk SB, Sumengen B (2007) Contextual identity recognition in personal photo albums. In: {CVPR} 2007 - Proc. IEEE Conf. Comput. Vis. Pattern Recognit. pp 1–7

  4. Bacha S, Benblidia N (2013) Combining context and content for automatic image annotation on mobile phones. In: {ICITCS} 2013 - Proc. Int. Conf. IT Converg. Secur. pp 1–4

  5. Baltieri D, Vezzani R, Cucchiara R (2013) Learning articulated body models for people re-identification. In: {MM} 2013 - Proc. 21st ACM Int. Conf. Multimed. ACM, New York, pp 557–560

  6. Becker H, Naaman M, Gravano L (2011) Selecting quality twitter content for events. {ICWSM} 2011 - Proc. Fifth Int. AAAI Conf. Weblogs Soc. Media

  7. Becker H, Iter D, Naaman M, Gravano L (2012) Identifying content for planned events across social media sites. In: {WSDM} 2012 - Proc. Fifth ACM Int. Conf. Web Search Data Min. ACM, New York, pp 533–542

  8. Biaud V, Despiegel V, Herold C, Beiler O, Gentric S (2013) Semi-supervised evaluation of face recognition in videos. In: {VIGTA} 2013 - Proc. Int. Work. Video Image Gr. Truth Comput. Vis. Appl. ACM, New York, p 1:1–1:6

  9. Brenner M, Izquierdo E (2012) Social event detection and retrieval in collaborative photo collections. In: {ICMR} 2012 - Proc. 2Nd ACM Int. Conf. Multimed. Retr. ACM, New York, p 21:1–21:8

  10. Brenner M, Izquierdo E (2013) MediaEval 2013: social event detection, retrieval and classification in collaborative photo collections. MediaEval 1043

  11. Brenner M, Mirza N, Izquierdo E (2014) People recognition using gamified ambiguous feedback. In: {GamifIR} 2014 - Proc. First Int. Work. Gamification Inf. Retr. ACM, New York, pp 22–26

  12. Brickley D, Buswell S, Matthews BM, Miller L, Reynolds D, Wilson MD (2002) SWAD-Europe: semantic web advanced development in Europe. In: {ISWC} 2002 - Proc. First Int. Semant. Web Conf. Semant. Web. Springer-Verlag, London, pp 409–413

  13. Caprani N, Piasek P, Gurrin C, O’Connor NE, Irving K, Smeaton AF (2014) Life-long collections: motivations and the implications for lifelogging with mobile devices. IJMHCI 6:15–36. doi:10.4018/ijmhci.2014010102

    Google Scholar 

  14. Chai Y, Zhu X, Zhou S, Bian Y, Bu F, Li W, Zhu J (2009) Ontology-based digital photo annotation using multi-source information. In: {CIMSA} 2009 - Proc. IEEE Int. Conf. Comput. Intell. Meas. Syst. Appl. pp 38–41

  15. Chakravarthy A (2006) Cross-Media document annotation and enrichment. {SAAW} 2006 - Proc. 1st Semant. Web Authoring Annot. Work.

  16. Choi JY, Yang S, Ro YM, Plataniotis KN (2008) Face annotation for personal photos using context-assisted face recognition. In: {MIR} 2008 - Proc. 1st ACM Int. Conf. Multimed. Inf. Retr. ACM, New York, pp 44–51

  17. Choi J, De Neve W, Ro YM, Plataniotis KN (2009) Face annotation for personal photos using collaborative face recognition in online social networks. In: Proc. 16th Int. Conf. Digit. Signal Process. pp 1–8

  18. Choi JY, De Neve W, Ro YM, Plataniotis KN (2010) Automatic face annotation in personal photo collections using context-based unsupervised clustering and face information fusion. Circuits Syst Video Technol IEEE Trans 20:1292–1309. doi:10.1109/TCSVT.2010.2058470

    Article  Google Scholar 

  19. Choi JY, De Neve W, Plataniotis KN, Ro YM (2011) Collaborative face recognition for improved face annotation in personal photo collections shared on online social networks. Multimedia, IEEE Trans 13:14–28. doi:10.1109/TMM.2010.2087320

    Article  Google Scholar 

  20. Choi J, Hauff C, Van Laere O, Thomee B (2015) The placing task at mediaeval 2015. Work. Notes Proc. Mediaev. 2015 Work. Wurzen, Ger. Sept. 14–15, 2015

  21. Cooray SH, O’Connor NE (2009) Enhancing person annotation for personal photo management applications. In: {DEXA} 2009 - Proc. 2009 20th Int. Work. Database Expert Syst. Appl. IEEE Computer Society, Washington, DC, pp 251–255

  22. Cooray S, O’Connor NE, Gurrin C, Jones GJF, O’Hare N, Smeaton AF (2006) Identifying person re-occurrences for personal photo management applications. In: {VIE} 2006 - Proc. IET Int. Conf. Vis. Inf. Eng. pp 144–149

  23. Dao M-S, Boato G, De Natale FGB, Nguyen T-V (2013) Jointly exploiting visual and non-visual information for event-related social media retrieval. In: {ICMR} 2013 - Proc. 3rd ACM Conf. Int. Conf. Multimed. Retr. ACM, New York, NY, USA, pp 159–166

  24. Dasiopoulou S, Giannakidou E, Litos G, Malasioti P, Kompatsiaris Y (2011) A survey of semantic image and video annotation tools. In: Paliouras G, Spyropoulos CD, Tsatsaronis G (eds) Knowledge-driven Multimed. Inf. Extr. Ontol. Evol. Springer-Verlag, Berlin, pp 196–239

    Chapter  Google Scholar 

  25. Davis M, King S, Good N, Sarvas R (2004) From context to content: leveraging context to infer media metadata. In: {MULTIMEDIA} 2004 - Proc. 12th Annu. ACM Int. Conf. Multimed. ACM, New York, NY, USA, pp 188–195

  26. Davis M, Smith M, Canny J, Good N, King S, Janakiraman R (2005) Towards context-aware face recognition. In: {MULTIMEDIA} 2005 - Proc. 13th Annu. ACM Int. Conf. Multimed. ACM, New York, NY, USA, pp 483–486

  27. Davis M, Smith M, Stentiford F, Bamidele A, Canny J, Good N, King S, Janakiraman R (2006) Using context and similarity for face and location identification. Proc. IS&T/SPIE 18th Annu. Symp. Electron. Imaging Sci. Technol.

  28. de Andrade DOS, de Figueirêdo HF, de Souza Baptista C, de Paiva AC (2014a) New approaches for geographic location propagation in digital photograph collections. In: {ICEIS} 2014 - Proc. 16th Int. Conf. Enterp. Inf. Syst. Vol. 3, Lisbon, Port. 27–30 April. 2014. pp 92–99

  29. de Andrade DOS, da Nóbrega Santos SI, de Figueirêdo HF, de Souza Baptista C, de Araújo JMFR (2014b) Towards better propagation of geographic location in digital photo collections. In: {IBERAMIA} 2014 - Proc. 14th Ibero-American Conf. Artif. Intell. pp 742–753

  30. De Choudhury M, Diakopoulos N, Naaman M (2012) Unfolding the event landscape on twitter: classification and exploration of user categories. In: {CSCW} 2012 - Proc. ACM 2012 Conf. Comput. Support. Coop. Work. ACM, New York, pp 241–244

  31. de Figueirêdo HF, Lacerda Y, de Paiva A, Casanova M, de Souza BC (2012) PhotoGeo: a photo digital library with spatial-temporal support and self-annotation. Multimed Tools Appl 59:279–305. doi:10.1007/s11042-011-0745-x

    Article  Google Scholar 

  32. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: Comput. Vis. Pattern Recognition, 2009. CVPR 2009. IEEE Conf. pp 248–255

  33. Feng K, Cong G, Bhowmick SS, Ma S (2014) In search of influential event organizers in online social networks. In: {SIGMOD} 2014 - Proc. 2014 ACM SIGMOD Int. Conf. Manag. Data. ACM, New York, pp 63–74

  34. Gallagher AC, Chen T (2008) Clothing cosegmentation for recognizing people. In: Comput. Vis. Pattern Recognition, 2008. CVPR 2008. IEEE Conf. pp 1–8

  35. Gallagher AC, Chen T (2009) Using context to recognize people in consumer images. IPSJ Trans Comput Vis Appl 1:115–126. doi:10.2197/ipsjtcva.1.115

    Article  Google Scholar 

  36. Gallagher AC, Neustaedter CG, Cao L, Luo J, Chen T (2008) Image annotation using personal calendars as context. In: {MM} 2008 - Proc. 16th ACM Int. Conf. Multimed. ACM, New York, pp 681–684

  37. Gao H, Tang J, Liu H (2012) Mobile location prediction in spatio-temporal context. Proc. Nokia Mob. data Chall. Work.

  38. Gao X, Cao J, Jin Z, Li X, Li J (2013) GeSoDeck: a geo-social event detection and tracking system. In: {MM} 2013 - Proc. 21st ACM Int. Conf. Multimed. ACM, New York, pp 471–472

  39. Gong Y, Li Y, Jin D, Su L, Zeng L (2011) A location prediction scheme based on social correlation. In: {VTC} 2011 - Proc. IEEE 73rd Veh. Technol. Conf. pp 1–5

  40. Grabovitch-Zuyev I, Kanza Y, Kravi E, Pat B (2007) On the correlation between textual content and geospatial locations in microblogs. In: {GeoRich} 2014 - Proc. Work. Manag. Min. Enriched Geo-Spatial Data. ACM, New York, p 3:1–3:6

  41. Halaschek-Wiener C, Golbeck J, Schain A, Grove M, Parsia B, Hendler JA (2005) PhotoStuff—an image annotation tool for the semantic web. Poster Proc. 4th Int. Semant. Web Conf.

  42. Hanbury A (2008) A survey of methods for image annotation. J Vis Lang Comput 19:617–627. doi:10.1016/j.jvlc.2008.01.002

    Article  Google Scholar 

  43. Hays J, Efros A (2008) IM2GPS: estimating geographic information from a single image. In: {CVPR} 2008 - Proc. IEEE Conf. Comput. Vis. Pattern Recognit. pp 1–8

  44. Hollenstein L, Purves R (2015) Exploring place through user-generated content: using Flickr tags to describe city cores. J Spat Inf Sci 1:21–48

    Google Scholar 

  45. Hu S, Hong TH, Maschal R, Phillips JP, Young SS (2010) Performance assessment of face recognition using super-resolution. In: {PerMIS} 2010 - Proc. 10th Perform. Metrics Intell. Syst. Work. ACM, New York, pp 195–200

  46. Hulsebosch RJ, Ebben PWG (2008) Enhancing face recognition with location information. In: {ARES} 2008 - Proc. 2008 Third Int. Conf. Availability, Reliab. Secur. IEEE Computer Society, Washington, DC, pp 397–403

  47. Ilina E, Hauff C, Celik I, Abel F, Houben G-J (2012) Social event detection on twitter. In: Brambilla M, Tokuda T, Tolksdorf R (eds) Web Eng. SE - 12. Springer, Berlin, pp 169–176

    Chapter  Google Scholar 

  48. Ionescu B, Radu A-L, Menéndez M, Müller H, Popescu A, Loni B (2014) Div400: a social image retrieval result diversification dataset. In: {MMSys} 2014 - Proc. 5th ACM Multimed. Syst. Conf. ACM, New York, NY, USA, pp 29–34

  49. Ivanov I, Vajda P, Lee J-S, Goldmann L, Ebrahimi T (2012) Geotag propagation in social networks based on user trust model. Multimed Tools Appl 56:155–177. doi:10.1007/s11042-010-0570-7

    Article  Google Scholar 

  50. Izquierdo E, Chandramouli K, Grzegorzek M, Piatrik T (2007) K-Space content management and retrieval system. In: {ICIAPW} 2007 - Proc. 14th Int. Conf. Image Anal. Process. - Work. IEEE Computer Society, Washington, DC, pp 131–136

  51. Joshi D, Gallagher A, Yu J, Luo J (2012) Inferring photographic location using geotagged web images. Multimed Tools Appl 56:131–153. doi:10.1007/s11042-010-0553-8

    Article  Google Scholar 

  52. Kim H-N, El Saddik A, Jung J-G (2012) Leveraging personal photos to inferring friendships in social network services. Expert Syst Appl 39:6955–6966. doi:10.1016/j.eswa.2012.01.022

    Article  Google Scholar 

  53. Lacerda YA, de Figueirêdo HF, de Souza Baptista C, de Paiva AC (2008a) Expanding and using context information to photo annotation suggestion (in Portuguese). In: {WebMedia} 2008 - Proc. 14th Brazilian Symp. Multimed. Web. ACM, New York, pp 162–169

  54. Lacerda YA, de Figueirêdo HF, de Souza Baptista C, Sampaio MC (2008b) PhotoGeo: a self-organizing system for personal photo collections. In: {ISM} 2008 - Proc. Tenth IEEE Int. Symp. Multimed. pp 258–265

  55. Lacerda YA, de Figueirêdo HF, da Silva JPR, Leite DFB, de Paiva AC, de Souza Baptista C (2013) On improving geotag quality in photo collections. In: {GEOProcessing} 2013 - Proc. Fifth Int. Conf. Adv. Geogr. Inf. Syst. Appl. Serv. pp 139–144

  56. Lee YJ, Grauman K (2011) Face discovery with social context. In: Proc. Br. Mach. Vis. Conf. BMVA Press, p 36.1–36.11

  57. Lim J-H, Tian Q, Mulhem P (2003) PhotoGeo: a photo digital library with spatial-temporal support and self-annotation. IEEE Multimed 10:28–37

    Google Scholar 

  58. Lin D, Kapoor A, Hua G, Baker S (2010) Joint people, event, and location recognition in personal photo collections using cross-domain context. In: {ECCV} 2010 - Proc. 11th Eur. Conf. Comput. Vis. Part I. Springer-Verlag, Berlin, pp 243–256

  59. Lux M (2009) Caliph & Emir: MPEG-7 photo annotation and retrieval. In: {MM} 2009 - Proc. 17th ACM Int. Conf. Multimed. ACM, New York, pp 925–926

  60. Malpas J (2007) Place and experience: a philosophical topography, 1st edn. Cambridge University Press, Cambridge

    Google Scholar 

  61. Martins B, Manguinhas H, Borbinha J (2008) Extracting and exploring the geo-temporal semantics of textual resources. In: Proc. IEEE Int. Conf. Semant. Comput. pp 1–9

  62. Matellanes A. Evans A, Erdal B (2006) Creating an application for automatic annotation of images and video. Proc. 1st First Int. Work. Semant. Web Annot. Multimed.

  63. MediaEval (2013) Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop.

  64. Medvet E, Bartoli A, Davanzo G, De Lorenzo A (2011) Automatic face annotation in news images by mining the web. In: {WI-IAT} 2011 - Proc. 2011 IEEE/WIC/ACM Int. Conf. Web Intell. Intell. Agent Technol. - Vol. 01. IEEE Computer Society, Washington, DC, pp 47–54

  65. Mezaris V, Scherp A, Jain R, Kankanhalli M (2014) Real-life events in multimedia: detection, representation, retrieval, and applications. Multimed Tools Appl 70:1–6. doi:10.1007/s11042-013-1426-8

    Article  Google Scholar 

  66. Monaghan F, O’Sullivan D (2007) Leveraging ontologies, context and social networks to automate photo annotation. In: Falcidieno B, Spagnuolo M, Avrithis Y, Kompatsiaris I, Buitelaar P (eds) Semant. Multimed. Springer, Berlin, pp 252–255

    Chapter  Google Scholar 

  67. Naaman M, Harada S, Wang Q, Garcia-Molina H, Paepcke A (2004) Context data in geo-referenced digital photo collections. In: {MULTIMEDIA} 2004 - Proc. 12th Annu. ACM Int. Conf. Multimed. ACM, New York, pp 196–203

  68. Naaman M, Yeh RB, Garcia-Molina H, Paepcke A (2005) Leveraging context to resolve identity in photo albums. In: {JCDL} 2005 - Proc. 5th ACM/IEEE-CS Jt. Conf. Digit. Libr. ACM, New York, pp 178–187

  69. Nita B, Serbanati LD (2013) Using the surrounding WEB content of pictures to generate candidates for photo annotation. In: {CSCS} 2013 - Proc. 2013 19th Int. Conf. Control Syst. Comput. Sci. IEEE Computer Society, Washington, DC, pp 255–262

  70. O’Hare N, Smeaton AF (2009) Context-aware person identification in personal photo collections. Multimedia, IEEE Trans 11:220–228. doi:10.1109/TMM.2008.2009679

    Article  Google Scholar 

  71. O’Hare N, Gurrin C, Jones GJF, Lee H, O’Connor NE, Smeaton AF (2007) using text search for personal photo collections with the mediassist system. In: {SAC} 2007 - Proc. ACM Symp. Appl. Comput. ACM, New York, pp 880–881

  72. O’Toole AJ, An X, Dunlop J, Natu V, Phillips PJ (2012) Comparing face recognition algorithms to humans on challenging tasks. ACM Trans Appl Percept 9:16:1–16:13

    Google Scholar 

  73. Paniagua J, Tankoyeu I, Stöttinger J, Giunchiglia F (2013) Social events and social ties. In: {ICMR} 2013 - Proc. 3rd ACM Conf. Int. Conf. Multimed. Retr. ACM, New York, pp 143–150

  74. Perelman D, Bortnikov E, Lempel R, Sandler R (2012) Lightweight automatic face annotation in media pages. In: {WWW} 2012 - Proc. 21st Int. Conf. World Wide Web. ACM, New York, pp 939–948

  75. Petridis K, Anastasopoulos D, Saathoff C, Timmermann N, Kompatsiaris Y, Staab S (2006) M-OntoMat-Annotizer: image annotation linking ontologies and multimedia low-level features. In: Gabrys B, Howlett R, Jain L (eds) Knowledge-based Intell. Inf. Eng. Syst. Springer, Berlin, pp 633–640

    Google Scholar 

  76. Pham T-T, Maillot NE, Lim J-H, Chevallet J-P (2007) Latent semantic fusion model for image retrieval and annotation. In: Proc. Sixt. ACM Conf. Conf. Inf. Knowl. Manag. - CIKM ‘07. ACM Press, New York, pp 439–444

  77. Psallidas F, Becker H, Naaman M, Gravano L (2013) Effective event identification in social media. IEEE Data Eng Bull 36:42–50

    Google Scholar 

  78. Rabbath M, Sandhaus P, Boll S (2012) Analysing Facebook features to support event detection for photo-based Facebook applications. In: {ICMR} 2012 - Proc. 2Nd ACM Int. Conf. Multimed. Retr. ACM, New York, p 11:1–11:8

  79. Rodden K, Wood KR (2003) How do people manage their digital photographs? In: {CHI} 2003 - Proc. SIGCHI Conf. Hum. Factors Comput. Syst. ACM, New York, pp 409–416

  80. Russell BC, Torralba A, Murphy KP, Freeman WT (2008) LabelMe: a database and web-based tool for image annotation. Int J Comput Vis 77:157–173. doi:10.1007/s11263-007-0090-8

    Article  Google Scholar 

  81. Sadlier D, Lee H, Gurrin C, Smeaton AF, O’Connor NE, et al. (2008) User-feedback on a feature-rich photo organiser. In: {WIAMIS} 2008 - Proc. Ninth Int. Work. Image Anal. Multimed. Interact. Serv. pp 215–218

  82. Sandhaus P, Boll S (2011) Semantic analysis and retrieval in personal and social photo collections. Multimed Tools Appl 51:5–33. doi:10.1007/s11042-010-0673-1

    Article  Google Scholar 

  83. Satta R, Fumera G, Roli F (2012) Appearance-based people recognition by local dissimilarity representations. In: Proc. Multimed. Secur. ACM, New York, pp 151–156

  84. Schreiber ATG, Dubbeldam B, Wielemaker J, Wielinga B (2001) Ontology-based photo annotation. IEEE Intell Syst 16:66–74. doi:10.1109/5254.940028

    Article  Google Scholar 

  85. Schweer A, Hinze A (2007) The digital parrot: combining {Context-Awareness} and semantics to augment memory. Proc. Work. Support. Hum. Mem. with Interact. Syst. (MeMos 2007)

  86. Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22:1349–1380. doi:10.1109/34.895972

    Article  Google Scholar 

  87. Smith JR (2012) Minding the gap. IEEE Multimed 19:2–3. doi:10.1109/MMUL.2012.9

    Google Scholar 

  88. Spyrou E, Mylonas P (2016) Analyzing Flickr metadata to extract location-based information and semantically organize its photo content. Neurocomputing 172:114–133. doi:10.1016/j.neucom.2014.12.104

    Article  Google Scholar 

  89. Stone Z, Zickler T, Darrell T (2010) Toward large-scale face recognition using social network context. In: Proc. IEEE. pp 1408–1415

  90. Suh B, Bederson BB (2007) Semi-automatic photo annotation strategies using event based clustering and clothing based person recognition. Interact Comput 19:524–544

    Article  Google Scholar 

  91. Turk M, Pentland A (1991) Eigenfaces for recognition. J Cogn Neurosci 3

  92. Verborgh R, Van Deursen D, Mannens E, Poppe C, de Walle R (2012) Enabling context-aware multimedia annotation by a novel generic semantic problem-solving platform. Multimed Tools Appl 61:105–129. doi:10.1007/s11042-010-0709-6

    Article  Google Scholar 

  93. Viana W, Miron AD, Moisuc B, Gensel J, Villanova-Oliver M, Martin H (2011) Towards the semantic and context-aware Management of Mobile Multimedia. Multimed Tools Appl 53:391–429. doi:10.1007/s11042-010-0502-6

    Article  Google Scholar 

  94. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: {CVPR} 2001 - Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. p I-511–I-518 vol.1

  95. von Ahn L, Dabbish L (2004) Labeling images with a computer game. In: Proc. SIGCHI Conf. Hum. Factors Comput. Syst. ACM, New York, pp 319–326

  96. Vyas D, Nijholt A, van der Veer G (2013) Practices surrounding event photos. In: Kotzé P, Marsden G, Lindgaard G, Wesson J, Winckler M (eds) Human-computer interact. – INTERACT 2013. Springer, Berlin, pp 55–72

    Chapter  Google Scholar 

  97. Wagenaar WA (1986) My memory: a study of autobiographical memory over six years. Cogn Psychol 18:225–252

    Article  Google Scholar 

  98. Wang M, Hua X-S (2011) Active learning in multimedia annotation and retrieval: a survey. ACM Trans Intell Syst Technol 2:10:1–10:21. doi:10.1145/1899412.1899414

    Article  Google Scholar 

  99. Wang X, Zhang T (2011) Clothes search in consumer photos via color matching and attribute learning. In: {MM} 2011 - Proc. 19th ACM Int. Conf. Multimed. ACM, New York, pp 1353–1356

  100. Wang G, Gallagher A, Luo J, Forsyth D (2010) Seeing people in social context: recognizing people and social relationships. In: Daniilidis K, Maragos P, Paragios N (eds) Comput. Vis. – ECCV 2010. Springer, Berlin, pp 169–182

    Chapter  Google Scholar 

  101. Wang D, Hoi SCH, He Y, Zhu J (2011) Retrieval-based face annotation by weak label regularized local coordinate coding. In: {MM} 2011 - Proc. 19th ACM Int. Conf. Multimed. ACM, New York, pp 353–362

  102. Wells L (2015) Photography: a critical introduction. Taylor & Francis, London

    Google Scholar 

  103. Wilhelm A, Takhteyev Y, Sarvas R, Van House N, Davis M (2004) Photo annotation on a camera phone. In: {CHI} 2004 - Proc. Ext. Abstr. Hum. Factors Comput. Syst. ACM, New York, pp 1403–1406

  104. Wu O, Zuo H, Hu W, Zhu M, Li S (2008) Recognizing and filtering web images based on people’s existence. In: {WI-IAT} 2008 - Proc. 2008 IEEE/WIC/ACM Int. Conf. Web Intell. Intell. Agent Technol. - Vol. 01. IEEE Computer Society, Washington, DC, pp 648–654

  105. Yagnik J, Islam A (2007) Learning people annotation from the web via consistency learning. In: {MIR} 2007 - Proc. Int. Work. Work. Multimed. Inf. Retr. ACM, New York, NY, USA, pp 285–290

  106. Yang M-H, Kriegman DJ, Ahuja N (2002) Detecting faces in images: a survey. IEEE Trans Pattern Anal Mach Intell 24:34–58

    Article  Google Scholar 

  107. Yao B, Yang X, Zhu S-C (2007) Introduction to a large-scale general purpose ground truth database: methodology, annotation tool and benchmarks. In: Yuille AL, Zhu S-C, Cremers D, Wang Y (eds) Energy minimization methods comput. Vis. Pattern Recognit. 6th Int. conf. EMMCVPR 2007, Ezhou, China, august 27–29, 2007. Proc. Springer, Berlin, pp 169–183

    Google Scholar 

  108. Nakaji Yusuke, Yanai K (2012) Visualization of real-world events with geotagged tweet photos. In: {ICMEW} 2012 - Proc. IEEE Int. Conf. Multimed. Expo Work. pp 272–277

  109. Zhang W, Zhang T, Tretter D (2010) Clothing-based person clustering in family photos. In: {ICIP} 2010 - Proc. 17th IEEE Int. Conf. Image Process. pp 4593–4596

  110. Zhang D, Islam MM, Lu G (2012a) A review on automatic image annotation techniques. Pattern Recogn 45:346–362

    Article  Google Scholar 

  111. Zhang D, Islam MM, Lu G (2012b) A review on automatic image annotation techniques. Pattern Recogn 45:346–362. doi:10.1016/j.patcog.2011.05.013

    Article  Google Scholar 

  112. Zhao W, Chellappa R, Phillips PJ, Rosenfeld A (2003) Face recognition: a literature survey. ACM Comput Surv 35:399–458

    Article  Google Scholar 

  113. Zhu S, Shi Z, Sun C, Shen S (2015) Deep neural network based image annotation. Pattern Recogn Lett 65:103–108. doi:10.1016/j.patrec.2015.07.037

    Article  Google Scholar 

  114. Zigkolis C, Papadopoulos S, Filippou G, Kompatsiaris Y, Vakali A (2014) Collaborative event annotation in tagged photo collections. Multimed Tools Appl 70:89–118. doi:10.1007/s11042-012-1154-5

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Davi Oliveira Serrano de Andrade.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

de Andrade, D.O.S., Maia, L.F., de Figueirêdo, H.F. et al. Photo annotation: a survey. Multimed Tools Appl 77, 423–457 (2018).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

