Automatic Image Annotation at ImageCLEF

Wang, Josiah; Gilbert, Andrew; Thomee, Bart; Villegas, Mauricio

doi:10.1007/978-3-030-22948-1_11

Josiah Wang⁹,
Andrew Gilbert¹⁰,
Bart Thomee¹¹ &
…
Mauricio Villegas¹²

Part of the book series: The Information Retrieval Series ((INRE,volume 41))

693 Accesses
1 Citations

Abstract

Automatic image annotation is the task of automatically assigning some form of semantic label to images, such as words, phrases or sentences describing the objects, attributes, actions, and scenes depicted in the image. In this chapter, we present an overview of the various automatic image annotation tasks that were organized in conjunction with the ImageCLEF track at CLEF between 2009–2016. Throughout the 8 years, the image annotation tasks have evolved from annotating Flickr photos by learning from clean data to annotating web images by learning from large-scale noisy web data. The tasks are divided into three distinct phases, and this chapter will provide a discussion for each of these phases. We will also compare and contrast other related benchmarking challenges, and provide some insights into the future of automatic image annotation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Antol S, Agrawal A, Lu J, Mitchell M, Batra D, Zitnick CL, Parikh D (2015) VQA: visual question answering. In: Proceedings of the IEEE international conference on computer vision (ICCV). IEEE, Piscataway, pp 2425–2433. https://doi.org/10.1109/ICCV.2015.279
Google Scholar
Caesar H, Uijlings J, Ferrari V (2018) COCO-stuff: thing and stuff classes in context. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1209–1218. http://openaccess.thecvf.com/content_cvpr_2018/html/Caesar_COCO-Stuff_Thing_and_CVPR_2018_paper.html
Chen X, Fang H, Lin T, Vedantam R, Gupta S, Dollár P, Zitnick CL (2015) Microsoft COCO captions: data collection and evaluation server. CoRR abs/1504.00325. http://arxiv.org/abs/ 1504.00325. 1504.00325.
Google Scholar
Clough P, Grubinger M, Deselaers T, Hanbury A, Müller H (2007) Overview of the ImageCLEF 2006 photographic retrieval and object annotation tasks. In: Peters C, Clough P, Gey FC, Karlgren J, Magnini B, Oard DW, de Rijke M, Stempfhuber M (eds) Evaluation of multilingual and multi-modal information retrieval: seventh workshop of the cross–language evaluation forum (CLEF 2006). Revised selected papers. Lecture notes in computer science (LNCS), vol 4730. Springer, Heidelberg, pp 223–256
Chapter Google Scholar
Dang-Nguyen DT, Piras L, Riegler M, Boato G, Zhou L, Gurrin C (2017) Overview of ImageCLEFlifelog 2017: lifelog retrieval and summarization. In: Cappellato L, Ferro N, Goeuriot L, Mandl T (eds) CLEF 2017 working notes. CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073. http://ceur-ws.org/Vol-1866/
Das A, Kottur S, Gupta K, Singh A, Yadav D, Moura JMF, Parikh D, Batra D (2017) Visual dialog. In: Proceedings of the IEEE conference on computer vision and pattern recognition, IEEE, Piscataway, pp 1080–1089. https://doi.org/10.1109/CVPR.2017.121
Google Scholar
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), IEEE, Piscataway, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848
Google Scholar
Denkowski M, Lavie A (2014) Meteor universal: language specific translation evaluation for any target language. In: Proceedings of the ninth workshop on statistical machine translation. Association for computational linguistics, pp 376–380. https://doi.org/10.3115/v1/W14-3348
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The PASCAL visual object classes (VOC) challenge. Int J Comput Vis 88(2):303–338. https://doi.org/10.1007/s11263-009-0275-4
Article Google Scholar
Everingham M, Eslami SMA, Van Gool L, Williams CKI, Winn J, Zisserman A (2015) The PASCAL visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136. https://doi.org/10.1007/s11263-014-0733-5
Article Google Scholar
Fellbaum C (ed) (1998) WordNet an electronic lexical database. MIT Press, Cambridge
MATH Google Scholar
Gilbert A, Piras L, Wang J, Yan F, Dellandrea E, Gaizauskas R, Villegas M, Mikolajczyk K (2015) Overview of the ImageCLEF 2015 scalable image annotation, localization and sentence generation task. In: Cappellato L, Ferro N, Jones GJF, SanJuan E (eds) CLEF 2015 labs and workshops, Notebook papers. CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073. http://ceur-ws.org/Vol-1391/
Gilbert A, Piras L, Wang J, Yan F, Ramisa A, Dellandrea E, Gaizauskas R, Villegas M, Mikolajczyk K (2016) Overview of the ImageCLEF 2016 scalable concept image annotation task. In: Balog K, Cappellato L, Ferro N, Macdonald C (eds) CLEF 2016 working notes. CEUR workshop proceedings (CEUR-WS.org), pp 254–278. ISSN 1613-0073. http://ceur-ws.org/Vol-1609/
Goëau H, Bonnet P, Joly A, Boujemaa N, Barthelemy D, Molino JF, Birnbaum P, Mouysset E, Picard M (2011) The CLEF 2011 plant images classification task. In: Petras V, Forner P, Clough P, Ferro N (eds) CLEF 2011 working notes. CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073. http://ceur-ws.org/Vol-1177/
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1026–1034. https://doi.org/10.1109/ICCV.2015.123
Huiskes MJ, Lew MS (2008) The MIR flickr retrieval evaluation. In: Proceedings of the ACM international conference on multimedia information retrieval, pp 39–43
Google Scholar
Huiskes MJ, Thomee B, Lew MS (2010) New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative. In: Proceedings of the ACM international conference on multimedia information retrieval, pp 527–536
Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems, vol 25. Curran Associates, pp 1097–1105
Google Scholar
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Proceedings of the European conference on computer vision (ECCV). Springer, Berlin, pp 740–755
Google Scholar
Müller H, Deselaers T, Deserno TM, Clough P, Kim E, Hersh WR (2007) Overview of the ImageCLEFmed 2006 medical retrieval and medical annotation tasks. In: Peters C, Clough P, Gey FC, Karlgren J, Magnini B, Oard DW, de Rijke M, Stempfhuber M (eds) Evaluation of multilingual and multi-modal information retrieval: seventh workshop of the cross–language evaluation forum (CLEF 2006). Revised selected papers. Lecture notes in computer science (LNCS), vol 4730. Springer, Heidelberg, pp 595–608
Chapter Google Scholar
Müller H, Deselaers T, Deserno TM, Kalpathy-Cramer J, Kim E, Hersh WR (2008) Overview of the ImageCLEFmed 2007 medical retrieval and medical annotation tasks. In: Peters C, Jijkoun V, Mandl T, Müller H, Oard DW, Peñas A, Petras V, Santos D (eds) Advances in multilingual and multimodal information retrieval: eighth workshop of the cross–language evaluation forum (CLEF 2007). Revised selected papers. Lecture notes in computer science (LNCS), vol 5152. Springer, Heidelberg, pp 472–491
Chapter Google Scholar
Müller H, Kalpathy-Cramer J, Kahn CE, Hatt W, Bedrick S, Hersh W (2009) Overview of the ImageCLEFmed 2008 medical image retrieval task. In: Peters C, Deselaers T, Ferro N, Gonzalo J, Jones GJF, Kurimo M, Mandl T, Peñas A (eds) Evaluating systems for multilingual and multimodal information access: ninth workshop of the cross–language evaluation forum (CLEF 2008). Revised selected papers. Lecture notes in computer science (LNCS), vol 5706. Springer, Heidelberg, pp 512–522
Chapter Google Scholar
Nowak S, Dunker P (2010) Overview of the CLEF 2009 large-scale visual concept detection and annotation task. In: Peters C, Tsikrika T, Müller H, Kalpathy-Cramer J, Jones GJF, Gonzalo J, Caputo B (eds) Multilingual information access evaluation vol. II multimedia experiments – tenth workshop of the cross–language evaluation forum (CLEF 2009). Revised selected papers. Lecture notes in computer science (LNCS). Springer, Heidelberg, pp 94–109
Google Scholar
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis. 42(3):145–175. https://doi.org/10.1023/A:1011139631724
Article Google Scholar
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics (ACL), pp 311–318
Google Scholar
Reshma IA, Ullah MZ, Aono M (2014) KDEVIR at ImageCLEF 2014 scalable concept image annotation task: ontology based automatic image annotation. In: Cappellato L, Ferro N, Halvey M, Kraaij W (eds) CLEF 2014 labs and workshops, Notebook papers, CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073. http://ceur-ws.org/Vol-1180/
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Sahbi H (2013) CNRS - TELECOM ParisTech at ImageCLEF 2013 scalable concept image annotation task: winning annotations with context dependent SVMs. In: Forner P, Navigli R, Tufis D, Ferro N (eds) CLEF 2013 evaluation labs and workshop, Online working notes, CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073. http://ceur-ws.org/Vol-1179/
Thomee B, Popescu A (2012) Overview of the ImageCLEF 2012 flickr photo annotation and retrieval task. In: Forner P, Karlgren J, Womser-Hacker C, Ferro N (eds) CLEF 2012 working notes. CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073. http://ceur-ws.org/Vol-1178/
van de Sande KE, Gevers T, Snoek CG (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell 32:1582–1596. https://doi.org/10.1109/TPAMI.2009.154
Article Google Scholar
Villegas M, Paredes R (2012a) Image-text dataset generation for image annotation and retrieval. In: Berlanga R, Rosso P (eds) II Congreso Español de Recuperación de Información, CERI 2012, Universidad Politécnica de Valencia, Valencia, pp 115–120
Google Scholar
Villegas M, Paredes R (2012b) Overview of the ImageCLEF 2012 scalable web image annotation task. In: Forner P, Karlgren J, Womser-Hacker C, Ferro N (eds) CLEF 2012 working notes. CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073. http://ceur-ws.org/Vol-1178/
Villegas M, Paredes R (2014) Overview of the ImageCLEF 2014 scalable concept image annotation task. In: Cappellato L, Ferro N, Halvey M, Kraaij W (eds) CLEF 2014 labs and workshops, Notebook papers, CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073. http://ceur-ws.org/Vol-1180/, pp 308–328
Villegas M, Paredes R, Thomee B (2013) Overview of the ImageCLEF 2013 scalable concept image annotation subtask. In: Forner P, Navigli R, Tufis D, Ferro N (eds) CLEF 2013 working notes. CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073. http://ceur-ws.org/Vol-1179/
Wang J, Gaizauskas R (2015) Generating image descriptions with gold standard visual inputs: motivation, evaluation and baselines. In: Proceedings of the 15th European workshop on natural language generation (ENLG). Association for computational linguistics, pp 117–126
Google Scholar
Wang J, Yan F, Aker A, Gaizauskas R (2014) A poodle or a dog? Evaluating automatic image annotation using human descriptions at different levels of granularity. In: Proceedings of the third workshop on vision and language, Dublin City University and the association for computational linguistics, pp 38–45
Google Scholar

Download references

Acknowledgements

The Concept Annotation, Localization and Sentence Generation task in ImageCLEF 2015 and 2016 were co-organized by the VisualSense (ViSen) consortium under the ERA-NET CHIST-ERA D2K 2011 Programme, jointly supported by UK EPSRC Grants EP/K019082/1 and EP/K01904X/1, French ANR Grant ANR-12-CHRI-0002-04 and Spanish MINECO Grant PCIN-2013-047.

Author information

Authors and Affiliations

Department of Computing, Imperial College London, London, UK
Josiah Wang
CVSSP, University of Surrey, Guildford, UK
Andrew Gilbert
Google, San Bruno, CA, USA
Bart Thomee
omni:us, Berlin, Germany
Mauricio Villegas

Authors

Josiah Wang
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Gilbert
View author publications
You can also search for this author in PubMed Google Scholar
Bart Thomee
View author publications
You can also search for this author in PubMed Google Scholar
Mauricio Villegas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Josiah Wang .

Editor information

Editors and Affiliations

Dipartimento di Ingegneria dell’Informazione, Università degli Studi di Padova , Padova, Italy
Nicola Ferro
Consiglio Nazionale delle Ricerche, Istituto di Scienza e Tecnologie dell’Informazione, Pisa, Italy
Carol Peters

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wang, J., Gilbert, A., Thomee, B., Villegas, M. (2019). Automatic Image Annotation at ImageCLEF. In: Ferro, N., Peters, C. (eds) Information Retrieval Evaluation in a Changing World. The Information Retrieval Series, vol 41. Springer, Cham. https://doi.org/10.1007/978-3-030-22948-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-22948-1_11
Published: 14 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22947-4
Online ISBN: 978-3-030-22948-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics