Abstract
With the increasing amount of digital information in the Web and on personal computers, the need for systems that are capable of automated indexing, searching, and organizing multimedia documents is incessantly growing. Automated systems have to retrieve information with high performance in order to be accepted by industry and end users. Multimedia retrieval systems are often evaluated on different test collections with different performance measures, which makes the comparison of retrieval performance impossible and limits the benefits of the approaches. Benchmarking campaigns counteract these tendencies and establish an objective comparison among the performance of different approaches by posing challenging tasks and by pushing the availability of test collections, topics, and performance measures. As part of the THESEUS research program, Fraunhofer IDMT organized the “Visual Concept Detection and Annotation Task” (VCDT) of the international benchmark ImageCLEF, with the goal of enabling the comparison of technologies developed within THESEUS CTC to international developments. While the test collection in 2009 was assessed with expert knowledge, the relevance assessments for the task have been acquired in a crowdsourcing approach since 2010 by using the platform of Amazon Mechanical Turk (MTurk). In this article the evaluation of THESEUS core technologies within ImageCLEF is explained in detail. A special focus lies on the acquisition of ground truth data using MTurk. Advantages and disadvantages of this approach are discussed and best practices are shared.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
References
O. Alonso, D.E. Rose, B. Stewart, Crowdsourcing for relevance evaluation. ACM SIGIR Forum 42(2), 9–15 (2008). http://doi.acm.org/10.1145/1480506.1480508
D. Chandler, A. Kapelner, Breaking monotony with meaning: motivation in crowdsourcing markets. Technical report, University of Chicago, 2010
M. Everingham, L.J.V. Gool, C.K.I. Williams, J.M. Winn, A. Zisserman, The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010). http://dblp.uni-trier.de/db/journals/ijcv/ijcv88.html#EveringhamGWWZ10
A. Felstiner, Working the crowd: employment and labor law in the crowdsourcing industry. Berkeley J. Employ. Labor Law 32(1), 143–204 (2011)
K. Fort, G. Adda, K.B. Cohen, Amazon mechanical turk: gold mine or coal mine? Comput. Linguist. 37(2), 413–420 (2011)
J.J. Horton, R.J. Zeckhauser, Algorithmic wage negotiations: applications to paid crowdsourcing, in Proceedings of CrowdConf, San Francisco, California, 2010
M.J. Huiskes, M.S. Lew, The MIR flickr retrieval evaluation, in Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval (MIR’08), Vancouver, ed. by M.S. Lew, A.D. Bimbo, E.M. Bakker (ACM, New York, 2008), pp. 39–43. http://dblp.uni-trier.de/db/conf/mir/mir2008.html#HuiskesL08
W. Mason, D.J. Watts, Financial incentives and the “performance of crowds”, in Proceedings of the ACM SIGKDD Workshop on Human Computation (HCOMP’09), Paris (ACM, New York, 2009), pp. 77–85
H. Müller, P. Clough, T. Deselaers, B. Caputo (eds.), ImageCLEF: Experimental Evaluation in Visual Information Retrieval. Volume 32 of The Information Retrieval Series (Springer, Berlin/Heidelberg/New York, 2010)
S. Nowak, Evaluation methodologies for visual information retrieval and annotation. PhD thesis, Technische Universität Ilmenau, 2011
S. Nowak, P. Dunker, Overview of the CLEF 2009 large-scale visual concept detection and annotation task, in Multilingual Information Access Evaluation II. Multimedia Experiments, ed. by C. Peters, B. Caputo, J. Gonzalo, G.J.F. Jones, J. Kalpathy-Cramer, H. Múller, T. Tsikrika. Volume 6242 of Lecture Notes in Computer Science (Springer, Berlin/Heidelberg/New York, 2010), pp. 94–109
S. Nowak, S. Rüger, How reliable are annotations via crowdsourcing? A study about inter-annotator agreement for multi-label image annotation, in Proceedings of the International Conference on Multimedia Information Retrieval (MIR’10), Firenze (ACM, New York), 2010, pp. 557–566
J.A. Russell, A circumplex model of affect. J. Personal. Soc. Psychol. 39(6), 1161–1178 (1980)
A.F. Smeaton, P. Over, W. Kraaij, Evaluation campaigns and TRECVid, in Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, Santa Barbara, 26–27 Oct 2006 (ACM, New York, 2006), pp. 321–330
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Liebetrau, J., Nowak, S., Schneider, S. (2014). Evaluation of Image Annotation Using Amazon Mechanical Turk in ImageCLEF. In: Wahlster, W., Grallert, HJ., Wess, S., Friedrich, H., Widenka, T. (eds) Towards the Internet of Services: The THESEUS Research Program. Cognitive Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-06755-1_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-06755-1_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06754-4
Online ISBN: 978-3-319-06755-1
eBook Packages: Computer ScienceComputer Science (R0)