Abstract
In the last years crowdsourcing has emerged as a viable platform for conducting relevance assessments. The main reason behind this trend is that makes possible to conduct experiments extremely fast, with good results and at low cost. However, like in any experiment, there are several details that would make an experiment work or fail. To gather useful results, user interface guidelines, inter-agreement metrics, and justification analysis are important aspects of a successful crowdsourcing experiment. In this work we explore the design and execution of relevance judgments using Amazon Mechanical Turk as crowdsourcing platform, introducing a methodology for crowdsourcing relevance assessments and the results of a series of experiments using TREC 8 with a fixed budget. Our findings indicate that workers are as good as TREC experts, even providing detailed feedback for certain query-document pairs. We also explore the importance of document design and presentation when performing relevance assessment tasks. Finally, we show our methodology at work with several examples that are interesting in their own.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
von Ahn, L.: Games with a purpose. IEEE Computer 39(6), 92–94 (2006)
Alonso, O., Mizzaro, S.: Can we get rid of TREC Assessors? Using Mechanical Turk for Relevance Assessment. In: SIGIR Workshop Future of IR Evaluation (2009)
Alonso, O., Schenkel, R., Theobald, M.: Crowdsourcing Assessments for XML Ranked Retrieval. In: 32 ECIR, Milton Keynes, UK (2010)
Alonso, O., Baeza-Yates, R.: An Analysis of Crowdsourcing Relevance Assessments in Spanish. In: CERI 2010, Madrid, Spain (2010)
Bradburn, N., Sudman, S., Wansink, B.: Asking Questions: The Definitive Guide to Questionnaire Design. Josey-Bass (2004)
Callison-Burch, C.: Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk. In: Proceedings of EMNLP (2009)
Grady, C., Lease, M.: Crowdsourcing Document Relevance Assessment with Mechanical Turk. In: NAACL HLT Workshop on Creating Speech and Language Data with Amazons Mechanical Turk (2010)
Kazai, G., Milic-Frayling, N., Costello, J.: Towards Methods for the Collective Gathering and Quality Control of Relevance Assessments. In: 32 SIGIR (2009)
Kinney, K., Huffman, S., Zhai, J.: How Evaluator Domain Expertise Affects Search Result Relevance Judgments. In: 17 CIKM (2008)
Malone, T.W., Laubacher, R., Dellarocas, C.: Harnessing Crowds: Mapping the Genome of Collective Intelligence. MIT Press, Cambridge (2009)
Mason, W., Watts, D.: Financial Incentives and the ‘Performance of Crowds’. In: HCOMP Workshop at KDD, Paris, France (2009)
Nov, O., Naaman, M., Ye, C.: What Drives Content Tagging: The Case of Photos on Flickr. In: CHI, Florence, Italy (2008)
Snow, R., O’ Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and Fast - But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks. In: EMNLP (2008)
Tang, J., Sanderson, M.: Evaluation and User Preference Study on Spatial Diversity. In: 32 ECIR, Milton Keynes, UK (2010)
Voorhees, E.: Personal communication (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Alonso, O., Baeza-Yates, R. (2011). Design and Implementation of Relevance Assessments Using Crowdsourcing. In: Clough, P., et al. Advances in Information Retrieval. ECIR 2011. Lecture Notes in Computer Science, vol 6611. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20161-5_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-20161-5_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20160-8
Online ISBN: 978-3-642-20161-5
eBook Packages: Computer ScienceComputer Science (R0)