Design and Implementation of Relevance Assessments Using Crowdsourcing

Alonso, Omar; Baeza-Yates, Ricardo

doi:10.1007/978-3-642-20161-5_16

Omar Alonso²¹ &
Ricardo Baeza-Yates²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6611))

Included in the following conference series:

European Conference on Information Retrieval

7347 Accesses
37 Citations

Abstract

In the last years crowdsourcing has emerged as a viable platform for conducting relevance assessments. The main reason behind this trend is that makes possible to conduct experiments extremely fast, with good results and at low cost. However, like in any experiment, there are several details that would make an experiment work or fail. To gather useful results, user interface guidelines, inter-agreement metrics, and justification analysis are important aspects of a successful crowdsourcing experiment. In this work we explore the design and execution of relevance judgments using Amazon Mechanical Turk as crowdsourcing platform, introducing a methodology for crowdsourcing relevance assessments and the results of a series of experiments using TREC 8 with a fixed budget. Our findings indicate that workers are as good as TREC experts, even providing detailed feedback for certain query-document pairs. We also explore the importance of document design and presentation when performing relevance assessment tasks. Finally, we show our methodology at work with several examples that are interesting in their own.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

von Ahn, L.: Games with a purpose. IEEE Computer 39(6), 92–94 (2006)
Article Google Scholar
Alonso, O., Mizzaro, S.: Can we get rid of TREC Assessors? Using Mechanical Turk for Relevance Assessment. In: SIGIR Workshop Future of IR Evaluation (2009)
Google Scholar
Alonso, O., Schenkel, R., Theobald, M.: Crowdsourcing Assessments for XML Ranked Retrieval. In: 32 ECIR, Milton Keynes, UK (2010)
Google Scholar
Alonso, O., Baeza-Yates, R.: An Analysis of Crowdsourcing Relevance Assessments in Spanish. In: CERI 2010, Madrid, Spain (2010)
Google Scholar
Bradburn, N., Sudman, S., Wansink, B.: Asking Questions: The Definitive Guide to Questionnaire Design. Josey-Bass (2004)
Google Scholar
Callison-Burch, C.: Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk. In: Proceedings of EMNLP (2009)
Google Scholar
Grady, C., Lease, M.: Crowdsourcing Document Relevance Assessment with Mechanical Turk. In: NAACL HLT Workshop on Creating Speech and Language Data with Amazons Mechanical Turk (2010)
Google Scholar
Kazai, G., Milic-Frayling, N., Costello, J.: Towards Methods for the Collective Gathering and Quality Control of Relevance Assessments. In: 32 SIGIR (2009)
Google Scholar
Kinney, K., Huffman, S., Zhai, J.: How Evaluator Domain Expertise Affects Search Result Relevance Judgments. In: 17 CIKM (2008)
Google Scholar
Malone, T.W., Laubacher, R., Dellarocas, C.: Harnessing Crowds: Mapping the Genome of Collective Intelligence. MIT Press, Cambridge (2009)
Google Scholar
Mason, W., Watts, D.: Financial Incentives and the ‘Performance of Crowds’. In: HCOMP Workshop at KDD, Paris, France (2009)
Google Scholar
Nov, O., Naaman, M., Ye, C.: What Drives Content Tagging: The Case of Photos on Flickr. In: CHI, Florence, Italy (2008)
Google Scholar
Snow, R., O’ Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and Fast - But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks. In: EMNLP (2008)
Google Scholar
Tang, J., Sanderson, M.: Evaluation and User Preference Study on Spatial Diversity. In: 32 ECIR, Milton Keynes, UK (2010)
Google Scholar
Voorhees, E.: Personal communication (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Corp., Mountain View, California, USA
Omar Alonso
Yahoo! Research, Barcelona, Spain
Ricardo Baeza-Yates

Authors

Omar Alonso
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Baeza-Yates
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Information School, University of Sheffield, Regent Court, 211 Portobello Street, S1 4DP, Sheffield, UK
Paul Clough
CLARITY: Centre for Sensor Web Technologies, School of Computing, Dublin City University, Glasnevin, Dublin 9, Ireland
Colum Foley , Cathal Gurrin & Hyowon Lee , &
Centre for Next Generation Localisation, School of Computing, Dublin City University, Glasnevin, Dublin 9, Ireland
Gareth J. F. Jones
TNO Human Factors, Brassersplein 2, 2612 CT, Delft, The Netherlands
Wessel Kraaij
Yahoo! Research, 177 Diagonal, 08018, Barcelona, Spain
Vanessa Mudoch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alonso, O., Baeza-Yates, R. (2011). Design and Implementation of Relevance Assessments Using Crowdsourcing. In: Clough, P., et al. Advances in Information Retrieval. ECIR 2011. Lecture Notes in Computer Science, vol 6611. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20161-5_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-20161-5_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20160-8
Online ISBN: 978-3-642-20161-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics