research-article

A Game-Theory Approach for Effective Crowdsource-Based Relevance Assessment

Authors:

Yashar Moshfeghi,

Alvaro Francisco Huertas Rosero,

Joemon M. JoseAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology (TIST), Volume 7, Issue 4

Article No.: 55, Pages 1 - 25

https://doi.org/10.1145/2873063

Published: 31 March 2016 Publication History

Abstract

Despite the ever-increasing popularity of crowdsourcing (CS) in both industry and academia, procedures that ensure quality in its results are still elusive. We hypothesise that a CS design based on game theory can persuade workers to perform their tasks as quickly as possible with the highest quality. In order to do so, in this article we propose a CS framework inspired by the n-person Chicken game. Our aim is to address the problem of CS quality without compromising on CS benefits such as low monetary cost and high task completion speed. With that goal in mind, we study the effects of knowledge updates as well as incentives for good workers to continue playing. We define a general task with the characteristics of relevance assessment as a case study, because it has been widely explored in the past with CS due to its potential cost and complexity. In order to investigate our hypotheses, we conduct a simulation where we study the effect of the proposed framework on data accuracy, task completion time, and total monetary rewards. Based on a game-theoretical analysis, we study how different types of individuals would behave under a particular game scenario. In particular, we simulate a population comprised of different types of workers with varying ability to formulate optimal strategies and learn from their experiences. A simulation of the proposed framework produced results that support our hypothesis.

References

[1]

Omar Alonso. 2013. Implementing crowdsourcing-based relevance experimentation: An industrial perspective. Inform. Retriev. 16, 2 (April 2013), 101--120.

Digital Library

[2]

Omar Alonso and Ricardo Baeza-Yates. 2011. Design and implementation of relevance assessments using crowdsourcing. In Advances in Information Retrieval. Springer, Berlin, 153--164.

Digital Library

[3]

Omar Alonso and Stefano Mizzaro. 2009. Can we get rid of TREC assessors? Using mechanical turk for relevance assessment. In Proceedings of the SIGIR 2009 Workshop on the Future of IR Evaluation. 557--566.

[4]

Omar Alonso, Daniel E. Rose, and Benjamin Stewart. 2008a. Crowdsourcing for relevance evaluation. ACM SIGIR Forum 42, 2 (2008), 9.

Digital Library

[5]

Omar Alonso, Daniel E. Rose, and Benjamin Stewart. 2008b. Crowdsourcing for relevance evaluation. SIGIR Forum 42, 2 (Nov. 2008), 9--15.

Digital Library

[6]

Vamshi Ambati, Stephan Vogel, and Jaime Carbonell. 2010. Active learning and crowd-sourcing for machine translation. Lang. Resourc. Eval. 2, 1 (2010), 2169--2174.

[7]

Jaime Arguello, Fernando Diaz, Jamie Callan, and Ben Carterette. 2011. A methodology for evaluating aggregated search results. In Advances in Information Retrieval, 33rd European Conference on IR Research 6611 (2011). 141--152.

Digital Library

[8]

Peter Bailey, Nick Craswell, Ian Soboroff, Paul Thomas, Arjen P. de Vries, and Emine Yilmaz. 2008. Relevance assessment: Are judges exchangeable and does it matter. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’08). ACM, New York, NY, 667--674.

Digital Library

[9]

Roi Blanco, Harry Halpin, Daniel M. Herzig, Peter Mika, Jeffrey Pound, Henry S. Thompson, and Thanh Tran Duc. 2011. Repeatable and reliable search system evaluation using crowdsourcing. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). ACM, New York, NY, 923--932.

Digital Library

[10]

Ben Carterette and Ian Soborof. 2010. The effect of assessor error on IR system evaluation. In SIGIR’10. 539--546.

Digital Library

[11]

Sebastian Deterding, Miguel Sicart, Lennart Nacke, Kenton O’Hara, and Dan Dixon. 2011. Gamification. using game-design elements in non-gaming contexts. In PART 2-Proceedings of the 2011 Annual Conference Extended Abstracts on Human Factors in Computing Systems. ACM, New York, NY, 2425--2428.

Digital Library

[12]

Karl Wolfgang Deutsch. 1968. The Analysis of International Relations. Vol. 12. Prentice-Hall, Englewood Cliffs, NJ.

[13]

Dominic DiPalantino and Milan Vojnovic. 2009. Crowdsourcing and all-pay auctions. In Proceedings of the 10th ACM Conference on Electronic Commerce (EC’09). ACM, New York, NY, 119--128.

Digital Library

[14]

Julie S. Downs, Mandy B. Holbrook, Steve Sheng, and Lorrie Faith Cranor. 2010. Are your participants gaming the system?: Screening mechanical turk workers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’10). ACM, New York, NY, 2399--2402. 10.1145/1753326.1753688

Digital Library

[15]

Carsten Eickhoff and Arjen de Vries. 2011. How crowdsourcable is your task. In Proceedings of the Workshop on Crowdsourcing for Search and Data Mining (CSDM) at the 4th ACM International Conference on Web Search and Data Mining (WSDM). 11--14.

[16]

Carsten Eickhoff and Arjen P de Vries. 2013. Increasing cheat robustness of crowdsourcing tasks. Inform. Retriev. (2013), 1--17.

Digital Library

[17]

Carsten Eickhoff, Christopher G. Harris, Arjen P. de Vries, and Padmini Srinivasan. 2012. Quality through flow and immersion: Gamifying crowdsourced relevance assessments. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY 871--880.

Digital Library

[18]

Arpita Ghosh and Patrick Hummel. 2012. Implementing optimal outcomes in social computing: A game-theoretic approach. In Proceedings of the 21st International Conference on World Wide Web. ACM, New York, NY, 539--548.

Digital Library

[19]

Catherine Grady and Matthew Lease. 2010. Crowdsourcing document relevance assessment with mechanical turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk (CSLDAMT’10). Association for Computational Linguistics, Stroudsburg, PA, 172--179.

Digital Library

[20]

John C. Harsanyi and Reinhard Selten. 1988. A General Theory of Equilibrium Selection in Games. MIT Press, Cambridge, MA.

[21]

Jack Hirshleifer and John G. Riley. 1992. The Analytics of Uncertainty and Information. Cambridge University Press, Cambridge.

[22]

John Joseph Horton and Lydia B. Chilton. 2010. The labor economics of paid crowdsourcing. In Proceedings of the 11th ACM Conference on Electronic Commerce. ACM, New York, NY, 209--218.

Digital Library

[23]

J. Howe. 2006. Crowdsourcing: A definition. Crowdsourcing Tracking the Rise of the Amateur 33 (2006), 159--174.

[24]

Panagiotis G. Ipeirotis. 2010. Analyzing the amazon mechanical turk marketplace. XRDS 17, 2 (Dec. 2010), 16--21.

Digital Library

[25]

Panagiotis G. Ipeirotis, Foster Provost, and Jing Wang. 2010. Quality management on amazon mechanical turk. In Proceedings of the ACM SIGKDD Workshop on Human Computation (HCOMP’10). ACM, New York, NY, 64--67.

Digital Library

[26]

Shaili Jain and David C. Parkes. 2009a. The role of game theory in human computation systems. In Proceedings of the ACM SIGKDD Workshop on Human Computation. ACM, New York, NY, 58--61.

Digital Library

[27]

Shaili Jain and David C. Parkes. 2009b. The role of game theory in human computation systems. In Proceedings of the ACM SIGKDD Workshop on Human Computation (HCOMP’09). ACM, New York, NY, 58--61.

Digital Library

[28]

Adam Kapelner and Dana Chandler. 2010. Preventing satisficing in online surveys: A ‘kapcha’ to ensure higher quality data. In Proceedings of the Worlds 1st Conference on the Future of Distributed Work (CrowdConf’10).

[29]

Nicolas Kaufmann, Thimo Schulze, and Daniel Veit. 2011. More than fun and money. Worker motivation in crowdsourcing-a study on mechanical turk. In Proceedings of the 17th Americas Conference on Information Systems (AMCIS’11).

[30]

Gabriella Kazai. 2011. In search of quality in crowdsourcing for search engine evaluation. In Advances in Information Retrieval, 33rd European Conference on IR Research, Vol. 6611. Springer, Berlin, 165--176.

Digital Library

[31]

Gabriella Kazai, Nick Craswell, Emine Yilmaz, and S. M. M Tahaghoghi. 2012. An analysis of systematic judging errors in information retrieval. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, NY, 105--114. 10.1145/2396761.2396779

Digital Library

[32]

Gabriella Kazai, Jaap Kamps, and Natasa Milic-Frayling. 2011. Worker types and personality traits in crowdsourcing relevance labels. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management. ACM, New York, NY, 1941--1944.

Digital Library

[33]

Gabriella Kazai, Jaap Kamps, and Natasa Milic-Frayling. 2012. The face of quality in crowdsourcing relevance labels: Demographics, personality and labeling accuracy. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, New York, NY, 2583--2586.

Digital Library

[34]

Robert Kern, Hans Thies, and Gerhard Satzger. 2010. Statistical quality control for human-based electronic services. In Service-Oriented Computing. Springer, Berlin, 243--257.

[35]

Kenneth A. Kinney, Scott B. Huffman, and Juting Zhai. 2008. How evaluator domain expertise affects search result relevance judgments. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM’08). ACM, New York, NY, 591--598.

Digital Library

[36]

Aniket Kittur, Ed H. Chi, and Bongwon Suh. 2008. Crowdsourcing user studies with mechanical turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’08). ACM, New York, NY, 453--456.

Digital Library

[37]

Edith Law, Paul N. Bennett, and Eric Horvitz. 2011. The effects of choice in routing relevance judgments. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). ACM, New York, NY, 1127--1128. 10.1145/2009916.2010082

Digital Library

[38]

John Le, Andy Edmonds, Vaughn Hester, and Lukas Biewald. 2010. Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution. In Proceedings of the SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation. 21--26.

[39]

Winter Mason and Siddharth Suri. 2011. Conducting behavioral research on Amazon’s Mechanical Turk. Behav. Res. Methods 44, 1 (June 2011), 1--23.

[40]

Winter Mason and Duncan J. Watts. 2009. Financial incentives and the “performance of crowds”. In Proceedings of the ACM SIGKDD Workshop on Human Computation (HCOMP’09). ACM, New York, NY, 77--85.

Digital Library

[41]

Paula McDermott and Colm O’Riordan. 2002. A System for multi-agent information retrieval. In Artificial Intelligence and Cognitive Science, Michael O’Neill, Richard F. E. Sutcliffe, Conor Ryan, Malachy Eaton, and Niall J. L. Griffith (Eds.). Lecture Notes in Computer Science, Vol. 2464. Springer, Berlin, 70--77.

Digital Library

[42]

Robert McGill, John W. Tukey, and Wayne A. Larsen. 1978. Variations of box plots. Am. Statist. 32, 1 (1978), 12--16.

[43]

Carmen F. Menezes and David L. Hanson. 1970. On the theory of risk aversion. Int. Econ. Review (1970), 481--487.

[44]

Sandeep Mishra, Martin L. Lalumière, and Robert J. Williams. 2010. Gambling as a form of risk-taking: Individual differences in personality, risk-accepting attitudes, and behavioral preferences for risk. Person. Indiv. Differ. 49, 6 (2010), 616--621.

[45]

Yashar Moshfeghi, Michael Matthews, Roi Blanco, and Joemon M. Jose. 2013. Influence of timeline and named-entity components on user engagement. In Advances in Information Retrieval, 35th European Conference on IR Research. 305--317.

Digital Library

[46]

Oded Nov, Mor Naaman, and Chen Ye. 2008. What drives content tagging: The case of photos on Flickr. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’08). ACM, New York, NY, 1097--1100.

Digital Library

[47]

Stefanie Nowak and Stefan Rüger. 2010. How reliable are annotations via crowdsourcing: A study about inter-annotator agreement for multi-label image annotation. In Proceedings of the International Conference on Multimedia Information Retrieval. ACM, New York, NY, 557--566.

Digital Library

[48]

Martin J. Osborne and Ariel Rubinstein. 1994. A Course in Game Theory. MIT Press, Cambridge, MA.

[49]

Laurel D. Riek, Maria F. O’Connor, and Peter Robinson. 2011. Guess what? a game for affective annotation of video using crowd sourcing. In Affective Computing and Intelligent Interaction. Springer, Berlin, 277--285.

Digital Library

[50]

Joel Ross, Lilly Irani, M. Six Silberman, Andrew Zaldivar, and Bill Tomlinson. 2010. Who are the crowdworkers?: Shifting demographics in mechanical turk. In CHI’10 Extended Abstracts on Human Factors in Computing Systems (CHI EA’10). ACM, New York, NY, 2863--2872. 1753846.1753873

Digital Library

[51]

Vello Sermat and Robert P. Gregovich. 1966. The effect of experimental manipulation on cooperative behavior in a chicken game. Psychonom. Sci. 4, 12 (1966), 435--436.

[52]

Aaron D. Shaw, John J. Horton, and Daniel L. Chen. 2011. Designing incentives for inexpert human raters. In Proceedings of the ACM 2011 Conference on Computer Supported Cooperative Work (CSCW’11). ACM, New York, NY, 275--284.

Digital Library

[53]

Victor S. Sheng, Foster Provost, and Panagiotis G. Ipeirotis. 2008. Get another label? improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08). ACM, New York, NY, 614--622.

Digital Library

[54]

Mark D. Smucker and Chandra Prakash Jethani. 2011. Measuring assessor accuracy: A comparison of nist assessors and user study participants. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). ACM, New York, NY, 1231--1232.

Digital Library

[55]

Rion Snow, Brendan O’Connor, Daniel Jurafsky, and Andrew Y. Ng. 2008. Cheap and fast—but is it good?: Evaluating non-expert annotations for natural language tasks. In EMNLP. Association for Computational Linguistics, 254--263.

Digital Library

[56]

Miklos N. Szilagyi. 2007. Agent-based simulation of the n-person chicken game. In Advances in Dynamic Game Theory, Steffen Jorgensen, Marc Quincampoix, and Thomas L. Vincent (Eds.). Annals of the International Society of Dynamic Games, Vol. 9. Birkhäuser Boston, 696--703. 978-0-8176-4553-3_34

[57]

Luis Von Ahn. 2006. Games with a purpose. Computer 39, 6 (2006), 92--94.

Digital Library

[58]

Luis Von Ahn and Laura Dabbish. 2008. Designing games with a purpose. Commun. ACM 51, 8 (2008), 58--67.

Digital Library

[59]

Jing Wang, Siamak Faridani, and P. Ipeirotis. 2011. Estimating the completion time of crowdsourced tasks using survival analysis models. In Proceedings of the Workshop on Crowdsourcing for Search and Data Mining (CSDM’11).

[60]

Mark N. Wexler. 2010. Reconfiguring the sociology of the crowd: Exploring crowdsourcing. Int. J. Sociol. Soc. Policy (2010), 6--20.

[61]

Yuxiang Zhao and Qinghua Zhu. 2012. Evaluation on crowdsourcing research: Current status and future direction. Inform. Syst. Front. (2012), 1--18.

Cited By

Humayun HMalik MGhazali M(2024)Analysis of Motivational Theories in Crowdsourcing Using Long Tail Theory: A Systematic Literature ReviewInternational Journal of Crowd Science10.26599/IJCS.2023.91000108:1(10-27)Online publication date: Feb-2024
https://doi.org/10.26599/IJCS.2023.9100010
Moshfeghi YHuertas-Rosero A(2021)A Game Theory Approach for Estimating Reliability of Crowdsourced Relevance AssessmentsACM Transactions on Information Systems10.1145/348096540:3(1-29)Online publication date: 17-Nov-2021
https://dl.acm.org/doi/10.1145/3480965
Mou CTsai WJiang XYang D(2021)Game-Theoretic Analysis on CBDC AdoptionIntelligent Computing and Block Chain10.1007/978-981-16-1160-5_23(294-305)Online publication date: 11-Mar-2021
https://doi.org/10.1007/978-981-16-1160-5_23
Show More Cited By

Index Terms

A Game-Theory Approach for Effective Crowdsource-Based Relevance Assessment
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Document filtering
      2. Information extraction

Recommendations

Identifying Careless Workers in Crowdsourcing Platforms: A Game Theory Approach
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

In this paper we introduce a game scenario for crowdsourcing (CS) using incentives as a bait for careless (gambler) workers, who respond to them in a characteristic way. We hypothesise that careless workers are risk-inclined and can be detected in the ...
A Game Theory Approach for Estimating Reliability of Crowdsourced Relevance Assessments
In this article, we propose an approach to improve quality in crowdsourcing (CS) tasks using Task Completion Time (TCT) as a source of information about the reliability of workers in a game-theoretical competitive scenario. Our approach is based on the ...
Discovering theorems in game theory: Two-person games with unique pure Nash equilibrium payoffs

In this paper we provide a logical framework for two-person finite games in strategic form, and use it to design a computer program for discovering some classes of games that have unique pure Nash equilibrium payoffs. The classes of games that we ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology

ACM Transactions on Intelligent Systems and Technology Volume 7, Issue 4

Special Issue on Crowd in Intelligent Systems, Research Note/Short Paper and Regular Papers

July 2016

498 pages

ISSN:2157-6904

EISSN:2157-6912

DOI:10.1145/2906145

Editor:
Yu Zheng
Microsoft Research, China

Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 March 2016

Accepted: 01 December 2015

Revised: 01 December 2015

Received: 01 February 2015

Published in TIST Volume 7, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
499
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)2

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Humayun HMalik MGhazali M(2024)Analysis of Motivational Theories in Crowdsourcing Using Long Tail Theory: A Systematic Literature ReviewInternational Journal of Crowd Science10.26599/IJCS.2023.91000108:1(10-27)Online publication date: Feb-2024
https://doi.org/10.26599/IJCS.2023.9100010
Moshfeghi YHuertas-Rosero A(2021)A Game Theory Approach for Estimating Reliability of Crowdsourced Relevance AssessmentsACM Transactions on Information Systems10.1145/348096540:3(1-29)Online publication date: 17-Nov-2021
https://dl.acm.org/doi/10.1145/3480965
Mou CTsai WJiang XYang D(2021)Game-Theoretic Analysis on CBDC AdoptionIntelligent Computing and Block Chain10.1007/978-981-16-1160-5_23(294-305)Online publication date: 11-Mar-2021
https://doi.org/10.1007/978-981-16-1160-5_23
Gallardo K(2020)The Importance of Assessment Literacy: Formative and Summative Assessment Instruments and TechniquesWorkgroups eAssessment: Planning, Implementing and Analysing Frameworks10.1007/978-981-15-9908-8_1(3-25)Online publication date: 12-Dec-2020
https://doi.org/10.1007/978-981-15-9908-8_1
Peng PMou CTsai W(2018)Game-Theoretic Analysis on the Number of Participants in the Software Crowdsourcing ContestArtificial Intelligence and Symbolic Computation10.1007/978-3-319-99957-9_20(255-268)Online publication date: 22-Aug-2018
https://doi.org/10.1007/978-3-319-99957-9_20
Chow CGedeon TBrereton M(2017)Evaluating crowdsourced relevance assessments using self-reported traits and task speedProceedings of the 29th Australian Conference on Computer-Human Interaction10.1145/3152771.3156146(407-411)Online publication date: 28-Nov-2017
https://dl.acm.org/doi/10.1145/3152771.3156146
Joshi ABhattacharyya PCarman M(2017)Automatic Sarcasm DetectionACM Computing Surveys10.1145/312442050:5(1-22)Online publication date: 26-Sep-2017
https://dl.acm.org/doi/10.1145/3124420
Ferrante MFerro NMaistro M(2017)AWAREACM Transactions on Information Systems10.1145/311021736:2(1-38)Online publication date: 29-Aug-2017
https://dl.acm.org/doi/10.1145/3110217
Bai XArapakis ICambazoglu BFreire A(2017)Understanding and Leveraging the Impact of Response Latency on User Behaviour in Web SearchACM Transactions on Information Systems10.1145/310637236:2(1-42)Online publication date: 29-Aug-2017
https://dl.acm.org/doi/10.1145/3106372
Medsker L(2017)AI policyAI Matters10.1145/3054837.30548403:1(9-11)Online publication date: 25-May-2017
https://dl.acm.org/doi/10.1145/3054837.3054840
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents