skip to main content
research-article

A Game-Theory Approach for Effective Crowdsource-Based Relevance Assessment

Published: 31 March 2016 Publication History

Abstract

Despite the ever-increasing popularity of crowdsourcing (CS) in both industry and academia, procedures that ensure quality in its results are still elusive. We hypothesise that a CS design based on game theory can persuade workers to perform their tasks as quickly as possible with the highest quality. In order to do so, in this article we propose a CS framework inspired by the n-person Chicken game. Our aim is to address the problem of CS quality without compromising on CS benefits such as low monetary cost and high task completion speed. With that goal in mind, we study the effects of knowledge updates as well as incentives for good workers to continue playing. We define a general task with the characteristics of relevance assessment as a case study, because it has been widely explored in the past with CS due to its potential cost and complexity. In order to investigate our hypotheses, we conduct a simulation where we study the effect of the proposed framework on data accuracy, task completion time, and total monetary rewards. Based on a game-theoretical analysis, we study how different types of individuals would behave under a particular game scenario. In particular, we simulate a population comprised of different types of workers with varying ability to formulate optimal strategies and learn from their experiences. A simulation of the proposed framework produced results that support our hypothesis.

References

[1]
Omar Alonso. 2013. Implementing crowdsourcing-based relevance experimentation: An industrial perspective. Inform. Retriev. 16, 2 (April 2013), 101--120.
[2]
Omar Alonso and Ricardo Baeza-Yates. 2011. Design and implementation of relevance assessments using crowdsourcing. In Advances in Information Retrieval. Springer, Berlin, 153--164.
[3]
Omar Alonso and Stefano Mizzaro. 2009. Can we get rid of TREC assessors? Using mechanical turk for relevance assessment. In Proceedings of the SIGIR 2009 Workshop on the Future of IR Evaluation. 557--566.
[4]
Omar Alonso, Daniel E. Rose, and Benjamin Stewart. 2008a. Crowdsourcing for relevance evaluation. ACM SIGIR Forum 42, 2 (2008), 9.
[5]
Omar Alonso, Daniel E. Rose, and Benjamin Stewart. 2008b. Crowdsourcing for relevance evaluation. SIGIR Forum 42, 2 (Nov. 2008), 9--15.
[6]
Vamshi Ambati, Stephan Vogel, and Jaime Carbonell. 2010. Active learning and crowd-sourcing for machine translation. Lang. Resourc. Eval. 2, 1 (2010), 2169--2174.
[7]
Jaime Arguello, Fernando Diaz, Jamie Callan, and Ben Carterette. 2011. A methodology for evaluating aggregated search results. In Advances in Information Retrieval, 33rd European Conference on IR Research 6611 (2011). 141--152.
[8]
Peter Bailey, Nick Craswell, Ian Soboroff, Paul Thomas, Arjen P. de Vries, and Emine Yilmaz. 2008. Relevance assessment: Are judges exchangeable and does it matter. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’08). ACM, New York, NY, 667--674.
[9]
Roi Blanco, Harry Halpin, Daniel M. Herzig, Peter Mika, Jeffrey Pound, Henry S. Thompson, and Thanh Tran Duc. 2011. Repeatable and reliable search system evaluation using crowdsourcing. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). ACM, New York, NY, 923--932.
[10]
Ben Carterette and Ian Soborof. 2010. The effect of assessor error on IR system evaluation. In SIGIR’10. 539--546.
[11]
Sebastian Deterding, Miguel Sicart, Lennart Nacke, Kenton O’Hara, and Dan Dixon. 2011. Gamification. using game-design elements in non-gaming contexts. In PART 2-Proceedings of the 2011 Annual Conference Extended Abstracts on Human Factors in Computing Systems. ACM, New York, NY, 2425--2428.
[12]
Karl Wolfgang Deutsch. 1968. The Analysis of International Relations. Vol. 12. Prentice-Hall, Englewood Cliffs, NJ.
[13]
Dominic DiPalantino and Milan Vojnovic. 2009. Crowdsourcing and all-pay auctions. In Proceedings of the 10th ACM Conference on Electronic Commerce (EC’09). ACM, New York, NY, 119--128.
[14]
Julie S. Downs, Mandy B. Holbrook, Steve Sheng, and Lorrie Faith Cranor. 2010. Are your participants gaming the system?: Screening mechanical turk workers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’10). ACM, New York, NY, 2399--2402. 10.1145/1753326.1753688
[15]
Carsten Eickhoff and Arjen de Vries. 2011. How crowdsourcable is your task. In Proceedings of the Workshop on Crowdsourcing for Search and Data Mining (CSDM) at the 4th ACM International Conference on Web Search and Data Mining (WSDM). 11--14.
[16]
Carsten Eickhoff and Arjen P de Vries. 2013. Increasing cheat robustness of crowdsourcing tasks. Inform. Retriev. (2013), 1--17.
[17]
Carsten Eickhoff, Christopher G. Harris, Arjen P. de Vries, and Padmini Srinivasan. 2012. Quality through flow and immersion: Gamifying crowdsourced relevance assessments. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY 871--880.
[18]
Arpita Ghosh and Patrick Hummel. 2012. Implementing optimal outcomes in social computing: A game-theoretic approach. In Proceedings of the 21st International Conference on World Wide Web. ACM, New York, NY, 539--548.
[19]
Catherine Grady and Matthew Lease. 2010. Crowdsourcing document relevance assessment with mechanical turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk (CSLDAMT’10). Association for Computational Linguistics, Stroudsburg, PA, 172--179.
[20]
John C. Harsanyi and Reinhard Selten. 1988. A General Theory of Equilibrium Selection in Games. MIT Press, Cambridge, MA.
[21]
Jack Hirshleifer and John G. Riley. 1992. The Analytics of Uncertainty and Information. Cambridge University Press, Cambridge.
[22]
John Joseph Horton and Lydia B. Chilton. 2010. The labor economics of paid crowdsourcing. In Proceedings of the 11th ACM Conference on Electronic Commerce. ACM, New York, NY, 209--218.
[23]
J. Howe. 2006. Crowdsourcing: A definition. Crowdsourcing Tracking the Rise of the Amateur 33 (2006), 159--174.
[24]
Panagiotis G. Ipeirotis. 2010. Analyzing the amazon mechanical turk marketplace. XRDS 17, 2 (Dec. 2010), 16--21.
[25]
Panagiotis G. Ipeirotis, Foster Provost, and Jing Wang. 2010. Quality management on amazon mechanical turk. In Proceedings of the ACM SIGKDD Workshop on Human Computation (HCOMP’10). ACM, New York, NY, 64--67.
[26]
Shaili Jain and David C. Parkes. 2009a. The role of game theory in human computation systems. In Proceedings of the ACM SIGKDD Workshop on Human Computation. ACM, New York, NY, 58--61.
[27]
Shaili Jain and David C. Parkes. 2009b. The role of game theory in human computation systems. In Proceedings of the ACM SIGKDD Workshop on Human Computation (HCOMP’09). ACM, New York, NY, 58--61.
[28]
Adam Kapelner and Dana Chandler. 2010. Preventing satisficing in online surveys: A ‘kapcha’ to ensure higher quality data. In Proceedings of the Worlds 1st Conference on the Future of Distributed Work (CrowdConf’10).
[29]
Nicolas Kaufmann, Thimo Schulze, and Daniel Veit. 2011. More than fun and money. Worker motivation in crowdsourcing-a study on mechanical turk. In Proceedings of the 17th Americas Conference on Information Systems (AMCIS’11).
[30]
Gabriella Kazai. 2011. In search of quality in crowdsourcing for search engine evaluation. In Advances in Information Retrieval, 33rd European Conference on IR Research, Vol. 6611. Springer, Berlin, 165--176.
[31]
Gabriella Kazai, Nick Craswell, Emine Yilmaz, and S. M. M Tahaghoghi. 2012. An analysis of systematic judging errors in information retrieval. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, NY, 105--114. 10.1145/2396761.2396779
[32]
Gabriella Kazai, Jaap Kamps, and Natasa Milic-Frayling. 2011. Worker types and personality traits in crowdsourcing relevance labels. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management. ACM, New York, NY, 1941--1944.
[33]
Gabriella Kazai, Jaap Kamps, and Natasa Milic-Frayling. 2012. The face of quality in crowdsourcing relevance labels: Demographics, personality and labeling accuracy. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, New York, NY, 2583--2586.
[34]
Robert Kern, Hans Thies, and Gerhard Satzger. 2010. Statistical quality control for human-based electronic services. In Service-Oriented Computing. Springer, Berlin, 243--257.
[35]
Kenneth A. Kinney, Scott B. Huffman, and Juting Zhai. 2008. How evaluator domain expertise affects search result relevance judgments. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM’08). ACM, New York, NY, 591--598.
[36]
Aniket Kittur, Ed H. Chi, and Bongwon Suh. 2008. Crowdsourcing user studies with mechanical turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’08). ACM, New York, NY, 453--456.
[37]
Edith Law, Paul N. Bennett, and Eric Horvitz. 2011. The effects of choice in routing relevance judgments. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). ACM, New York, NY, 1127--1128. 10.1145/2009916.2010082
[38]
John Le, Andy Edmonds, Vaughn Hester, and Lukas Biewald. 2010. Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution. In Proceedings of the SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation. 21--26.
[39]
Winter Mason and Siddharth Suri. 2011. Conducting behavioral research on Amazon’s Mechanical Turk. Behav. Res. Methods 44, 1 (June 2011), 1--23.
[40]
Winter Mason and Duncan J. Watts. 2009. Financial incentives and the “performance of crowds”. In Proceedings of the ACM SIGKDD Workshop on Human Computation (HCOMP’09). ACM, New York, NY, 77--85.
[41]
Paula McDermott and Colm O’Riordan. 2002. A System for multi-agent information retrieval. In Artificial Intelligence and Cognitive Science, Michael O’Neill, Richard F. E. Sutcliffe, Conor Ryan, Malachy Eaton, and Niall J. L. Griffith (Eds.). Lecture Notes in Computer Science, Vol. 2464. Springer, Berlin, 70--77.
[42]
Robert McGill, John W. Tukey, and Wayne A. Larsen. 1978. Variations of box plots. Am. Statist. 32, 1 (1978), 12--16.
[43]
Carmen F. Menezes and David L. Hanson. 1970. On the theory of risk aversion. Int. Econ. Review (1970), 481--487.
[44]
Sandeep Mishra, Martin L. Lalumière, and Robert J. Williams. 2010. Gambling as a form of risk-taking: Individual differences in personality, risk-accepting attitudes, and behavioral preferences for risk. Person. Indiv. Differ. 49, 6 (2010), 616--621.
[45]
Yashar Moshfeghi, Michael Matthews, Roi Blanco, and Joemon M. Jose. 2013. Influence of timeline and named-entity components on user engagement. In Advances in Information Retrieval, 35th European Conference on IR Research. 305--317.
[46]
Oded Nov, Mor Naaman, and Chen Ye. 2008. What drives content tagging: The case of photos on Flickr. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’08). ACM, New York, NY, 1097--1100.
[47]
Stefanie Nowak and Stefan Rüger. 2010. How reliable are annotations via crowdsourcing: A study about inter-annotator agreement for multi-label image annotation. In Proceedings of the International Conference on Multimedia Information Retrieval. ACM, New York, NY, 557--566.
[48]
Martin J. Osborne and Ariel Rubinstein. 1994. A Course in Game Theory. MIT Press, Cambridge, MA.
[49]
Laurel D. Riek, Maria F. O’Connor, and Peter Robinson. 2011. Guess what? a game for affective annotation of video using crowd sourcing. In Affective Computing and Intelligent Interaction. Springer, Berlin, 277--285.
[50]
Joel Ross, Lilly Irani, M. Six Silberman, Andrew Zaldivar, and Bill Tomlinson. 2010. Who are the crowdworkers?: Shifting demographics in mechanical turk. In CHI’10 Extended Abstracts on Human Factors in Computing Systems (CHI EA’10). ACM, New York, NY, 2863--2872. 1753846.1753873
[51]
Vello Sermat and Robert P. Gregovich. 1966. The effect of experimental manipulation on cooperative behavior in a chicken game. Psychonom. Sci. 4, 12 (1966), 435--436.
[52]
Aaron D. Shaw, John J. Horton, and Daniel L. Chen. 2011. Designing incentives for inexpert human raters. In Proceedings of the ACM 2011 Conference on Computer Supported Cooperative Work (CSCW’11). ACM, New York, NY, 275--284.
[53]
Victor S. Sheng, Foster Provost, and Panagiotis G. Ipeirotis. 2008. Get another label? improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08). ACM, New York, NY, 614--622.
[54]
Mark D. Smucker and Chandra Prakash Jethani. 2011. Measuring assessor accuracy: A comparison of nist assessors and user study participants. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). ACM, New York, NY, 1231--1232.
[55]
Rion Snow, Brendan O’Connor, Daniel Jurafsky, and Andrew Y. Ng. 2008. Cheap and fast—but is it good?: Evaluating non-expert annotations for natural language tasks. In EMNLP. Association for Computational Linguistics, 254--263.
[56]
Miklos N. Szilagyi. 2007. Agent-based simulation of the n-person chicken game. In Advances in Dynamic Game Theory, Steffen Jorgensen, Marc Quincampoix, and Thomas L. Vincent (Eds.). Annals of the International Society of Dynamic Games, Vol. 9. Birkhäuser Boston, 696--703. 978-0-8176-4553-3_34
[57]
Luis Von Ahn. 2006. Games with a purpose. Computer 39, 6 (2006), 92--94.
[58]
Luis Von Ahn and Laura Dabbish. 2008. Designing games with a purpose. Commun. ACM 51, 8 (2008), 58--67.
[59]
Jing Wang, Siamak Faridani, and P. Ipeirotis. 2011. Estimating the completion time of crowdsourced tasks using survival analysis models. In Proceedings of the Workshop on Crowdsourcing for Search and Data Mining (CSDM’11).
[60]
Mark N. Wexler. 2010. Reconfiguring the sociology of the crowd: Exploring crowdsourcing. Int. J. Sociol. Soc. Policy (2010), 6--20.
[61]
Yuxiang Zhao and Qinghua Zhu. 2012. Evaluation on crowdsourcing research: Current status and future direction. Inform. Syst. Front. (2012), 1--18.

Cited By

View all
  • (2024)Analysis of Motivational Theories in Crowdsourcing Using Long Tail Theory: A Systematic Literature ReviewInternational Journal of Crowd Science10.26599/IJCS.2023.91000108:1(10-27)Online publication date: Feb-2024
  • (2021)A Game Theory Approach for Estimating Reliability of Crowdsourced Relevance AssessmentsACM Transactions on Information Systems10.1145/348096540:3(1-29)Online publication date: 17-Nov-2021
  • (2021)Game-Theoretic Analysis on CBDC AdoptionIntelligent Computing and Block Chain10.1007/978-981-16-1160-5_23(294-305)Online publication date: 11-Mar-2021
  • Show More Cited By

Index Terms

  1. A Game-Theory Approach for Effective Crowdsource-Based Relevance Assessment

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Intelligent Systems and Technology
      ACM Transactions on Intelligent Systems and Technology  Volume 7, Issue 4
      Special Issue on Crowd in Intelligent Systems, Research Note/Short Paper and Regular Papers
      July 2016
      498 pages
      ISSN:2157-6904
      EISSN:2157-6912
      DOI:10.1145/2906145
      • Editor:
      • Yu Zheng
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 31 March 2016
      Accepted: 01 December 2015
      Revised: 01 December 2015
      Received: 01 February 2015
      Published in TIST Volume 7, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Game theory
      2. crowdsourcing
      3. relevance assessment

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)15
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 28 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Analysis of Motivational Theories in Crowdsourcing Using Long Tail Theory: A Systematic Literature ReviewInternational Journal of Crowd Science10.26599/IJCS.2023.91000108:1(10-27)Online publication date: Feb-2024
      • (2021)A Game Theory Approach for Estimating Reliability of Crowdsourced Relevance AssessmentsACM Transactions on Information Systems10.1145/348096540:3(1-29)Online publication date: 17-Nov-2021
      • (2021)Game-Theoretic Analysis on CBDC AdoptionIntelligent Computing and Block Chain10.1007/978-981-16-1160-5_23(294-305)Online publication date: 11-Mar-2021
      • (2020)The Importance of Assessment Literacy: Formative and Summative Assessment Instruments and TechniquesWorkgroups eAssessment: Planning, Implementing and Analysing Frameworks10.1007/978-981-15-9908-8_1(3-25)Online publication date: 12-Dec-2020
      • (2018)Game-Theoretic Analysis on the Number of Participants in the Software Crowdsourcing ContestArtificial Intelligence and Symbolic Computation10.1007/978-3-319-99957-9_20(255-268)Online publication date: 22-Aug-2018
      • (2017)Evaluating crowdsourced relevance assessments using self-reported traits and task speedProceedings of the 29th Australian Conference on Computer-Human Interaction10.1145/3152771.3156146(407-411)Online publication date: 28-Nov-2017
      • (2017)Automatic Sarcasm DetectionACM Computing Surveys10.1145/312442050:5(1-22)Online publication date: 26-Sep-2017
      • (2017)AWAREACM Transactions on Information Systems10.1145/311021736:2(1-38)Online publication date: 29-Aug-2017
      • (2017)Understanding and Leveraging the Impact of Response Latency on User Behaviour in Web SearchACM Transactions on Information Systems10.1145/310637236:2(1-42)Online publication date: 29-Aug-2017
      • (2017)AI policyAI Matters10.1145/3054837.30548403:1(9-11)Online publication date: 25-May-2017
      • Show More Cited By

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media