skip to main content
10.1145/2441776.2441847acmconferencesArticle/Chapter ViewAbstractPublication PagescscwConference Proceedingsconference-collections
research-article

Enhancing reliability using peer consistency evaluation in human computation

Authors Info & Claims
Published:23 February 2013Publication History

ABSTRACT

Peer consistency evaluation is often used in games with a purpose (GWAP) to evaluate workers using outputs of other workers without using gold standard answers. Despite its popularity, the reliability of peer consistency evaluation has never been systematically tested to show how it can be used as a general evaluation method in human computation systems. We present experimental results that show that human computation systems using peer consistency evaluation can lead to outcomes that are even better than those that evaluate workers using gold standard answers. We also show that even without evaluation, simply telling the workers that their answers will be used as future evaluation standards can significantly enhance the workers' performance. Results have important implication for methods that improve the reliability of human computation systems.

References

  1. Ahn, L. V., Blum, M., Hopper, N. J., and Langford, J. Captcha: using hard ai problems for security. In Proceedings of the 22nd international conference on Theory and applications of cryptographic techniques, EUROCRYPT'03, Springer-Verlag (Berlin, Heidelberg, 2003), 294--311. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bandiera, O., Barankay, I., and Rasul, I. Social preferences and the response to incentives: Evidence from personnel data. The Quarterly Journal of Economics 120, 3 (2005), 917--962.Google ScholarGoogle Scholar
  3. Dawid, A. P., and Skene, A. M. Maximum likelihood estimation of observer error-rates using the em algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics) 28, 1 (1979), pp. 20--28.Google ScholarGoogle Scholar
  4. Dow, S., Kulkarni, A., Klemmer, S., and Hartmann, B. Shepherding the crowd yields better work. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work, CSCW '12, ACM (New York, NY, USA, 2012), 1013--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Gneezy, U., and Rustichini, A. Pay enough or don't pay at all. The Quarterly Journal of Economics 115, 3 (2000), 791--810.Google ScholarGoogle ScholarCross RefCross Ref
  6. Harris, C. G. You're hired! an examination of crowdsourcing incentive models in human resourse tasks. In Proceedings of WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining (2011).Google ScholarGoogle Scholar
  7. Hirth, M., Hossfeld, T., and Tran-Gia, P. Cost-optimal validation mechanisms and cheat-detection for crowdsourcing platforms. In Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS), 2011 Fifth International Conference on (30 2011-july 2 2011), 316--321. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Huang, E., Zhang, H., Parkes, D. C., Gajos, K. Z., and Chen, Y. Toward automatic task design: a progress report. In Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP '10, ACM (New York, NY, USA, 2010), 77--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Huang, S.-W., and Fu, W.-T. Systematic analysis of output agreement games: Effects of gaming environment, social interaction, and feedback. In Proceedings of HCOMP12: The 4th Workshop on Human Computation (2012).Google ScholarGoogle Scholar
  10. Ipeirotis, P. G. Analyzing the amazon mechanical turk marketplace. XRDS 17, 2 (Dec. 2010), 16--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ipeirotis, P. G., Provost, F., and Wang, J. Quality management on amazon mechanical turk. In Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP '10, ACM (New York, NY, USA, 2010), 64--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Kittur, A., Khamkar, S., André, P., and Kraut, R. Crowdweaver: visually managing complex crowd work. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work, CSCW '12, ACM (New York, NY, USA, 2012), 1033--1036. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kittur, A., Smus, B., Khamkar, S., and Kraut, R. E. Crowdforge: crowdsourcing complex work. In Proceedings of the 24th annual ACM symposium on User interface software and technology, UIST '11, ACM (New York, NY, USA, 2011), 43--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kulkarni, A., Can, M., and Hartmann, B. Collaboratively crowdsourcing workflows with turkomatic. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work, CSCW '12, ACM (New York, NY, USA, 2012), 1003--1012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Law, E., and Von Ahn, L. Human Computation. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Le, J., Edmonds, A., Hester, V., and Biewald, L. Ensuring quality in crowdsourced search relevance evaluation: The effects of training qustion distribution. In Proceedings of the SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation (2010).Google ScholarGoogle Scholar
  17. Liem, B., Zhang, H., and Chen, Y. An Iterative Dual Pathway Structure for Speech-to-Text Transcription. In Proceedings of the AAAI Workshop on Human Computation (HCOMP) (2011).Google ScholarGoogle Scholar
  18. Lin, C., Mausam, M., and Weld, D. Crowdsourcing control: Moving beyond multiple choice. In Proceedings of HCOMP12: The 4th Workshop on Human Computation (2012).Google ScholarGoogle Scholar
  19. Lin, C. H., Mausam, and Weld, D. S. Dynamically switching between synergistic workflows for crowdsourcing. In AAAI (2012).Google ScholarGoogle Scholar
  20. Little, G., Chilton, L. B., Goldman, M., and Miller, R. C. Turkit: tools for iterative tasks on mechanical turk. In Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP '09, ACM (New York, NY, USA, 2009), 29--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Mason, W., and Watts, D. J. Financial incentives and the "performance of crowds". SIGKDD Explor. Newsl. 11, 2 (May 2010), 100--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Oleson, D., Sorokin, A., Laughlin, G., Hester, V., Le, J., and Biewald, L. Programmatic gold: Targeted and scalable quality assurance in crowdsourcing. In Proceedings of HCOMP11: The 3rd Workshop on Human Computation (2011).Google ScholarGoogle Scholar
  23. Paritosh, P. Human computation must be reproducible. In Proceedings of CrowdSearch: Crowdsourcing Web search 2012 (2012).Google ScholarGoogle Scholar
  24. Quinn, A. J., and Bederson, B. B. Human computation: a survey and taxonomy of a growing field. In Proceedings of the 2011 annual conference on Human factors in computing systems, CHI '11, ACM (New York, NY, USA, 2011), 1403--1412. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Robertson, S., Vojnovic, M., and Weber, I. Rethinking the esp game. In Proceedings of the 27th international conference extended abstracts on Human factors in computing systems, CHI EA '09, ACM (New York, NY, USA, 2009), 3937--3942. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Seemakurty, N., Chu, J., von Ahn, L., and Tomasic, A. Word sense disambiguation via human computation. In Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP '10, ACM (New York, NY, USA, 2010), 60--63. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Snow, R., O'Connor, B., Jurafsky, D., and Ng, A. Y. Cheap and fast - but is it good?: evaluating non-expert annotations for natural language tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '08, Association for Computational Linguistics (Stroudsburg, PA, USA, 2008), 254--263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Sun, Y.-A., Roy, S., and Little, G. D. Beyond independent agreement: A tournament selection approach for quality assurance of human computation tasks. In Proceedings of HCOMP11: The 3rd Workshop on Human Computation (2011).Google ScholarGoogle Scholar
  29. von Ahn, L., and Dabbish, L. Labeling images with a computer game. In Proceedings of the SIGCHI conference on Human factors in computing systems, CHI '04, ACM (New York, NY, USA, 2004), 319--326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. von Ahn, L., and Dabbish, L. Designing games with a purpose. Commun. ACM 51 (Aug. 2008), 58--67. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Enhancing reliability using peer consistency evaluation in human computation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CSCW '13: Proceedings of the 2013 conference on Computer supported cooperative work
      February 2013
      1594 pages
      ISBN:9781450313315
      DOI:10.1145/2441776

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 February 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate2,235of8,521submissions,26%

      Upcoming Conference

      CSCW '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader