Skip to main content
Log in

Sloppiness mitigation in crowdsourcing: detecting and correcting bias for crowd scoring tasks

  • Regular Paper
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

Due to different expertise levels, personal preference, or fatigue from long working of the crowd workers, the data obtained through crowdsourcing are usually unreliable. One big challenge is to obtain true information from such noisy data. Sloppiness, which represents the phenomena of observed labels which fluctuate around the true labels, is one type of the errors that has rarely been discussed in research. Moreover, most existing approaches try to derive truths in binary labeling tasks. In this paper, we deal with the sloppiness in a crowd scoring task, to obtain high-quality estimated labels. Crowd scoring task consists of ordinal and multiple labels, instead of just two labels. The worker in crowdsourcing can exhibit sloppiness, which can lead to unreliable scoring. We show that sloppy workers with biases, who constantly give higher (or lower) answers compared with true labels, can be effectively utilized to improve the quality of the estimated labels. To make use of the labels from crowd workers with biased sloppy behavior, we propose an iterative two-step model to infer the true labels. The first step identifies the biased workers and corrects the biases. The second step uses an optimization-based truth discovery framework to derive true labels from high-quality observed labels and the corrected labels from first step. We also present a hierarchical categorization for different types of crowd workers. Experiments on synthetic data as well as real-world datasets are conducted on the proposed model. The effectiveness of the proposed framework is demonstrated by comparing results with baseline models such as majority voting and expectation maximization-based aggregating algorithm; up to 16% improvement could be obtained for the accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Aydin, B.I., Yilmaz, Y.S., Li, Y., Li, Q., Gao, J., Demirbas, M. Crowdsourcing for multiple-choice question answering. In: AAAI, pp. 2946–2953 (2014)

  2. Bertsekas, D.P. Non-linear programming. In: Athena scientific (1999)

  3. Buckley, C., Lease, M., Smucker, M.D., Jung, H.J., Grady, C. Overview of the TREC 2010 relevance feedback track (notebook). In: The Nineteenth Text Retrieval Conference (TREC) Notebook (2010)

  4. Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the EM algorithm. Appl. Stat. 28, 20–28 (1979)

    Article  Google Scholar 

  5. De Alfaro, L., Shavlovsky, M. Crowdgrader: a tool for crowdsourcing the evaluation of homework assignments. In: Proceedings of the 45th ACM Technical Symposium on Computer Science Education. ACM, pp. 415–420 (2014)

  6. Dekel, O., Shamir, O.: Vox populi: collecting high-quality labels from a crowd (2009)

  7. Demartini, G., Difallah, D.E., Cudré-Mauroux, P.: Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: Proceedings of the 21st International Conference on World Wide Web. ACM, pp. 469–478 (2012)

  8. Dong, X.L., Berti-Equille, L., Srivastava, D.: Integrating conflicting data: the role of source dependence. Proc. VLDB Endow. 2(1), 550–561 (2009)

    Article  Google Scholar 

  9. Ertekin, S., Hirsh, H., Rudin, C.: Learning to predict the wisdom of crowds (2012). Preprint. arXiv:1204.3611

  10. Galland, A., Abiteboul, S., Marian, A., Senellart, P.: Corroborating information from disagreeing views. In: Proceedings of the third ACM International Conference on Web Search and Data Mining. ACM, pp. 131–140 (2010)

  11. Gao, J., Li, Q., Zhao, B., Fan, W., Han, J.: Truth discovery and crowdsourcing aggregation: a unified perspective. Proc. VLDB Endow. 8(12), 2048–2049 (2015)

    Article  Google Scholar 

  12. Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., Rubin, D.B.: Bayesian Data Analysis, vol. 2. CRC Press, Boca Raton (2014)

    MATH  Google Scholar 

  13. Gneezy, U., Rustichini, A.: Pay enough or don’t pay at all. Q. J. Econ. 115(3), 791–810 (2000)

    Article  Google Scholar 

  14. Hama, A.: Predictably irrational: the hidden forces that shape our decisions. Mank. Q. 50(3), 257 (2010)

    Google Scholar 

  15. Ipeirotis, P.G., Gabrilovich, E.: Quizz: targeted crowdsourcing with a billion (potential) users. In: Proceedings of the 23rd International Conference on World Wide Web. ACM, pp. 143–154 (2014)

  16. Ipeirotis, P.G., Provost, F., Wang, J.: Quality management on amazon mechanical turk. In: Proceedings of the ACM SIGKDD Workshop on Human Computation. ACM, pp. 64–67 (2010)

  17. Kamar, E., Kapoor, A., Horvitz, E.: Identifying and accounting for task-dependent bias in crowdsourcing. In: Third AAAI Conference on Human Computation and Crowdsourcing (2015)

  18. Karger, D.R., Oh, S., Shah, D.: Iterative learning for reliable crowdsourcing systems. In: Advances in Neural Information Proceeding Systems, pp. 1953–1961 (2011)

  19. Kazai, G., Kamps, J., Milic-Frayling, N.: Worker types and personality traits in crowdsourcing relevance labels. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management. ACM, pp. 1941–1944 (2011)

  20. Li, Q., Li, Y., Gao, J., Zhao, B., Fan, W., Han, J.: Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, pp. 1187–1198 (2014)

  21. Li, Y., Gao, J., Meng, C., Li, Q., Su, L., Zhao, B., Fan, W., Han, J.: A survey on truth discovery. ACM Sigkdd Explor. Newsl. 17(2), 1–16 (2016)

    Article  Google Scholar 

  22. Meng, C., Jiang, W., Li, Y., Gao, J., Su, L., Ding, H., Cheng, Y.: Truth discovery on crowd sensing of correlated entities. In: Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems. ACM, pp. 169–182 (2015)

  23. Passonneau, R.J., Carpenter, B.: The benefits of a model of annotation. Trans. Assoc. Comput. Linguist. 2, 311–326 (2014)

    Article  Google Scholar 

  24. Pasternack, J., Roth, D.: Knowing what to believe (when you already know something). In: Proceedings of the 23rd International Conference on Computational Linguistics, Association for Computational Linguistics, pp. 877–885 (2010)

  25. Raykar, V.C., Yu, S.: Eliminating spammers and ranking annotators for crowdsourced labeling tasks. J. Mach. Learn. Res. 13, 491–518 (2012)

    MathSciNet  MATH  Google Scholar 

  26. Raykar, V.C., Yu, S., Zhao, L.H., Valadez, G.H., Florin, C., Bogoni, L., Moy, L.: Learning from crowds. J. Mach. Learn. Res. 11, 1297–1322 (2010)

    MathSciNet  Google Scholar 

  27. Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? Improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp. 614–622 (2008)

  28. Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast—but is it good?: Evaluating non-expert annotations for natural language tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 254–263 (2008)

  29. Von Ahn, L., Dabbish, L.: Labeling images with a computer game. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, pp. 319–326 (2004)

  30. Von Ahn, L., Maurer, B., McMillen, C., Abraham, D., Blum, M.: Recaptcha: human-based character recognition via web security measures. Science 321(5895), 1465–1468 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  31. Vondrick, C., Patterson, D., Ramanan, D.: Efficiently scaling up crowdsourced video annotation. Int. J. Comput. Vis. 101(1), 184–204 (2013)

    Article  Google Scholar 

  32. Vuurens, J., de Vries, A.P., Eickhoff, C.: How much spam can you take? An analysis of crowdsourcing results to increase accuracy. In: Proceedings of hte ACM SIGIR Workshop on Crowdsourcing for Information Retrieval (CIR11), pp. 21–26 (2011)

  33. Wauthier, F.L., Jordan, M.I.: Bayesian bias mitigation for crowdsourcing. In: Advances in Neural Information Processing Systems, pp. 1800–1808 (2011)

  34. Welinder, P., Branson, S., Perona, P., Belongie, S.J.: The multidimensional wisdom of crowds. In: Advances in Neural Information Processing Systems, pp. 2424–2432 (2010)

  35. Whitehill, J., Wu, T.F., Bergsma, J., Movellan, J.R., Ruvolo, P.L.: Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Advances in Neural Information Processing Systems, pp. 2035–2043 (2009)

  36. Yan, Y., Rosales, R., Fung, G., Dy, J.G.: Active learning from crowds. ICML 11, 1161–1168 (2011)

    Google Scholar 

  37. Yin, X., Han, J., Philip, S.Y.: Truth discovery with multiple conflicting information providers on the web. IEEE Trans. Knowl. Data Eng. 20(6), 796–808 (2008)

    Article  Google Scholar 

  38. Zaidan, O.F., Callison-Burch, C.: Crowdsourcing translation: professional quality from non-professionals. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, Association for Computational Linguistics, pp. 1220–1229 (2011)

  39. Zhang, J., Sheng, V.S., Li, Q., Wu, J., Wu, X.: Consensus algorithms for biased labeling in crowdsourcing. Inf. Sci. 382, 254–273 (2017)

    Article  Google Scholar 

  40. Zhou, D., Basu, S., Mao, Y., Platt, J.C.: Learning from the wisdom of crowds by minimax entropy. In: Advances in Neural Information Processing Systems, pp. 2195–2203 (2012)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lingyu Lyu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lyu, L., Kantardzic, M. & Sethi, T.S. Sloppiness mitigation in crowdsourcing: detecting and correcting bias for crowd scoring tasks. Int J Data Sci Anal 7, 179–199 (2019). https://doi.org/10.1007/s41060-018-0139-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41060-018-0139-5

Keywords

Navigation