Skip to main content

CrowdED and CREX: Towards Easy Crowdsourcing Quality Control Evaluation

  • Conference paper
  • First Online:
  • 762 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11695))

Abstract

Crowdsourcing is a time- and cost-efficient web-based technique for labeling large datasets like those used in Machine Learning. Controlling the output quality in crowdsourcing is an active research domain which has yielded a fair number of methods and approaches. Due to the quantitative and qualitative limitations of the existing evaluation datasets, comparing and evaluating these methods have been very limited. In this paper, we present CrowdED (Crowdsourcing Evaluation Dataset), a rich dataset for evaluating a wide range of quality control methods alongside with CREX (CReate Enrich eXtend), a framework that facilitates the creation of such datasets and guarantees their future-proofing and re-usability through customizable extension and enrichment.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    E.g., demographics and self-evaluation profiles.

  2. 2.

    https://www.figure-eight.com/data-for-everyone/.

  3. 3.

    https://www.figure-eight.com. Formerly named CrowdFlower.

  4. 4.

    Yet, it is not the only one since any other task corpus can be used.

  5. 5.

    FE levels range from 1 to 3 where level 3 represents the most experienced and reliable workers and 1 represents all qualified workers.

  6. 6.

    A demo of CREX’s user interface and a real world use scenario can be found on https://project-crowd.eu/.

  7. 7.

    e.g., requester accessible back-end services or API to dynamically modify tasks and assignments.

References

  1. Alsayasneh, M., et al.: Personalized and diverse task composition in crowdsourcing. IEEE Trans. Knowl. Data Eng. 30(1), 128–141 (2018)

    Article  Google Scholar 

  2. Amer-Yahia, S., Gaussier, E., Leroy, V., Pilourdault, J., Borromeo, R.M., Toyama, M.: Task composition in crowdsourcing, pp. 194–203 (2016)

    Google Scholar 

  3. Awwad, T., Bennani, N., Ziegler, K., Sonigo, V., Brunie, L., Kosch, H.: Efficient worker selection through history-based learning in crowdsourcing, vol. 1, pp. 923–928 (2017)

    Google Scholar 

  4. Baba, Y., Kashima, H.: Statistical quality estimation for general crowdsourcing tasks. In: ACM SIGKDD, NY, USA, pp. 554–562 (2013)

    Google Scholar 

  5. Daniel, F., Kucherbaev, P., Cappiello, C., Benatallah, B., Allahbakhsh, M.: Quality control in crowdsourcing: a survey of quality attributes, assessment techniques, and assurance actions. ACM Comput. Surv. (CSUR) 51(1), 7 (2018)

    Article  Google Scholar 

  6. Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the EM algorithm. J. R. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 20–28 (1979)

    Article  Google Scholar 

  7. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39, 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  8. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database, pp. 248–255 (2009)

    Google Scholar 

  9. Difallah, D.E., Catasta, M., Demartini, G., Cudré-Mauroux, P.: Scaling-up the crowd: micro-task pricing schemes for worker retention and latency improvement (2014)

    Google Scholar 

  10. Ghosh, A., Kale, S., McAfee, P.: Who moderates the moderators?: crowdsourcing abuse detection in user-generated content. In: Proceedings of the 12th ACM Conference on Electronic Commerce, pp. 167–176. ACM (2011)

    Google Scholar 

  11. Gil, Y., Garijo, D., Ratnakar, V., Khider, D., Emile-Geay, J., McKay, N.: A controlled crowdsourcing approach for practical ontology extensions and metadata annotations. In: d’Amato, C., et al. (eds.) ISWC 2017. LNCS, vol. 10588, pp. 231–246. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68204-4_24

    Chapter  Google Scholar 

  12. Quoc Viet Hung, N., Tam, N.T., Tran, L.N., Aberer, K.: An evaluation of aggregation techniques in crowdsourcing. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds.) WISE 2013. LNCS, vol. 8181, pp. 1–15. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41154-0_1

    Chapter  Google Scholar 

  13. Ipeirotis, P.G.: Demographics of mechanical turk (2010)

    Google Scholar 

  14. Jin, Y., Carman, M., Kim, D., Xie, L.: Leveraging side information to improve label quality control in crowd-sourcing (2017)

    Google Scholar 

  15. Jung, H.J., Lease, M.: Improving consensus accuracy via z-score and weighted voting. In: Human Computation (2011)

    Google Scholar 

  16. Kamar, E., Horvitz, E.: Incentives for truthful reporting in crowdsourcing. In: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems, vol. 3, pp. 1329–1330. International Foundation for Autonomous Agents and Multiagent Systems (2012)

    Google Scholar 

  17. Kanoulas, E., Carterette, B., Hall, M., Clough, P., Sanderson, M.: Overview of the TREC 2011 session track (2011)

    Google Scholar 

  18. Kazai, G., Kamps, J., Milic-Frayling, N.: The face of quality in crowdsourcing relevance labels: Demographics, personality and labeling accuracy. In: CIKM, pp. 2583–2586 (2012)

    Google Scholar 

  19. Le, J., Edmonds, A., Hester, V., Biewald, L.: Ensuring quality in crowdsourced search relevance evaluation: the effects of training question distribution. In: 2010 Workshop on Crowdsourcing for Search Evaluation, pp. 21–26 (2010)

    Google Scholar 

  20. Le, Q., Mikolov, T.: Distributed representations of sentences and documents, pp. 1188–1196 (2014)

    Google Scholar 

  21. Li, H., Yu, B., Zhou, D.: Error rate analysis of labeling by crowdsourcing. In: Machine Learning Meets Crowdsourcing Workshop (2013)

    Google Scholar 

  22. Li, H., Yu, B., Zhou, D.: Error rate bounds in crowdsourcing models. arXiv preprint arXiv:1307.2674 (2013)

  23. Li, H., Zhao, B., Fuxman, A.: The wisdom of minority: Discovering and targeting the right group of workers for crowdsourcing. In: WWW, NY, pp. 165–176 (2014)

    Google Scholar 

  24. Mashhadi, A.J., Capra, L.: Quality control for real-time ubiquitous crowdsourcing. In: UbiCrowd, NY, USA, pp. 5–8 (2011)

    Google Scholar 

  25. Mavridis, P., Gross-Amblard, D., Miklós, Z.: Using hierarchical skills for optimized task assignment in knowledge-intensive crowdsourcing, pp. 843–853 (2016)

    Google Scholar 

  26. Morris, R., Dontcheva, M., Gerber, E.: Priming for better performance in microtask crowdsourcing environments. IEEE Internet Comput. 16(5), 13–19 (2012)

    Article  Google Scholar 

  27. Mousa, H., Benmokhtar, S., Hasan, O., Brunie, L., Younes, O., Hadhoud, M.: A reputation system resilient against colluding and malicious adversaries in mobile participatory sensing applications (2017)

    Google Scholar 

  28. Oleson, D., Sorokin, A., Laughlin, G.P., Hester, V., Le, J., Biewald, L.: Programmatic gold: targeted and scalable quality assurance in crowdsourcing. Hum. Comput. 11(11), 43–48 (2011)

    Google Scholar 

  29. Rahman, H., Roy, S.B., Thirumuruganathan, S., Amer-Yahia, S., Das, G.: Task assignment optimization in collaborative crowdsourcing, pp. 949–954 (2015)

    Google Scholar 

  30. Rahmanian, B., Davis, J.G.: User interface design for crowdsourcing systems, pp. 405–408 (2014)

    Google Scholar 

  31. Raykar, V.C., et al.: Supervised learning from multiple experts: whom to trust when everyone lies a bit. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 889–896. ACM (2009)

    Google Scholar 

  32. Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external cluster evaluation measure (2007)

    Google Scholar 

  33. Rousseeuw, P.J., Kaufman, L.: Finding Groups in Data. Wiley, Hoboken (1990)

    MATH  Google Scholar 

  34. Roy, S.B., Lykourentzou, I., Thirumuruganathan, S., Amer-Yahia, S., Das, G.: Task assignment optimization in knowledge-intensive crowdsourcing. VLDB J. 24(4), 467–491 (2015)

    Article  Google Scholar 

  35. Rzeszotarski, J.M., Kittur, A.: Instrumenting the crowd: using implicit behavioral measures to predict task performance. In: UIST, NY, USA, pp. 13–22 (2011)

    Google Scholar 

  36. Salton, G., McGill, M.: Modern information retrieval (1983)

    Google Scholar 

  37. Sarasua, C., Simperl, E., Noy, N., Bernstein, A., Leimeister, J.M.: Crowdsourcing and the semantic web: a research manifesto. Hum. Comput. (HCOMP) 2(1), 3–17 (2015)

    Google Scholar 

  38. Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast–but is it good?: evaluating non-expert annotations for natural language tasks. In: Conference on Empirical Methods in Natural Language Processing, pp. 254–263. Association for Computational Linguistics (2008)

    Google Scholar 

  39. Welinder, P., Branson, S., Perona, P., Belongie, S.J.: The multidimensional wisdom of crowds, pp. 2424–2432 (2010)

    Google Scholar 

  40. Whitehill, J., Wu, T.F., Bergsma, J., Movellan, J.R., Ruvolo, P.L.: Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: NIPS, pp. 2035–2043 (2009)

    Google Scholar 

  41. Wilkinson, M.D., et al.: The fair guiding principles for scientific data management and stewardship. Sci. Data 3 (2016)

    Google Scholar 

  42. Ye, B., Wang, Y., Liu, L.: Crowd trust: a context-aware trust model for worker selection in crowdsourcing environments, pp. 121–128 (2015)

    Google Scholar 

  43. Zhou, D., Basu, S., Mao, Y., Platt, J.C.: Learning from the wisdom of crowds by minimax entropy, pp. 2195–2203 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tarek Awwad .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Awwad, T., Bennani, N., Rehn-Sonigo, V., Brunie, L., Kosch, H. (2019). CrowdED and CREX: Towards Easy Crowdsourcing Quality Control Evaluation. In: Welzer, T., Eder, J., Podgorelec, V., Kamišalić Latifić, A. (eds) Advances in Databases and Information Systems. ADBIS 2019. Lecture Notes in Computer Science(), vol 11695. Springer, Cham. https://doi.org/10.1007/978-3-030-28730-6_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-28730-6_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-28729-0

  • Online ISBN: 978-3-030-28730-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics