Skip to main content
Log in

Classification-oriented dawid skene model for transferring intelligence from crowds to machines

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

When a crowdsourcing approach is used to assist the classification of a set of items, the main objective is to classify this set of items by aggregating the worker-provided labels. A secondary objective is to assess the workers’ skill levels in this process. A classical model that achieves both objectives is the famous Dawid-Skene model. In this paper, we consider a third objective in this context, namely, to learn a classifier that is capable of labelling future items without further assistance of crowd workers. By extending the Dawid-Skene model to include the item features into consideration, we develop a Classification-Oriented Dawid Skene (CODS) model, which achieves the three objectives simultaneously. The effectiveness of CODS on this three dimensions of the problem space is demonstrated experimentally.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Su H, Deng J, Li F F. Crowdsourcing annotations for visual object detection. In: Proceedings of 2012 AAAI Workshop on Human Computation. 2012, 40–46

  2. Welinder P, Branson S, Belongie S, Perona P. The multidimensional wisdom of crowds. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems. 2010, 2424–2432

  3. Little G, Chilton L B, Goldman M, Miller R C. TurKit: human computation algorithms on mechanical Turk. In: Proceedings of the 23rd Annual ACM Symposium on User Interface Software and Technology. 2010, 57–66

  4. Snow R, O’Connor B, Jurafsky D, Ng A Y. Cheap and fast—but is it good?: evaluating non-expert annotations for natural language tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2008, 254–263

  5. Lu X, Chow T W S. Modeling sequential annotations for sequence labeling with crowds. IEEE Transactions on Cybernetics, 2021: 1–11

  6. Lin C H, Mausam, Weld D S. Dynamically switching between synergistic workflows for crowdsourcing. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence. 2012, 87–93

  7. Wang J, Kraska T, Franklin M J, Feng J. CrowdER: crowdsourcing entity resolution. Proceedings of the VLDB Endowment, 2012, 5(11): 1483–1494

    Article  Google Scholar 

  8. Khatib F, Cooper S, Tyka M D, Xu K, Makedon I, Popović Z, Baker D, Players F. Algorithm discovery by protein folding game players. Proceedings of the National Academy of Sciences of the United States of America, 2011, 108(47): 18949–18953

    Article  Google Scholar 

  9. Zaidan O F, Callison-Burch C. Crowdsourcing translation: professional quality from non-professionals. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011, 1220–1229

  10. Murphy M J, Miller C D, Lasecki W S, Bigham J P. Adaptive time windows for real-time crowd captioning. In: Proceedings of CHI’ 13 Extended Abstracts on Human Factors in Computing Systems. 2013, 13–18

  11. Dawid A P, Skene A M. Maximum likelihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 1979, 28(1): 20–28

    Google Scholar 

  12. Kurve A, Miller D J, Kesidis G. Multicategory crowdsourcing accounting for variable task difficulty, worker skill, and worker intention. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(3): 794–809

    Article  Google Scholar 

  13. Zhao Z, Wei F, Zhou M, Chen W, Ng W S H. Crowd-selection query processing in crowdsourcing databases: a task-driven approach. In: Proceedings of the 18th International Conference on Extending Database Technology. 2015, 397–408

  14. Lof C, El Maarry K, Balke W T. Skyline queries in crowd-enabled databases. In: Proceedings of the 16th International Conference on Extending Database Technology. 2013, 465–476

  15. Chen X, Lin Q, Zhou D. Optimistic knowledge gradient policy for optimal budget allocation in crowdsourcing. In: Proceedings of the 30th International Conference on International Conference on Machine Learning. 2013, III-64–III-72

  16. Fan J, Lu M, Ooi B C, Tan W C, Zhang M. A hybrid machine-crowdsourcing system for matching web tables. In: Proceedings of the 30th IEEE International Conference on Data Engineering. 2014, 976–987

  17. Yousefnezhad M, Huang S J, Zhang D. WoCE: a framework for clustering ensemble by exploiting the wisdom of crowds theory. IEEE Transactions on Cybernetics, 2018, 48(2): 486–499

    Article  Google Scholar 

  18. Zhang J. Knowledge learning with crowdsourcing: a brief review and systematic perspective. IEEE/CAA Journal of Automatica Sinica, 2022, 9(5): 749–762

    Article  Google Scholar 

  19. Jiang L, Zhang H, Tao F, Li C. Learning from crowds with multiple noisy label distribution propagation. IEEE Transactions on Neural Networks and Learning Systems, 2021: 1–11

  20. Tao F, Jiang L, Li C. Differential evolution-based weighted soft majority voting for crowdsourcing. Engineering Applications of Artificial Intelligence, 2021, 106: 104474

    Article  Google Scholar 

  21. Chittilappilly A I, Chen L, Amer-Yahia S. A survey of general-purpose crowdsourcing techniques. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(9): 2246–2266

    Article  Google Scholar 

  22. Zhang J, Wu X, Sheng V S. Learning from crowdsourced labeled data: a survey. Artificial Intelligence Review, 2016, 46(4): 543–576

    Article  Google Scholar 

  23. Yan Y, Rosales R, Fung G, Dy J G. Active learning from crowds. In: Proceedings of the 28th International Conference on Machine Learning. 2011, 1161–1168

  24. Mozafari B, Sarkar P, Franklin M J, Jordan M I, Madden S. Active learning for crowd-sourced databases. 2014, arXiv preprint arXiv: 1209.3686

  25. Raykar V C, Yu S, Zhao L H, Valadez G H, Florin C, Bogoni L, Moy L. Learning from crowds. Journal of Machine Learning Research, 2010, 11: 1297–1322

    MathSciNet  Google Scholar 

  26. Zhang J, Wu X, Shengs V S. Active learning with imbalanced multiple noisy labeling. IEEE Transactions on Cybernetics, 2015, 45(5): 1095–1107

    Article  Google Scholar 

  27. Bachrach Y, Minka T, Guiver J, Graepel T. How to grade a test without knowing the answers: a Bayesian graphical model for adaptive crowdsourcing and aptitude testing. In: Proceedings of the 29th International Conference on Machine Learning. 2012, 819–826

  28. Ho C J, Jabbari S, Vaughan J W. Adaptive task assignment for crowdsourced classification. In: Proceedings of the 30th International Conference on Machine Learning. 2013, I-534–I-542

  29. Buchbinder N, Naor J. Online primal-dual algorithms for covering and packing problems. In: Proceedings of 13th Annual European Symposium on Algorithms. 2005, 689–701

  30. Long C, Hua G, Kapoor A. Active visual recognition with expertise estimation in crowdsourcing. In: Proceedings of 2013 IEEE International Conference on Computer Vision. 2013, 3000–3007

  31. Donmez P, Carbonell J G, Schneider J. Efficiently learning the accuracy of labeling sources for selective sampling. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2009, 259–268

  32. Zhao L, Zhang Y, Sukthankar G. An active learning approach for jointly estimating worker performance and annotation reliability with crowdsourced data. 2014, arXiv preprint arXiv: 1401.3836

  33. Lewis D D, Gale W A. A sequential algorithm for training text classifiers. In: Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. 1994, 3–12

  34. Cochran W G. Sampling Techniques. 3rd ed. Hoboken: John Wiley & Sons, 1977

    MATH  Google Scholar 

  35. Salton G, Buckley C. Improving retrieval performance by relevance feedback. In: Jones K S, Willett P, eds. Readings in Information Retrieval. San Francisco: Morgan Kaufmann, 1997, 355–364

    Google Scholar 

  36. Dagan I, Engelson S P. Committee-based sampling for training probabilistic classifiers. In: Proceedings of the 12th International Conference on Machine Learning. 1995, 150–157

  37. Zhu J, Wang H, Hovy E, Ma M. Confidence-based stopping criteria for active learning for data annotation. ACM Transactions on Speech and Language Processing, 2010, 6(3): 3

    Article  Google Scholar 

  38. Dua D, Graff C. UCI machine learning repository. See Archive.ics.uci.eduml website, 2017

  39. Karger D R, Oh S, Shah D. Iterative learning for reliable crowdsourcing systems. In: Proceedings of the 24th International Conference on Neural Information Processing Systems. 2011, 1953–1961

  40. Kajino H, Tsuboi Y, Kashima H. Clustering crowds. In: Proceedings of the 27th AAAI Conference on Artificial Intelligence. 2013, 1120–1127

  41. Yin L A, Han J H, Zhang W N, Yu Y. Aggregating crowd wisdoms with label-aware autoencoders. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017, 1325–1331

  42. Liu Y F, Zhang W N, Yu Y. Aggregating crowd wisdom with side information via a clustering-based label-aware autoencoder. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence. 2021, 214

  43. Simpson E, Roberts S, Psorakis I, Smith A. Dynamic Bayesian combination of multiple imperfect classifiers. In: Guy T V, Karny M, Wolpert D, eds. Decision Making and Imperfection. Berlin: Springer, 2013, 1–35

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Key R&D Program of China (2021ZD0110700), in part by the Fundamental Research Funds for the Central Universities, in part by the State Key Laboratory of Software Development Environment, and in part by a Leverhulme Trust Research Project Grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Richong Zhang.

Additional information

Jiaran Li received his bachelor degree in Mathematics from Beihang University, China in 2015. He is now a PhD Student at the School of Computer Science and Engineering, Beihang University, China. His main area of research is natural language processing and crowdsourcing.

Richong Zhang received his BSc degree and MASc degree from Jilin University, China in 2001 and 2004. In 2006, he received his MSc degree from Dalhousie University, Canada. In 2011, he received his PhD degree from the School of Information Technology and Engineering, University of Ottawa, Canada. He is currently a professor in the School of Computer Science and Engineering, Beihang University, China. His research interests include natural language processing and knowledge engineering.

Samuel Mensah is currently a Research Associate in Natural Language Processing at the Department of Computer Science, University of Sheffield, UK. He received the BS degree in mathematics and computer science from the University of Ghana, Ghana in 2012, the MS degree in mathematical science from the Kwame Nkrumah University of Science and Technology, Ghana in 2015, and the PhD degree in computer science from Beihang University, China in 2020. His main research interests include natural language processing, recommender systems, and knowledge graphs.

Wenyi Qin received her BSc degree from Beijing Institute of Technology, China in 2014. In 2018, she received her PhD degree from the School of Mathematics, University of Edinburgh, UK. She is currently a postdoctoral researcher in the School of Computer Science and Engineering, Beihang University, China. Her research interests include applied probability and natural language processing.

Chunming Hu received the PhD degree from Beihang University, China in 2006. He is an associate professor with the School of Computer Science and Engineering, Beihang University, China. His current research interests include distributed systems, system virtualization, large scale data management, and processing systems.

Electronic Supplementary Material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., Zhang, R., Mensah, S. et al. Classification-oriented dawid skene model for transferring intelligence from crowds to machines. Front. Comput. Sci. 17, 175332 (2023). https://doi.org/10.1007/s11704-022-2245-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-022-2245-8

Keywords

Navigation