Abstract
Crowdsourcing utilizes the intelligence of people to solve problems that are difficult for machines such as entity resolution, sentiment analysis and image recognition. In crowdsourcing systems, requesters publish tasks that are answered by workers. However, the responses collected from the crowd are ambiguous as the workers on internet with unknown and very diverse abilities, skills, interests and knowledge background. In order to ensure the quality of crowdsourcing results, it is important to characterize worker quality accurately. Many previous works model the worker quality by a fixed value (such as probability value or confusion matrix). But even when workers complete the same type of tasks, the quality is affected by some factors (task difficulty) to varying degrees. Here we propose a dynamic difficulty-sensitive worker quality distribution model. In our model, the worker’s ability is affected by task difficulty and fits a functional distribution. This model reflects the relationship between worker reliability and task difficulty. In addition, we utilize Expectation-Maximization approach (EM) to obtain maximum likelihood estimates of the parameters of worker quality distribution model and the true answers to the tasks. We conduct extensive experiments with synthetic data and real-world data. The experimental results show that our method significantly outperforms other state-of-the-art approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Heymann, P., Garcia-Molina, H.: Turkalytics: analytics for human computation. In: International Conference on World Wide Web DBLP (2011)
Wang, H., Guo, S., Cao, J., et al.: MeLoDy: a long-term dynamic quality-aware incentive mechanism for crowdsourcing. IEEE Trans. Parallel Distrib. Syst. PP(99), 1 (2018)
Hu, H., Zheng, Y., Bao, Z., et al.: Crowdsourced POI labelling: location-aware result inference and task assignment. In: IEEE International Conference on Data Engineering, pp. 61–72. IEEE (2016)
Ma, F., Li, Y., Li, Q., et al.: FaitCrowd: fine grained truth discovery for crowdsourced data aggregation. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 745–754. ACM (2015)
Liu, X., Lu, M., Ooi, B.C., Shen, Y., Wu, S., Zhang, M.: CDAS: a crowdsourcing data analytics system. PVLDB 5(10), 1040–1051 (2012)
Bo, P., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales, pp. 115–124 (2005)
Cao, C.C., She, J., Tong, Y., et al.: Whom to ask?: jury selection for decision making tasks on micro-blog services. Proc. VLDB Endowment 5(11), 1495–1506 (2012)
Dalvi, N.N., Dasgupta, A., Kumar, R., et al.: Aggregating crowdsourced binary ratings, pp. 285–294 (2013)
Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the EM algorithm. J. Roy. Stat. Soc. 28(1), 20–28 (1979)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. 39(1), 1–38 (1977)
Feng, J., Feng, J., Feng, J., et al.: QASCA: a quality-aware task assignment system for crowdsourcing applications. In: ACM SIGMOD International Conference on Management of Data, pp. 1031–1046. ACM (2015)
Guo, S., Parameswaran, A., Garcia-Molina, H.: So who won?: dynamic max discovery with the crowd. In: ACM SIGMOD International Conference on Management of Data, pp. 385–396. ACM (2012)
Ipeirotis, P.G., Provost, F., Wang, J.: Quality management on Amazon mechanical turk. In: ACM SIGKDD Workshop on Human Computation, pp. 64–67. ACM (2010)
Karger, D.R., Oh, S., Shah, D.: Iterative learning for reliable crowdsourcing systems. In: International Conference on Neural Information Processing Systems, pp. 1953–1961. Curran Associates Inc. (2011)
Karger, D.R., Oh, S., Shah, D.: Efficient crowdsourcing for multi-class labeling. ACM Sigmetrics Perform. Eval. Rev. 41(1), 81–92 (2013)
Kuncheva, L.I., Whitaker, C.J., Shipp, C.A., et al.: Limits on the majority vote accuracy in classifier fusion. Pattern Anal. Appl. 6(1), 22–31 (2003)
Liu, X., Lu, M., Ooi, B.C., et al.: CDAS: a crowdsourcing data analytics system. Proc. VLDB Endowment 5(10), 1040–1051 (2012)
Marcus, A., Wu, E., Karger, D., et al.: Human-powered sorts and joins. Proc. VLDB Endowment 5(1), 13–24 (2011)
Kurve, A., Miller, D.J., Kesidis, G.: Multicategory crowdsourcing accounting for variable task difficulty, worker skill, and worker intention. IEEE Trans. Knowl. Data Eng. 27(3), 794–809 (2015)
Parameswaran, A.G., Garciamolina, H., Park, H., et al.: CrowdScreen: algorithms for filtering data with humans, pp. 361–372 (2012)
Raykar, V.C., Yu, S., Zhao, L.H., et al.: Supervised learning from multiple experts: whom to trust when everyone lies a bit. In: International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, pp. 889–896. DBLP, June 2009
Sarawagi, S., Bhamidipaty, A.: Interactive deduplication using active learning. In: Proceedings ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 269–278 (2002)
Das Sarma, A., Parameswaran, A., Widom, J.: Towards globally optimal crowdsourcing quality management: the uniform worker setting, pp. 47–62 (2016)
Smyth, P., Fayyad, U., Burl, M., et al.: Inferring ground truth from subjective labelling of venus images. In: International Conference on Neural Information Processing Systems, pp. 1085–1092. MIT Press (1994)
Wang, J., Kraska, T., Franklin, M.J., et al.: CrowdER: crowdsourcing entity resolution. Proc. VLDB Endowment 5(11), 1483–1494 (2012)
Wang, J., Li, G., Kraska, T., et al.: Leveraging transitive relations for crowdsourced joins. In: ACM SIGMOD International Conference on Management of Data, pp. 229–240. ACM (2013)
Whitehill, J., Ruvolo, P., Wu, T., et al.: Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: International Conference on Neural Information Processing Systems, pp. 2035–2043. Curran Associates Inc. (2009)
Zhang, Y., Chen, X., Zhou, D., et al.: Spectral methods meet EM: a provably optimal algorithm for crowdsourcing. Adv. Neural Inf. Process. Syst. 2, 1260–1268 (2014)
Demartini, G., Difallah, D.E., CudréMauroux, P.: ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: International Conference on World Wide Web. ACM (2012)
Karger, D.R., Oh, S., Shah, D.: Iterative learning for reliable crowd-sourcing systems. In: NIPS, pp. 1953–1961 (2011)
Acknowledgements
The research work was supported by the National Key R&D Program \((No.\,2017YFB1400100)\), the Innovation Method Fund of China (\(No.\,2018IM020200\)), the SDNFSC \((No.\,ZR2017ZB0420, No.\,ZR2018MF014)\) and the Science and Technology Development Plan Project of Shandong Province \((No.\,2018YFJH0506)\).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Zheng, M., Cui, L., He, W., Guo, W., Lu, X. (2019). A Dynamic Difficulty-Sensitive Worker Distribution Model for Crowdsourcing Quality Management. In: Wang, X., Gao, H., Iqbal, M., Min, G. (eds) Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2019. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 292. Springer, Cham. https://doi.org/10.1007/978-3-030-30146-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-30146-0_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30145-3
Online ISBN: 978-3-030-30146-0
eBook Packages: Computer ScienceComputer Science (R0)