Defeating Tyranny of the Masses in Crowdsourcing: Accounting for Low-Skilled and Adversarial Workers

Kurve, Aditya; Miller, David J.; Kesidis, George

doi:10.1007/978-3-319-02786-9_9

Aditya Kurve¹⁹,
David J. Miller¹⁹ &
George Kesidis²⁰

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 8252))

Included in the following conference series:

International Conference on Decision and Game Theory for Security

1629 Accesses
1 Citations

Abstract

Crowdsourcing has emerged as a useful learning paradigm which allows us to instantly recruit workers on the web to solve large scale problems, such as quick annotation of image, web page, or document databases. Automated inference engines that fuse the answers or opinions from the crowd to make critical decisions are susceptible to unreliable, low-skilled and malicious workers who tend to mislead the system towards inaccurate inferences. We present a probabilistic generative framework to model worker responses for multicategory crowdsourcing tasks based on two novel paradigms. First, we decompose worker reliability into skill level and intention. Second, we introduce a stochastic model for answer generation that plausibly captures the interplay between worker skills, intentions, and task difficulties. This framework allows us to model and estimate a broad range of worker “types”. A generalized Expectation Maximization algorithm is presented to jointly estimate the unknown ground truth answers along with worker and task parameters. As supported experimentally, the proposed scheme de-emphasizes answers from low skilled workers and leverages malicious workers to, in fact, improve crowd aggregation. Moreover, our approach is especially advantageous when there is an (a priori unknown) majority of low-skilled and/or malicious workers in the crowd.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Amazon Mechanical Turk, http://www.mturk.com
Flower, C.: http://crowdflower.com
Funding, C.: http://www.mid-day.com/news/2013/feb/150213-twos-company-tweets-a-crowdfunder.htm
Topcoder, http://topcoder.com
50,000 volunteers join distributed search for Steve Fossett. Wired (September 11, 2007)
Google Scholar
Bache, K., Lichman, M.: UCI machine learning repository (2013)
Google Scholar
Benbouzid, D., Busa-Fekete, R., Casagrande, N., Collin, F.D., Kégl, B., et al.: Multiboost: a multi-purpose boosting package. Journal of Machine Learning Research 13, 549–553 (2012)
Google Scholar
Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the EM algorithm. Applied Statistics, 20–28 (1979)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 1–38 (1977)
Google Scholar
Douceur, J.R.: The sybil attack. In: Druschel, P., Kaashoek, M.F., Rowstron, A. (eds.) IPTPS 2002. LNCS, vol. 2429, pp. 251–260. Springer, Heidelberg (2002)
Chapter Google Scholar
Feldman, M., Papadimitriou, C., Chuang, J., Stoica, I.: Free-riding and whitewashing in peer-to-peer systems. In: Proceedings of the ACM SIGCOMM Workshop on Practice and Theory of Incentives in Networked Systems, pp. 228–236 (2004)
Google Scholar
Graham, M.W., Miller, D.J.: Unsupervised learning of parsimonious mixtures on large spaces with integrated feature and component selection. IEEE Transactions on Signal Processing 54(4), 1289–1303 (2006)
Article Google Scholar
Karger, D.R., Oh, S., Shah, D.: Budget-optimal crowdsourcing using low-rank matrix approximations. In: 49th IEEE Annual Allerton Conference on Communication, Control, and Computing, pp. 284–291 (2011)
Google Scholar
Karger, D.R., Oh, S., Shah, D.: Iterative learning for reliable crowdsourcing systems. In: Advances in Neural Information Processing Systems (2011)
Google Scholar
Lakshminarayanan, B., Teh, Y.W.: Inferring ground truth from multi-annotator ordinal data: a probabilistic approach. arXiv preprint arXiv:1305.0015 (2013)
Google Scholar
Meng, X.L., Van Dyk, D.: The EM algorithm an old folk-song sung to a fast new tune. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 59(3), 511–567 (1997)
Article MATH Google Scholar
Raykar, V.C., Yu, S.: Eliminating spammers and ranking annotators for crowdsourced labeling tasks. Journal of Machine Learning Research 13, 491–518 (2012)
MathSciNet Google Scholar
Raykar, V.C., Yu, S., Zhao, L.H., Valadez, G.H., Florin, C., Bogoni, L., Moy, L.: Learning from crowds. Journal of Machine Learning Research 11, 1297–1322 (2010)
MathSciNet Google Scholar
Schapire, R.E.: The strength of weak learnability. Machine Learning 5(2), 197–227 (1990)
Google Scholar
Szarvas, G., Farkas, R., Kocsor, A.: A multilingual named entity recognition system using boosting and c4. 5 decision tree learning algorithms. In: Todorovski, L., Lavrač, N., Jantke, K.P. (eds.) DS 2006. LNCS (LNAI), vol. 4265, pp. 267–278. Springer, Heidelberg (2006)
Chapter Google Scholar
Vukovic, M.: Crowdsourcing for enterprises. In: IEEE World Conference on Services-I, pp. 686–692 (2009)
Google Scholar
Welinder, P., Branson, S., Belongie, S., Perona, P.: The multidimensional wisdom of crowds. In: Advances in Neural Information Processing Systems, vol. 6, p. 8 (2010)
Google Scholar
Whitehill, J., Ruvolo, P., Wu, T., Bergsma, J., Movellan, J.: Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In: Advances in Neural Information Processing Systems, vol. 22, pp. 2035–2043 (2009)
Google Scholar
Zhou, D., Platt, J., Basu, S., Mao, Y.: Learning from the wisdom of crowds by minimax entropy. In; Advances in Neural Information Processing Systems, pp. 2204–2212 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of EE, The Pennsylvania State University, PA, USA
Aditya Kurve & David J. Miller
Department of EE and CSE, The Pennsylvania State University, PA, USA
George Kesidis

Authors

Aditya Kurve
View author publications
You can also search for this author in PubMed Google Scholar
David J. Miller
View author publications
You can also search for this author in PubMed Google Scholar
George Kesidis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Missouri University of Science and Technology, 500 West 15th Street, 325B Computer Science Building, 65409, Rolla, MO, USA
Sajal K. Das
Department of Computer Science, Purdue University, LWSN 2142J, 305 N. University Street, 47907, West Lafayette, IN, USA
Cristina Nita-Rotaru
Data Seurity and Privacs Lab, University of Texas at Dallas, 800 W. Campbell Road, MS EC31, 75080, Richardson, TX, USA
Murat Kantarcioglu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kurve, A., Miller, D.J., Kesidis, G. (2013). Defeating Tyranny of the Masses in Crowdsourcing: Accounting for Low-Skilled and Adversarial Workers. In: Das, S.K., Nita-Rotaru, C., Kantarcioglu, M. (eds) Decision and Game Theory for Security. GameSec 2013. Lecture Notes in Computer Science, vol 8252. Springer, Cham. https://doi.org/10.1007/978-3-319-02786-9_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-02786-9_9
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02785-2
Online ISBN: 978-3-319-02786-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics