Improving Label Accuracy by Filtering Low-Quality Workers in Crowdsourcing

Nicholson, Bryce; Sheng, Victor S.; Zhang, Jing; Wang, Zhiheng; Xian, Xuefeng

doi:10.1007/978-3-319-27060-9_45

Bryce Nicholson¹⁵,
Victor S. Sheng¹⁵,
Jing Zhang¹⁶,
Zhiheng Wang¹⁷ &
…
Xuefeng Xian¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9413))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

1270 Accesses
2 Citations

Abstract

Filtering low-quality workers from data sets labeled via crowdsourcing is often necessary due to the presence of low quality workers, who either lack knowledge on corresponding subjects and thus contribute many incorrect labels to the data set, or intentionally label quickly and imprecisely in order to produce more labels in a short time period. We present two new filtering algorithms to remove low-quality workers, called Cluster Filtering (CF) and Dynamic Classification Filtering (DCF). Both methods can use any number of characteristics of workers as attributes for learning. CF separates workers using k-means clustering with 2 centroids, separating the workers into a high-quality cluster and a low-quality cluster. DCF uses a classifier of any kind to perform learning. It builds a model from a set of workers from other crowdsourced data sets and classifies the workers in the data set to filter. In theory, DCF can be trained to remove any proportion of the lowest-quality workers. We compare the performance of DCF with two other filtering algorithms, one by Raykar and Yu (RY), and one by Ipeirotis et al. (IPW). Our results show that CF, the second-best filter, performs modestly but effectively, and that DCF, the best filter, performs much better than RY and IPW on average and on the majority of crowdsourced data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Quality Control for Crowdsourced Multi-label Classification Using RAkEL

Crowdsourcing label quality: a theoretical analysis

Article 15 September 2015

A formalized framework for incorporating expert labels in crowdsourcing environment

Article 11 July 2015

References

Alonso, O., Mizzaro, S.: Can we get rid of trec assessors? using mechanical turk for relevance assessment. In: Proceedings of the SIGIR 2009 Workshop on the Future of IR Evaluation, vol. 15, p. 16 (2009)
Google Scholar
Brabham, D.C.: Crowdsourcing as a model for problem solving an introduction and cases. Convergence: Int. J. Res. New Media Technol. 14(1), 75–90 (2008)
Google Scholar
Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the em algorithm. Appl. Stat. 28, 20–28 (1979)
Article Google Scholar
Difallah, D.E., Demartini, G., Cudre-Mauroux, P.: Mechanical cheat: spamming schemes and adversarial techniques on crowdsourcing platforms. In: CrowdSearch, pp. 26–30 (2012)
Google Scholar
Downs, J.S., Holbrook, M.B., Sheng, S., Cranor, L.F.: Are your participants gaming the system?: screening mechanical turk workers. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 2399–2402. ACM (2010)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Article Google Scholar
Howe, J.: The rise of crowdsourcing. Wired Mag. 14(6), 1–4 (2006)
MathSciNet Google Scholar
Ipeirotis, P.G., Provost, F., Wang, J.: Quality management on amazon mechanical turk. In: Proceedings of the ACM SIGKDD workshop on human computation, pp. 64–67. ACM (2010)
Google Scholar
Raykar, V.C., Yu, S.: Ranking annotators for crowdsourced labeling tasks. In: Advances in neural information processing systems, pp. 1809–1817 (2011)
Google Scholar
Ribeiro, F., Florencio, D., Nascimento, V.: Crowdsourcing subjective image quality evaluation. In: 2011 18th IEEE International Conference on Image Processing (ICIP), pp. 3097–3100. IEEE (2011)
Google Scholar
Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622. ACM (2008)
Google Scholar
Venetis, P., Garcia-Molina, H.: Quality control for comparison microtasks. In: Proceedings of the First International Workshop on Crowdsourcing and Data Mining, pp. 15–21. ACM (2012)
Google Scholar
Zaidan, O.F., Callison-Burch, C.: Crowdsourcing translation: professional quality from non-professionals. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 1220–1229. Association for Computational Linguistics (2011)
Google Scholar
Zook, M., Graham, M., Shelton, T., Gorman, S.: Volunteered geographic information and crowdsourcing disaster relief: a case study of the haitian earthquake. World Med. Health Policy 2(2), 7–33 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, University of Central Arkansas, Conway, USA
Bryce Nicholson & Victor S. Sheng
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
Jing Zhang
IT Department, China Executive Leadership Academy Pudong, Shanghai, China
Zhiheng Wang
College of Computer Engineering, Suzhou Vocational University, Suzhou, China
Xuefeng Xian

Authors

Bryce Nicholson
View author publications
You can also search for this author in PubMed Google Scholar
Victor S. Sheng
View author publications
You can also search for this author in PubMed Google Scholar
Jing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xuefeng Xian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bryce Nicholson .

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Centro de Investigación en Computación, Mexico City, Mexico
Grigori Sidorov
Facultad de ciencias, Universidad Autónoma Nacional, México, Distrito Federal, Mexico
Sofía N. Galicia-Haro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nicholson, B., Sheng, V.S., Zhang, J., Wang, Z., Xian, X. (2015). Improving Label Accuracy by Filtering Low-Quality Workers in Crowdsourcing. In: Sidorov, G., Galicia-Haro, S. (eds) Advances in Artificial Intelligence and Soft Computing. MICAI 2015. Lecture Notes in Computer Science(), vol 9413. Springer, Cham. https://doi.org/10.1007/978-3-319-27060-9_45

Download citation

DOI: https://doi.org/10.1007/978-3-319-27060-9_45
Published: 30 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27059-3
Online ISBN: 978-3-319-27060-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving Label Accuracy by Filtering Low-Quality Workers in Crowdsourcing

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Quality Control for Crowdsourced Multi-label Classification Using RAkEL

Crowdsourcing label quality: a theoretical analysis

A formalized framework for incorporating expert labels in crowdsourcing environment

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Improving Label Accuracy by Filtering Low-Quality Workers in Crowdsourcing

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Quality Control for Crowdsourced Multi-label Classification Using RAkEL

Crowdsourcing label quality: a theoretical analysis

A formalized framework for incorporating expert labels in crowdsourcing environment

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation