A Lightweight Combinatorial Approach for Inferring the Ground Truth from Multiple Annotators

Liu, Xiang; Li, Liyun; Memon, Nasir

doi:10.1007/978-3-642-39712-7_47

Xiang Liu²⁰,
Liyun Li²¹ &
Nasir Memon²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7988))

Included in the following conference series:

International Workshop on Machine Learning and Data Mining in Pattern Recognition

4407 Accesses
1 Citations

Abstract

With the increasing importance of producing large-scale labeled datasets for training, testing and validation, services such as Amazon Mechanical Turk (MTurk) are becoming more and more popular to replace the tedious task of manual labeling finished by hand. However, annotators in these crowdsourcing services are known to exhibit different levels of skills, consistencies and even biases, making it difficult to estimate the ground truth class label from the imperfect labels provided by these annotators. To solve this problem, we present a discriminative approach to infer the ground truth class labels by mapping both annotators and the tasks into a low-dimensional space. Our proposed model is inherently combinatorial and therefore does not require any prior knowledge about the annotators or the examples, thereby providing more simplicity and computational efficiency than the state-of-the-art Bayesian methods. We also show that our lightweight approach is, experimentally on real datasets, more accurate than either majority voting or weighted majority voting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and Fast - But is it Good? Evaluating NonExpert Annotations for Natural Language Tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 254–263. ACL (2008)
Google Scholar
Sorokin, A., Forsyth, D.: Utility data annotation with Amazon Mechanical Turk. In: Proc. of CVPR 2008, pp. 1–8 (2008)
Google Scholar
Amazon Mechanical Turk, https://www.mturk.com/mturk/welcome
Raykar, V.C., Yu, S., Zhao, L.H., Jerebko, A., Florin, C., Valadez, G.H., Bogoni, L., Moy, L.: Supervised Learning from Multiple Experts: Whom to trust when everyone lies a bit. In: Proc. of ICML 2009, pp. 889–896 (2009)
Google Scholar
Smyth, P., Fayyad, U., Burl, M., Perona, P., Baldi, P.: Inferring Ground Truth from Subjective Labelling of Venus Images. Advances in Neural Information Processing Systems 7, 1085–1092 (1995)
Google Scholar
Yan, Y., Rosales, R., Fung, G., Schmidt, M.W., Valadez, G.H., Bogoni, L., Moy, L., Dy, J.G.: Modeling annotator expertise: Learning when everybody knows a bit of something. Journal of Machine Learning Research - Proceedings Track (JMLR) 9, 932–939 (2010)
Google Scholar
Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceeding of the 14th International Conference on Knowledge Discovery and Data Mining (KDD), pp. 614–622 (2008)
Google Scholar
Raykar, V.C., Yu, S., Zhao, L.H., Valadez, G.H., Florin, C., Bogoni, L.: Linda Moy: Learning From Crowds. Journal of Machine Learning Research 11, 1297–1322 (2010)
MathSciNet Google Scholar
Whitehill, J., Ruvolo, P., Wu, T.: Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In: NIPS (2009)
Google Scholar
Welinder, P., Branson, S., Belongie, S., Perona, P.: The multidimensional wisdom of crowds. In: Advances in Neural Information Processing Systems, NIPS (2010)
Google Scholar
Ertekin, S., Hirsh, H., Rudin, C.: Approximating the wisdom of the crowd. In: Proceedings of the Workshop on Computational Social Science and the Wisdom of Crowds (2011)
Google Scholar
Kamar, E., Hacker, S., Horvitz, E.: Combining human and machine intelligence in largescale crowdsourcing. In: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems, pp. 467–474 (2012)
Google Scholar
Karger, D.R., Oh, S., Shah, D.: Iterative learning for reliable crowdsourcing systems. In: Advances in Neural Information Processing Systems, vol. 24, pp. 1953–1961 (2011)
Google Scholar
Dawid, A.P., Skene, A.M.: Maximum likeihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society 28(1), 20–28 (1979)
Google Scholar
Zhou, D., Platt, J., Basu, S., Mao, Y.: Learning from the Wisdom of Crowds by Minimax Entropy. In: NIPS (2012)
Google Scholar
Basri, R., Felzenszwalb, P.F., Girshick, R.B., Jacobs, D.W., Klivans, C.J.: Visibility constraints on features of 3D objects. In: CVPR, pp. 1231–1238 (2009)
Google Scholar
Blum, A.: Empirical support for winnow and weighted-Majority algorithms: results on a calendar scheduling domain. Machine Learning 26, 5–23 (1997)
Article Google Scholar
Frank, Eibe, et al.: Weka. Data Mining and Knowledge Discovery Handbook, 1305–1314 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Metrotech Center, Polytechnic Institute of New York University, Brooklyn, NY, USA, 11201
Xiang Liu & Nasir Memon
Linkedin Corporation, 2029 Stierling Ct, Mountain View, 94043, USA
Liyun Li

Authors

Xiang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Liyun Li
View author publications
You can also search for this author in PubMed Google Scholar
Nasir Memon
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Vision and Applied Computer Sciences, IBaI, Leipzig, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, X., Li, L., Memon, N. (2013). A Lightweight Combinatorial Approach for Inferring the Ground Truth from Multiple Annotators. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2013. Lecture Notes in Computer Science(), vol 7988. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39712-7_47

Download citation

DOI: https://doi.org/10.1007/978-3-642-39712-7_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39711-0
Online ISBN: 978-3-642-39712-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics