Skip to main content

A Lightweight Combinatorial Approach for Inferring the Ground Truth from Multiple Annotators

  • Conference paper
Machine Learning and Data Mining in Pattern Recognition (MLDM 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7988))

Abstract

With the increasing importance of producing large-scale labeled datasets for training, testing and validation, services such as Amazon Mechanical Turk (MTurk) are becoming more and more popular to replace the tedious task of manual labeling finished by hand. However, annotators in these crowdsourcing services are known to exhibit different levels of skills, consistencies and even biases, making it difficult to estimate the ground truth class label from the imperfect labels provided by these annotators. To solve this problem, we present a discriminative approach to infer the ground truth class labels by mapping both annotators and the tasks into a low-dimensional space. Our proposed model is inherently combinatorial and therefore does not require any prior knowledge about the annotators or the examples, thereby providing more simplicity and computational efficiency than the state-of-the-art Bayesian methods. We also show that our lightweight approach is, experimentally on real datasets, more accurate than either majority voting or weighted majority voting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and Fast - But is it Good? Evaluating NonExpert Annotations for Natural Language Tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 254–263. ACL (2008)

    Google Scholar 

  2. Sorokin, A., Forsyth, D.: Utility data annotation with Amazon Mechanical Turk. In: Proc. of CVPR 2008, pp. 1–8 (2008)

    Google Scholar 

  3. Amazon Mechanical Turk, https://www.mturk.com/mturk/welcome

  4. Raykar, V.C., Yu, S., Zhao, L.H., Jerebko, A., Florin, C., Valadez, G.H., Bogoni, L., Moy, L.: Supervised Learning from Multiple Experts: Whom to trust when everyone lies a bit. In: Proc. of ICML 2009, pp. 889–896 (2009)

    Google Scholar 

  5. Smyth, P., Fayyad, U., Burl, M., Perona, P., Baldi, P.: Inferring Ground Truth from Subjective Labelling of Venus Images. Advances in Neural Information Processing Systems 7, 1085–1092 (1995)

    Google Scholar 

  6. Yan, Y., Rosales, R., Fung, G., Schmidt, M.W., Valadez, G.H., Bogoni, L., Moy, L., Dy, J.G.: Modeling annotator expertise: Learning when everybody knows a bit of something. Journal of Machine Learning Research - Proceedings Track (JMLR) 9, 932–939 (2010)

    Google Scholar 

  7. Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceeding of the 14th International Conference on Knowledge Discovery and Data Mining (KDD), pp. 614–622 (2008)

    Google Scholar 

  8. Raykar, V.C., Yu, S., Zhao, L.H., Valadez, G.H., Florin, C., Bogoni, L.: Linda Moy: Learning From Crowds. Journal of Machine Learning Research 11, 1297–1322 (2010)

    MathSciNet  Google Scholar 

  9. Whitehill, J., Ruvolo, P., Wu, T.: Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In: NIPS (2009)

    Google Scholar 

  10. Welinder, P., Branson, S., Belongie, S., Perona, P.: The multidimensional wisdom of crowds. In: Advances in Neural Information Processing Systems, NIPS (2010)

    Google Scholar 

  11. Ertekin, S., Hirsh, H., Rudin, C.: Approximating the wisdom of the crowd. In: Proceedings of the Workshop on Computational Social Science and the Wisdom of Crowds (2011)

    Google Scholar 

  12. Kamar, E., Hacker, S., Horvitz, E.: Combining human and machine intelligence in largescale crowdsourcing. In: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems, pp. 467–474 (2012)

    Google Scholar 

  13. Karger, D.R., Oh, S., Shah, D.: Iterative learning for reliable crowdsourcing systems. In: Advances in Neural Information Processing Systems, vol. 24, pp. 1953–1961 (2011)

    Google Scholar 

  14. Dawid, A.P., Skene, A.M.: Maximum likeihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society 28(1), 20–28 (1979)

    Google Scholar 

  15. Zhou, D., Platt, J., Basu, S., Mao, Y.: Learning from the Wisdom of Crowds by Minimax Entropy. In: NIPS (2012)

    Google Scholar 

  16. Basri, R., Felzenszwalb, P.F., Girshick, R.B., Jacobs, D.W., Klivans, C.J.: Visibility constraints on features of 3D objects. In: CVPR, pp. 1231–1238 (2009)

    Google Scholar 

  17. Blum, A.: Empirical support for winnow and weighted-Majority algorithms: results on a calendar scheduling domain. Machine Learning 26, 5–23 (1997)

    Article  Google Scholar 

  18. Frank, Eibe, et al.: Weka. Data Mining and Knowledge Discovery Handbook, 1305–1314 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liu, X., Li, L., Memon, N. (2013). A Lightweight Combinatorial Approach for Inferring the Ground Truth from Multiple Annotators. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2013. Lecture Notes in Computer Science(), vol 7988. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39712-7_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39712-7_47

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39711-0

  • Online ISBN: 978-3-642-39712-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics