Skip to main content

Advertisement

Log in

Agreement/disagreement based crowd labeling

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In many supervised learning problems, determining the true labels of training instances is expensive, laborious, and even practically impossible. As an alternative approach, it is much easier to collect multiple subjective (possibly noisy) labels from human labelers, especially with the crowdsourcing services such as Amazon’s Mechanical Turk. The collected labels are then aggregated to estimate the true labels. In order to reduce the negative effects of novices, spammers, and malicious labelers, it necessitates taking into account the accuracies of the labelers. However, in the absence of true labels, we miss the main source of information to estimate the labeler accuracies. This paper demonstrates that the agreements or disagreements among labeler opinions are useful sources of information and facilitate the accuracy estimation problem. We represent this estimation problem as an optimization problem which its goal is to minimize the differences between the analytical probabilities of disagreements based on estimated accuracies and the probabilities of disagreements according to the provided labels. We present an efficient semi-exhaustive search method to solve this optimization problem. Our experiments on the simulated data and three real datasets show that the proposed method is a promising idea in this emerging new area. The source code of the proposed method is available for downloading at http://ceit.aut.ac.ir/~amirkhani.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. AMT (found at http://www.mturk.com) is an online labor market where uncertified workers complete small tasks receiving small amounts of money.

  2. Other references are reported in the rest of this section.

  3. Different papers use different (possibly conflicting) epithets for these categories. In this paper, we use the terms proposed in [17].

  4. These two parameters model the accuracy of the labeler separately for each class.

  5. We adopt a simple approach for convenience. Yet more realistic assumptions and more advanced techniques can be applied.

  6. This information is explicitly used in [29].

  7. When there is no clear majority, this information is useless and we must use other types of information.

  8. It is clear that one labeler must be better than the other and equality information is useless.

References

  1. Howe J (2006) The rise of crowdsourcing. Wired Mag 14:1–4

    Google Scholar 

  2. Pontin J (2007) Artificial intelligence: with help from the humans. The New York Times

  3. Vondrick C, Patterson D, Ramanan D (2013) Efficiently scaling up crowdsourced video annotation. Int J Comput Vis 101:184–204

    Article  Google Scholar 

  4. Jones GJ (2013) An introduction to crowdsourcing for language and multimedia technology research. In: Information retrieval meets information visualization. Springer, Berlin, pp 132–154

    Chapter  Google Scholar 

  5. Marcus A, Wu E, Karger DR et al (2011) Crowdsourced databases: query processing with people. In: 5th biennial conference on innovative data systems research, Asilomar, CA, USA, pp 211–214

    Google Scholar 

  6. Snow R, O’Connor B, Jurafsky D, Ng A (2008) Cheap and fast—but is it good? Evaluating non-expert annotations for natural language tasks. In: The conference on empirical methods in natural language processing, pp 254–263

    Chapter  Google Scholar 

  7. Zaidan OF, Callison-Burch C (2011) Crowdsourcing translation: professional quality from non-professionals. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, pp 1220–1229

    Google Scholar 

  8. Bernstein MS, Little G, Miller RC et al (2010) Soylent: a word processor with a crowd inside. In: Proceedings of the 23nd annual ACM symposium on user interface software and technology, pp 313–322

    Chapter  Google Scholar 

  9. Urbano J, Morato J, Marrero M, Martín D (2010) Crowdsourcing preference judgments for evaluation of music similarity tasks. In: ACM SIGIR workshop on crowdsourcing for search evaluation, pp 9–16

    Google Scholar 

  10. Grady C, Lease M (2010) Crowdsourcing document relevance assessment with mechanical turk. In: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s mechanical turk, pp 172–179

    Google Scholar 

  11. Ahn LV, Maurer B, McMillen C et al (2008) Re-CAPTCHA: human-based character recognition via web security measures. Science 5895(321):1465–1468

    Article  Google Scholar 

  12. Muhammadi J, Rabiee HR (2013) Crowd computing: a survey. arXiv:1301.2774

  13. Sheng VS, Provost F, Ipeirotis PG (2008) Get another label? Improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 614–622

    Chapter  Google Scholar 

  14. Dekel O, Shamir O (2009) Good learners for evil teachers. In: Proceedings of the 26th international conference on machine learning, Montreal, Canada, pp 233–240

    Google Scholar 

  15. Raykar VC, Yu S, Zhao LH et al (2010) Learning from crowds. J Mach Learn Res 11:1297–1322

    MathSciNet  Google Scholar 

  16. Whitehill J, Ruvolo P, Wu T et al (2009) Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Advances in neural information processing systems, pp 2035–2043

    Google Scholar 

  17. Raykar VC, Yu S (2012) Eliminating spammers and ranking annotators for crowdsourced labeling tasks. J Mach Learn Res 13:491–518

    MATH  MathSciNet  Google Scholar 

  18. Ipeirotis PG, Provost F, Wang J (2010) Quality management on amazon mechanical turk. In: The ACM SIGKDD workshop on human computation. ACM, New York, pp 64–67

    Chapter  Google Scholar 

  19. Dawid AP, Skene AM (1979) Maximum likelihood estimation of observer error-rates using the EM algorithm. Appl Stat 28:20–28

    Article  Google Scholar 

  20. Karger DR, Oh S, Shah D (2011) Iterative learning for reliable crowdsourcing systems. In: Neural information processing systems (NIPS)

    Google Scholar 

  21. Khattak FK, Salleb-Aouissi A (2011) Quality control of crowd labeling through expert evaluation. In: Proceedings of the NIPS 2nd workshop on computational social science and the wisdom of crowds

    Google Scholar 

  22. Wauthier FL, Jordan MI (2011) Bayesian bias mitigation for crowdsourcing. In: Neural information processing systems (NIPS)

    Google Scholar 

  23. Smyth P, Fayyad U, Burl M et al (1995) Inferring ground truth from subjective labelling of venus images. In: Advances in neural information processing systems, pp 1085–1092

    Google Scholar 

  24. Wiebe JM, Bruce RF, O’Hara TP (1999) Development and use of a gold-standard data set for subjectivity classifications. In: Proceedings of the 37th annual meeting of the association for computational linguistics on computational linguistics, pp 246–253

    Chapter  Google Scholar 

  25. Eagle N (2009) Txteagle: mobile crowdsourcing. In: Internationalization, design and global development. Springer, Berlin, pp 447–456

    Chapter  Google Scholar 

  26. Yan Y, Rosales R, Fung G et al (2010) Modeling annotator expertise: learning when everybody knows a bit of something. In: International conference on artificial intelligence and statistics, pp 932–939

    Google Scholar 

  27. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc, Ser B 39:1–38

    MATH  MathSciNet  Google Scholar 

  28. Kajino H, Tsuboi Y, Kashima H (2012) A convex formulation for learning from crowds. In: Proceedings of the 26th AAAI conference on artificial intelligence, pp 73–79

    Google Scholar 

  29. Karger DR, Oh S, Shah D (2012) Budget-optimal task allocation for reliable crowdsourcing systems. arXiv:1110.3564v3

  30. Dietterich TG (2000) Ensemble methods in machine learning. In: Multiple classifier systems. Springer, Berlin, pp 1–15

    Chapter  Google Scholar 

  31. Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn 40:139–157

    Article  Google Scholar 

  32. Valentini G, Dietterich TG (2004) Bias-variance analysis of support vector machines for the development of SVM-based ensemble methods. J Mach Learn Res 5:725–775

    MATH  MathSciNet  Google Scholar 

  33. Cho SB, Won H-H (2007) Cancer classification using ensemble of neural networks with multiple significant gene subsets. Appl Intell 26:243–250

    Article  MATH  Google Scholar 

  34. Lin Z, Hao Z, Yang X, Liu X (2009) Several SVM ensemble methods integrated with under-sampling for imbalanced data learning. In: Advanced data mining and applications. Springer, Berlin, pp 536–544

    Chapter  Google Scholar 

  35. Maudes J, Rodríguez JJ, García-Osorio C, Pardo C (2011) Random projections for linear SVM ensembles. Appl Intell 34:347–359

    Article  Google Scholar 

  36. Canuto AM, Santos AM, Vargas RR (2011) Ensembles of ARTMAP-based neural networks: an experimental study. Appl Intell 35:1–17

    Article  Google Scholar 

  37. Khor K-C, Ting C-Y, Phon-Amnuaisuk S (2012) A cascaded classifier approach for improving detection rates on rare attack categories in network intrusion detection. Appl Intell 36:320–329

    Article  Google Scholar 

  38. Lee H, Kim E, Pedrycz W (2012) A new selective neural network ensemble with negative correlation. Appl Intell 37:488–498

    Article  Google Scholar 

  39. Wang C-W, You W-H (2013) Boosting-SVM: effective learning with reduced data dimension. Appl Intell 39(3):465–474

    Article  Google Scholar 

  40. Park S, Lee SR (2014) Red tides prediction system using fuzzy reasoning and the ensemble method. Appl Intell 40(2):244–255

    Article  Google Scholar 

  41. Bella A, Ferri C, Hernández-Orallo J, Ramírez-Quintana MJ (2013) On the effect of calibration in classifier combination. Appl Intell 38(4):566–585

    Article  Google Scholar 

  42. Sakar CO, Kursun O, Gurgen F (2014) Ensemble canonical correlation analysis. Appl Intell 40(2):291–304

    Article  Google Scholar 

  43. Fahim M, Fatima I, Lee S, Lee Y-K (2013) EEM: evolutionary ensembles model for activity recognition in smart homes. Appl Intell 38:88–98

    Article  Google Scholar 

  44. Gamberger D, Lavrac N, Dzeroski S (2000) Noise detection and elimination in data preprocessing: experiments in medical domains. Appl Artif Intell 14:205–223

    Article  Google Scholar 

  45. Lallich S, Muhlenbach F, Zighed DA (2002) Improving classification by removing or relabeling mislabeled instances. In: Foundations of intelligent systems. Springer, Berlin, pp 5–15

    Google Scholar 

  46. Verbaeten S, Van Assche A (2003) Ensemble methods for noise elimination in classification problems. In: Multiple classifier systems. Springer, Berlin, pp 317–325

    Chapter  Google Scholar 

  47. Guan D, Yuan W, Lee Y-K, Lee S (2011) Identifying mislabeled training data with the aid of unlabeled data. Appl Intell 35:345–358

    Article  Google Scholar 

  48. Young J, Ashburner J, Ourselin S (2013) Wrapper methods to correct mislabelled training data. In: International workshop on pattern recognition in neuroimaging (PRNI), pp 170–173

    Chapter  Google Scholar 

  49. Wasserman L (2003) All of statistics: a concise course. In: Statistical inference. Springer, Berlin

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Rahmati.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Amirkhani, H., Rahmati, M. Agreement/disagreement based crowd labeling. Appl Intell 41, 212–222 (2014). https://doi.org/10.1007/s10489-014-0516-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-014-0516-2

Keywords

Navigation