Agreement/disagreement based crowd labeling

Amirkhani, Hossein; Rahmati, Mohammad

doi:10.1007/s10489-014-0516-2

Agreement/disagreement based crowd labeling

Published: 15 February 2014

Volume 41, pages 212–222, (2014)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Hossein Amirkhani¹ &
Mohammad Rahmati¹

500 Accesses
10 Citations
Explore all metrics

Abstract

In many supervised learning problems, determining the true labels of training instances is expensive, laborious, and even practically impossible. As an alternative approach, it is much easier to collect multiple subjective (possibly noisy) labels from human labelers, especially with the crowdsourcing services such as Amazon’s Mechanical Turk. The collected labels are then aggregated to estimate the true labels. In order to reduce the negative effects of novices, spammers, and malicious labelers, it necessitates taking into account the accuracies of the labelers. However, in the absence of true labels, we miss the main source of information to estimate the labeler accuracies. This paper demonstrates that the agreements or disagreements among labeler opinions are useful sources of information and facilitate the accuracy estimation problem. We represent this estimation problem as an optimization problem which its goal is to minimize the differences between the analytical probabilities of disagreements based on estimated accuracies and the probabilities of disagreements according to the provided labels. We present an efficient semi-exhaustive search method to solve this optimization problem. Our experiments on the simulated data and three real datasets show that the proposed method is a promising idea in this emerging new area. The source code of the proposed method is available for downloading at http://ceit.aut.ac.ir/~amirkhani.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

AMT (found at http://www.mturk.com) is an online labor market where uncertified workers complete small tasks receiving small amounts of money.
Other references are reported in the rest of this section.
Different papers use different (possibly conflicting) epithets for these categories. In this paper, we use the terms proposed in [17].
These two parameters model the accuracy of the labeler separately for each class.
We adopt a simple approach for convenience. Yet more realistic assumptions and more advanced techniques can be applied.
This information is explicitly used in [29].
When there is no clear majority, this information is useless and we must use other types of information.
It is clear that one labeler must be better than the other and equality information is useless.

References

Howe J (2006) The rise of crowdsourcing. Wired Mag 14:1–4
Google Scholar
Pontin J (2007) Artificial intelligence: with help from the humans. The New York Times
Vondrick C, Patterson D, Ramanan D (2013) Efficiently scaling up crowdsourced video annotation. Int J Comput Vis 101:184–204
Article Google Scholar
Jones GJ (2013) An introduction to crowdsourcing for language and multimedia technology research. In: Information retrieval meets information visualization. Springer, Berlin, pp 132–154
Chapter Google Scholar
Marcus A, Wu E, Karger DR et al (2011) Crowdsourced databases: query processing with people. In: 5th biennial conference on innovative data systems research, Asilomar, CA, USA, pp 211–214
Google Scholar
Snow R, O’Connor B, Jurafsky D, Ng A (2008) Cheap and fast—but is it good? Evaluating non-expert annotations for natural language tasks. In: The conference on empirical methods in natural language processing, pp 254–263
Chapter Google Scholar
Zaidan OF, Callison-Burch C (2011) Crowdsourcing translation: professional quality from non-professionals. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, pp 1220–1229
Google Scholar
Bernstein MS, Little G, Miller RC et al (2010) Soylent: a word processor with a crowd inside. In: Proceedings of the 23nd annual ACM symposium on user interface software and technology, pp 313–322
Chapter Google Scholar
Urbano J, Morato J, Marrero M, Martín D (2010) Crowdsourcing preference judgments for evaluation of music similarity tasks. In: ACM SIGIR workshop on crowdsourcing for search evaluation, pp 9–16
Google Scholar
Grady C, Lease M (2010) Crowdsourcing document relevance assessment with mechanical turk. In: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s mechanical turk, pp 172–179
Google Scholar
Ahn LV, Maurer B, McMillen C et al (2008) Re-CAPTCHA: human-based character recognition via web security measures. Science 5895(321):1465–1468
Article Google Scholar
Muhammadi J, Rabiee HR (2013) Crowd computing: a survey. arXiv:1301.2774
Sheng VS, Provost F, Ipeirotis PG (2008) Get another label? Improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 614–622
Chapter Google Scholar
Dekel O, Shamir O (2009) Good learners for evil teachers. In: Proceedings of the 26th international conference on machine learning, Montreal, Canada, pp 233–240
Google Scholar
Raykar VC, Yu S, Zhao LH et al (2010) Learning from crowds. J Mach Learn Res 11:1297–1322
MathSciNet Google Scholar
Whitehill J, Ruvolo P, Wu T et al (2009) Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Advances in neural information processing systems, pp 2035–2043
Google Scholar
Raykar VC, Yu S (2012) Eliminating spammers and ranking annotators for crowdsourced labeling tasks. J Mach Learn Res 13:491–518
MATH MathSciNet Google Scholar
Ipeirotis PG, Provost F, Wang J (2010) Quality management on amazon mechanical turk. In: The ACM SIGKDD workshop on human computation. ACM, New York, pp 64–67
Chapter Google Scholar
Dawid AP, Skene AM (1979) Maximum likelihood estimation of observer error-rates using the EM algorithm. Appl Stat 28:20–28
Article Google Scholar
Karger DR, Oh S, Shah D (2011) Iterative learning for reliable crowdsourcing systems. In: Neural information processing systems (NIPS)
Google Scholar
Khattak FK, Salleb-Aouissi A (2011) Quality control of crowd labeling through expert evaluation. In: Proceedings of the NIPS 2nd workshop on computational social science and the wisdom of crowds
Google Scholar
Wauthier FL, Jordan MI (2011) Bayesian bias mitigation for crowdsourcing. In: Neural information processing systems (NIPS)
Google Scholar
Smyth P, Fayyad U, Burl M et al (1995) Inferring ground truth from subjective labelling of venus images. In: Advances in neural information processing systems, pp 1085–1092
Google Scholar
Wiebe JM, Bruce RF, O’Hara TP (1999) Development and use of a gold-standard data set for subjectivity classifications. In: Proceedings of the 37th annual meeting of the association for computational linguistics on computational linguistics, pp 246–253
Chapter Google Scholar
Eagle N (2009) Txteagle: mobile crowdsourcing. In: Internationalization, design and global development. Springer, Berlin, pp 447–456
Chapter Google Scholar
Yan Y, Rosales R, Fung G et al (2010) Modeling annotator expertise: learning when everybody knows a bit of something. In: International conference on artificial intelligence and statistics, pp 932–939
Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc, Ser B 39:1–38
MATH MathSciNet Google Scholar
Kajino H, Tsuboi Y, Kashima H (2012) A convex formulation for learning from crowds. In: Proceedings of the 26th AAAI conference on artificial intelligence, pp 73–79
Google Scholar
Karger DR, Oh S, Shah D (2012) Budget-optimal task allocation for reliable crowdsourcing systems. arXiv:1110.3564v3
Dietterich TG (2000) Ensemble methods in machine learning. In: Multiple classifier systems. Springer, Berlin, pp 1–15
Chapter Google Scholar
Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn 40:139–157
Article Google Scholar
Valentini G, Dietterich TG (2004) Bias-variance analysis of support vector machines for the development of SVM-based ensemble methods. J Mach Learn Res 5:725–775
MATH MathSciNet Google Scholar
Cho SB, Won H-H (2007) Cancer classification using ensemble of neural networks with multiple significant gene subsets. Appl Intell 26:243–250
Article MATH Google Scholar
Lin Z, Hao Z, Yang X, Liu X (2009) Several SVM ensemble methods integrated with under-sampling for imbalanced data learning. In: Advanced data mining and applications. Springer, Berlin, pp 536–544
Chapter Google Scholar
Maudes J, Rodríguez JJ, García-Osorio C, Pardo C (2011) Random projections for linear SVM ensembles. Appl Intell 34:347–359
Article Google Scholar
Canuto AM, Santos AM, Vargas RR (2011) Ensembles of ARTMAP-based neural networks: an experimental study. Appl Intell 35:1–17
Article Google Scholar
Khor K-C, Ting C-Y, Phon-Amnuaisuk S (2012) A cascaded classifier approach for improving detection rates on rare attack categories in network intrusion detection. Appl Intell 36:320–329
Article Google Scholar
Lee H, Kim E, Pedrycz W (2012) A new selective neural network ensemble with negative correlation. Appl Intell 37:488–498
Article Google Scholar
Wang C-W, You W-H (2013) Boosting-SVM: effective learning with reduced data dimension. Appl Intell 39(3):465–474
Article Google Scholar
Park S, Lee SR (2014) Red tides prediction system using fuzzy reasoning and the ensemble method. Appl Intell 40(2):244–255
Article Google Scholar
Bella A, Ferri C, Hernández-Orallo J, Ramírez-Quintana MJ (2013) On the effect of calibration in classifier combination. Appl Intell 38(4):566–585
Article Google Scholar
Sakar CO, Kursun O, Gurgen F (2014) Ensemble canonical correlation analysis. Appl Intell 40(2):291–304
Article Google Scholar
Fahim M, Fatima I, Lee S, Lee Y-K (2013) EEM: evolutionary ensembles model for activity recognition in smart homes. Appl Intell 38:88–98
Article Google Scholar
Gamberger D, Lavrac N, Dzeroski S (2000) Noise detection and elimination in data preprocessing: experiments in medical domains. Appl Artif Intell 14:205–223
Article Google Scholar
Lallich S, Muhlenbach F, Zighed DA (2002) Improving classification by removing or relabeling mislabeled instances. In: Foundations of intelligent systems. Springer, Berlin, pp 5–15
Google Scholar
Verbaeten S, Van Assche A (2003) Ensemble methods for noise elimination in classification problems. In: Multiple classifier systems. Springer, Berlin, pp 317–325
Chapter Google Scholar
Guan D, Yuan W, Lee Y-K, Lee S (2011) Identifying mislabeled training data with the aid of unlabeled data. Appl Intell 35:345–358
Article Google Scholar
Young J, Ashburner J, Ourselin S (2013) Wrapper methods to correct mislabelled training data. In: International workshop on pattern recognition in neuroimaging (PRNI), pp 170–173
Chapter Google Scholar
Wasserman L (2003) All of statistics: a concise course. In: Statistical inference. Springer, Berlin
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Engineering and Information Technology Department, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran
Hossein Amirkhani & Mohammad Rahmati

Authors

Hossein Amirkhani
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Rahmati
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammad Rahmati.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Amirkhani, H., Rahmati, M. Agreement/disagreement based crowd labeling. Appl Intell 41, 212–222 (2014). https://doi.org/10.1007/s10489-014-0516-2

Download citation

Published: 15 February 2014
Issue Date: July 2014
DOI: https://doi.org/10.1007/s10489-014-0516-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Agreement/disagreement based crowd labeling

Abstract

Access this article

Similar content being viewed by others

Crowd Learning with Candidate Labeling: An EM-Based Solution

A unified statistical framework for crowd labeling

Improving crowd labeling using Stackelberg models

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Agreement/disagreement based crowd labeling

Abstract

Access this article

Similar content being viewed by others

Crowd Learning with Candidate Labeling: An EM-Based Solution

A unified statistical framework for crowd labeling

Improving crowd labeling using Stackelberg models

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation