Skip to main content

Advertisement

Log in

A worker clustering-based approach of label aggregation under the belief function theory

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Crowdsourcing platforms have been attracting a wide attention in the field of artificial intelligence in recent years, providing a cheap and reachable human-powered resource to gather massive labeled data. These data are used to effectively build supervised learning models for academic research puposes. However, despite the attractiveness of these systems, the major concern has always been the quality of the collected labels. In fact, a wide range of workers contributes in labeling data leading to be in possession of potentially noisy and imperfect labels. Therefore in this paper, we propose a new label aggregation technique that allows to determine workers qualities via a clustering process and then represent and combine their labels to estimate the final one under the belief function theory. This latter is notorious for its strength and flexibility when dealing with imperfect information. Experimental results demonstrate that our proposed method outperforms the related work baseline and improves results quality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Howe J (2006) The rise of crowdsourcing. Wired Mag 14(6):1–4

    Google Scholar 

  2. Zheng Y, Wang J, Li G, Feng J (2015) QASCA: a quality-aware task assignment system for crowdsourcing applications. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, pp 1031–1046

  3. Yan T, Kumar V, Ganesan D (2010) Crowdsearch: exploiting crowds for accurate real-time image search on mobile phones. In: 8th international conference on mobile systems, applications, and services, pp 77–90

  4. Vondrick C, Patterson D, Ramanan D (2013) Efficiently scaling up crowdsourced video annotation. In: International journal of computer vision, pp 184–204

  5. Snow R, O’Connor B, Jurafsky D, Ng YA (2008) Cheap and fast but is it good? Evaluation non-expert annotations for natural language tasks. In: The conference on empirical methods in natural languages processing, pp 254–263

  6. Shafer G (1976) A mathematical theory of evidence, vol 1. Princeton University Press, Princeton

    MATH  Google Scholar 

  7. Dempster AP (1967) Upper and lower probabilities induced by a multivalued mapping. Ann Math Stat 219:325–339

    Article  MathSciNet  MATH  Google Scholar 

  8. Jousselme A-L, Grenier D, Bossé É (2001) A new distance between two bodies of evidence. In: Information fusion, pp 91–101

  9. Downs JS, Holbrook MB, Sheng S, Cranor LF (2010) Are your participants gaming the system?: screening mechanical turk workers. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 2399–2402

  10. Lefèvre E, Elouedi Z (2013) How to preserve the confict as an alarm in the combination of belief functions? Decis Support Syst 56:326–333

    Article  Google Scholar 

  11. Lee K, Caverlee J, Webb S (2010) The social honeypot project: protecting online communities from spammers. In: International World Wide Web conference, pp 1139–1140

  12. Smets P (1990) The combination of evidence in the transferable belief model. IEEE Trans Pattern Anal Mach Intell 12(5):447–458

    Article  Google Scholar 

  13. Raykar VC, Yu S, Zhao LH, Jerebko A, Florin C, Valadez GH, Bogoni L, Moy L (2009) Supervised learning from multiple experts: whom to trust when everyone lies a bit. In: Proceedings of the 26th annual international conference on machine learning, pp 889–896

  14. Raykar VC, Yu S, Zhao LH, Valadez GH, Florin C, Bogoni L, Moy L (2010) Learning from crowds. In: Journal of machine learning research, pp 1297–1322

  15. Dawid AP, Skene AM (2010) Maximum likelihood estimation of observer error-rates using the EM algorithm. Appl Stat 28:20–28

    Article  Google Scholar 

  16. Khattak FK, Salleb A (2011) Quality control of crowd labeling through expert evaluation. In: The neural information processing systems 2nd workshop on computational social science and the wisdom of crowds, pp 27–29

  17. Sheng VS, Provost F, Ipeirotis P (2008) Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 614– 622

  18. Smets P, Mamdani A, Dubois D, Prade H (1988) Non standard logics for automated reasoning. Academic Press, London, pp 253–286

    MATH  Google Scholar 

  19. Whitehill JT, Bergsma J, R Movellan J, L Ruvolo P (2009) Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Neural information processing systems, pp 2035–2043

  20. Ipeirotis P (2010) Worker evaluation in crowdsourcing : Gold data or multiple workers? http://www.behind-the-enemylines.com/2010/09/worker-evaluation-in-crowdsourcinggold.html

  21. Abassi L, Boukhris I (2016) Crowd label aggregation under a belief function framework. In: International conference on knowledge science, engineering and management. Springer, pp 185–196

  22. Abassi L, Boukhris I (2017) A gold Standards-Based crowd label aggregation within the belief function theory. In: International conference on industrial, engineering and other applications of applied intelligent systems. Springer, pp 97–106

  23. Abassi L, Boukhris I (2017) Iterative aggregation of crowdsourced tasks within the belief function theory. In: European conference on symbolic and quantitative approaches to reasoning and uncertainty. Springer, pp 159–168

  24. Frank A (1987) UCI Machine learning repository. http://archive.ics.uci.edu/ml

  25. Georgescu M, Zhu X (2014) Aggregation of crowdsourced labels based on worker history. In: Proceedings of the 4th international conference on web intelligence, mining and semantics, pp 1–11

  26. Kuncheva L et al (2003) Limits on the majority vote accuracy in classifier fusion. Pattern Anal Appl 6:22–31

    Article  MathSciNet  MATH  Google Scholar 

  27. Welinder P, Branson S, Perona P, Belongie JS (2010) The multidimensional wisdom of crowds. In: Neural information processing systems, pp 2424–2432

  28. Feng S, Xing L, Gogar A, Choi Y (2012) Distributional footprints of deceptive product reviews. In: AAAI, pp 98–105

  29. Jindal N, Liu B (2008) Opinion spam and analysis. In: Proceedings of the 2008 international conference on Web Search and Data Mining (WSDM ’08), pp 219–230

  30. Ott M, Choi Y, Cardie C, Hancock JT (2011) Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, pp 309–319

  31. Ben Rjab A, Kharoune M, Miklos Z, Martin A (2016) Characterization of experts in crowdsourcing platforms. In: International conference, BELIEF 2016, pp 97–104

  32. Ipeirotis P, Provost F, Wang J (2010) Quality management on amazon mechanical turk. In: Proceedings of the ACM SIGKDD workshop on human computation, pp 64–67

  33. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceeding of the 5th Berkeley symposium on mathematical statistics and probability, pp 281–297

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lina Abassi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abassi, L., Boukhris, I. A worker clustering-based approach of label aggregation under the belief function theory. Appl Intell 49, 53–62 (2019). https://doi.org/10.1007/s10489-018-1209-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-018-1209-z

Keywords

Navigation