Skip to main content
Log in

A novel ground truth inference algorithm based on instance similarity for crowdsourcing learning

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In crowdsourcing system, each instance will be usually labeled multiple times by different workers. After obtaining the multiple noise labels of data, ground truth inference algorithms are used to infer unknown true labels of instances. However, most existing ground truth inference algorithms only utilize the information in the multiple noise labels themselves while ignoring the instance similarity. This paper proposes a novel ground truth inference algorithm based on instance similarity to further improve the performance of ground truth inference. Because similar instances are more likely to be clustered in the same cluster and similar instances are more likely to have similar labels, both fuzzy c-means clustering (FCM) and k-nearest neighbors algorithm (kNN) are used to explore the instance similarity in this paper. Specifically, FCM is firstly used to adjust label distributions of instances. Then the labels of instances are inferred according to their label distributions and kNN algorithm. Based on the instance similarity, the instances with reliable label distributions will influence the instances with unreliable label distributions. The experimental results on benchmark and real-world data sets validate that using the instance similarity can effectively enhance the performance of ground truth inference.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Buhrmester M, Kwang T, Gosling SD (2011) Amazon’s mechanical turk: a new source of inexpensive, yet high-quality, data?. Perspect Psychol Sci 6(1):3–5

    Article  Google Scholar 

  2. Long C, Hua G (2015) Multi-class multi-annotator active learning with robust gaussian process for visual recognition. In: Proceedings of the IEEE international conference on computer vision, pp 2839–2847

  3. Long C, Hua G, Kapoor A (2016) A joint gaussian process model for active visual recognition with expertise estimation in crowdsourcing. Int J Comput Vis 116(2):136–160

    Article  MathSciNet  MATH  Google Scholar 

  4. Rodrigues F, Lourenco M, Ribeiro B, Pereira FC (2017) Learning supervised topic models for classification and regression from crowds. IEEE Trans Pattern Anal Mach Intell 39(12):2409–2422

    Article  Google Scholar 

  5. Rodrigues F, Pereira FC (2018) Heteroscedastic gaussian processes for uncertainty modeling in large-scale crowdsourced traffic data. Transp Res C Emerg Technol 95:636–651

    Article  Google Scholar 

  6. Sheng VS, Provost F, Ipeirotis PG (2008) Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 614–622

  7. Li H, Yu B (2014) Error rate bounds and iterative weighted majority voting for crowdsourcing. Computer Research Repository, abs/1411.4086

  8. Tian T, Zhu J, Qiaoben Y (2019) Max-margin majority voting for learning from crowds. IEEE Trans Pattern Anal Mach Intell 41(10):2480–2494

    Article  Google Scholar 

  9. Tao F, Jiang L, Li C (2020) Label similarity-based weighted soft majority voting and pairing for crowdsourcing. Knowl Inf Syst 62(7):2521–2538

    Article  Google Scholar 

  10. Tao F, Jiang L, Li C (2021) Differential evolution-based weighted soft majority voting for crowdsourcing. Eng Appl Artif Intell 106:104474

    Article  Google Scholar 

  11. Dawid AP, Skene AM (1979) Maximum likelihood estimation of observer error-rates using the em algorithm. Appl Stat 28(1):20–28

    Article  Google Scholar 

  12. Raykar VC, Yu S, Zhao LH, Valadez GH, Florin C, Bogoni L, Moy L (2010) Learning from crowds. J Mach Learn Res 11:1297– 1322

    MathSciNet  Google Scholar 

  13. Zhang J, Wu X, Sheng VS (2015) Imbalanced multiple noisy labeling. IEEE Trans Knowl Data Eng 27(2):489–503

    Article  Google Scholar 

  14. Wu M, Li Q, Zhang J, Cui S, Li D, Qi Y (2017) A robust inference algorithm for crowd sourced categorization. In: 12th International conference on intelligent systems and knowledge engineering, ISKE 2017, Nanjing, China, November 24-26, 2017. IEEE, pp 1–6

  15. Gong W, Liao Z, Mi X, Wang L, Guo Y (2021) Nonlinear equations solving with intelligent optimization algorithms: A survey. Compl Syst Model Simul 1(1):15–32

    Article  Google Scholar 

  16. Zhang J, Sheng VS, Wu J, Wu X (2015) Multi-class ground truth inference in crowdsourcing with clustering. IEEE Trans Knowl Data Eng 28(4):1080–1085

    Article  Google Scholar 

  17. Zhang J, Sheng VS, Wu J (2019) Crowdsourced label aggregation using bilayer collaborative clustering. IEEE Trans Neural Netw Learn Syst 30(10):3172–3185

    Article  Google Scholar 

  18. Guan M, Gulshan V, Dai A, Hinton G (2018) Who said what: Modeling individual labelers improves classification. In: Proceedings of the AAAI conference on artificial intelligence, vol 32

  19. Demartini G, Difallah D E, Cudré-Mauroux P (2012) Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: Proceedings of the 21st World Wide Web Conference 2012, WWW 2012, Lyon, France, April 16-20, 2012. ACM, pp 469–478

  20. Whitehill J, Ruvolo P, Wu T, Bergsma J, Movellan J R (2009) Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In: Advances in neural information processing systems 22: 23rd annual conference on neural information processing systems 2009. Proceedings of a meeting held 7-10 December 2009, Vancouver, British Columbia, Canada. Curran Associates, Inc., pp 2035–2043

  21. Karger D R, Oh S, Shah D (2011) Iterative learning for reliable crowdsourcing systems. In: Advances in neural information processing systems 24: 25th annual conference on neural information processing systems 2011. proceedings of a meeting held 12-14 December 2011, Granada, Spain. ACM, pp 1953–1961

  22. Nicholson B, Zhang J, Sheng VS, Wang Z (2015) Label noise correction methods. In: 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), IEEE, pp 1–9

  23. Jiang L, Zhang H, Tao F, Li C (2021) Learning from crowds with multiple noisy label distribution propagation. IEEE Transactions on Neural Networks and Learning Systems, pp 1–11, https://doi.org/10.1109/TNNLS.2021.3082496

  24. Li C, Jiang L, Xu W (2019) Noise correction to improve data and model quality for crowdsourcing. Eng Appl Artif Intell 82:184–191

    Article  Google Scholar 

  25. Xu W, Jiang L, Li C (2021) Improving data and model quality in crowdsourcing using cross-entropy-based noise correction. Inf Sci 546:803–814

    Article  Google Scholar 

  26. Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst 25(5):845–869

    Article  MATH  Google Scholar 

  27. Zhang J, Sheng VS, Nicholson B, Wu X (2015) Ceka: a tool for mining the wisdom of crowds. J Mach Learn Res 16(1):2853–2858

    MathSciNet  Google Scholar 

  28. Witten IH, Frank E, Hall MA (2011) Data mining: Practical machine learning tools and techniques, 3rd edition. Morgan Kaufmann, Elsevier

  29. Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult-Valued Log Soft Comput 17(2-3):255–287

    Google Scholar 

  30. Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  31. Garcia S, Herrera F (2008) An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. J Mach Learn Res 9:2677–2694

    MATH  Google Scholar 

  32. Zhang J, Wu X, Sheng VS (2016) Learning from crowdsourced labeled data: a survey. Artif Intell Rev 46(4):543–576

    Article  Google Scholar 

  33. Rodrigues F, Pereira F (2018) Deep learning from crowds. In: Proceedings of the AAAI conference on artificial intelligence, vol 32

  34. Oliva A, Torralba A (2001) Modeling the shape of the scene: A holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175

    Article  MATH  Google Scholar 

Download references

Acknowledgements

The work was partially supported by Science and Technology Project of Hubei Province-Unveiling System (Grant No. 2021BEC007), Industry-University-Research Innovation Funds for Chinese Universities (Grant No. 2020ITA05008), and Open Research Project of The Hubei Key Laboratory of Intelligent Geo-Information Processing (Grant No. KLIGIP-2019A03).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chaoqun Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, B., Li, C. & Jiang, L. A novel ground truth inference algorithm based on instance similarity for crowdsourcing learning. Appl Intell 52, 17784–17796 (2022). https://doi.org/10.1007/s10489-022-03433-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03433-3

Keywords

Navigation