A novel ground truth inference algorithm based on instance similarity for crowdsourcing learning

Ma, Ben; Li, Chaoqun; Jiang, Liangxiao

doi:10.1007/s10489-022-03433-3

A novel ground truth inference algorithm based on instance similarity for crowdsourcing learning

Published: 05 April 2022

Volume 52, pages 17784–17796, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

329 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

In crowdsourcing system, each instance will be usually labeled multiple times by different workers. After obtaining the multiple noise labels of data, ground truth inference algorithms are used to infer unknown true labels of instances. However, most existing ground truth inference algorithms only utilize the information in the multiple noise labels themselves while ignoring the instance similarity. This paper proposes a novel ground truth inference algorithm based on instance similarity to further improve the performance of ground truth inference. Because similar instances are more likely to be clustered in the same cluster and similar instances are more likely to have similar labels, both fuzzy c-means clustering (FCM) and k-nearest neighbors algorithm (kNN) are used to explore the instance similarity in this paper. Specifically, FCM is firstly used to adjust label distributions of instances. Then the labels of instances are inferred according to their label distributions and kNN algorithm. Based on the instance similarity, the instances with reliable label distributions will influence the instances with unreliable label distributions. The experimental results on benchmark and real-world data sets validate that using the instance similarity can effectively enhance the performance of ground truth inference.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on instance segmentation: state of the art

Article 03 July 2020

Abdul Mueed Hafiz & Ghulam Mohiuddin Bhat

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

Article Open access 06 November 2019

Najat Ali, Daniel Neagu & Paul Trundle

Scalable decision fusion algorithm for enabling decentralized computation in distributed, big data clustering problems

Article 08 April 2024

H. S. Jennath & S. Asharaf

References

Buhrmester M, Kwang T, Gosling SD (2011) Amazon’s mechanical turk: a new source of inexpensive, yet high-quality, data?. Perspect Psychol Sci 6(1):3–5
Article Google Scholar
Long C, Hua G (2015) Multi-class multi-annotator active learning with robust gaussian process for visual recognition. In: Proceedings of the IEEE international conference on computer vision, pp 2839–2847
Long C, Hua G, Kapoor A (2016) A joint gaussian process model for active visual recognition with expertise estimation in crowdsourcing. Int J Comput Vis 116(2):136–160
Article MathSciNet MATH Google Scholar
Rodrigues F, Lourenco M, Ribeiro B, Pereira FC (2017) Learning supervised topic models for classification and regression from crowds. IEEE Trans Pattern Anal Mach Intell 39(12):2409–2422
Article Google Scholar
Rodrigues F, Pereira FC (2018) Heteroscedastic gaussian processes for uncertainty modeling in large-scale crowdsourced traffic data. Transp Res C Emerg Technol 95:636–651
Article Google Scholar
Sheng VS, Provost F, Ipeirotis PG (2008) Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 614–622
Li H, Yu B (2014) Error rate bounds and iterative weighted majority voting for crowdsourcing. Computer Research Repository, abs/1411.4086
Tian T, Zhu J, Qiaoben Y (2019) Max-margin majority voting for learning from crowds. IEEE Trans Pattern Anal Mach Intell 41(10):2480–2494
Article Google Scholar
Tao F, Jiang L, Li C (2020) Label similarity-based weighted soft majority voting and pairing for crowdsourcing. Knowl Inf Syst 62(7):2521–2538
Article Google Scholar
Tao F, Jiang L, Li C (2021) Differential evolution-based weighted soft majority voting for crowdsourcing. Eng Appl Artif Intell 106:104474
Article Google Scholar
Dawid AP, Skene AM (1979) Maximum likelihood estimation of observer error-rates using the em algorithm. Appl Stat 28(1):20–28
Article Google Scholar
Raykar VC, Yu S, Zhao LH, Valadez GH, Florin C, Bogoni L, Moy L (2010) Learning from crowds. J Mach Learn Res 11:1297– 1322
MathSciNet Google Scholar
Zhang J, Wu X, Sheng VS (2015) Imbalanced multiple noisy labeling. IEEE Trans Knowl Data Eng 27(2):489–503
Article Google Scholar
Wu M, Li Q, Zhang J, Cui S, Li D, Qi Y (2017) A robust inference algorithm for crowd sourced categorization. In: 12th International conference on intelligent systems and knowledge engineering, ISKE 2017, Nanjing, China, November 24-26, 2017. IEEE, pp 1–6
Gong W, Liao Z, Mi X, Wang L, Guo Y (2021) Nonlinear equations solving with intelligent optimization algorithms: A survey. Compl Syst Model Simul 1(1):15–32
Article Google Scholar
Zhang J, Sheng VS, Wu J, Wu X (2015) Multi-class ground truth inference in crowdsourcing with clustering. IEEE Trans Knowl Data Eng 28(4):1080–1085
Article Google Scholar
Zhang J, Sheng VS, Wu J (2019) Crowdsourced label aggregation using bilayer collaborative clustering. IEEE Trans Neural Netw Learn Syst 30(10):3172–3185
Article Google Scholar
Guan M, Gulshan V, Dai A, Hinton G (2018) Who said what: Modeling individual labelers improves classification. In: Proceedings of the AAAI conference on artificial intelligence, vol 32
Demartini G, Difallah D E, Cudré-Mauroux P (2012) Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: Proceedings of the 21st World Wide Web Conference 2012, WWW 2012, Lyon, France, April 16-20, 2012. ACM, pp 469–478
Whitehill J, Ruvolo P, Wu T, Bergsma J, Movellan J R (2009) Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In: Advances in neural information processing systems 22: 23rd annual conference on neural information processing systems 2009. Proceedings of a meeting held 7-10 December 2009, Vancouver, British Columbia, Canada. Curran Associates, Inc., pp 2035–2043
Karger D R, Oh S, Shah D (2011) Iterative learning for reliable crowdsourcing systems. In: Advances in neural information processing systems 24: 25th annual conference on neural information processing systems 2011. proceedings of a meeting held 12-14 December 2011, Granada, Spain. ACM, pp 1953–1961
Nicholson B, Zhang J, Sheng VS, Wang Z (2015) Label noise correction methods. In: 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), IEEE, pp 1–9
Jiang L, Zhang H, Tao F, Li C (2021) Learning from crowds with multiple noisy label distribution propagation. IEEE Transactions on Neural Networks and Learning Systems, pp 1–11, https://doi.org/10.1109/TNNLS.2021.3082496
Li C, Jiang L, Xu W (2019) Noise correction to improve data and model quality for crowdsourcing. Eng Appl Artif Intell 82:184–191
Article Google Scholar
Xu W, Jiang L, Li C (2021) Improving data and model quality in crowdsourcing using cross-entropy-based noise correction. Inf Sci 546:803–814
Article Google Scholar
Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst 25(5):845–869
Article MATH Google Scholar
Zhang J, Sheng VS, Nicholson B, Wu X (2015) Ceka: a tool for mining the wisdom of crowds. J Mach Learn Res 16(1):2853–2858
MathSciNet Google Scholar
Witten IH, Frank E, Hall MA (2011) Data mining: Practical machine learning tools and techniques, 3rd edition. Morgan Kaufmann, Elsevier
Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult-Valued Log Soft Comput 17(2-3):255–287
Google Scholar
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Garcia S, Herrera F (2008) An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. J Mach Learn Res 9:2677–2694
MATH Google Scholar
Zhang J, Wu X, Sheng VS (2016) Learning from crowdsourced labeled data: a survey. Artif Intell Rev 46(4):543–576
Article Google Scholar
Rodrigues F, Pereira F (2018) Deep learning from crowds. In: Proceedings of the AAAI conference on artificial intelligence, vol 32
Oliva A, Torralba A (2001) Modeling the shape of the scene: A holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175
Article MATH Google Scholar

Download references

Acknowledgements

The work was partially supported by Science and Technology Project of Hubei Province-Unveiling System (Grant No. 2021BEC007), Industry-University-Research Innovation Funds for Chinese Universities (Grant No. 2020ITA05008), and Open Research Project of The Hubei Key Laboratory of Intelligent Geo-Information Processing (Grant No. KLIGIP-2019A03).

Author information

Authors and Affiliations

School of Mathematics and Physics, China University of Geosciences, Wuhan, 430074, China
Ben Ma & Chaoqun Li
School of Computer Science, China University of Geosciences, Wuhan, 430074, China
Liangxiao Jiang

Authors

Ben Ma
View author publications
You can also search for this author in PubMed Google Scholar
Chaoqun Li
View author publications
You can also search for this author in PubMed Google Scholar
Liangxiao Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chaoqun Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ma, B., Li, C. & Jiang, L. A novel ground truth inference algorithm based on instance similarity for crowdsourcing learning. Appl Intell 52, 17784–17796 (2022). https://doi.org/10.1007/s10489-022-03433-3

Download citation

Accepted: 22 February 2022
Published: 05 April 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s10489-022-03433-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel ground truth inference algorithm based on instance similarity for crowdsourcing learning

Abstract

Access this article

Similar content being viewed by others

A survey on instance segmentation: state of the art

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

Scalable decision fusion algorithm for enabling decentralized computation in distributed, big data clustering problems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel ground truth inference algorithm based on instance similarity for crowdsourcing learning

Abstract

Access this article

Similar content being viewed by others

A survey on instance segmentation: state of the art

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

Scalable decision fusion algorithm for enabling decentralized computation in distributed, big data clustering problems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation