Learning from crowds with decision trees

Yang, Wenjun; Li, Chaoqun; Jiang, Liangxiao

doi:10.1007/s10115-022-01701-9

Learning from crowds with decision trees

Regular Paper
Published: 08 July 2022

Volume 64, pages 2123–2140, (2022)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

507 Accesses
1 Altmetric
Explore all metrics

Abstract

Crowdsourcing systems provide an efficient way to collect labeled data by employing non-expert crowd workers. In practice, each instance obtains a multiple noisy label set from different workers. Ground truth inference algorithms are designed to infer the unknown true labels of data from multiple noisy label sets. Since there is substantial variation among different workers, evaluating the qualities of workers is crucial for ground truth inference. This paper proposes a novel algorithm called decision tree-based weighted majority voting (DTWMV). DTWMV directly takes the multiple noisy label set of each instance as its feature vector; that is, each worker is a feature of instances. Then sequential decision trees are built to calculate the weight of each feature (worker). Finally weighted majority voting is used to infer the integrated labels of instances. In DTWMV, evaluating the qualities of workers is converted to calculating the weights of features, which provides a new perspective for solving the ground truth inference problem. Then, a novel feature weight measurement based on decision trees is proposed. Our experimental results show that DTWMV can effectively evaluate the qualities of workers and improve the label quality of data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A formalized framework for incorporating expert labels in crowdsourcing environment

Article 11 July 2015

Label similarity-based weighted soft majority voting and pairing for crowdsourcing

Article 14 May 2020

Improving crowd labeling using Stackelberg models

Article 26 January 2021

References

Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Multiple-Valued Log Soft Comput 17(2–3):255–287
Google Scholar
Dawid AP, Skene AM (1979) Maximum likelihood estimation of observer error-rates using the EM algorithm. Appl Stat 28(1):20–28
Article Google Scholar
Demartini Gianluca, Difallah Djellel Eddine, Cudré-Mauroux Philippe (2012) Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: Proceedings of the 21st world wide web conference 2012, WWW 2012, Lyon, France, pp 469–478. ACM
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Dong Yu, Jiang L, Li C (2022) Improving data and model quality in crowdsourcing using co-training-based noise correction. Inf Sci 583:174–188
Article Google Scholar
Garcia S, Herrera F (2008) An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. J Mach Learn Res 9:2677–2694
MATH Google Scholar
Geng X (2016) Label distribution learning. IEEE Trans Knowl Data Eng 28(7):1734–1748
Article Google Scholar
Hall MA (2007) A decision tree-based attribute weighting filter for Naive Bayes. Knowl Based Syst 20(2):120–126
Article Google Scholar
Jiang L, Zhang H, Tao F, Li C (2021) Learning from crowds with multiple noisy label distribution propagation. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2021.3082496
Article Google Scholar
Jiang L, Zhang L, Li C, Wu J (2019) A correlation-based feature weighting filter for naive bayes. IEEE Trans Knowl Data Eng 31(2):201–213
Jiang L, Zhang L, Liangjun Yu, Wang D (2019) Class-specific attribute weighted naive bayes. Pattern Recogn 88:321–330
Article Google Scholar
Kamar E, Kapoor A, Horvitz E (2015) Identifying and accounting for task-dependent bias in crowdsourcing. In: Proceedings of the third AAAI conference on human computation and crowdsourcing, HCOMP 2015, San Diego, California, USA, pp 92–101. AAAI Press
Karger DR, Oh S, Shah D (2011) Iterative learning for reliable crowdsourcing systems. In: Advances in neural information processing systems 24: 25th annual conference on neural information processing systems 2011. Proceedings of a meeting held 12–14, Granada, Spain, pp 1953–1961. ACM
Kurve A, Miller DJ, Kesidis G (2015) Multicategory crowdsourcing accounting for variable task difficulty, worker skill, and worker intention. IEEE Trans Knowl Data Eng 27(3):794–809
Article Google Scholar
Li C, Jiang L, Wenqiang X (2019) Noise correction to improve data and model quality for crowdsourcing. Eng Appl Artif Intell 82:184–191
Article Google Scholar
Li C, Sheng VS, Jiang L, Li H (2016) Noise filtering to improve data and model quality for crowdsourcing. Knowl Based Syst 107:96–103
Article Google Scholar
Ma Y Olshevsky A, Szepesvári C, Saligrama V (2018) Gradient descent for sparse rank-one matrix completion for crowd-sourced aggregation of sparsely interacting workers. In: Proceedings of the 35th international conference on machine learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 2018, volume 80 of proceedings of machine learning research, pp 3341–3350. PMLR
Raykar VC, Yu S, Zhao LH, Valadez GH, Florin C, Bogoni L, Moy L (2010) Learning from crowds. J Mach Learn Res 11:1297–1322
MathSciNet Google Scholar
Rodrigues F, Pereira FC (2018) Deep learning from crowds. In: McIlraith SA, Weinberger KQ (eds) Proceedings of the 32th AAAI conference on artificial intelligence, pp 1611–1618
Rodrigues F, Pereira FC, Ribeiro B (2014) Gaussian process classification and active learning with multiple annotators. In: Proceedings of the 31th international conference on machine learning, ICML 2014, Beijing, China, 2014, volume 32 of JMLR workshop and conference proceedings, pp 433–441. JMLR.org
Sheng VS, Provost FJ, Ipeirotis PG (2008) Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, Las Vegas, Nevada, USA, pp 614–622. ACM
Sheng VS, Zhang J (2019) Machine learning with crowdsourcing: a brief summary of the past research and future directions. In: The Thirty-third AAAI conference on artificial intelligence, AAAI 2019, Honolulu, Hawaii, USA, 2019, pp 9837–9843. AAAI Press
Tao F, Jiang L, Li C (2020) Label similarity-based weighted soft majority voting and pairing for crowdsourcing. Knowl Inf Syst 62(7):2521–2538
Article Google Scholar
Tao F, Jiang L, Li C (2021) Differential evolution-based weighted soft majority voting for crowdsourcing. Eng Appl Artif Intell 106:104474
Article Google Scholar
Wang F, Zhang C (2008) Label propagation through linear neighborhoods. IEEE Trans Knowl Data Eng 20(1):55–67
Article MathSciNet Google Scholar
Welinder P, Branson S, Belongie SJ, Perona P (2010) The multidimensional wisdom of crowds. In: Advances in neural information processing systems 23: 24th Annual conference on neural information processing systems 2010. Proceedings of a meeting held 6–9 December 2010, Vancouver, British Columbia, Canada, pp 2424–2432. Curran Associates, Inc
Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, Elsevier
Google Scholar
Wu M, Li Q, Zhang J, Cui S, Li D, Qi Y (2017) A robust inference algorithm for crowd sourced categorization. In: 12th international conference on intelligent systems and knowledge engineering, ISKE 2017, Nanjing, China, 2017, pp 1–6. IEEE
Wenqiang X, Jiang L, Li C (2021) Improving data and model quality in crowdsourcing using cross-entropy-based noise correction. Inf Sci 546:803–814
Article Google Scholar
Zhang H, Jiang L, Xu W (2019) Multiple noisy label distribution propagation for crowdsourcing. In: Proceedings of the twenty-eighth international joint conference on artificial intelligence, IJCAI 2019, Macao, China, 2019, pp 1473–1479. Morgan Kaufmann
Zhang J, Sheng VS, Nicholson B, Xindong W (2015) CEKA: a tool for mining the wisdom of crowds. J Mach Learn Res 16:2853–2858
MathSciNet Google Scholar
Zhang J, Sheng VS, Jian W, Xindong W (2016) Multi-class ground truth inference in crowdsourcing with clustering. IEEE Trans Knowl Data Eng 28(4):1080–1085
Article Google Scholar
Zhang J, Ming W, Sheng VS (2019) Ensemble learning from crowds. IEEE Trans Knowl Data Eng 31(8):1506–1519
Article Google Scholar
Zhang J, Xindong W, Sheng VS (2015) Imbalanced multiple noisy labeling. IEEE Trans Knowl Data Eng 27(2):489–503
Article Google Scholar
Zhang L, Jiang L, Li C, Kong G (2016) Two feature weighting approaches for naive bayes text classifiers. Knowl Based Syst 100:137–144
Article Google Scholar
Zhong J, Yang P, Tang K (2017) A quality-sensitive method for learning from crowds. IEEE Trans Knowl Data Eng 29(12):2643–2654
Article Google Scholar

Download references

Acknowledgements

The work was partially supported by Science and Technology Project of Hubei Province-Unveiling System (2021BEC007), Industry-University-Research Innovation Funds for Chinese Universities (2020ITA05008) and Open Research Project of The Hubei Key Laboratory of Intelligent Geo-Information Processing (KLIGIP-2019A03).

Author information

Authors and Affiliations

School of Mathematics and Physics, China University of Geosciences, Wuhan, 430074, China
Wenjun Yang & Chaoqun Li
School of Computer Science, China University of Geosciences, Wuhan, 430074, China
Liangxiao Jiang

Authors

Wenjun Yang
View author publications
You can also search for this author inPubMed Google Scholar
Chaoqun Li
View author publications
You can also search for this author inPubMed Google Scholar
Liangxiao Jiang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Chaoqun Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, W., Li, C. & Jiang, L. Learning from crowds with decision trees. Knowl Inf Syst 64, 2123–2140 (2022). https://doi.org/10.1007/s10115-022-01701-9

Download citation

Received: 31 July 2021
Revised: 04 June 2022
Accepted: 12 June 2022
Published: 08 July 2022
Issue Date: August 2022
DOI: https://doi.org/10.1007/s10115-022-01701-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning from crowds with decision trees

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A formalized framework for incorporating expert labels in crowdsourcing environment

Label similarity-based weighted soft majority voting and pairing for crowdsourcing

Improving crowd labeling using Stackelberg models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now