Attribute augmentation-based label integration for crowdsourcing

Zhang, Yao; Jiang, Liangxiao; Li, Chaoqun

doi:10.1007/s11704-022-2225-z

Attribute augmentation-based label integration for crowdsourcing

Research Article
Published: 24 December 2022

Volume 17, article number 175331, (2023)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Yao Zhang¹,
Liangxiao Jiang¹ &
Chaoqun Li²

85 Accesses
23 Citations
32 Altmetric
4 Mentions
Explore all metrics

Abstract

Crowdsourcing provides an effective and low-cost way to collect labels from crowd workers. Due to the lack of professional knowledge, the quality of crowdsourced labels is relatively low. A common approach to addressing this issue is to collect multiple labels for each instance from different crowd workers and then a label integration method is used to infer its true label. However, to our knowledge, almost all existing label integration methods merely make use of the original attribute information and do not pay attention to the quality of the multiple noisy label set of each instance. To solve these issues, this paper proposes a novel three-stage label integration method called attribute augmentation-based label integration (AALI). In the first stage, we design an attribute augmentation method to enrich the original attribute space. In the second stage, we develop a filter to single out reliable instances with high-quality multiple noisy label sets. In the third stage, we use majority voting to initialize integrated labels of reliable instances and then use cross-validation to build multiple component classifiers on reliable instances to predict all instances. Experimental results on simulated and real-world crowdsourced datasets demonstrate that AALI outperforms all the other state-of-the-art competitors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Crowdsourcing label quality: a theoretical analysis

Article 15 September 2015

A worker clustering-based approach of label aggregation under the belief function theory

Article 14 June 2018

Crowd Learning with Candidate Labeling: An EM-Based Solution

References

Jiang L, Zhang L, Yu L, Wang D. Class-specific attribute weighted naive Bayes. Pattern Recognition, 2019, 88: 321–330
Article Google Scholar
Dong Y, Jiang L, Li C. Improving data and model quality in crowdsourcing using co-training-based noise correction. Information Sciences, 2022, 583: 174–188
Article Google Scholar
Chen Z, Jiang L, Li C. Label distribution-based noise correction for multiclass crowdsourcing. International Journal of Intelligent Systems, 2022, 37(9): 5752–5767
Article Google Scholar
Zhang N, Xue J, Ma Y, Zhang R, Liang T, Tan Y A. Hybrid sequence-based android malware detection using natural language processing. International Journal of Intelligent Systems, 2021, 36(10): 5770–5784
Article Google Scholar
Hu Y, Ou Z, Xu X, Song M. A crowdsourcing repeated annotations system for visual object detection. In: Proceedings of the 3rd International Conference on Vision, Image and Signal Processing. 2019, 14
Ocquaye E N N, Mao Q, Xue Y, Song H. Cross lingual speech emotion recognition via triple attentive asymmetric convolutional neural network. International Journal of Intelligent Systems, 2021, 36(1): 53–71
Article Google Scholar
Sheng V S, Provost F, Ipeirotis P G. Get another label? Improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 614–622
Tian T, Zhu J, You B. Max-margin majority voting for learning from crowds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(10): 2480–2494
Article Google Scholar
Sheng V S, Zhang J. Machine learning with crowdsourcing: a brief summary of the past research and future directions. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 9837–9843
Zhang J. Knowledge learning with crowdsourcing: a brief review and systematic perspective. IEEE/CAA Journal of Automatica Sinica, 2022, 9(5): 749–762
Article Google Scholar
Dawid A P, Skene A M. Maximum likelihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics), 1979, 28(1): 20–28
Google Scholar
Demartini G, Difallah D E, CudrÃ©-Mauroux P. ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: Proceedings of the 21st International Conference on World Wide Web. 2012, 469–478
Raykar V C, Yu S, Zhao L H, Valadez G H, Florin C, Bogoni L, Moy L. Learning from crowds. The Journal of Machine Learning Research, 2010, 11: 1297–1322
MathSciNet Google Scholar
Gemalmaz M A, Yin M. Accounting for confirmation bias in crowdsourced label aggregation. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence. 2021, 1729–1735
Whitehill J, Ruvolo P, Wu T, Bergsma J, Movellan J. Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Proceedings of the 22nd International Conference on Neural Information Processing Systems. 2009, 2035–2043
Han T, Sun H, Song Y, Fang Y, Liu X. Find truth in the hands of the few: acquiring specific knowledge with crowdsourcing. Frontiers of Computer Science, 2021, 15(4): 154315
Article Google Scholar
Zhang J, Wu X. Multi-label truth inference for crowdsourcing using mixture models. IEEE Transactions on Knowledge and Data Engineering, 2021, 33(5): 2083–2095
Google Scholar
Rodrigues F, Pereira F C. Deep learning from crowds. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. 2018, 1611–1618
Guan M Y, Gulshan V, Dai A M, Hinton G E. Who said what: modeling individual labelers improves classification. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. 2018, 3109–3118
Atarashi K, Oyama S, Kurihara M. Semi-supervised learning from crowds using deep generative models. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. 2018, 1555–1562
Li S Y, Huang S J, Chen S. Crowdsourcing aggregation with deep Bayesian learning. Science China Information Sciences, 2021, 64(3): 130104
Article MathSciNet Google Scholar
Sheng V S, Zhang J, Gu B, Wu X. Majority voting and pairing with multiple noisy labeling. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(7): 1355–1368
Article Google Scholar
Tao F, Jiang L, Li C. Label similarity-based weighted soft majority voting and pairing for crowdsourcing. Knowledge and Information Systems, 2020, 62(7): 2521–2538
Article Google Scholar
Tao F, Jiang L, Li C. Differential evolution-based weighted soft majority voting for crowdsourcing. Engineering Applications of Artificial Intelligence, 2021, 106: 104474
Article Google Scholar
Karger D R, Oh S, Shah D. Budget-optimal task allocation for reliable crowdsourcing systems. Operations Research, 2014, 62(1): 1–24
Article Google Scholar
Li H, Yu B. Error rate bounds and iterative weighted majority voting for crowdsourcing. 2014, arXiv preprint arXiv: 1411.4086
Zhang J, Wu X, Sheng V S. Imbalanced multiple noisy labeling. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(2): 489–503
Article Google Scholar
Zhang J, Sheng V S, Wu J, Wu X. Multi-class ground truth inference in crowdsourcing with clustering. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(4): 1080–1085
Article Google Scholar
Zhang J, Wu M, Sheng V S. Ensemble learning from crowds. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(8): 1506–1519
Article Google Scholar
Jiang L, Zhang H, Tao F, Li C. Learning from crowds with multiple noisy label distribution propagation. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(11): 6558–6568
Article Google Scholar
Zhang J, Sheng V S, Nicholson B, Wu X. CEKA: a tool for mining the wisdom of crowds. The Journal of Machine Learning Research, 2015, 16(1): 2853–2858
MathSciNet Google Scholar
Witten I H, Frank E, Hall M A. Data Mining: Practical Machine Learning Tools and Techniques. 3rd ed. Morgan Kaufmann: Elsevier, 2011
Google Scholar
Langley P, Iba W, Thompson K. An analysis of Bayesian classifiers. In: Proceedings of the Tenth National Conference on Artificial Intelligence. 1992, 223–228
Quinlan J R. C4.5: Programs for Machine Learning. San Mateo: Morgan Kaufmann Publishers, 1993
Google Scholar
le Cessie S, van Houwelingen J C. Ridge estimators in logistic regression. Journal of the Royal Statistical Society: Series C (Applied Statistics), 1992, 41(1): 191–201
Google Scholar
Alcala-Fdez J, FernÃ¡ndez A, Luengo J, Derrac J, GarcÃ-a S, SÃ¡nchez L, Herrera H. KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic and Soft Computing, 2011, 17(2–3): 255–287
Google Scholar
Demšar J. Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 2006, 7: 1–30
MathSciNet Google Scholar
Jiang L, Zhang L, Li C, Wu J. A correlation-based feature weighting filter for naive Bayes. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(2): 201–213
Article Google Scholar
Oliva A, Torralba A. Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, 2001, 42(3): 145–175
Article Google Scholar

Download references

Acknowledgements

The work was supported by the Science and Technology Project of Hubei Province-Unveiling System (2021BEC007) and the Industry-University-Research Innovation Funds for Chinese Universities (2020ITA05008).

Author information

Authors and Affiliations

School of Computer Science, China University of Geosciences, Wuhan, 430074, China
Yao Zhang & Liangxiao Jiang
School of Mathematics and Physics, China University of Geosciences, Wuhan, 430074, China
Chaoqun Li

Authors

Yao Zhang
View author publications
Search author on:PubMed Google Scholar
Liangxiao Jiang
View author publications
Search author on:PubMed Google Scholar
Chaoqun Li
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Liangxiao Jiang.

Additional information

Yao Zhang is currently a MSc student at the School of Computer Science, China University of Geosciences, China. Her research interests mainly include machine learning and data mining (MLDM).

Liangxiao Jiang is currently a professor at the School of Computer Science, China University of Geosciences, China. His research interests mainly include machine learning and data mining (MLDM). In MLDM domains, he has already published more than 90 papers.

Chaoqun Li is currently an associate professor at the School of Mathematics and Physics, China University of Geosciences, China. Her research interests mainly include machine learning and data mining (MLDM). In MLDM domains, she has already published more than 50 papers.

Electronic supplementary material

Augmentation-based Label Integration for Crowdsourcing

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Y., Jiang, L. & Li, C. Attribute augmentation-based label integration for crowdsourcing. Front. Comput. Sci. 17, 175331 (2023). https://doi.org/10.1007/s11704-022-2225-z

Download citation

Received: 15 April 2022
Accepted: 18 August 2022
Published: 24 December 2022
DOI: https://doi.org/10.1007/s11704-022-2225-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attribute augmentation-based label integration for crowdsourcing

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Crowdsourcing label quality: a theoretical analysis

A worker clustering-based approach of label aggregation under the belief function theory

Crowd Learning with Candidate Labeling: An EM-Based Solution

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Augmentation-based Label Integration for Crowdsourcing

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now