Skip to main content
Log in

Attribute augmentation-based label integration for crowdsourcing

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Crowdsourcing provides an effective and low-cost way to collect labels from crowd workers. Due to the lack of professional knowledge, the quality of crowdsourced labels is relatively low. A common approach to addressing this issue is to collect multiple labels for each instance from different crowd workers and then a label integration method is used to infer its true label. However, to our knowledge, almost all existing label integration methods merely make use of the original attribute information and do not pay attention to the quality of the multiple noisy label set of each instance. To solve these issues, this paper proposes a novel three-stage label integration method called attribute augmentation-based label integration (AALI). In the first stage, we design an attribute augmentation method to enrich the original attribute space. In the second stage, we develop a filter to single out reliable instances with high-quality multiple noisy label sets. In the third stage, we use majority voting to initialize integrated labels of reliable instances and then use cross-validation to build multiple component classifiers on reliable instances to predict all instances. Experimental results on simulated and real-world crowdsourced datasets demonstrate that AALI outperforms all the other state-of-the-art competitors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Jiang L, Zhang L, Yu L, Wang D. Class-specific attribute weighted naive Bayes. Pattern Recognition, 2019, 88: 321–330

    Article  Google Scholar 

  2. Dong Y, Jiang L, Li C. Improving data and model quality in crowdsourcing using co-training-based noise correction. Information Sciences, 2022, 583: 174–188

    Article  Google Scholar 

  3. Chen Z, Jiang L, Li C. Label distribution-based noise correction for multiclass crowdsourcing. International Journal of Intelligent Systems, 2022, 37(9): 5752–5767

    Article  Google Scholar 

  4. Zhang N, Xue J, Ma Y, Zhang R, Liang T, Tan Y A. Hybrid sequence-based android malware detection using natural language processing. International Journal of Intelligent Systems, 2021, 36(10): 5770–5784

    Article  Google Scholar 

  5. Hu Y, Ou Z, Xu X, Song M. A crowdsourcing repeated annotations system for visual object detection. In: Proceedings of the 3rd International Conference on Vision, Image and Signal Processing. 2019, 14

  6. Ocquaye E N N, Mao Q, Xue Y, Song H. Cross lingual speech emotion recognition via triple attentive asymmetric convolutional neural network. International Journal of Intelligent Systems, 2021, 36(1): 53–71

    Article  Google Scholar 

  7. Sheng V S, Provost F, Ipeirotis P G. Get another label? Improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 614–622

  8. Tian T, Zhu J, You B. Max-margin majority voting for learning from crowds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(10): 2480–2494

    Article  Google Scholar 

  9. Sheng V S, Zhang J. Machine learning with crowdsourcing: a brief summary of the past research and future directions. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 9837–9843

  10. Zhang J. Knowledge learning with crowdsourcing: a brief review and systematic perspective. IEEE/CAA Journal of Automatica Sinica, 2022, 9(5): 749–762

    Article  Google Scholar 

  11. Dawid A P, Skene A M. Maximum likelihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics), 1979, 28(1): 20–28

    Google Scholar 

  12. Demartini G, Difallah D E, Cudré-Mauroux P. ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: Proceedings of the 21st International Conference on World Wide Web. 2012, 469–478

  13. Raykar V C, Yu S, Zhao L H, Valadez G H, Florin C, Bogoni L, Moy L. Learning from crowds. The Journal of Machine Learning Research, 2010, 11: 1297–1322

    MathSciNet  Google Scholar 

  14. Gemalmaz M A, Yin M. Accounting for confirmation bias in crowdsourced label aggregation. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence. 2021, 1729–1735

  15. Whitehill J, Ruvolo P, Wu T, Bergsma J, Movellan J. Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Proceedings of the 22nd International Conference on Neural Information Processing Systems. 2009, 2035–2043

  16. Han T, Sun H, Song Y, Fang Y, Liu X. Find truth in the hands of the few: acquiring specific knowledge with crowdsourcing. Frontiers of Computer Science, 2021, 15(4): 154315

    Article  Google Scholar 

  17. Zhang J, Wu X. Multi-label truth inference for crowdsourcing using mixture models. IEEE Transactions on Knowledge and Data Engineering, 2021, 33(5): 2083–2095

    Google Scholar 

  18. Rodrigues F, Pereira F C. Deep learning from crowds. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. 2018, 1611–1618

  19. Guan M Y, Gulshan V, Dai A M, Hinton G E. Who said what: modeling individual labelers improves classification. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. 2018, 3109–3118

  20. Atarashi K, Oyama S, Kurihara M. Semi-supervised learning from crowds using deep generative models. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. 2018, 1555–1562

  21. Li S Y, Huang S J, Chen S. Crowdsourcing aggregation with deep Bayesian learning. Science China Information Sciences, 2021, 64(3): 130104

    Article  MathSciNet  Google Scholar 

  22. Sheng V S, Zhang J, Gu B, Wu X. Majority voting and pairing with multiple noisy labeling. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(7): 1355–1368

    Article  Google Scholar 

  23. Tao F, Jiang L, Li C. Label similarity-based weighted soft majority voting and pairing for crowdsourcing. Knowledge and Information Systems, 2020, 62(7): 2521–2538

    Article  Google Scholar 

  24. Tao F, Jiang L, Li C. Differential evolution-based weighted soft majority voting for crowdsourcing. Engineering Applications of Artificial Intelligence, 2021, 106: 104474

    Article  Google Scholar 

  25. Karger D R, Oh S, Shah D. Budget-optimal task allocation for reliable crowdsourcing systems. Operations Research, 2014, 62(1): 1–24

    Article  MATH  Google Scholar 

  26. Li H, Yu B. Error rate bounds and iterative weighted majority voting for crowdsourcing. 2014, arXiv preprint arXiv: 1411.4086

  27. Zhang J, Wu X, Sheng V S. Imbalanced multiple noisy labeling. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(2): 489–503

    Article  Google Scholar 

  28. Zhang J, Sheng V S, Wu J, Wu X. Multi-class ground truth inference in crowdsourcing with clustering. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(4): 1080–1085

    Article  Google Scholar 

  29. Zhang J, Wu M, Sheng V S. Ensemble learning from crowds. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(8): 1506–1519

    Article  Google Scholar 

  30. Jiang L, Zhang H, Tao F, Li C. Learning from crowds with multiple noisy label distribution propagation. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(11): 6558–6568

    Article  Google Scholar 

  31. Zhang J, Sheng V S, Nicholson B, Wu X. CEKA: a tool for mining the wisdom of crowds. The Journal of Machine Learning Research, 2015, 16(1): 2853–2858

    MathSciNet  Google Scholar 

  32. Witten I H, Frank E, Hall M A. Data Mining: Practical Machine Learning Tools and Techniques. 3rd ed. Morgan Kaufmann: Elsevier, 2011

    Google Scholar 

  33. Langley P, Iba W, Thompson K. An analysis of Bayesian classifiers. In: Proceedings of the Tenth National Conference on Artificial Intelligence. 1992, 223–228

  34. Quinlan J R. C4.5: Programs for Machine Learning. San Mateo: Morgan Kaufmann Publishers, 1993

    Google Scholar 

  35. le Cessie S, van Houwelingen J C. Ridge estimators in logistic regression. Journal of the Royal Statistical Society: Series C (Applied Statistics), 1992, 41(1): 191–201

    MATH  Google Scholar 

  36. Alcala-Fdez J, Fernández A, Luengo J, Derrac J, GarcÃ-a S, Sánchez L, Herrera H. KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic and Soft Computing, 2011, 17(2–3): 255–287

    Google Scholar 

  37. Demšar J. Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 2006, 7: 1–30

    MathSciNet  MATH  Google Scholar 

  38. Jiang L, Zhang L, Li C, Wu J. A correlation-based feature weighting filter for naive Bayes. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(2): 201–213

    Article  Google Scholar 

  39. Oliva A, Torralba A. Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, 2001, 42(3): 145–175

    Article  MATH  Google Scholar 

Download references

Acknowledgements

The work was supported by the Science and Technology Project of Hubei Province-Unveiling System (2021BEC007) and the Industry-University-Research Innovation Funds for Chinese Universities (2020ITA05008).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liangxiao Jiang.

Additional information

Yao Zhang is currently a MSc student at the School of Computer Science, China University of Geosciences, China. Her research interests mainly include machine learning and data mining (MLDM).

Liangxiao Jiang is currently a professor at the School of Computer Science, China University of Geosciences, China. His research interests mainly include machine learning and data mining (MLDM). In MLDM domains, he has already published more than 90 papers.

Chaoqun Li is currently an associate professor at the School of Mathematics and Physics, China University of Geosciences, China. Her research interests mainly include machine learning and data mining (MLDM). In MLDM domains, she has already published more than 50 papers.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Jiang, L. & Li, C. Attribute augmentation-based label integration for crowdsourcing. Front. Comput. Sci. 17, 175331 (2023). https://doi.org/10.1007/s11704-022-2225-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-022-2225-z

Keywords

Navigation