Learning from crowds with active learning and self-healing

Shu, Zhenyu; Sheng, Victor S.; Li, Jingjing

doi:10.1007/s00521-017-2878-y

Learning from crowds with active learning and self-healing

Original Article
Published: 21 February 2017

Volume 30, pages 2883–2894, (2018)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

531 Accesses
5 Citations
3 Altmetric
Explore all metrics

Abstract

With the development of crowdsourcing, data acquisition for supervised learning from annotators all over the world becomes simple and economical. To improve accuracy, it is nature to obtain multiple noisy labels (i.e., a multiple label set) for each example from the crowd. Then, consensus algorithms can infer the estimated ground truth from the multiple label set for each example. The estimated ground truth is also called an integrated label, which could be a noise. That is, a dataset constructed via integrating the multiple noisy labels for each example in a crowdsourcing dataset (called an integrated dataset) still contains noises. In order to further improve the data quality of an integrated dataset, so that to improve the performance of a model learned from the integrated dataset, this paper proposes a framework that integrates active learning with the self-healing of a model together. With active learning, a limited number of examples from the integrated dataset, which are most likely noises, are selected for the oracle to correct; with the self-healing of a model, the data quality of the integrated dataset can be also improved automatically. From our experimental results on eight simulated crowdsourcing datasets with three popular consensus algorithms, we draw some conclusions as follows. (1) Our proposed framework does improve the performance of a model learned from the integrated dataset. (2) The simple active learning selection strategy based on uncertainty estimation can identify noises in the integrated dataset. (3) Self-healing is efficient and effective to improve the data quality of the integrated dataset, so that it improves the accuracy of a model learned from the integrated dataset. We further investigate our proposed framework on a real-world crowdsourcing dataset collected from Amazon Mechanical Turk, and the above conclusions are sustained.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A formalized framework for incorporating expert labels in crowdsourcing environment

Article 11 July 2015

Learning from Crowds under Experts’ Supervision

Crowd Learning with Candidate Labeling: An EM-Based Solution

References

Lai S, Xu L, Liu K et al (2015) Recurrent convolutional neural networks for text classification. AAAI, pp 2267–2273
Tang K, Paluri M, Fei-Fei L et al (2015) Improving image classification with location context. In: Proceedings of the IEEE international conference on computer vision, pp 1008–1016
Wen X, Shao L, Xue Y, Fang W (2015) A rapid learning algorithm for vehicle classification. Inf Sci 295:395–406
Article Google Scholar
Li J, Li X, Yang B, Sun X (2015) Segmentation-based image copy-move forgery detection scheme. IEEE Trans Inf Forensics Secur 10(3):507–518
Article Google Scholar
Xia Z, Wang X, Sun X, Liu Q, Xiong N (2016) Steganalysis of LSB matching using differences between nonadjacent pixels. Multimedia Tools Appl 75(4):1947–1962
Article Google Scholar
Chen B, Shu H, Coatrieux G, Chen G, Sun X, Coatrieux JL (2015) Color image analysis by quaternion-type moments. J Math Imaging Vis 51(1):124–144
Article MathSciNet Google Scholar
Zheng Y, Jeon B, Xu D et al (2015) Image segmentation by generalized hierarchical fuzzy C-means algorithm. J Intell Fuzzy Syst 28(2):961–973
Google Scholar
Zhou Z, Wang Y, Wu QM et al (2017) Effective and efficient global context verification for image copy detection. IEEE Trans Inf Forensics Secur 12(1):48–63
Article Google Scholar
Xia Z, Wang X, Zhang L et al (2016) A privacy-preserving and copy-deterrence content-based image retrieval scheme in cloud computing. IEEE Trans Inf Forensics Secur 11(11):2594–2608
Article Google Scholar
Fu Z, Wu X, Guan C et al (2016) Toward efficient multi-keyword fuzzy search over encrypted outsourced data with accuracy improvement. IEEE Trans Inf Forensics Secur 11(12): 2706–2716
Article Google Scholar
Li J, Li X, Yang B, Sun X (2015) Segmentation-based image copy--move forgery detection scheme. IEEE Trans Inf Forensics Secur 10(3):507–518
Article Google Scholar
Xia Z, Wang X, Sun X, Wang B (2014) Steganalysis of least significant bit matching using multi-order differences. Secur Commun Netw 7(8):1283–1291
Article Google Scholar
Wu J, Pan S, Zhu X et al (2016) Positive and unlabeled multi-graph learning. IEEE Trans Cybern
Wu J, Pan S, Zhu X et al (2015) Boosting for multi-graph classification. IEEE Trans Cybern 45:416–429
Article Google Scholar
Wu J, Zhu X, Zhang C et al (2014) Bag constrained structure pattern mining for multi-graph classification. IEEE Trans Knowl Data Eng 26:2382–2396
Article Google Scholar
Xintong G, Hongzhi W, Song Y et al (2014) Brief survey of crowdsourcing for data mining. Expert Syst Appl 41:7987–7994
Article Google Scholar
Sheng VS, Provost F, Ipeirotis PG (2008) Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 614–622
Ipeirotis PG, Provost F, Sheng VS et al (2008) Repeated labeling using multiple noisy labelers. Data Min Knowl Disc 28:402–441
Article MathSciNet Google Scholar
Penrose LS (1946) The elementary statistics of majority voting. J R Stat Soc 109:53–57
Article Google Scholar
Raykar VC, Yu S, Zhao LH et al (2010) Learning from crowds. J Mach Learn Res 11:1297–1322
MathSciNet Google Scholar
Demartini G, Difallah D E, Cudré-Mauroux P (2012) ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: Proceedings of the 21st international conference on World Wide Web. ACM, pp 469–478
Liu Q, Steyvers M, Fisher JW et al (2003) On reliable crowdsourcing and the use of ground truth information. The Advancement of Artificial Intelligence. http://www.ics.uci.edu/~ihler/papers/hcomp13.pdf
Settles, Burr (2010) Active learning literature survey. University of Wisconsin, Madison 52:55–66
Lewis, David D, Catlett Jason (1994) Heterogeneous uncertainty sampling for supervised learning. In: Proceedings of the eleventh international conference on machine learning pp 48–156
Blake C, Merz CJ (1998) UCI repository of machine learning databases
Wu J, Pan S, Zhu X et al (2016) SODE: self-adaptive one-dependence estimators for classification. Pattern Recogn 51:358–377
Article Google Scholar
Wu J, Pan S, Zhu X et al (2015) Self-adaptive attribute weighting for Naive Bayes classification. Expert Syst Appl 42:1487–1502
Article Google Scholar
Jiang L, Li C, Wang S, Zhang L (2016) Deep feature weighting for naive Bayes and its application to text classification. Eng Appl Artif Intell 52:26–39
Article Google Scholar
Rahman Mahbubur et al (2015) Smartphone-based hierarchical crowdsourcing for weed identification. Comput Electron Agric 113:14–23
Article Google Scholar
Parry C, Beckjord E, Moser RP et al (2015) It takes a (virtual) village: crowdsourcing measurement consensus to advance survivorship care planning. Transl Behav Med 5:53–59
Article Google Scholar
Crescenzi V, Merialdo P, Qiu D (2014) Crowdsourcing large scale wrapper inference. Distributed and Parallel Databases, pp 1–28
Article Google Scholar
Byun TMA, Halpin PF, Szeredi D (2015) Online crowdsourcing for efficient rating of speech: a validation study. J Commun Disord 53:70–83
Article Google Scholar
Li C, Sheng VS, Jiang L et al (2016) Noise filtering to improve data and model quality for crowdsourcing. Knowl Based Syst 107:96–103
Article Google Scholar
Peer E, Vosgerau J, Acquisti A (2014) Reputation as a sufficient condition for data quality on Amazon Mechanical Turk. Behav Res Methods 46:1023–1031
Article Google Scholar
Raykar VC, Yu S (2011) An entropic score to rank annotators for crowdsourced labeling tasks. In: IEEE third national conference on computer vision, pattern recognition, image processing and graphics (NCVPRIPG)
Tarasov A, Delany SJ, Namee BMac (2014) Dynamic estimation of worker reliability in crowdsourcing for regression tasks: making it work. Expert Syst Appl 41:6190–6210
Article Google Scholar
Hu Q et al (2014) Learning from crowds under experts’ supervision. Advances in knowledge discovery and data mining, pp 200–211
Seung HS, Opper M, Sompolinsky H (1992) Query by committee. In: Proceedings of the fifth annual workshop on computational learning theory, ACM pp 287–294
Brinker K (2003) Incorporating diversity in active learning with support vector machines. ICML 3:59–66
Google Scholar
Settles B, Craven M (2008) An analysis of active learning strategies for sequence labeling tasks. In: Proceedings of the conference on empirical methods in natural language processing, association for computational linguistics, pp 1070–1079
Holub A, Perona P, Burl MC (2008) Entropy-based active learning for object recognition. In: IEEE computer society conference computer vision and pattern recognition workshops, 2008. CVPRW’08, pp 1–8
Zhao L, Sukthankar G, Sukthankar R (2011) Incremental relabeling for active learning with noisy crowdsourced annotations. In: IEEE international conference privacy, security, risk and trust (PASSAT) and 2011 IEEE third international conference on social computing (SocialCom), pp 728–733
Costa J et al (2011) On using crowdsourcing and active learning to improve classification performance. In: IEEE international 11th conference intelligent systems design and applications (ISDA), pp 469–474
Zhang J, Wu X, Sheng VS (2015) Active learning with imbalanced multiple noisy labeling. IEEE Trans Cybern 45:1081–1093
Google Scholar
Breiman Leo (2001) Random forests. Mach Learn 45:5–32
Article Google Scholar
Shu Z, Sheng VS, Zhang Y, et al (2015) Integrating active learning with supervision for crowdsourcing generalization. In: IEEE 14th international conference on machine learning and applications (ICMLA), pp 232–237
Jiang L (2011) Learning random forests for ranking. Front Comput Sci China 5:79–86
Article MathSciNet Google Scholar
Jiang L, Zhang H, Cai Z (2009) A novel bayes model: hidden naive bayes. IEEE Trans Knowl Data Eng 21:1361–1371
Article Google Scholar
Jiang L, Cai Z, Wang D, Zhang H (2012) Improving tree augmented naive bayes for class probability estimation. Knowl-Based Syst 26:239–245
Article Google Scholar
Qiu C, Jiang L, Li C (2015) Not always simple classification: learning super parent for class probability estimation. Expert Syst Appl 42:5433–5440
Article Google Scholar
Gu B, Sheng VS, Wang Z, Ho D, Osman S, Li S (2015) Incremental learning for v-support vector regression. Neural Netw 67:140–150
Article Google Scholar
Gu B, Sheng VS, Li S (2015) Bi-parameter space partition for cost-sensitive SVM. In: Proceedings of the 24th international conference on artificial intelligence. AAAI Press, pp 3532–3539
Gu B, Sheng VS, Tay KY, Romano W, Li S (2015) Incremental support vector learning for ordinal regression. IEEE Trans Neural Netw Learn Syst 26(7):1403–1416
Article MathSciNet Google Scholar
Gu B, Sun X, Sheng VS (2016) Structural minimax probability machine. IEEE Trans Neural Netw Learn Syst. doi:10.1109/TNNLS.2016.2544779
Article MathSciNet Google Scholar
Gu B, Sheng VS (2016) A robust regularization path algorithm for ν-support vector classification. IEEE Trans Neural Netw Learn Syst. doi:10.1109/TNNLS.2016.2527796
Article Google Scholar

Download references

Acknowledgements

The work was supported by the U.S. National Science Foundation under Grant No. IIS-1115417, the National Natural Science Foundation of China under Grant No. 61472267, 61170020, 61440053, and the Natural Science Foundation of Hubei Province under Grant No. 2014CFB913.

Author information

Authors and Affiliations

College of Electronics and Information Engineering, South-Central University for Nationalities, 182 Minyuan Road Hongshan District, Wuhan, Hubei, China
Zhenyu Shu
School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
Victor S. Sheng
Department of Computer Science, University of Central Arkansas, 201 Donaghey Ave, Conway, AR, USA
Victor S. Sheng
College of Electronics and Information Engineering, Hubei University of Economics, 8 Yangchahu Road Jiangxia district, Wuhan, Hubei, China
Jingjing Li

Authors

Zhenyu Shu
View author publications
You can also search for this author in PubMed Google Scholar
Victor S. Sheng
View author publications
You can also search for this author in PubMed Google Scholar
Jingjing Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhenyu Shu.

Ethics declarations

Conflict of interest

We declare that we have no conflicts of interest to this work. The manuscript has been approved by all authors for publication.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shu, Z., Sheng, V.S. & Li, J. Learning from crowds with active learning and self-healing. Neural Comput & Applic 30, 2883–2894 (2018). https://doi.org/10.1007/s00521-017-2878-y

Download citation

Received: 01 March 2016
Accepted: 10 February 2017
Published: 21 February 2017
Issue Date: November 2018
DOI: https://doi.org/10.1007/s00521-017-2878-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning from crowds with active learning and self-healing

Abstract

Access this article

Similar content being viewed by others

A formalized framework for incorporating expert labels in crowdsourcing environment

Learning from Crowds under Experts’ Supervision

Crowd Learning with Candidate Labeling: An EM-Based Solution

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning from crowds with active learning and self-healing

Abstract

Access this article

Similar content being viewed by others

A formalized framework for incorporating expert labels in crowdsourcing environment

Learning from Crowds under Experts’ Supervision

Crowd Learning with Candidate Labeling: An EM-Based Solution

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation