Abstract
Supervised learning with convolutional neural networks has made a great contribution to computer vision largely due to massive labeled samples. However, it is far from adequate available labeled samples for training in many applications. Realistically, annotation is a tedious, time consuming, and costly task while a strong need for specialty-oriented knowledge and skillful expert. Therefore, in order to take full advantage of limited resources to observably reduce the cost of annotation, we propose a noise robust batch mode semi-supervised and active learning framework which named NRMSL-BMAL. When querying labels in an iteration, firstly, a convolutional autoencoder cluster based batch mode active learning strategy is used for querying worthy samples from annotation experts with a cost. Then, a noise robust memorized self-learning is successively used for extending training samples without any annotation cost. Finally, these labeled samples are added to the training set for improving the performance of the target model. We perform a thorough experimental evaluation in image classification tasks, using datasets from different domains, including medical image, natural image, and a real-world application. Our experimental evaluation shows that NRMSL-BMAL is capable to observably reduce the annotation cost range from 44% to 95% while maintaining or even improving the performance of the target model.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
- Active learning
- Semi-supervised learning
- Convolutional autoencoder cluster
- Continuously fine-tuning
- Image classification
1 Introduction
Image classification is a challenging task in computer vision and pattern recognition with a long history. Significantly, convolutional neural networks (CNNs), which is one of the most successful deep learning models, have achieved groundbreaking results owing to massive labeled samples and computational power in the recent few years. Unfortunately, in many real-world applications, there has been far from adequate existing labeled samples for training [25]. Moreover, especially in medical image analysis, annotation is a tedious, time consuming and costly task while a strong need for specialty-oriented knowledge and skillful annotation expert [29]. Therefore, the motivation of this paper is how to take full advantage of limited resources to minimize the need for human annotation while maintaining or even slightly improving the performance of the model.
Semi-supervised learning (SSL) is a kind of method that attempts to exploit the availability of unlabeled samples for the performance improvement of the model. Self-training [28] (we called “self-learning, SL”) is a classic SSL algorithm. Firstly, SL starts with a model that pre-trained on a smaller set from target domain labeled samples. Subsequently, SL attempts to add those relatively certain samples and its respective labels to training set from unlabeled sample pool. The model is re-trained with the incremental training set until reach the mark. Obviously, there is a drawback of SL that noisy labels shall seriously degenerate the performance by itself. Moreover, most learning algorithms, including deep learning, are sensitive to noisy labels. To remit the noisy label issue, [6] proposed a self-paced manifold regularization framework which was inspired by the learning principle of human. And [4] designed a two-stage approach to train AlexNet and ResNet respectively from noisy labels in a semi-supervised manner without any prior knowledge or distribution of noisy labels. Furthermore, lots of existing methods [12, 18, 22] were proposed to address noisy labels.
In this paper, we propose a noise robust memorized self-learning algorithm named NRMSL. Firstly, we attempt to memorize the prediction information of the model in each iteration, and apply it into SSL procedure for reducing noisy labels. Subsequently, we apply a self-error-correcting method, which is inspired by [18], to improve the anti-noise ability of the target model.
Alternatively, active learning (AL) is another similar field which could greatly improve the performance of the target model with fewer labeled samples thanks to some heuristic strategies, such as uncertainty [1, 5, 13]. There are three classical settings [25], including membership query synthesis, stream-based sampling, and pool-based sampling. We focus on pool-based sampling which is the most popular setting recently years. However, in real-world applications, parallel labeling environment may be available that means single mode AL queries in serial may be inefficient. Fortunately, batch mode active learning (BMAL)[2, 25] is exactly the way to remit the parallel labeling problem by selecting more than one sample in each iteration. We consider overlap in information content among those most informative samples through combining uncertainty strategy and convolutional autoencoder (CAE) cluster. Furthermore, based on a mass of researches, we have summarized various related work and listed them in Table 1.
Obviously, there is an essential difference between SL and AL. SL directly obtains label through model while AL querying for label by annotation experts who also called oracle. In other words, AL assurances annotated samples by experts, leading to notable performance improvements, but costly. On the contrary, SL obtains label without the intervention of human annotators while along with noisy labels. A tandem certainty-based AL and self-learning method was proposed in [10], considering both the pool-based scenario and the stream-based scenario. In addition, [10] have summarized overview of previous work combining AL and SSL. [19] is the first work exploring combinations of AL and SL algorithm in 1998. However, those work mentioned above just skillfully combined AL and SL seldom considering noisy labels. According to these important observations, we propose a framework named NRMSL-BMAL to combine NRMSL and BMAL naturally.
Contributions of this paper can be summarized as follows. (1) We propose a memorized self-learning (MSL) algorithm to reduce noisy labels owing to the memorized information of historical model’s prediction. Furthermore, to reduce the impact of noise labels, we apply a noisy label self-adjusting method in an effective way to MSL, named NRMSL. (2) We propose a framework for taking full advantage of limited resources that interactively integrated NRMSL, CAE cluster based BMAL and other technologies, such as transfer learning and continuously fine-tuning. (3) We perform a thorough experimental evaluation in image classification task and show that NRMSL-BMAL can reduce annotation cost range from 44% to 95%, using datasets from different domains as detailed in Sect. 3.1. (4) We verify that these samples queried by NRMSL-BMAL are representative.
2 Proposed Framework
2.1 NRMSL-BMAL Framework
NRMSL-BMAL framework includes the following four cores. (1) Transfer learning: NRMSL-BMAL initializes with a pre-trained model that source domain can be changed flexibly, and ImageNet [24] was used in this paper. (2) Then, CAE cluster based BMAL method is used for querying those most uncertain and diverse samples from unlabeled sample pool. (3) NRMSL algorithm is successively used for extending training samples without any annotation cost. (4) Fine-tuning the target model continuously by these incremental labeled samples. Finally, these four parts were synergistically integrated into a framework.
In another point of view, as we can see from Fig. 1, directly perceived through the senses of NRMSL-BMAL framework. It works in an incremental way, improving the performance of the target model with increasing labeled samples. In each iteration, firstly, we calculate the corresponding prediction probability value from the softmax layer of each unlabeled sample without real labels. Secondly, an uncertainty indicator, such as entropy, BvSB [13], least confidence, and smallest margin [25], will be calculated according to these probability value. In this paper, we use entropy for binary classification and BvSB for multi-classification. Thirdly, CAE cluster is the next step to gather similar samples into a pre-defined set of clusters as shown in the 3rd pentagon of Fig. 1. Furthermore, we innovatively apply a self noisy label adjusting method into NRMSL procedure as detailed in [18], using a self-error-correcting softmax loss to adaptively switch between the noisy label or the max-activated neuron and a Bernoulli distribution to decide whether to select the max-activated label (see details in Sect. 2.2). Finally, fine-tuning AlexNet [15] by those incremental labeled samples obtained respectively from experts and the current model. Experts annotate those most uncertain and diverse samples while the current model annotates those relatively certain and diverse samples cooperatively.
2.2 Noise Robust Memorized Self Learning
Self-learning was used for extending training samples without any annotation cost in NRMSL-BMAL framework. The core challenge of self-learning is to mitigate noise samples. NRMSL is exactly the way to mitigate the problem, which containing two-stage.
The first stage is to reduce the generation of noise samples. Firstly, we design a structure for storing historical predictive information. Then, we use two constraints to filter samples. Those two constraints respectively are (1) the prediction results of latest m-times must be consistent and (2) the score of latest m-times must be within the preset range (score\(_{min} < \) score < score\(_{max}\)).
The second stage is to improve the anti-noise ability of the target model which is the core of NRMSL. Considering that the performance of the target model will be improved with the increase of the labeled samples, as shown in Fig. 2. We allow those labeled samples, which comes from self-learning procedure, to modify the label with a certain probability. Besides, inspired by [18], we used the formula 1 as polynomial confidence policy to elaborately cooperate with the target model in each iteration:
where \(C_0\) denotes the initial confidence, t denotes the current iteration of training, T denotes the total number of iterations. Subsequently, for each labeled sample from self-learning, a random value r between 0 and 1 will be generated to decide whether or not to adjust the label of this sample. If \(r > C_t\) then change the label of this sample as \(\tilde{y}\), where \(\tilde{y}\) is the predicted result of the current model, otherwise do nothing. Note that, as the iteration t increases, more and more labeled samples will be added into the training set, and the value of \(C_t\) decreases. Therefore, as shown in Fig. 2, the further the iteration, the performance of the target model will get better, and the probability of adjusting the label will be higher.
2.3 CAE Cluster Based BMAL
As we can see from the 2nd pentagon of Fig. 1, for probabilistic classification models, we used entropy to measure the uncertainty of samples, and the entropy score of samples are defined as:
where C is the number of categories, \(p_i\) is the predicted probability value of the corresponding sample. Uncertainty measure captures the informativeness of samples without true labels. For example, as a binary classification task, the predicted probability value of sample\(_1\) and sample\(_2\) respectively are {\(p_{11}\) = 0.5, \(p_{12}\) = 0.5} and {\(p_{21}\) = 0.1, \(p_{22}\) = 0.9}. Then the entropy score of sample\(_1\) and sample\(_2\) respectively are 1 and 0.47. So, sample\(_1\) with higher scores will be the prioritized target sample.
However, as shown in Fig. 3(b), it would generate much redundant information when we need to query more than one sample in an iteration. In other words, the information of those samples provide for the model is similar. Therefore, as shown in Fig. 3(c), the clustering algorithm is used for querying those representative samples.
As mentioned, the core challenge of BMAL is to reduce duplicate information of queried samples. CAE cluster, as shown in the 1st pentagon from Fig. 1, plays a role both in querying diverse samples and being able to deal with more complex images. We design a CAE network which encoder and decoder are composed of convolutional, deconvolutional, fully connected and leaky ReLU layers. The main loss is added by reconstruction loss and clustering loss, as detailed in [7, 8, 26]. CAE cluster is pre-trained in an unsupervised way with unlabeled sample pool and fine-tuned with a subset selected by uncertainty strategy. Subsequently, we apply the CAE cluster not only in BMAL procedure but also in NRMSL procedure, as shown in the red rectangles and green triangles respectively of the 3rd pentagon from Fig. 1. Finally, we try to select these diverse samples from each cluster. Besides, in many real-world applications, the number of querying samples is usually larger than clusters. In other words, we need more than one sample from each cluster so that we use the cosine distance to measure the similarity between different samples, as shown in these solid lines circles and triangles with letter “1” and “0” from Fig. 1. Moreover, we can deal with more complex images owing to the parameters sharing of convolution, such as single-multi packing dataset which discussed in Sect. 3.1.
The overall NRMSL-BMAL algorithm is given in Algorithm 1.
3 Simulation Experimental Results
In this section, we performed a thorough experimental evaluation in image classification tasks, including five different datasets to evaluate the effectiveness of the proposed method and framework. We keep the same parameters to train with all labeled samples and marked the best validation accuracy as the target accuracy of NRMSL-BMAL. PyTorch is used to develop our framework, with support of 1080Ti GPU and Ubuntu system. We repeat the experiment 5 times and record the mean value as the final evaluation result for each dataset, where a random 15%~20% and 10% of annotated samples is divided into validation set and a test set respectively.
3.1 Dataset
NRMSL-BMAL is available for both binary classification and multi-classification task. We evaluate on two multi-classification datasets including (1) MNIST [16]: handwritten digital images with \(28\times 28\) pixels, (2) CIFAR10 [14]: real object images with \(32\times 32\times 3\) pixels, and three binary classification datasets including (1) Open Access Series of Imaging Studies (OASIS): it is a project aimed at making neuroimaging data sets of the brain freely available to the scientific community and a subset was available in [11] that we used in this paper, (2) Dog-Cat: it is a competition of Kaggle, (3) Image attribute from Tmall: it is a binary classification task with attributes including single-packing and multi-packing. And we completed the experiment as an intern in Tmall. Particularly, cause of the real-world demand (single-multi packing classification), two annotation experts have participated in our experimental procedure.
3.2 Full Training with AlexNet
We train AlexNet with all labeled samples for each dataset respectively. It is necessary to make clear that our uppermost goal is to reduce the cost of annotation while maintaining the performance of the target model. Therefore, as shown in Table 2, we set the best validation accuracy of full training as the target accuracy. Besides, relevant parameters are optimizer = Adam, pre-trained dataset = ImageNet, classifier learning \(\mathrm{rate}\,=\,1\mathrm{e}-3\), feature learning rate = 1e−4, batch size = 32.
3.3 Annotation Cost of NRMSL-BMAL
NRMSL: We compare our NRMSL method with the pure SL method. As shown in Fig. 4, the cause of noisy labels and the uninformative labeled samples, SL barely improved the performance of the target model after several iterations. On the contrary, owing to the memorized information and noisy label self-adjusting method, NRMSL relatively improved the accuracy of the target model. Especially in MNIST and Dog-Cat dataset.
BMAL: Our purpose in this part is to verify the effectiveness of applying CAE cluster method into BMAL. As shown in Fig. 4, considering the duplicate information of samples, CAE cluster based BMAL outperforms one strategy based BMAL. And both of these two BMAL methods significantly reduced annotation cost compared with random strategy.
NRMSL-BMAL: As shown in Fig. 4, NRMSL-BMAL is significantly superior to random strategy. Because of those extended training samples and noisy label adjusted from NRMSL, NRMSL-BMAL outperforms CAE cluster based BMAL. On the single-multi set from TMall, NRMSL-BMAL is not only significantly reduced the cost of annotation, but also improve the performance of the target model (the final validation accuracy is higher than target validation accuracy).
3.4 The Representativeness of Samples Queried by NRMSL-BMAL
To verify the representativeness of those labeled samples by experts (\(L_1\) in Algorithm 1), the statistical accuracy of remaining unselected samples (\(U/L_1\)) was evaluated according to the final model in Table 3. As it shows, the accuracy of remaining unselected samples is almost close to 100%, which means those selected samples could represent those unselected samples. In other words, those unselected samples are hardly enough to improve the target model, so that we can reduce the annotation cost of these samples. Besides, the unlabeled sample pool of OASIS is smaller than others, which is the main reason why its much lower saved rate. NRMSL-BMAL could be more effective in a larger scale unlabeled sample pool.
4 Conclusion
We have designed and implemented NRMSL-BAML framework to remit the critical problem: how could we take full advantage of limited resources to minimize the need for human annotation while maintaining the performance of the target model for image classification. It naturally combined NRMSL and CAE cluster based BMAL method. We have evaluated in five different image classification datasets, demonstrating that NRMSL-BMAL could reduce annotation cost range from 44% to 95% for different dataset mentioned in Sect. 3.1. It is worth noting that NRMSL-BMAL could prevent partly self-learning procedure from skewing the target model on account of the high quality labeled samples from experts and noisy label self-adjusting method. Moreover, we are eager to apply it to more real-world applications, such as single-multi packing classification from Tmall. However, it could not be neglected that the time complexity of NRMSL-BAML still needs to be improved, because of the CAE cluster procedure and a large number of training epoches. Therefore, the tradeoff between time complexity and annotation cost is also a critical problem in future works.
References
Campbell, C., Cristianini, N., Smola, A., et al.: Query learning with large margin classifiers. In: Proceedings of International Conference on Machine Learning, pp. 111–118 (2000)
Cardoso, T.N.C., Silva, R.M., Canuto, S., Moro, M.M., Gonçalves, M.A.: Ranked batch-mode active learning. Inf. Sci. 379, 313–337 (2017)
Chiu, S.C., Jin, Z., Gu, Y.: Active learning combining uncertainty and diversity for multi-class image classification. IET Comput. Vis. 9(3), 400–407 (2015)
Ding, Y., Wang, L., Fan, D., Gong, B.: A semi-supervised two-stage approach to learning from noisy labels. arXiv preprint arXiv:1802.02679 (2018)
Freund, Y., Seung, H.S., Shamir, E., Tishby, N.: Selective sampling using the query by committee algorithm. Mach. Learn. 28(2–3), 133–168 (1997)
Gu, N., Fan, M., Meng, D.: Robust semi-supervised classification for noisy labels based on self-paced learning. IEEE Sig. Process. Lett. 23(12), 1806–1810 (2016)
Guo, X., Gao, L., Liu, X., Yin, J.: Improved deep embedded clustering with local structure preservation. In: International Joint Conference on Artificial Intelligence, pp. 1753–1759 (2017)
Guo, X., Liu, X., Zhu, E., Yin, J.: Deep clustering with convolutional autoencoders. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, E.S. (eds.) Neural Information Processing. ICONIP 2017, vol. 10635, pp. 373–382. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70096-0_39
Guo, Y., Schuurmans, D.: Discriminative batch mode active learning. In: Proceedings of the International Conference on Neural Information Processing Systems, NIPS 2007, pp. 593–600. Curran Associates Inc., USA (2007)
Han, W., et al.: Semi-supervised active learning for sound classification in hybrid learning environments. PLoS ONE 11(9), e0162075 (2016)
Hon, M., Khan, N.M.: Towards alzheimer’s disease classification through transfer learning. In: International Conference on Bioinformatics and Biomedicine, pp. 1166–1169 (2017)
Jindal, I., Nokleby, M., Chen, X.: Learning deep networks from noisy labels with dropout regularization. In: IEEE International Conference on Data Mining, pp. 967–972. IEEE (2016)
Joshi, A.J., Porikli, F., Papanikolopoulos, N.: Multi-class active learning for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2372–2379 (2009)
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical Report (2009)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the International Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Li, X., Guo, Y.: Adaptive active learning for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 859–866 (2013)
Liu, X., Li, S., Kan, M., Shan, S., Chen, X.: Self-error-correcting convolutional neural network for learning with noisy labels. In: IEEE International Conference on Automatic Face & Gesture Recognition, pp. 111–117 (2017)
McCallumzy, A.K., Nigamy, K.: Employing EM and pool-based active learning for text classification. In: Proceedings of International Conference on Machine Learning, pp. 359–367 (1998)
Patra, S., Bruzzone, L.: A batch-mode active learning technique based on multiple uncertainty for SVM classifier. IEEE Geosc. Remote Sens. Lett. 9(3), 497–501 (2012)
Patra, S., Bruzzone, L.: A cluster-assumption based batch mode active learning technique. Pattern Recogn. Lett. 33(9), 1042–1048 (2012)
Patrini, G., Rozza, A., Menon, A.K., Nock, R., Qu, L.: Making deep neural networks robust to label noise: a loss correction approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2233–2241 (2017)
Rong, C., Cao, Y.F., Hong, S.: Multi-class image classification with active learning and semi-supervised learning. Acta Automatica Sinica 37(8), 954–962 (2011)
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Settles, B.: Active learning. Synth. Lect. Artif. Intell. Mach. Learn. 6(1), 1–114 (2012)
Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: Proceedings of International Conference on Machine Learning, pp. 478–487 (2016)
Xiong, S., Azimi, J., Fern, X.Z.: Active learning of constraints for semi-supervised clustering. IEEE Trans. Knowl. Data Eng. 26(1), 43–54 (2014)
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of Annual Meeting of the Association for Computational Linguistics, pp. 189–196 (1995)
Zhou, Z., Shin, J., Zhang, L., Gurudu, S., Gotway, M., Liang, J.: Fine-tuning convolutional neural networks for biomedical image analysis: Actively and incrementally. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4761–4772 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Hou, C., Yang, C., Ren, F., Lin, R. (2019). A Noise Robust Batch Mode Semi-supervised and Active Learning Framework for Image Classification. In: Zhao, Y., Barnes, N., Chen, B., Westermann, R., Kong, X., Lin, C. (eds) Image and Graphics. ICIG 2019. Lecture Notes in Computer Science(), vol 11901. Springer, Cham. https://doi.org/10.1007/978-3-030-34120-6_44
Download citation
DOI: https://doi.org/10.1007/978-3-030-34120-6_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34119-0
Online ISBN: 978-3-030-34120-6
eBook Packages: Computer ScienceComputer Science (R0)