Keywords

1 Introduction

Image classification is a challenging task in computer vision and pattern recognition with a long history. Significantly, convolutional neural networks (CNNs), which is one of the most successful deep learning models, have achieved groundbreaking results owing to massive labeled samples and computational power in the recent few years. Unfortunately, in many real-world applications, there has been far from adequate existing labeled samples for training [25]. Moreover, especially in medical image analysis, annotation is a tedious, time consuming and costly task while a strong need for specialty-oriented knowledge and skillful annotation expert [29]. Therefore, the motivation of this paper is how to take full advantage of limited resources to minimize the need for human annotation while maintaining or even slightly improving the performance of the model.

Semi-supervised learning (SSL) is a kind of method that attempts to exploit the availability of unlabeled samples for the performance improvement of the model. Self-training [28] (we called “self-learning, SL”) is a classic SSL algorithm. Firstly, SL starts with a model that pre-trained on a smaller set from target domain labeled samples. Subsequently, SL attempts to add those relatively certain samples and its respective labels to training set from unlabeled sample pool. The model is re-trained with the incremental training set until reach the mark. Obviously, there is a drawback of SL that noisy labels shall seriously degenerate the performance by itself. Moreover, most learning algorithms, including deep learning, are sensitive to noisy labels. To remit the noisy label issue, [6] proposed a self-paced manifold regularization framework which was inspired by the learning principle of human. And [4] designed a two-stage approach to train AlexNet and ResNet respectively from noisy labels in a semi-supervised manner without any prior knowledge or distribution of noisy labels. Furthermore, lots of existing methods [12, 18, 22] were proposed to address noisy labels.

In this paper, we propose a noise robust memorized self-learning algorithm named NRMSL. Firstly, we attempt to memorize the prediction information of the model in each iteration, and apply it into SSL procedure for reducing noisy labels. Subsequently, we apply a self-error-correcting method, which is inspired by [18], to improve the anti-noise ability of the target model.

Alternatively, active learning (AL) is another similar field which could greatly improve the performance of the target model with fewer labeled samples thanks to some heuristic strategies, such as uncertainty [1, 5, 13]. There are three classical settings [25], including membership query synthesis, stream-based sampling, and pool-based sampling. We focus on pool-based sampling which is the most popular setting recently years. However, in real-world applications, parallel labeling environment may be available that means single mode AL queries in serial may be inefficient. Fortunately, batch mode active learning (BMAL)[2, 25] is exactly the way to remit the parallel labeling problem by selecting more than one sample in each iteration. We consider overlap in information content among those most informative samples through combining uncertainty strategy and convolutional autoencoder (CAE) cluster. Furthermore, based on a mass of researches, we have summarized various related work and listed them in Table 1.

Table 1. Various related researches about BMAL and SSL-AL in recent years.

Obviously, there is an essential difference between SL and AL. SL directly obtains label through model while AL querying for label by annotation experts who also called oracle. In other words, AL assurances annotated samples by experts, leading to notable performance improvements, but costly. On the contrary, SL obtains label without the intervention of human annotators while along with noisy labels. A tandem certainty-based AL and self-learning method was proposed in [10], considering both the pool-based scenario and the stream-based scenario. In addition, [10] have summarized overview of previous work combining AL and SSL. [19] is the first work exploring combinations of AL and SL algorithm in 1998. However, those work mentioned above just skillfully combined AL and SL seldom considering noisy labels. According to these important observations, we propose a framework named NRMSL-BMAL to combine NRMSL and BMAL naturally.

Contributions of this paper can be summarized as follows. (1) We propose a memorized self-learning (MSL) algorithm to reduce noisy labels owing to the memorized information of historical model’s prediction. Furthermore, to reduce the impact of noise labels, we apply a noisy label self-adjusting method in an effective way to MSL, named NRMSL. (2) We propose a framework for taking full advantage of limited resources that interactively integrated NRMSL, CAE cluster based BMAL and other technologies, such as transfer learning and continuously fine-tuning. (3) We perform a thorough experimental evaluation in image classification task and show that NRMSL-BMAL can reduce annotation cost range from 44% to 95%, using datasets from different domains as detailed in Sect. 3.1. (4) We verify that these samples queried by NRMSL-BMAL are representative.

2 Proposed Framework

2.1 NRMSL-BMAL Framework

Fig. 1.
figure 1

Diagram of NRMSL-BMAL framework combines some deep learning components cooperatively. To simplify the discussion, this diagram was designed for binary classification, although it is also available for multi-classification. The pentagon shape shows the core procedures. These dotted circles and ellipses of 2nd and 3rd pentagon mark different clusters while these solid lines circles and triangles represent different categories, and the heavy line is the classifier. Besides, for example, these solid lines circles and triangles with letter “1” and “0” represent the selected samples for BMAL and NRMSL respectively.

NRMSL-BMAL framework includes the following four cores. (1) Transfer learning: NRMSL-BMAL initializes with a pre-trained model that source domain can be changed flexibly, and ImageNet [24] was used in this paper. (2) Then, CAE cluster based BMAL method is used for querying those most uncertain and diverse samples from unlabeled sample pool. (3) NRMSL algorithm is successively used for extending training samples without any annotation cost. (4) Fine-tuning the target model continuously by these incremental labeled samples. Finally, these four parts were synergistically integrated into a framework.

In another point of view, as we can see from Fig. 1, directly perceived through the senses of NRMSL-BMAL framework. It works in an incremental way, improving the performance of the target model with increasing labeled samples. In each iteration, firstly, we calculate the corresponding prediction probability value from the softmax layer of each unlabeled sample without real labels. Secondly, an uncertainty indicator, such as entropy, BvSB [13], least confidence, and smallest margin [25], will be calculated according to these probability value. In this paper, we use entropy for binary classification and BvSB for multi-classification. Thirdly, CAE cluster is the next step to gather similar samples into a pre-defined set of clusters as shown in the 3rd pentagon of Fig. 1. Furthermore, we innovatively apply a self noisy label adjusting method into NRMSL procedure as detailed in [18], using a self-error-correcting softmax loss to adaptively switch between the noisy label or the max-activated neuron and a Bernoulli distribution to decide whether to select the max-activated label (see details in Sect. 2.2). Finally, fine-tuning AlexNet [15] by those incremental labeled samples obtained respectively from experts and the current model. Experts annotate those most uncertain and diverse samples while the current model annotates those relatively certain and diverse samples cooperatively.

2.2 Noise Robust Memorized Self Learning

Self-learning was used for extending training samples without any annotation cost in NRMSL-BMAL framework. The core challenge of self-learning is to mitigate noise samples. NRMSL is exactly the way to mitigate the problem, which containing two-stage.

The first stage is to reduce the generation of noise samples. Firstly, we design a structure for storing historical predictive information. Then, we use two constraints to filter samples. Those two constraints respectively are (1) the prediction results of latest m-times must be consistent and (2) the score of latest m-times must be within the preset range (score\(_{min} < \) score < score\(_{max}\)).

Fig. 2.
figure 2

Diagram of the noisy label adjust. These processed samples are from the self-learning procedure.

The second stage is to improve the anti-noise ability of the target model which is the core of NRMSL. Considering that the performance of the target model will be improved with the increase of the labeled samples, as shown in Fig. 2. We allow those labeled samples, which comes from self-learning procedure, to modify the label with a certain probability. Besides, inspired by [18], we used the formula 1 as polynomial confidence policy to elaborately cooperate with the target model in each iteration:

$$\begin{aligned} C_t = C_0 * \left( 1-\frac{t}{T} \right) ^\lambda , \end{aligned}$$
(1)

where \(C_0\) denotes the initial confidence, t denotes the current iteration of training, T denotes the total number of iterations. Subsequently, for each labeled sample from self-learning, a random value r between 0 and 1 will be generated to decide whether or not to adjust the label of this sample. If \(r > C_t\) then change the label of this sample as \(\tilde{y}\), where \(\tilde{y}\) is the predicted result of the current model, otherwise do nothing. Note that, as the iteration t increases, more and more labeled samples will be added into the training set, and the value of \(C_t\) decreases. Therefore, as shown in Fig. 2, the further the iteration, the performance of the target model will get better, and the probability of adjusting the label will be higher.

2.3 CAE Cluster Based BMAL

As we can see from the 2nd pentagon of Fig. 1, for probabilistic classification models, we used entropy to measure the uncertainty of samples, and the entropy score of samples are defined as:

$$\begin{aligned} EntropyScore = -\sum _{i=1}^{C}p_i \times log(p_i), \end{aligned}$$
(2)

where C is the number of categories, \(p_i\) is the predicted probability value of the corresponding sample. Uncertainty measure captures the informativeness of samples without true labels. For example, as a binary classification task, the predicted probability value of sample\(_1\) and sample\(_2\) respectively are {\(p_{11}\) = 0.5, \(p_{12}\) = 0.5} and {\(p_{21}\) = 0.1, \(p_{22}\) = 0.9}. Then the entropy score of sample\(_1\) and sample\(_2\) respectively are 1 and 0.47. So, sample\(_1\) with higher scores will be the prioritized target sample.

However, as shown in Fig. 3(b), it would generate much redundant information when we need to query more than one sample in an iteration. In other words, the information of those samples provide for the model is similar. Therefore, as shown in Fig. 3(c), the clustering algorithm is used for querying those representative samples.

Fig. 3.
figure 3

Diagram of batch mode active learning: combining uncertainty strategy and cluster. The closer the sample is to the classifier, the higher the uncertainty will be. Distance between samples reflects the similarity.

As mentioned, the core challenge of BMAL is to reduce duplicate information of queried samples. CAE cluster, as shown in the 1st pentagon from Fig. 1, plays a role both in querying diverse samples and being able to deal with more complex images. We design a CAE network which encoder and decoder are composed of convolutional, deconvolutional, fully connected and leaky ReLU layers. The main loss is added by reconstruction loss and clustering loss, as detailed in [7, 8, 26]. CAE cluster is pre-trained in an unsupervised way with unlabeled sample pool and fine-tuned with a subset selected by uncertainty strategy. Subsequently, we apply the CAE cluster not only in BMAL procedure but also in NRMSL procedure, as shown in the red rectangles and green triangles respectively of the 3rd pentagon from Fig. 1. Finally, we try to select these diverse samples from each cluster. Besides, in many real-world applications, the number of querying samples is usually larger than clusters. In other words, we need more than one sample from each cluster so that we use the cosine distance to measure the similarity between different samples, as shown in these solid lines circles and triangles with letter “1” and “0” from Fig. 1. Moreover, we can deal with more complex images owing to the parameters sharing of convolution, such as single-multi packing dataset which discussed in Sect. 3.1.

The overall NRMSL-BMAL algorithm is given in Algorithm 1.

figure a

3 Simulation Experimental Results

In this section, we performed a thorough experimental evaluation in image classification tasks, including five different datasets to evaluate the effectiveness of the proposed method and framework. We keep the same parameters to train with all labeled samples and marked the best validation accuracy as the target accuracy of NRMSL-BMAL. PyTorch is used to develop our framework, with support of 1080Ti GPU and Ubuntu system. We repeat the experiment 5 times and record the mean value as the final evaluation result for each dataset, where a random 15%~20% and 10% of annotated samples is divided into validation set and a test set respectively.

3.1 Dataset

NRMSL-BMAL is available for both binary classification and multi-classification task. We evaluate on two multi-classification datasets including (1) MNIST [16]: handwritten digital images with \(28\times 28\) pixels, (2) CIFAR10 [14]: real object images with \(32\times 32\times 3\) pixels, and three binary classification datasets including (1) Open Access Series of Imaging Studies (OASIS): it is a project aimed at making neuroimaging data sets of the brain freely available to the scientific community and a subset was available in [11] that we used in this paper, (2) Dog-Cat: it is a competition of Kaggle, (3) Image attribute from Tmall: it is a binary classification task with attributes including single-packing and multi-packing. And we completed the experiment as an intern in Tmall. Particularly, cause of the real-world demand (single-multi packing classification), two annotation experts have participated in our experimental procedure.

3.2 Full Training with AlexNet

We train AlexNet with all labeled samples for each dataset respectively. It is necessary to make clear that our uppermost goal is to reduce the cost of annotation while maintaining the performance of the target model. Therefore, as shown in Table 2, we set the best validation accuracy of full training as the target accuracy. Besides, relevant parameters are optimizer = Adam, pre-trained dataset = ImageNet, classifier learning \(\mathrm{rate}\,=\,1\mathrm{e}-3\), feature learning rate = 1e−4, batch size = 32.

Table 2. Result of full training with AlexNet. Abbreviations are Tra: Training, val: Validation, num: number, acc: accuracy.

3.3 Annotation Cost of NRMSL-BMAL

Fig. 4.
figure 4

Experiments on random strategy, pure self earning, NRMSL, one strategy based BMAL, CAE cluster based BMAL and NRMSL-BAML. The abscissa and the ordinate indicate the number of iterations and the validation accuracy respectively

NRMSL: We compare our NRMSL method with the pure SL method. As shown in Fig. 4, the cause of noisy labels and the uninformative labeled samples, SL barely improved the performance of the target model after several iterations. On the contrary, owing to the memorized information and noisy label self-adjusting method, NRMSL relatively improved the accuracy of the target model. Especially in MNIST and Dog-Cat dataset.

BMAL: Our purpose in this part is to verify the effectiveness of applying CAE cluster method into BMAL. As shown in Fig. 4, considering the duplicate information of samples, CAE cluster based BMAL outperforms one strategy based BMAL. And both of these two BMAL methods significantly reduced annotation cost compared with random strategy.

NRMSL-BMAL: As shown in Fig. 4, NRMSL-BMAL is significantly superior to random strategy. Because of those extended training samples and noisy label adjusted from NRMSL, NRMSL-BMAL outperforms CAE cluster based BMAL. On the single-multi set from TMall, NRMSL-BMAL is not only significantly reduced the cost of annotation, but also improve the performance of the target model (the final validation accuracy is higher than target validation accuracy).

3.4 The Representativeness of Samples Queried by NRMSL-BMAL

To verify the representativeness of those labeled samples by experts (\(L_1\) in Algorithm 1), the statistical accuracy of remaining unselected samples (\(U/L_1\)) was evaluated according to the final model in Table 3. As it shows, the accuracy of remaining unselected samples is almost close to 100%, which means those selected samples could represent those unselected samples. In other words, those unselected samples are hardly enough to improve the target model, so that we can reduce the annotation cost of these samples. Besides, the unlabeled sample pool of OASIS is smaller than others, which is the main reason why its much lower saved rate. NRMSL-BMAL could be more effective in a larger scale unlabeled sample pool.

Table 3. Experiments on the representativeness of selected samples. Expert means the number of labeled samples from experts, Saved Rate is equal to one minus “Expert” divided by “the number of all full training samples”, Remaining accuracy means the predicted accuracy of those samples that are not annotated by experts.

4 Conclusion

We have designed and implemented NRMSL-BAML framework to remit the critical problem: how could we take full advantage of limited resources to minimize the need for human annotation while maintaining the performance of the target model for image classification. It naturally combined NRMSL and CAE cluster based BMAL method. We have evaluated in five different image classification datasets, demonstrating that NRMSL-BMAL could reduce annotation cost range from 44% to 95% for different dataset mentioned in Sect. 3.1. It is worth noting that NRMSL-BMAL could prevent partly self-learning procedure from skewing the target model on account of the high quality labeled samples from experts and noisy label self-adjusting method. Moreover, we are eager to apply it to more real-world applications, such as single-multi packing classification from Tmall. However, it could not be neglected that the time complexity of NRMSL-BAML still needs to be improved, because of the CAE cluster procedure and a large number of training epoches. Therefore, the tradeoff between time complexity and annotation cost is also a critical problem in future works.