Keywords

1 Introduction

Considering the explosion of digital images in the real world, it is necessary to collect, classify and organize them in a simple, fast and efficient way. In order to use these increasing images as labeled data, automatic image annotation [28] is proposed by establishing statistical models, which can significantly reduce the labor cost of manually annotating images. However, statistical models, which need a large amount of labeled training samples, are not applicable for the case with a quite limited labeled data. How to build an accurate classification model with limited labeled samples for multi-class classification is still an open question.

Semi-Supervised learning (SSL) [1,2,3,4, 12] are potential solutions to the problem with a quite limited labeled data. SSL utilizes unlabeled samples to enhance the generalization ability of supervised learning. Classical SSL algorithms include Co-Training [2], graph-based semi-supervised learning [3], semi-supervised support vector machines (S3VM) [4] and semi-supervised dictionary learning (SSDL) [5,6,7,8,9,10, 12]. Recently promising performance has been achieved by jointly learning a dictionary based classifier and the class estimation of unlabeled data. However, it has been pointed by [11] that directly using unlabeled samples may significantly reduce classification performance when there are large amounts of noisy samples and outliers in the unlabeled data.

In order to effectively adopt the unlabeled training samples, which disturb semi-supervised learning methods due to their noise and variations, active learning (AL) [11, 29, 30] methods attract much attention recently. AL trains the model in an interactive way, which is capable of selecting the representative data based on the classification model learned in different iterations. However, the performance of AL quite depends on the effectiveness of the initial classifier.

Semi-supervised learning and active learning are not perfect alone but complementary to each other together. The classifier obtained by SSL, which takes both the labeled and unlabeled samples into account, can act as a good initial classifier; the introduction of AL can eliminate the problem of the model performance reduction caused due to the presence of a large number of noise samples and outliers in the unlabeled samples. Meanwhile, the introduction of AL can also gradually get labeled samples from the unlabeled data set for training without the need to prepare the required large-scale labeled datasets at the beginning. Several methods have been developed to study how to effectively combine SSL and AL. Song et al. [13] proposed an active learning method based on co-training in video annotation. Jiang et al. [14] developed a graph-based SSL method for video concept detection and used active learning to select data-concept pairs for human annotation. Although these combinations have improved the performance, the recently developed powerful semi-supervised dictionary learning (SSDL) models are not well exploited and how to jointly integrate SSDL and AL is still an open question.

In order to solve above issues, in this paper we proposed a novel framework of semi-supervised dictionary active learning (SSDAL) to effectively integrate semi-supervised dictionary learning (SSDL) and active learning (AL). Initially, we use a handful of labeled samples and abundant unlabeled samples to train a SSDL model. Based on that, we introduce AL algorithm to select the informative samples to boost the training. Compared to the original SSDL model, it is not necessary to prepare all the labeled samples at the beginning. Compared with the simple AL algorithm, it has a great advantage in learning from less labeled data and more unlabeled data. The experimental results on the benchmark datasets clearly show the superior performance of the proposed

To summarize, the main contributions of our work are as follows:

  • A novel semi-supervised dictionary active learning (SSDAL) framework is proposed to integrate the advantages of SSDL and AL for the first time.

  • The representative unlabeled samples selected by AL and the unlabeled samples with confident class estimation are complementary to each other.

  • Experiments on the benchmark datasets are conducted, with remarkable performance reported.

The rest of the paper is organized as follows. Section 2 presents a brief review of related work. Section 3 overviews the pipeline of our framework, followed by a discussion of model formulation and optimization in Sect. 4. The experimental results are presented in Sect. 5. Section 6 concludes the paper.

2 Related Work

2.1 Semi-supervised Dictionary Learning

Owing to the impressive performance of sparse representation and dictionary learning [16, 17, 31,32,33,34], semi-supervised dictionary learning (SSDL) algorithms [5,6,7,8,9,10, 12] have been proposed recently.

Most of SSDL methods aim to learn a shared dictionary. Pham et al. [5] incorporated the reconstruction error of both the labeled and unlabeled data with sparsity constraint into a joint objective function. Zhang et al. [6] proposed an online semi-supervised dictionary learning model, in which the reconstruction error of both labeled data and unlabeled data, label consistency and the classification error were integrated into a joint model. Wang et al. [9] proposed a robust dictionary learning method by exploiting the global structure of all labeled and unlabeled data. In these semi-supervised dictionary methods mentioned above, the unlabeled training data is only used to learn a shared dictionary, ignoring to explore the discrimination hidden in the unlabeled data.

In order to utilize the class information of unlabeled data, Shrivastava et al. [7] learnt a class-specific semi-supervised dictionary with estimating the class possibility of unlabeled data. Wang et al. [10] proposed an adaptively unified semi-supervised dictionary learning model which integrated the reconstruction error of both the labeled data and unlabeled data, and classifier learning into a unified framework. Vu et al. [27] proposed a shared dictionary learning by grouping the unlabeled samples via using the coefficient-based relationship between the labeled and unlabeled samples. The methods above try to exploit the discrimination hidden in the unlabeled data. However, the class probability of unlabeled training samples is artificially designed but not derived from the objective function. And the powerful class specific representation ability cannot be used in the shared dictionary learning model.

Recently, Yang et al. [12] proposed a discriminative semi-supervised dictionary learning (DSSDL) method, which achieves superior performance by introducing a regularization of entropy and using an extended dictionary to explore the discrimination embedded in the unlabeled data. However, there are some representative samples (e.g., nearby the border of different classes), which cannot be correctly estimated by DSSDL, preventing the further improvement of DSSDL.

2.2 Active Learning

Active learning (AL) has been widely studied in [11, 29, 30] for its ability to reducing human labor. In the view of sampling strategy, active learning can be roughly divided into three categories [28]: (i) membership query synthesis, (ii) stream-based selective sampling, and (iii) pool-based sampling.

Membership query synthesis assumes that the system can interact with the surrounding environment, e.g., the annotator can be asked to determine the category of some samples and learn the unknown concepts. But the disadvantage of this method is that all unlabeled samples are labeled by the annotator without considering the actual distribution of samples. To solve this issue with a large scale of unlabeled data, stream-based selective sampling introduced. Although the stream-based selective strategy can solve the problems caused by direct query methods to some extent, it often needs to set a fixed threshold to measure the information content of the sample, thus lack the universality of different tasks. Moreover, because of the way it compares, the actual distribution of unlabeled data sets and the difference between the unlabeled data can not be obtained [28].

Pool-based sampling active learning is proposed to overcome the drawbacks above. Lewis et al. [29] solved this by proposing pool-based sampling, which compares the information of unlabeled samples, and then selects the sample with the highest amount of information to ask the annotator. Since the pool-based sampling strategy has inherited the previous two methods and overcome the shortcomings of the above two methods, it has become the most widely studied and used sampling strategy [29, 30]. It has also pointed out by Lin et al. [30] that the sample selection criterion is the another key in AL algorithm, and there exists many sample selection criteria including risk reduction, uncertainty, diversity and so on [28]. The criteria is typically defined according to the classification uncertainty of samples. Specifically, the samples of low classification confidence, together with other informative criteria like diversity, are generally treated as the candidates for model retraining. The accuracy of progressively selecting uncertain unlabeled sample depends on the recognition ability of the desired classifier, which needs to perform well in the case with limited labeled training data.

3 Semi-supervised Dictionary Active Learning

We propose a novel SSDL-based active learning framework which is composed of a SSDL model and an active learning algorithm. Figure 1 illustrates the overall framework. Initially, the training set includes a limited labeled samples and abundant unlabeled samples. Next, we use semi-supervised dictionary learning to train a dictionary, which is supposed to have a good representative ability with a small within-class variation but a bad interclass representative ability. Then we select the most informative sample through active learning technique to retrain the proposed model. For the most informative sample, we introduce a user to annotate it and add it into labeled data set for the next dictionary training until the model converges.

Fig. 1.
figure 1

Illustration of our proposed SSDAL framework. Firstly, the SSDL model is learned with quite limited labeled samples and all of the unlabeled samples. Secondly, we use AL algorithm to select the most informative samples iteratively from the unlabeled data set. Thirdly, we introduce a user to label those informative samples and add them into labeled data set to update the model with the new labeled samples and the rest of unlabeled data.

3.1 Model of SSDAL

As many prevailing semi-supervised dictionary learning models [5,6,7,8,9,10, 12], we focus on the case that the identity of unlabeled training data lies in the training set. In order to overcome the drawbacks of the prevailing semi-supervised learning (e.g., its performance will be worsened by the unlabeled noisy samples and outliers) and active learning (e.g., a powerful initial classifier is needed), we proposed a novel model of semi-supervised dictionary active learning to fully exploit the benefits of both of semi-supervised dictionary learning [12] and active learning.

Given data points set \( \varvec{A } = \left[ {\varvec{A}_{1} , \ldots ,\varvec{A}_{i} , \ldots ,\varvec{A}_{C} ,\varvec{B}} \right] \) where \( \varvec{A}_{\varvec{i}} \) denotes the \( i^{th} \)-class training data and each column of \( {\mathbf{A}}_{\varvec{i}} \) is a training sample while the remaining \( \varvec{B} = \left[ {\varvec{b}_{1} , \ldots ,\varvec{b}_{i} , \ldots ,\varvec{b}_{N} } \right] \) is the \( N \) unlabeled training samples from class 1 to \( C \). Let \( \varvec{D} = \left[ {\varvec{D}_{1} , \ldots ,\varvec{D}_{i} , \ldots ,\varvec{D}_{C} } \right]\varvec{ } \) denote the supervised dictionary initialized by \( \varvec{A} \), while \( \varvec{E} = \left[ {\varvec{E}_{1} , \ldots ,\varvec{E}_{i} , \ldots ,\varvec{E}_{C} } \right] \) is an extended dictionary that mainly explore the discrimination of unlabeled training data. Both \( \varvec{D}_{i} \) and \( \varvec{E}_{i} \) are associated to class \( \varvec{i} \), and they are required to well represent \( i^{th} \)-class data but with a bad representation ability for all the other classes. As \( \varvec{P}_{i,j} \) indicates the probabilistic relationship between the \( j^{th} \)-unlabeled training sample and \( \varvec{i}^{{\varvec{th}}} \)-class. The model of our proposed SSDAL framework is:

$$ \begin{aligned} & \mathop {\hbox{min} }\limits_{{\hat{\varvec{D}},\varvec{P},\varvec{X}}} \sum\nolimits_{{\varvec{i} = {\mathbf{1}}}}^{C} {\left( {\left\| {\varvec{A}_{i} - \hat{\varvec{D}}_{i} \varvec{X}_{i}^{i} } \right\|_{F}^{2} + \gamma \left\| {\varvec{X}_{i}^{i} } \right\|_{1} + \lambda \left\| {\varvec{X}_{i}^{i} - \varvec{M}_{i} } \right\|_{F}^{2} } \right)} \\ & + \sum\nolimits_{j = 1}^{N - L} {\left( {\sum\nolimits_{i = 1}^{C} {\varvec{P}_{i,j} \left\| {\varvec{b}_{j} - \hat{\varvec{D}}_{i} \varvec{y}_{j}^{i} } \right\|_{F}^{2} + \gamma \left\| {\varvec{y}_{j}^{i} } \right\|_{1} } } \right)} \\ & - \beta \left( { - \sum\nolimits_{j = 1}^{N - L} {\sum\nolimits_{i = 1}^{C} {\varvec{P}_{i,j} \log \varvec{P}_{i,j} } } } \right) \\ & s.t.\;semi\;supervised\;learning\;for\;confident\;estimation \\ & active\;learning\;for\;unconfident\;class\;estimation \\ \end{aligned} $$
(1)

where \( \varvec{X}_{i}^{i} \) and \( \varvec{y}_{j}^{i} \) are the coding coefficient matrix of \( \varvec{A}_{i} \) and unlabeled data \( \varvec{b}_{j} \) on the class-specific dictionary \( \hat{\varvec{D}}_{i} = \left[ {\varvec{D}_{i} \varvec{E}_{i} } \right] \), respectively.

The confidence of the estimated class possibility can be measured by the entropy

$$ H\left( {\varvec{b}_{i} } \right) = \varvec{ } - \mathop \sum \limits_{i = 1}^{C} \varvec{P}_{i,j} \log \varvec{P}_{i,j} \varvec{ } $$
(2)

The entropy value of Eq. (2) indicates the uncertainty of the class estimation. For instance, if the unlabeled data is definitely assigned to some class (e.g., \( \varvec{P}_{i,j} = 1 \) for some \( j \) when the sample is assigned to the \( i^{th} \) class, and \( \varvec{P}_{i,j} = 0 \) for ji), the entropy value will be zero.

3.2 Semi-supervised Dictionary Learning

When the class estimation is confident, the proposed SSDAL model changes to

$$ \begin{aligned} & \mathop {\hbox{min} }\limits_{{\hat{\varvec{D}},\varvec{P},\varvec{X}}} \sum\nolimits_{{\varvec{i} = 1}}^{C} {\left( {\left\| {\varvec{A}_{i} - \hat{\varvec{D}}_{i} \varvec{X}_{i}^{i} } \right\|_{F}^{2} + \gamma \left\| {\varvec{X}_{i}^{i} } \right\|_{1} + \lambda \left\| {\varvec{X}_{\varvec{i}}^{\varvec{i}} - \varvec{M}_{\varvec{i}} } \right\|_{\varvec{F}}^{2} } \right)} \\ & + \sum\nolimits_{j = 1}^{N - L} {\left( {\sum\nolimits_{i = 1}^{C} {\varvec{P}_{i,j} \left\| {\varvec{b}_{j} - \hat{\varvec{D}}_{i} \varvec{y}_{j}^{i} } \right\|_{F}^{2} + \gamma \left\| {\varvec{y}_{j}^{i} } \right\|_{1} } } \right)} \\ & - \beta \left( { - \sum\nolimits_{j = 1}^{N - L} {\sum\nolimits_{i = 1}^{C} {\varvec{P}_{i,j} \log \varvec{P}_{i,j} } } } \right) \\ \end{aligned} $$
(3)
$$ s.t.\varvec{H}\left( {\varvec{b}_{\varvec{j}} } \right) < T $$

where T is a threshold, which is usually set as 0.5. In the dictionary learning, we only use the unlabeled data whose entropy is smaller than the threshold, i.e., their class estimation is relatively confident.

3.3 Active Learning

Considering the combination of active learning, Let \( \hat{\varvec{D}} \) denote the output of \( \left[ {\varvec{D E}} \right] \) in Eq. (1). Set \( \varvec{L} \) as the number of labeled samples for active learning. In model’s iteration, we can get the probabilistic outputs \( \varvec{P} \) for all the unlabeled samples and a class-specific dictionary \( \hat{\varvec{D}} = \left[ {\varvec{D E}} \right] \). If we want to boost the performance of our model by acquiring some labeled examples, the main issue is how to select the most valuable examples to query the user for labels. Considering that the SSDL model can naturally provide the probabilistic outputs, which is convenient to measure the uncertainty of all unlabeled samples, we adopt the uncertainty measurement to select the most uncertain samples.

For the unlabeled data, there are C candidate classes. Therefore, the semi-supervised dictionary learning provides C classifiers. When multiple learners exist, a widely applied strategy is to select the samples that have the maximum disagreement amongst them. Here the disagreement of multiple learners can also be regarded as an uncertainty measure, and this strategy is categorized into the uncertainty criterion as well. Inspired by [15], we use the uncertainty estimation method that considers the posterior probabilities of the best and the second best predictions, that is,

$$ {\text{Uncertainty}}\left( \varvec{x} \right) = \varvec{P}\left( {\varvec{c}_{1} |\varvec{x}} \right) - \varvec{P}\left( {\varvec{c}_{2} |\varvec{x}} \right)\varvec{ } $$
(4)

where \( \varvec{c}_{1} \) and \( \varvec{c}_{2} \) are the classes with the largest and second largest posterior class probabilities, respectively. If their margin is small, it means that the model is more confused on the sample and thus it is with high uncertainty. We use Eq. (3) as the final sample selection strategy in the active learning.

3.4 Classification Model

We utilize different coding models when dealing with the testing sample, e.g., collaborative representation of Eq. (5) for face recognition and the large scale image classification, while local representation of Eq. (6) is used in digit recognition [12].

$$ Code\_Classify\left( {\varvec{b}_{j} ,\hat{\varvec{D}}} \right) = argmin_{{\varvec{y}_{j} }} \left\| {\varvec{b}_{j} - \hat{\varvec{D}}\varvec{y}_{j} } \right\|_{F}^{2} + \gamma \left\| {\varvec{y}_{j} } \right\|_{1} \varvec{ } $$
(5)
$$ Code\_Classify\left( {\varvec{b}_{j} ,\hat{\varvec{D}}} \right) = argmin_{{\varvec{y}_{j}^{i} }} \left\| {\varvec{b}_{j} - \hat{\varvec{D}}_{i} \varvec{y}_{j}^{i} } \right\|_{F}^{2} + \gamma \left\| {\varvec{y}_{j}^{i} } \right\|_{1} \varvec{ }\forall i\varvec{ } $$
(6)

where \( \varvec{y}_{j} = [\varvec{y}_{j}^{1} , \ldots ,\varvec{y}_{j}^{i} , \ldots ,\varvec{y}_{j}^{c} ] \) is the coding vector on the whole dictionary, \( \hat{\varvec{D}} = \left[ {\varvec{D E}} \right] \) is the learned structured dictionary associated with class \( i, \) and \( \varvec{y}_{j}^{i} \) is the coding vector associated to ith class of the \( j^{th} \) unlabeled data. Then the final classification is conducted by

$$ identity\left( \varvec{b} \right) = arg\,\mathop {min }\limits_{\varvec{i}} \left\{ {\varvec{e}_{\varvec{i}} } \right\} $$
(7)

where \( \varvec{e}_{i} = \left\| {\varvec{b} - \hat{\varvec{D}}_{i} \varvec{y}_{j}^{i} } \right\|_{2}^{2} \).

4 Optimization of SSDAL

The optimization of SSDAL is an alternative solving procedure, which includes the selection of unlabeled data and the semi-supervised dictionary learning of Eq. (3). And the semi-supervised dictionary learning can further be divided into two sub-problems by doing class estimation of unlabeled data and discriminative dictionary learning alternatively: updating P by fixing D, E and X, while updating D, E and X alternatively by fixing P [12]. These processes enable the model to converge.

Selection of Unlabeled Data.

With the class estimation of all unlabeled data, the ones with confident class estimation will be integrated into the model of discriminative semi-supervised dictionary learning.

For the unlabeled data with unconfident class estimation, we select the most informative samples from the rest of unlabeled data set iteratively via Eq. (4). Then, we introduce a user to label those informative samples and then add them into the annotated dataset.

Update P. By fixing the class-specific dictionary and the corresponding coding coefficient (e.g., D, E, X and y), and let \( \varvec{\varepsilon}_{j}^{i} = \left\| {\varvec{b}_{j} - \hat{\varvec{D}}_{i} \varvec{y}_{j}^{i} } \right\| \). The class probability of \( \varvec{j}^{{\varvec{th}}} \) unlabeled training sample is

$$ \varvec{P}_{i,j} = { \exp }\left\{ { -\varvec{\varepsilon}_{j}^{i} /\beta } \right\}/\sum\nolimits_{i = 1}^{C} {{ \exp }\left\{ { -\varvec{\varepsilon}_{j}^{i} /\beta } \right\}} $$
(8)

Update D, E and X. The unlabeled data, which are not included into the active learning or don’t have a confident estimation, their probability of class will be set as zero, i.e., \( \varvec{P}_{i,j} = 0 \). Then the proposed SSDAL changes to

$$ \begin{aligned} & \mathop {\hbox{min} }\limits_{{\hat{\varvec{D}},\varvec{X}}} \sum\nolimits_{i = 1}^{C} {\left( {\left\| {\varvec{A}_{i} - \hat{\varvec{D}}_{i} \varvec{X}_{i}^{i} } \right\|_{F}^{2} + \gamma \left\| {\varvec{X}_{i}^{i} } \right\|_{1} + \lambda \left\| {\varvec{X}_{i}^{i} - \varvec{M}_{i} } \right\|_{F}^{2} } \right)} \\ & + \sum\nolimits_{j = 1}^{N - L} {\left( {\sum\nolimits_{i = 1}^{C} {\varvec{P}_{i,j} \left\| {\varvec{b}_{j} - \hat{\varvec{D}}_{i} \varvec{y}_{j}^{i} } \right\|_{F}^{2} + \gamma \left\| {\varvec{y}_{j}^{i} } \right\|_{1} } } \right)} \\ \end{aligned} $$
(9)

which can efficiently solved by using the method in Yang et al. [12].

5 Experiments

In this section, extensive experiments were conducted over on the benchmark datasets, such as LFW [24], Web Vision 1.0 [25], USPS [22] and MNIST [23] to demonstrate the effectiveness of our proposed semi-supervised dictionary active learning (SSDAL). The competing methods include several representative supervised dictionary learning methods: SRC [18], FDDL [19], DKSVD [20], LCKSVD [26] and semi-supervised dictionary learning methods: JDL [5], OSSDL [6], S2D2 [7], SSRD [9], SSP-DL [21] and recently proposed DSSDL [12] algorithm. Here we don’t include deep learning related models because our base classifier is a dictionary learning related model and the number of labeled samples is too limited to train a good enough deep learning model. The coding of unlabeled training data and testing data in our proposed framework adopts the same coding representation.

The SSDL model used in our framework has three super parameters, \( \lambda \), \( \gamma \) and \( \beta . \) We set them as \( \lambda = 0.01 \), \( \gamma = 0.001 \), \( \beta = 0.01 \) in all experiments as same as [12].

We evaluate the performance of our proposed SSDAL in the classification accuracy with the same amount of user annotation totally. The classification accuracy is defined as the top one rate for digit recognition and face recognition, with an extra top-5 rate in Web Vision large-scale image classification task.

5.1 Datasets and Results

Face Identification.

Following the same experimental setting in [10], we estimate our proposed framework in the LFW database [24], which is a large-scale database consists of 4,174 face images of 143 individuals taken under varying pose, expression, illumination, misalignment and occlusion conditions. Each individual has no less than 11 images and we select the first 10 samples for training data with the remaining samples for testing. We randomly select 2 samples from each class as the initial labeled data, then we set 5 times of user-query iteration, which makes the final amount of labeled data as same as other methods. As shown in Fig. 2, the data is divided into 3 parts, the data not used, the training data, and the test data.

Fig. 2.
figure 2

Illustration of how the data is divided. In this experiment, firstly, the data is randomly divided into 3 parts during the whole training process. Secondly, for training data, we randomly select 2 of them as the initial labeled data (i.e., orange frame) and the rest as unlabeled data. Then, we gradually add the labeled data (i.e., green frame) from the rest of the unlabeled data(i.e., red frame) via AL algorithm to boost our model. After all, we use testing data to test our model. (Color figure online)

We use the same feature in [12] which reduces the feature vectors to 500 dimension. Table 1 lists the identification results of the LFW database, which show clearly that our proposed method achieves the highest recognition rates with the same amount of labeled data among the competing schemes. Compare to DSSDL, the improvement of the performance stems from the integration of active learning algorithm, which can select the most informative samples and no need to get all the labeled data ready.

Table 1. The recognition rates (%) on LFW database.

Digit Recognition.

Use the same experimental setting in [12], we evaluate the performance on both the USPS dataset [22] and MNIST dataset [23]. In the USPS dataset, there are 9,298 digital images consisting of 10 classes. We randomly select 110 images from each class and then randomly select 2 images as the labeled samples for the initial dictionary training, 58 images as the unlabeled samples and the left as the testing samples. For MNIST dataset, there are 10 classes and 70,000 handwritten digital images totally, 60,000 for training and 10,000 for testing respectively. But we randomly select 200 samples from each class then we randomly select 2 images each class as the labeled samples for the initial dictionary training, 98 images as the unlabeled, and 100 images as the testing samples. The feature we used is the whole image, which was normalized to have unit \( l_{2} \)-norm. We set 18 times user-query iteration, which with 10 labels updated in each iteration. This makes the final labeled data amount as same as other methods, which use 20 labeled images per class for training.

All relevant results for ten independent tests are listed in Table 2, which calculates the mean accuracy and standard deviation. It can be seen that the proposed SSDAL is able to find the informative samples from the unlabeled dataset for next round training and can then utilize information of the selected unlabeled data to improve the classification accuracy. Compare to all the competing methods, our proposed SSDAL achieves the best performance.

Table 2. The recognition rates (%) on USPS and MNIST

Web Vision Database 1.0.

Web Vision database 1.0 [25] is larger than all the database we evaluated. We use a subset with the same number of classes (i.e., 1,000 classes) as the dataset, which contains 50 samples in each class. For each class, we randomly set 30 samples for train and 20 samples for test. From the training set, we select the first 5 samples as the initial labeled data. Next we set 8 times of user-query iteration. This makes it 13 labeled samples for each class finally.

We extract feature as same as [25] then we reduced it to 300 dimension. The top-1 result and top-5 result of the proposed SSDAL and two most competing methods, such as the supervised LCKSVD and the semi-supervised DSSDL. The results of all methods are listed in Table 3, from which we can observed that the improvements of SSDAL over DSSDL are 1.3% in Top-1 accuracy and 2.7% in Top-5 accuracy. Compared to LCKSVD, the advantages of SSDAL is larger.

Table 3. The recognition rates (%) on web Vision sub-database.

6 Conclusions

In this paper, we proposed a new model of semi-supervised dictionary active learning (SSDAL), which integrates the state-of-the-art semi-supervised dictionary learning and active learning for the first time. Based on the proposed criterion which based on the estimated class possibility, the unlabeled data with confident class estimation and the representative information are returned into the training of SSDAL. Extensive experiments have shown the superior performance of our proposed framework.