Semi-supervised Dictionary Active Learning for Pattern Classification

Zhong, Qin; Yang, Meng; Zhang, Tiancheng

doi:10.1007/978-3-030-03338-5_47

Qin Zhong^20,21,
Meng Yang²⁰ &
Tiancheng Zhang²¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11258))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

2009 Accesses

Abstract

Gathering labeled data is one of the most time-consuming and expensive tasks in supervised machine learning. In practical applications, there are usually quite limited labeled training samples but abundant unlabeled data that is easy to collect. Semi-supervised learning and active learning are two important techniques for learning a discriminative classification model when labeled data is scarce. However, unlabeled data with significant noises and outliers cannot be well exploited and usually worsen the performance of semi-supervised learning and the performance of active learning also needs a powerful initial classifier learned from the quite limited labeled training data. In order to solve the above issues, in this paper we proposed a novel model of semi-supervised dictionary active learning (SSDAL), which aims to integrate semi-supervised learning and active learning to effectively use all the training data. In particular, two criterions based on estimated class possibility are designed to select the unlabeled data with confident class estimation for semi-supervised learning and the informative unlabeled data for active learning, respectively. Extensive experiments are conducted to show the superior performance of our method in classification applications, e.g., handwritten digit recognition, face recognition and large-scale image classification.

You have full access to this open access chapter, Download conference paper PDF

PSSDL: Probabilistic Semi-supervised Dictionary Learning

Robust Multi-label Image Classification with Semi-Supervised Learning and Active Learning

Semi-supervised Batch Mode Active Learning for Multi-class Classification

Keywords

1 Introduction

Considering the explosion of digital images in the real world, it is necessary to collect, classify and organize them in a simple, fast and efficient way. In order to use these increasing images as labeled data, automatic image annotation [28] is proposed by establishing statistical models, which can significantly reduce the labor cost of manually annotating images. However, statistical models, which need a large amount of labeled training samples, are not applicable for the case with a quite limited labeled data. How to build an accurate classification model with limited labeled samples for multi-class classification is still an open question.

Semi-Supervised learning (SSL) [1,2,3,4, 12] are potential solutions to the problem with a quite limited labeled data. SSL utilizes unlabeled samples to enhance the generalization ability of supervised learning. Classical SSL algorithms include Co-Training [2], graph-based semi-supervised learning [3], semi-supervised support vector machines (S3VM) [4] and semi-supervised dictionary learning (SSDL) [5,6,7,8,9,10, 12]. Recently promising performance has been achieved by jointly learning a dictionary based classifier and the class estimation of unlabeled data. However, it has been pointed by [11] that directly using unlabeled samples may significantly reduce classification performance when there are large amounts of noisy samples and outliers in the unlabeled data.

In order to effectively adopt the unlabeled training samples, which disturb semi-supervised learning methods due to their noise and variations, active learning (AL) [11, 29, 30] methods attract much attention recently. AL trains the model in an interactive way, which is capable of selecting the representative data based on the classification model learned in different iterations. However, the performance of AL quite depends on the effectiveness of the initial classifier.

Semi-supervised learning and active learning are not perfect alone but complementary to each other together. The classifier obtained by SSL, which takes both the labeled and unlabeled samples into account, can act as a good initial classifier; the introduction of AL can eliminate the problem of the model performance reduction caused due to the presence of a large number of noise samples and outliers in the unlabeled samples. Meanwhile, the introduction of AL can also gradually get labeled samples from the unlabeled data set for training without the need to prepare the required large-scale labeled datasets at the beginning. Several methods have been developed to study how to effectively combine SSL and AL. Song et al. [13] proposed an active learning method based on co-training in video annotation. Jiang et al. [14] developed a graph-based SSL method for video concept detection and used active learning to select data-concept pairs for human annotation. Although these combinations have improved the performance, the recently developed powerful semi-supervised dictionary learning (SSDL) models are not well exploited and how to jointly integrate SSDL and AL is still an open question.

In order to solve above issues, in this paper we proposed a novel framework of semi-supervised dictionary active learning (SSDAL) to effectively integrate semi-supervised dictionary learning (SSDL) and active learning (AL). Initially, we use a handful of labeled samples and abundant unlabeled samples to train a SSDL model. Based on that, we introduce AL algorithm to select the informative samples to boost the training. Compared to the original SSDL model, it is not necessary to prepare all the labeled samples at the beginning. Compared with the simple AL algorithm, it has a great advantage in learning from less labeled data and more unlabeled data. The experimental results on the benchmark datasets clearly show the superior performance of the proposed

To summarize, the main contributions of our work are as follows:

A novel semi-supervised dictionary active learning (SSDAL) framework is proposed to integrate the advantages of SSDL and AL for the first time.
The representative unlabeled samples selected by AL and the unlabeled samples with confident class estimation are complementary to each other.
Experiments on the benchmark datasets are conducted, with remarkable performance reported.

The rest of the paper is organized as follows. Section 2 presents a brief review of related work. Section 3 overviews the pipeline of our framework, followed by a discussion of model formulation and optimization in Sect. 4. The experimental results are presented in Sect. 5. Section 6 concludes the paper.

2 Related Work

2.1 Semi-supervised Dictionary Learning

Owing to the impressive performance of sparse representation and dictionary learning [16, 17, 31,32,33,34], semi-supervised dictionary learning (SSDL) algorithms [5,6,7,8,9,10, 12] have been proposed recently.

Most of SSDL methods aim to learn a shared dictionary. Pham et al. [5] incorporated the reconstruction error of both the labeled and unlabeled data with sparsity constraint into a joint objective function. Zhang et al. [6] proposed an online semi-supervised dictionary learning model, in which the reconstruction error of both labeled data and unlabeled data, label consistency and the classification error were integrated into a joint model. Wang et al. [9] proposed a robust dictionary learning method by exploiting the global structure of all labeled and unlabeled data. In these semi-supervised dictionary methods mentioned above, the unlabeled training data is only used to learn a shared dictionary, ignoring to explore the discrimination hidden in the unlabeled data.

In order to utilize the class information of unlabeled data, Shrivastava et al. [7] learnt a class-specific semi-supervised dictionary with estimating the class possibility of unlabeled data. Wang et al. [10] proposed an adaptively unified semi-supervised dictionary learning model which integrated the reconstruction error of both the labeled data and unlabeled data, and classifier learning into a unified framework. Vu et al. [27] proposed a shared dictionary learning by grouping the unlabeled samples via using the coefficient-based relationship between the labeled and unlabeled samples. The methods above try to exploit the discrimination hidden in the unlabeled data. However, the class probability of unlabeled training samples is artificially designed but not derived from the objective function. And the powerful class specific representation ability cannot be used in the shared dictionary learning model.

Recently, Yang et al. [12] proposed a discriminative semi-supervised dictionary learning (DSSDL) method, which achieves superior performance by introducing a regularization of entropy and using an extended dictionary to explore the discrimination embedded in the unlabeled data. However, there are some representative samples (e.g., nearby the border of different classes), which cannot be correctly estimated by DSSDL, preventing the further improvement of DSSDL.

2.2 Active Learning

Active learning (AL) has been widely studied in [11, 29, 30] for its ability to reducing human labor. In the view of sampling strategy, active learning can be roughly divided into three categories [28]: (i) membership query synthesis, (ii) stream-based selective sampling, and (iii) pool-based sampling.

Membership query synthesis assumes that the system can interact with the surrounding environment, e.g., the annotator can be asked to determine the category of some samples and learn the unknown concepts. But the disadvantage of this method is that all unlabeled samples are labeled by the annotator without considering the actual distribution of samples. To solve this issue with a large scale of unlabeled data, stream-based selective sampling introduced. Although the stream-based selective strategy can solve the problems caused by direct query methods to some extent, it often needs to set a fixed threshold to measure the information content of the sample, thus lack the universality of different tasks. Moreover, because of the way it compares, the actual distribution of unlabeled data sets and the difference between the unlabeled data can not be obtained [28].

Pool-based sampling active learning is proposed to overcome the drawbacks above. Lewis et al. [29] solved this by proposing pool-based sampling, which compares the information of unlabeled samples, and then selects the sample with the highest amount of information to ask the annotator. Since the pool-based sampling strategy has inherited the previous two methods and overcome the shortcomings of the above two methods, it has become the most widely studied and used sampling strategy [29, 30]. It has also pointed out by Lin et al. [30] that the sample selection criterion is the another key in AL algorithm, and there exists many sample selection criteria including risk reduction, uncertainty, diversity and so on [28]. The criteria is typically defined according to the classification uncertainty of samples. Specifically, the samples of low classification confidence, together with other informative criteria like diversity, are generally treated as the candidates for model retraining. The accuracy of progressively selecting uncertain unlabeled sample depends on the recognition ability of the desired classifier, which needs to perform well in the case with limited labeled training data.

3 Semi-supervised Dictionary Active Learning

We propose a novel SSDL-based active learning framework which is composed of a SSDL model and an active learning algorithm. Figure 1 illustrates the overall framework. Initially, the training set includes a limited labeled samples and abundant unlabeled samples. Next, we use semi-supervised dictionary learning to train a dictionary, which is supposed to have a good representative ability with a small within-class variation but a bad interclass representative ability. Then we select the most informative sample through active learning technique to retrain the proposed model. For the most informative sample, we introduce a user to annotate it and add it into labeled data set for the next dictionary training until the model converges.

3.1 Model of SSDAL

As many prevailing semi-supervised dictionary learning models [5,6,7,8,9,10, 12], we focus on the case that the identity of unlabeled training data lies in the training set. In order to overcome the drawbacks of the prevailing semi-supervised learning (e.g., its performance will be worsened by the unlabeled noisy samples and outliers) and active learning (e.g., a powerful initial classifier is needed), we proposed a novel model of semi-supervised dictionary active learning to fully exploit the benefits of both of semi-supervised dictionary learning [12] and active learning.

Given data points set $ \varvec{A } = \left[ {\varvec{A}_{1} , \ldots ,\varvec{A}_{i} , \ldots ,\varvec{A}_{C} ,\varvec{B}} \right] $ where $ \varvec{A}_{\varvec{i}} $ denotes the $ i^{th} $-class training data and each column of $ {\mathbf{A}}_{\varvec{i}} $ is a training sample while the remaining $ \varvec{B} = \left[ {\varvec{b}_{1} , \ldots ,\varvec{b}_{i} , \ldots ,\varvec{b}_{N} } \right] $ is the $ N $ unlabeled training samples from class 1 to $ C $. Let $ \varvec{D} = \left[ {\varvec{D}_{1} , \ldots ,\varvec{D}_{i} , \ldots ,\varvec{D}_{C} } \right]\varvec{ } $ denote the supervised dictionary initialized by $ \varvec{A} $, while $ \varvec{E} = \left[ {\varvec{E}_{1} , \ldots ,\varvec{E}_{i} , \ldots ,\varvec{E}_{C} } \right] $ is an extended dictionary that mainly explore the discrimination of unlabeled training data. Both $ \varvec{D}_{i} $ and $ \varvec{E}_{i} $ are associated to class $ \varvec{i} $, and they are required to well represent $ i^{th} $-class data but with a bad representation ability for all the other classes. As $ \varvec{P}_{i,j} $ indicates the probabilistic relationship between the $ j^{th} $-unlabeled training sample and $ \varvec{i}^{{\varvec{th}}} $-class. The model of our proposed SSDAL framework is:

$$ \begin{aligned} & \mathop {\hbox{min} }\limits_{{\hat{\varvec{D}},\varvec{P},\varvec{X}}} \sum\nolimits_{{\varvec{i} = {\mathbf{1}}}}^{C} {\left( {\left\| {\varvec{A}_{i} - \hat{\varvec{D}}_{i} \varvec{X}_{i}^{i} } \right\|_{F}^{2} + \gamma \left\| {\varvec{X}_{i}^{i} } \right\|_{1} + \lambda \left\| {\varvec{X}_{i}^{i} - \varvec{M}_{i} } \right\|_{F}^{2} } \right)} \\ & + \sum\nolimits_{j = 1}^{N - L} {\left( {\sum\nolimits_{i = 1}^{C} {\varvec{P}_{i,j} \left\| {\varvec{b}_{j} - \hat{\varvec{D}}_{i} \varvec{y}_{j}^{i} } \right\|_{F}^{2} + \gamma \left\| {\varvec{y}_{j}^{i} } \right\|_{1} } } \right)} \\ & - \beta \left( { - \sum\nolimits_{j = 1}^{N - L} {\sum\nolimits_{i = 1}^{C} {\varvec{P}_{i,j} \log \varvec{P}_{i,j} } } } \right) \\ & s.t.\;semi\;supervised\;learning\;for\;confident\;estimation \\ & active\;learning\;for\;unconfident\;class\;estimation \\ \end{aligned} $$

(1)

where $ \varvec{X}_{i}^{i} $ and $ \varvec{y}_{j}^{i} $ are the coding coefficient matrix of $ \varvec{A}_{i} $ and unlabeled data $ \varvec{b}_{j} $ on the class-specific dictionary $ \hat{\varvec{D}}_{i} = \left[ {\varvec{D}_{i} \varvec{E}_{i} } \right] $, respectively.

The confidence of the estimated class possibility can be measured by the entropy

$$ H\left( {\varvec{b}_{i} } \right) = \varvec{ } - \mathop \sum \limits_{i = 1}^{C} \varvec{P}_{i,j} \log \varvec{P}_{i,j} \varvec{ } $$

(2)

The entropy value of Eq. (2) indicates the uncertainty of the class estimation. For instance, if the unlabeled data is definitely assigned to some class (e.g., $ \varvec{P}_{i,j} = 1 $ for some $ j $ when the sample is assigned to the $ i^{th} $ class, and $ \varvec{P}_{i,j} = 0 $ for j≠i), the entropy value will be zero.

3.2 Semi-supervised Dictionary Learning

When the class estimation is confident, the proposed SSDAL model changes to

$$ \begin{aligned} & \mathop {\hbox{min} }\limits_{{\hat{\varvec{D}},\varvec{P},\varvec{X}}} \sum\nolimits_{{\varvec{i} = 1}}^{C} {\left( {\left\| {\varvec{A}_{i} - \hat{\varvec{D}}_{i} \varvec{X}_{i}^{i} } \right\|_{F}^{2} + \gamma \left\| {\varvec{X}_{i}^{i} } \right\|_{1} + \lambda \left\| {\varvec{X}_{\varvec{i}}^{\varvec{i}} - \varvec{M}_{\varvec{i}} } \right\|_{\varvec{F}}^{2} } \right)} \\ & + \sum\nolimits_{j = 1}^{N - L} {\left( {\sum\nolimits_{i = 1}^{C} {\varvec{P}_{i,j} \left\| {\varvec{b}_{j} - \hat{\varvec{D}}_{i} \varvec{y}_{j}^{i} } \right\|_{F}^{2} + \gamma \left\| {\varvec{y}_{j}^{i} } \right\|_{1} } } \right)} \\ & - \beta \left( { - \sum\nolimits_{j = 1}^{N - L} {\sum\nolimits_{i = 1}^{C} {\varvec{P}_{i,j} \log \varvec{P}_{i,j} } } } \right) \\ \end{aligned} $$

(3)

$$ s.t.\varvec{H}\left( {\varvec{b}_{\varvec{j}} } \right) < T $$

where T is a threshold, which is usually set as 0.5. In the dictionary learning, we only use the unlabeled data whose entropy is smaller than the threshold, i.e., their class estimation is relatively confident.

3.3 Active Learning

Considering the combination of active learning, Let $ \hat{\varvec{D}} $ denote the output of $ \left[ {\varvec{D E}} \right] $ in Eq. (1). Set $ \varvec{L} $ as the number of labeled samples for active learning. In model’s iteration, we can get the probabilistic outputs $ \varvec{P} $ for all the unlabeled samples and a class-specific dictionary $ \hat{\varvec{D}} = \left[ {\varvec{D E}} \right] $. If we want to boost the performance of our model by acquiring some labeled examples, the main issue is how to select the most valuable examples to query the user for labels. Considering that the SSDL model can naturally provide the probabilistic outputs, which is convenient to measure the uncertainty of all unlabeled samples, we adopt the uncertainty measurement to select the most uncertain samples.

For the unlabeled data, there are C candidate classes. Therefore, the semi-supervised dictionary learning provides C classifiers. When multiple learners exist, a widely applied strategy is to select the samples that have the maximum disagreement amongst them. Here the disagreement of multiple learners can also be regarded as an uncertainty measure, and this strategy is categorized into the uncertainty criterion as well. Inspired by [15], we use the uncertainty estimation method that considers the posterior probabilities of the best and the second best predictions, that is,

$$ {\text{Uncertainty}}\left( \varvec{x} \right) = \varvec{P}\left( {\varvec{c}_{1} |\varvec{x}} \right) - \varvec{P}\left( {\varvec{c}_{2} |\varvec{x}} \right)\varvec{ } $$

(4)

where $ \varvec{c}_{1} $ and $ \varvec{c}_{2} $ are the classes with the largest and second largest posterior class probabilities, respectively. If their margin is small, it means that the model is more confused on the sample and thus it is with high uncertainty. We use Eq. (3) as the final sample selection strategy in the active learning.

3.4 Classification Model

We utilize different coding models when dealing with the testing sample, e.g., collaborative representation of Eq. (5) for face recognition and the large scale image classification, while local representation of Eq. (6) is used in digit recognition [12].

$$ Code\_Classify\left( {\varvec{b}_{j} ,\hat{\varvec{D}}} \right) = argmin_{{\varvec{y}_{j} }} \left\| {\varvec{b}_{j} - \hat{\varvec{D}}\varvec{y}_{j} } \right\|_{F}^{2} + \gamma \left\| {\varvec{y}_{j} } \right\|_{1} \varvec{ } $$

(5)

$$ Code\_Classify\left( {\varvec{b}_{j} ,\hat{\varvec{D}}} \right) = argmin_{{\varvec{y}_{j}^{i} }} \left\| {\varvec{b}_{j} - \hat{\varvec{D}}_{i} \varvec{y}_{j}^{i} } \right\|_{F}^{2} + \gamma \left\| {\varvec{y}_{j}^{i} } \right\|_{1} \varvec{ }\forall i\varvec{ } $$

(6)

where $ \varvec{y}_{j} = [\varvec{y}_{j}^{1} , \ldots ,\varvec{y}_{j}^{i} , \ldots ,\varvec{y}_{j}^{c} ] $ is the coding vector on the whole dictionary, $ \hat{\varvec{D}} = \left[ {\varvec{D E}} \right] $ is the learned structured dictionary associated with class $ i, $ and $ \varvec{y}_{j}^{i} $ is the coding vector associated to i^th class of the $ j^{th} $ unlabeled data. Then the final classification is conducted by

$$ identity\left( \varvec{b} \right) = arg\,\mathop {min }\limits_{\varvec{i}} \left\{ {\varvec{e}_{\varvec{i}} } \right\} $$

(7)

where $ \varvec{e}_{i} = \left\| {\varvec{b} - \hat{\varvec{D}}_{i} \varvec{y}_{j}^{i} } \right\|_{2}^{2} $.

4 Optimization of SSDAL

The optimization of SSDAL is an alternative solving procedure, which includes the selection of unlabeled data and the semi-supervised dictionary learning of Eq. (3). And the semi-supervised dictionary learning can further be divided into two sub-problems by doing class estimation of unlabeled data and discriminative dictionary learning alternatively: updating P by fixing D, E and X, while updating D, E and X alternatively by fixing P [12]. These processes enable the model to converge.

Selection of Unlabeled Data.

With the class estimation of all unlabeled data, the ones with confident class estimation will be integrated into the model of discriminative semi-supervised dictionary learning.

For the unlabeled data with unconfident class estimation, we select the most informative samples from the rest of unlabeled data set iteratively via Eq. (4). Then, we introduce a user to label those informative samples and then add them into the annotated dataset.

Update P. By fixing the class-specific dictionary and the corresponding coding coefficient (e.g., D, E, X and y), and let $ \varvec{\varepsilon}_{j}^{i} = \left\| {\varvec{b}_{j} - \hat{\varvec{D}}_{i} \varvec{y}_{j}^{i} } \right\| $. The class probability of $ \varvec{j}^{{\varvec{th}}} $ unlabeled training sample is

$$ \varvec{P}_{i,j} = { \exp }\left\{ { -\varvec{\varepsilon}_{j}^{i} /\beta } \right\}/\sum\nolimits_{i = 1}^{C} {{ \exp }\left\{ { -\varvec{\varepsilon}_{j}^{i} /\beta } \right\}} $$

(8)

Update D, E and X. The unlabeled data, which are not included into the active learning or don’t have a confident estimation, their probability of class will be set as zero, i.e., $ \varvec{P}_{i,j} = 0 $. Then the proposed SSDAL changes to

$$ \begin{aligned} & \mathop {\hbox{min} }\limits_{{\hat{\varvec{D}},\varvec{X}}} \sum\nolimits_{i = 1}^{C} {\left( {\left\| {\varvec{A}_{i} - \hat{\varvec{D}}_{i} \varvec{X}_{i}^{i} } \right\|_{F}^{2} + \gamma \left\| {\varvec{X}_{i}^{i} } \right\|_{1} + \lambda \left\| {\varvec{X}_{i}^{i} - \varvec{M}_{i} } \right\|_{F}^{2} } \right)} \\ & + \sum\nolimits_{j = 1}^{N - L} {\left( {\sum\nolimits_{i = 1}^{C} {\varvec{P}_{i,j} \left\| {\varvec{b}_{j} - \hat{\varvec{D}}_{i} \varvec{y}_{j}^{i} } \right\|_{F}^{2} + \gamma \left\| {\varvec{y}_{j}^{i} } \right\|_{1} } } \right)} \\ \end{aligned} $$

(9)

which can efficiently solved by using the method in Yang et al. [12].

5 Experiments

In this section, extensive experiments were conducted over on the benchmark datasets, such as LFW [24], Web Vision 1.0 [25], USPS [22] and MNIST [23] to demonstrate the effectiveness of our proposed semi-supervised dictionary active learning (SSDAL). The competing methods include several representative supervised dictionary learning methods: SRC [18], FDDL [19], DKSVD [20], LCKSVD [26] and semi-supervised dictionary learning methods: JDL [5], OSSDL [6], S2D2 [7], SSRD [9], SSP-DL [21] and recently proposed DSSDL [12] algorithm. Here we don’t include deep learning related models because our base classifier is a dictionary learning related model and the number of labeled samples is too limited to train a good enough deep learning model. The coding of unlabeled training data and testing data in our proposed framework adopts the same coding representation.

The SSDL model used in our framework has three super parameters, $ \lambda $, $ \gamma $ and $ \beta . $ We set them as $ \lambda = 0.01 $, $ \gamma = 0.001 $, $ \beta = 0.01 $ in all experiments as same as [12].

We evaluate the performance of our proposed SSDAL in the classification accuracy with the same amount of user annotation totally. The classification accuracy is defined as the top one rate for digit recognition and face recognition, with an extra top-5 rate in Web Vision large-scale image classification task.

5.1 Datasets and Results

Face Identification.

Following the same experimental setting in [10], we estimate our proposed framework in the LFW database [24], which is a large-scale database consists of 4,174 face images of 143 individuals taken under varying pose, expression, illumination, misalignment and occlusion conditions. Each individual has no less than 11 images and we select the first 10 samples for training data with the remaining samples for testing. We randomly select 2 samples from each class as the initial labeled data, then we set 5 times of user-query iteration, which makes the final amount of labeled data as same as other methods. As shown in Fig. 2, the data is divided into 3 parts, the data not used, the training data, and the test data.

We use the same feature in [12] which reduces the feature vectors to 500 dimension. Table 1 lists the identification results of the LFW database, which show clearly that our proposed method achieves the highest recognition rates with the same amount of labeled data among the competing schemes. Compare to DSSDL, the improvement of the performance stems from the integration of active learning algorithm, which can select the most informative samples and no need to get all the labeled data ready.

Table 1. The recognition rates (%) on LFW database.

Full size table

Digit Recognition.

Use the same experimental setting in [12], we evaluate the performance on both the USPS dataset [22] and MNIST dataset [23]. In the USPS dataset, there are 9,298 digital images consisting of 10 classes. We randomly select 110 images from each class and then randomly select 2 images as the labeled samples for the initial dictionary training, 58 images as the unlabeled samples and the left as the testing samples. For MNIST dataset, there are 10 classes and 70,000 handwritten digital images totally, 60,000 for training and 10,000 for testing respectively. But we randomly select 200 samples from each class then we randomly select 2 images each class as the labeled samples for the initial dictionary training, 98 images as the unlabeled, and 100 images as the testing samples. The feature we used is the whole image, which was normalized to have unit $ l_{2} $-norm. We set 18 times user-query iteration, which with 10 labels updated in each iteration. This makes the final labeled data amount as same as other methods, which use 20 labeled images per class for training.

All relevant results for ten independent tests are listed in Table 2, which calculates the mean accuracy and standard deviation. It can be seen that the proposed SSDAL is able to find the informative samples from the unlabeled dataset for next round training and can then utilize information of the selected unlabeled data to improve the classification accuracy. Compare to all the competing methods, our proposed SSDAL achieves the best performance.

Table 2. The recognition rates (%) on USPS and MNIST

Full size table

Web Vision Database 1.0.

Web Vision database 1.0 [25] is larger than all the database we evaluated. We use a subset with the same number of classes (i.e., 1,000 classes) as the dataset, which contains 50 samples in each class. For each class, we randomly set 30 samples for train and 20 samples for test. From the training set, we select the first 5 samples as the initial labeled data. Next we set 8 times of user-query iteration. This makes it 13 labeled samples for each class finally.

We extract feature as same as [25] then we reduced it to 300 dimension. The top-1 result and top-5 result of the proposed SSDAL and two most competing methods, such as the supervised LCKSVD and the semi-supervised DSSDL. The results of all methods are listed in Table 3, from which we can observed that the improvements of SSDAL over DSSDL are 1.3% in Top-1 accuracy and 2.7% in Top-5 accuracy. Compared to LCKSVD, the advantages of SSDAL is larger.

Table 3. The recognition rates (%) on web Vision sub-database.

Full size table

6 Conclusions

In this paper, we proposed a new model of semi-supervised dictionary active learning (SSDAL), which integrates the state-of-the-art semi-supervised dictionary learning and active learning for the first time. Based on the proposed criterion which based on the estimated class possibility, the unlabeled data with confident class estimation and the representative information are returned into the training of SSDAL. Extensive experiments have shown the superior performance of our proposed framework.

References

Zhu, X.: Semi-supervised learning literature survey. Technical report 1530, Wisconsin-Madison (2005)
Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: COLT (1998)
Google Scholar
Zhu, X.: Semi-supervised learning with graphs. In Proceedings of IJCNLP (2005)
Google Scholar
Sindhwani, V., Keerthi, S.S.: Large scale semi-supervised linear SVMs. In: ACM SIGIR (2006)
Google Scholar
Pham, D.-S., Svetha, V.: Joint learning and dictionary construction for pattern recognition. In: Proceedings of CVPR (2008)
Google Scholar
Zhang, G., Jiang, Z., Davis, L.S.: Online semi-supervised discriminative dictionary learning for sparse representation. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 259–273. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37331-2_20
Chapter Google Scholar
Shrivastava, A., Pillai, J.K., Patel, V.M., Chellappa, R.: Learning discriminative dictionaries with partially labeled data. In: Proceedings of ICIP (2012)
Google Scholar
Babagholami-Mohamadabadi, B., Zarghami, A., Zolfaghari, M., Baghshah, M.S.: PSSDL: probabilistic semi-supervised dictionary learning. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS (LNAI), vol. 8190, pp. 192–207. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40994-3_13
Chapter Google Scholar
Wang, H., Nie, F., Cai, W., Huang, H.: Semi-supervised robust dictionary learning via efficient l2,0+-norms minimization. In: Proceedings of ICCV (2013)
Google Scholar
Wang, X., Guo, X., Li, S.: Adaptively unified semisupervised dictionary learning with active points. In: Proceeding of the ICCV (2015)
Google Scholar
Li, Y.-F., Zhou, Z.-H.: Towards making unlabeled data never hurt. IEEE Trans. Pattern Anal. Mach. Intell. 37(1), 175–188 (2015)
Article Google Scholar
Yang, M., Chen, L.: Discriminative semi-supervised dictionary learning with entropy regularization for pattern classification. In: AAAI (2017)
Google Scholar
Song, Y., Hua, X.-S., Dai, L.-R., Wang, M.: Semi-automatic video annotation based on active learning with multiple complementary predictors. In: MIR, pp. 97–104 (2005)
Google Scholar
Jiang, W., Loui, A.: Laplacian adaptive context-based SVM for video concept detection. In: ACMSIGMM Workshop, pp. 15–20 (2011)
Google Scholar
Joshi, A.J., Porikli, F., Papanikolopoulos, N.: Multi-class active learning for image classification. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009. IEEE, pp. 2372–2379 (2009)
Google Scholar
Yang, M., Dai, D., Shen, L., Gool, L.V.: Latent dictionary learning for sparse representation based classification. In: Proceedings of CVPR (2014)
Google Scholar
Yang, M., Zhang, L., Yang, J., Zhang, D.: Metaface learning for sparse representation based face recognition. In: Proceedings of ICIP (2010)
Google Scholar
Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust face recognition via sparse representation. IEEE TPAMI 31(2), 210–227 (2009)
Article Google Scholar
Yang, M., Zhang, L., Feng, X.: Fisher discrimination dictionary learning for sparse representation. In: Proceedings of ICCV (2011)
Google Scholar
Zhang, Q., Li, B.: Discriminative K-SVD for dictionary learning in face recognition. In: Proceedings of CVPR (2010)
Google Scholar
Wang, D., Zhang, X., Fan, M., Ye, X.: Semi-supervised dictionary learning via structural sparse preserving. In: Proceedings of AAAI (2016)
Google Scholar
Hull, J.: A database for handwritten text recognition research. IEEE TPAMI 16(5), 550–554 (1994)
Article Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradientbased learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Wolf, L., Hassner, T., Taigman, Y.: Similarity scores based on background samples. In: Zha, H., Taniguchi, R.-i., Maybank, S. (eds.) ACCV 2009. LNCS, vol. 5995, pp. 88–97. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12304-7_9
Chapter Google Scholar
Li, W., Wang, L., Li, W., et al.: WebVision database: visual learning and understanding from web data (2017)
Google Scholar
Jiang, Z., Lin, Z., Davis, L.S.: Label consistent K-SVD: learning a discriminative dictionary for recognition. IEEE TPAMI 35(11), 2651–2664 (2013)
Article Google Scholar
Vu, T.H., Monga, V.: Learning a low-rank shared dictionary for object classification. In: International Conference on Image Processing (ICIP) (2016)
Google Scholar
Settles, B.: Active learning literature survey. Computer Sciences Technical report 1648, University of Wisconsin–Madison (2009)
Google Scholar
Lewis, D., Gale, W.: A sequential algorithm for training text classifiers. In: Croft, B.W., van Rijsbergen, C.J. (eds.) SIGIR 1994, pp. 3–12. Springer, London (1994). https://doi.org/10.1007/978-1-4471-2099-5_1. Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval
Chapter Google Scholar
Lin, L., Wang, K., Meng, D., et al.: Active self-paced learning for cost-effective and progressive face identification. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 7–19 (2017)
Google Scholar
Jing, X.Y., Zhang, D.: A face and palmprint recognition approach based on discriminant DCT feature extraction. IEEE Trans. Syst. Man Cybern. Part B Cybern. 34(6), 2405 (2004)
Article Google Scholar
Jing, X.Y., Zhu, X., Wu, F., et al.: Super-resolution person re-identification with semi-coupled low-rank discriminant dictionary learning. IEEE Trans. Image Process. 26(3), 1363–1378 (2017)
Article MathSciNet Google Scholar
Zhu, X., Jing, X.Y., You, X., et al.: Image to video person re-identification by learning heterogeneous dictionary pair with feature projection matrix. IEEE Trans. Inf. Forensics Secur. PP(99), 1 (2017)
Google Scholar
Learning heterogeneous dictionary pair with feature projection matrix for pedestrian video retrieval via single query image. In: AAAI 2017, pp. 4341–4348 (2017)
Google Scholar

Download references

Acknowledgement

This work is partially supported by the National Natural Science Foundation of China (Grant no. 61772568), the Guangzhou Science and Technology Program (Grant no. 201804010288), the Fundamental 535 Research Funds for the Central Universities (Grant no. 18lgzd15), the Shenzhen Scientific Research and Development Funding Program (Grant no. JCYJ20170302153827712).

Author information

Authors and Affiliations

School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
Qin Zhong & Meng Yang
School of Computer Science and Engineering, Northeastern University, Shenyang, China
Qin Zhong & Tiancheng Zhang

Authors

Qin Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Meng Yang
View author publications
You can also search for this author in PubMed Google Scholar
Tiancheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Meng Yang .

Editor information

Editors and Affiliations

Sun Yat-sen University, Guangzhou, China
Jian-Huang Lai
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Xilin Chen
Tsinghua University, Beijing, China
Jie Zhou
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Xi'an Jiaotong University, Xi'an, China
Nanning Zheng
Peking University, Beijing, China
Hongbin Zha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhong, Q., Yang, M., Zhang, T. (2018). Semi-supervised Dictionary Active Learning for Pattern Classification. In: Lai, JH., et al. Pattern Recognition and Computer Vision. PRCV 2018. Lecture Notes in Computer Science(), vol 11258. Springer, Cham. https://doi.org/10.1007/978-3-030-03338-5_47

Download citation

DOI: https://doi.org/10.1007/978-3-030-03338-5_47
Published: 03 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03337-8
Online ISBN: 978-3-030-03338-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Semi-supervised Dictionary Active Learning for Pattern Classification

Abstract

Similar content being viewed by others

PSSDL: Probabilistic Semi-supervised Dictionary Learning

Robust Multi-label Image Classification with Semi-Supervised Learning and Active Learning

Semi-supervised Batch Mode Active Learning for Multi-class Classification

Keywords

1 Introduction