Supervised transfer kernel sparse coding for image classification

doi:10.1016/j.patrec.2015.08.011

Pattern Recognition Letters

Volume 68, Part 1, 15 December 2015, Pages 27-33

https://doi.org/10.1016/j.patrec.2015.08.011 Get rights and content

Highlights

•
We use kernel sparse coding in the context of transfer learning.
•
We maximize the correlation between source and target sparse coding of the same class.
•
A unified framework is presented to learn the dictionary and transfer sparse coding.

Abstract

When there are a few labeled images, the classifier trained performs poorly even we use sparse coding technique to process image features. So we utilize other data from related domains as source data to help classification tasks. In this paper, we propose a Supervised Transfer Kernel Sparse Coding (STKSC) algorithm to construct discriminative sparse representations for cross domain image classification tasks. Specifically, we map source and target data into a high dimensional feature space by using kernel trick, hence capturing the nonlinear image features. In order to make the sparse representations robust to the domain mismatch, we incorporate the Maximum Mean Discrepancy (MMD) criterion into the objective function of kernel sparse coding. We also use label information to learn more discriminative sparse representations. Furthermore, we provide a unified framework to learn the dictionary and the discriminative sparse representations, which can be further used for classification. The experiment results validate that our method outperforms many state-of-art methods.

Introduction

Sparse coding has been successfully applied to image classification problems [1], [12], [20], [21]. Sparse coding yields sparse representations so that a sample can be represented by a linear combination of a few dictionary elements. When represented by sparse representations, images can be well interpreted and classified. Recently, sparse coding has been combined with kernel methods to deal with the nonlinear structure of the data [6], [18]. They all extended sparse coding to nonlinear sparse coding using kernel trick, thus yielding more discriminative sparse representations. In view of its merits, we also map source and target data into a high dimensional feature space by an implicit mapping function and then learn the sparse representations of the data in the kernel space.

However, labeled images are often short and labeling new images is time consuming. So we can use other related dataset, source domain, to help our target domain image classification tasks [1], [13], [15]. When the feature distribution of source domain differs greatly from that of target domain, directly applying the classifier or object models trained on source domain to target domain is impossible. Transfer learning has been widely studied to solve this problem, which produces excellent results. A survey on transfer learning is presented [14].

Many researchers have focus on the use of sparse coding in transfer learning. Source and target images are combined to learn a shared dictionary and sparse representations for all the images [1], [12]. They all adopted a nonparametric distance measure to reduce the distribution divergence between source and target data. The new representations obtained are proved to be robust for the classification tasks. However, in [1], source data are also used as test data. In real applications, source data are all tagged, and the task is to classify the target data. In [12], the limited labeled target data are not considered in the algorithm. Only using the source labeled data is not enough for the target classification tasks. Shekhar et al. proposed a method to learn a shared discriminative dictionary of source and target data in a low dimensional space [17].

However, they seek the sparse representations of all the samples in the original feature space or a low dimensional space. The drawback is that they are not adequate for representing image features. Image features are not linearly separable in the original feature space in which many nonlinear similarity functions such as spatial pyramid descriptor, region covariance descriptor are used to measure the similarities between them. Pyramid match kernel is used for spatial pyramid descriptor, while geodesic distance for region covariance descriptor. These similarity measures between images are highly nonlinear. However, sparse coding means that the data is represented by a linear combination of a few dictionary elements. Directly using the linear method to represent the nonlinear image features leads to poor performance. We overcome the problem by mapping the samples into a high dimensional feature space using kernel trick to get more separable sparse representations by a coupled dictionary. When mapping the samples into a high dimensional feature space, we convert the linearly inseparable data in the original feature space into linear separable data in the high dimensional feature space. So we can apply sparse coding theory in the high dimensional feature space. In the high dimensional feature space, the computational complexity is so high that we use kernel function to transform the objective into a feasible problem.

Our another novelty is adding label information into the objective using the sparse coding of source and target data. Specifically, we maximize the correlation between the sparse coding of source and target data of the same class. Due to the introduction of supervision information, we can learn more discriminative sparse representations which are helpful for our classification tasks. Several experiments are implemented to demonstrate that our method yields very good performance.

This paper is organized as follows. Section 2 provides a brief review of related work. In Section 3, we introduce our novel Supervised Transfer Kernel Sparse Coding (STKSC) algorithm. In Section 4, the optimization of STKSC is presented. Extensive experiments are conducted in Section 5. The conclusions and future work are presented in Section 6.

Section snippets

Related work

Sparse coding has received growing attention because of its advantage and promising performance for many computer vision applications [1], [12], [22]. Researchers have developed a lot of algorithms to get sparse and efficient representations of images [18], [23]. Given input samples $Y = [y_{1}, \dots, y_{n}] \in R^{d \times n},$ the sparse coding matrix $X \in R^{m \times n}$ and dictionary $D = [d_{1}, \dots, d_{m}] \in R^{d \times m}$ can be trained by solving the following problem $(D, X) = m i n_{D, X} {∥ Y - DX ∥}_{F}^{2} + γ \sum_{i = 1}^{n} {∥ x_{i} ∥}_{1}$ where x_i is a column of $X \in R^{m \times n}$ . The Frobenius norm is

Supervised transfer kernel sparse coding

Define source data $D_{s} = {Y_{s}, Z_{s}},$ target data $D_{t} = {Y_{t}}$ in which $Y_{s} = [y_{s}^{1}, \dots, y_{s}^{n_{s}}] \in R^{d \times n_{s}}$ with labels $Z_{s} = {[z_{s}^{1}, \dots, z_{s}^{n_{s}}]}^{'} \in R^{n_{s}},$ n_s is the number of source samples. We define target labeled samples $Y_{tl} = [y_{tl}^{1}, \dots, y_{tl}^{n_{tl}}] \in R^{d \times n_{t l}}$ with labels $Z_{tl} = {[z_{tl}^{1}, \dots, z_{tl}^{n_{tl}}]}^{'} \in R^{n_{t l}}$ and unlabeled samples $Y_{tu} = [y_{tu}^{1}, \dots, y_{tu}^{n_{tu}}] \in R^{d \times n_{t u}},$ respectively. All the target samples are $Y_{t} = [Y_{tl} Y_{tu}] \in R^{d \times n_{t}}$ $n_{t} = n_{t l} + n_{t u},$ n_t is the number of target samples. Q′ means the transpose of a matrix Q in this paper.

Suppose $Y = [Y_{s} Y_{t}] \in R^{d \times n},$ $n = n_{s} + n_{t},$ source and

Optimization of STKSC

Given dictionary D, the objective function can be written as $\begin{matrix} m i n_{D, X} {∥ Φ (Y) - Φ (D) X ∥}_{F}^{2} + t r (X^{'} a {XPX}^{'}) + γ \sum_{i = 1}^{n} {∥ x_{i} ∥}_{1}, \\ w h e r e P = α M + β L - λ L_{a} \end{matrix}$ Since x_i is independent of other items in X, we update x_i while other vectors fixed. $\begin{matrix} m i n_{x_{i}} ∥ Φ (y_{i}) - Φ (D) x_{i} ∥^{2} + P_{ii} x_{i}^{'} x_{i} + \sum_{j = 1, j \neq i}^{n} P_{ij} x_{i}^{'} x_{j} + γ {∥ x_{i} ∥}_{1}, \\ = ϕ {(y_{i})}^{'} ϕ (y_{i}) - 2 x_{i}^{'} ϕ {(D)}^{'} ϕ (y_{i}) + x_{i}^{'} ϕ {(D)}^{'} ϕ (D) x_{i} \\ + P_{ii} x_{i}^{'} x_{i} + \sum_{j = 1, j \neq i}^{n} P_{ij} x_{i}^{'} x_{j} + γ {∥ x_{i} ∥}_{1}, \end{matrix}$

We can use any kernel functions that satisfy the Mercers condition [2], such as Gaussian kernel, Polynomial kernel, to compute ϕ(y)′ϕ(y). Putting these kernels

Experiments and analysis

In this section, we perform experiments on publicly available benchmark datasets for cross domain image classification tasks (see Table 2). It can be observed that our method outperforms the state-of-art methods in most of cases.

Conclusions and future work

Learning discriminative sparse representations is of great importance for the cross domain image classification problems. In this paper, we use source and target data to learn a shared dictionary and sparse representations of all the data by minimizing the reconstruction error of kernel sparse coding. Moreover, the supervision information is used by maximizing the correlation between the sparse representations of source and target data of the same class. We also incorporate the MMD and graph

Acknowledgments

This work is supported by National Natural Science Foundation of China (Grant No. 61472305, 61070143, 61303034), Science and technology project of Shaanxi province, China (Grant No. 2015GY027), and the Fundamental Research Funds for the Central Universities (Grant No. SMC1405).

References (23)

M. Fang et al.
Multi-source transfer learning based on label shared subspace
Patt. Recognit. Lett.
(2015)
S. Wold et al.
Principal component analysis
Chemom. Intell. Lab. Syst.
(1987)
S. Yang et al.
Semi-supervised action recognition in video via labeled kernel sparse coding and sparse l 1 graph
Patt. Recognit. Lett.
(2012)
M. Al-Shedivat et al.
Supervised transfer sparse coding
Twenty-eighth AAAI Conference on Artificial Intelligence
(2014)
C. Cortes et al.
Support-vector networks
Mach. Learn.
(1995)
M. Everingham et al.
The pascal visual object classes (voc) challenge
Int. J. Comput. Vis.
(2010)
R.E. Fan et al.
Liblinear: A library for large linear classification
J. Mach. Learn. Res.
(2008)
S. Gao et al.
Sparse representation with kernels
IEEE Trans. Image Process.
(2013)
G. Griffin, A. Holub, P. Perona, Caltech-256 object category dataset, Technical Report 7694, California Institute of...
J.J. Hull
A database for handwritten text recognition research
IEEE Trans. Patt. Anal. Mach. Intell.
(1994)

S. Lazebnik et al.

Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories

Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on

(2006)

Cited by (14)

T2-FDL: A robust sparse representation method using adaptive type-2 fuzzy dictionary learning for medical image classification
2020, Expert Systems with Applications
Citation Excerpt :
The sparse representation was first proposed in the signal processing field and later found its place in different data representation applications (Huang & Aviyente, 2007). Recently, the sparse representation has been widely used in many image classification applications (Li, Fang, Wang, & Zhang, 2015; Zhang, Wang, Tao, Gong, & Zheng, 2017; Zhang, Shen, Wei, Li, & Sangaiah, 2017). Furthermore, it can significantly improve visual recognition and image classification by providing an efficient approximation (Chatfield, Lempitsky, Vedaldi, & Zisserman, 2011).
In this paper, a robust sparse representation for medical image classification is proposed based on the adaptive type-2 fuzzy learning (T2-FDL) system. In the proposed method, sparse coding and dictionary learning processes are executed iteratively until a near-optimal dictionary is obtained. The sparse coding step aiming at finding a combination of dictionary atoms to represent the input data efficiently, and the dictionary learning step rigorously adjusts a minimum set of dictionary items. The two-step operation helps create an adaptive sparse representation algorithm by involving the type-2 fuzzy sets in the design process of image classification. Since the existing image measurements are not made under the same conditions and with the same accuracy, the performance of medical diagnosis is always affected by noise and uncertainty. By introducing an adaptive type-2 fuzzy learning method, a better approximation in an environment with higher degrees of uncertainty and noise is achieved. The experiments are executed over two open-access brain tumor magnetic resonance image databases, REMBRANDT and TCGA-LGG, from The Cancer Imaging Archive (TCIA). The experimental results of a brain tumor classification task show that the proposed T2-FDL method can adequately minimize the negative effects of uncertainty in the input images. The results demonstrate the outperformance of T2-FDL compared to other important classification methods in the literature, in terms of accuracy, specificity, and sensitivity.
FDSR: A new fuzzy discriminative sparse representation method for medical image classification
2020, Artificial Intelligence in Medicine
Citation Excerpt :
Incomplete data in many signal processing, machine learning, and pattern recognition problems results in some sort of sparse representation. Thus, in recent years, the sparse representation has drawn much attention in different classification problems [12–14]. In sparse representation, the input data is approximated in terms of a linear combination of atoms from a given overcomplete basis, called a dictionary.
Recent developments in medical image analysis techniques make them essential tools in medical diagnosis. Medical imaging is always involved with different kinds of uncertainties. Managing these uncertainties has motivated extensive research on medical image classification methods, particularly for the past decade. Despite being a powerful classification tool, the sparse representation suffers from the lack of sufficient discrimination and robustness, which are required to manage the uncertainty and noisiness in medical image classification issues. It is tried to overcome this deficiency by introducing a new fuzzy discriminative robust sparse representation classifier, which benefits from the fuzzy terms in its optimization function of the dictionary learning process. In this work, we present a new medical image classification approach, fuzzy discriminative sparse representation (FDSR). The proposed fuzzy terms increase the inter-class representation difference and the intra-class representation similarity. Also, an adaptive fuzzy dictionary learning approach is used to learn dictionary atoms. FDSR is applied on Magnetic Resonance Images (MRI) from three medical image databases. The comprehensive experimental results clearly show that our approach outperforms its series of rival techniques in terms of accuracy, sensitivity, specificity, and convergence speed.
Multiple Universum Empirical Kernel Learning
2020, Engineering Applications of Artificial Intelligence
This paper proposes a novel framework called Multiple Universum Empirical Kernel Learning (MUEKL) that combines the Universum learning with Multiple Empirical Kernel Learning (MEKL) for the first time to inherit the advantages of both techniques. The proposed MUEKL not only obtained supplementary information of multiple feature spaces through MEKL, but also obtained priori information of samples by Universum learning. MUEKL incorporates a novel method, Imbalanced Modified Universum (IMU), to generate more efficient Universum samples by introducing the imbalanced ratio of data. MUEKL develops the basic multiple kernel learning framework by introducing a regularization of Universum data. The function of the introduced regularization is to adjust the classifier boundary closer to the Universum data to alleviate the influence of the imbalanced data. Moreover, MUEKL performs excellent generalization for both the imbalanced and balanced problems. Extensive experiments verify the effectiveness of the MUEKL and IMU.
Deep transfer network for rotating machine fault analysis
2019, Pattern Recognition
Machine learning-based intelligent fault diagnosis methods have gained extensive popularity and been widely investigated. However, in previous works, a major assumption accepted by default is that the training and testing datasets share the same distribution. Unfortunately, this assumption is mostly invalid in real-world applications for working condition variation of rotating machine can cause the distribution discrepancy between datasets easily, which results in performance degeneration of traditional diagnosis methods. Aiming at it, although some deep learning and transfer learning-based methods are proposed and validated effective recently, the dataset distribution alignments of them mainly focus on marginal distribution alignments, which are not powerful enough in some scenarios. Hence, a novel distribution discrepancy evaluating method called auto-balanced high-order Kullback–Leibler (AHKL) divergence is proposed, which can evaluate both the first and higher-order moment discrepancies and adapt the weights between them dimensionally and automatically. Meanwhile, smooth conditional distribution alignment (SCDA) is also developed, which performs excellently in aligning the conditional distributions through introducing soft labels instead of adopting widely-used pseudo labels. Furthermore, based on AHKL divergence and SCDA, weighted joint distribution alignment (WJDA) is developed for comprehensive joint distribution alignments. Finally, built on WJDA, we construct a novel deep transfer network (DTN) for rotating machine fault diagnosis with working condition variation. Extensive experimental evaluations through 18 transfer learning cases demonstrate its validity, and further comparisons with the state of the arts also validate its superiority.
Domain adaptation from RGB-D to RGB images
2017, Signal Processing
Citation Excerpt :
Domain difference between RGB-D dataset and RGB dataset is usually large due to causes from the imaging acquisition process. Many visual domain adaptation methods [14–17] have been widely studied to solve the problem, yielding excellent results. A survey of visual domain adaptation methods is presented in [18].
The introduction of depth cameras offers an opportunity to utilize the depth images to help the object recognition tasks. However, when our target tasks are classifying RGB images, how can we use the RGB-D images? To deal with this problem, we proposed a novel domain adaptation method by learning from RGB-D images in source domain to recognize RGB images in target domain, named DARDR. By introducing the cross modal constraint and the cross domain constraint, our DARDR can maximize the correlations between RGB and depth images in source domain and minimize the domain discrepancy across domains, simultaneously. We incorporate the two terms into the least-squares classifiers. Furthermore, a unified framework is presented to learn the classifier parameters. The advantage of our method is that the correlation between source RGB and depth images and the discrepancy between source and target data can be incorporated with the classifiers of source and target data. To evaluate our DARDR method, we apply it to five cross domain datasets. The experimental results demonstrate that our method can achieve competing performance against the state-of-art methods for object recognition and scene classification tasks.
Learning Coupled Classifiers with RGB images for RGB-D object recognition
2017, Pattern Recognition
Citation Excerpt :
Discriminative Domain Adaptation (DDA) method [24] transferred labeled source data to target domain after refining the data. Supervised Transfer Kernel Sparse Coding (STKSC) method [12] unified kernel sparse coding in transfer learning and exploited the labels to learn the sparse representations across domains. All the above methods use the few labeled samples from target domain and achieve better performance.
The emergence of depth images opens a new dimension to address the challenging object recognition tasks. However, when only a small amount of labeled data is available, we cannot learn a discriminative classifier directly using the RGB-D images. To cope with this problem, we proposed a new method, Learning Coupled Classifiers with RGB images for RGB-D object recognition (LCCRRD). We learn the coupled classifiers using RGB images from source domain, the combined RGB and depth images from target domain and RGB images from target domain. The predicted results of the two target classifiers are made to be similar to make them more accurate. We also utilize the correlation between source and target RGB images to boost the relevant features and eliminate the irrelevant features. It also has the capacity to incorporate the manifold structure into our model. Furthermore, a unified objective function is presented to learn the classifier parameters. To evaluate our LCCRRD method, we apply it to five cross domain datasets. The experimental results demonstrate that our method can achieve competing performance against the state-of-art methods for object recognition tasks.

View all citing articles on Scopus

^☆: This paper has been recommended for acceptance by Dr. S. Wang.

View full text

Supervised transfer kernel sparse coding for image classification☆

Highlights

Abstract

Introduction

Section snippets

Related work

Supervised transfer kernel sparse coding

Optimization of STKSC

Experiments and analysis

Conclusions and future work

Acknowledgments

Patt. Recognit. Lett.

Chemom. Intell. Lab. Syst.

Patt. Recognit. Lett.

Supervised transfer sparse coding

Twenty-eighth AAAI Conference on Artificial Intelligence

Support-vector networks

Mach. Learn.

The pascal visual object classes (voc) challenge

Int. J. Comput. Vis.

Liblinear: A library for large linear classification

J. Mach. Learn. Res.

Sparse representation with kernels

IEEE Trans. Image Process.

A database for handwritten text recognition research

IEEE Trans. Patt. Anal. Mach. Intell.

Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories

Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on