Elsevier

Pattern Recognition Letters

Volume 68, Part 1, 15 December 2015, Pages 27-33
Pattern Recognition Letters

Supervised transfer kernel sparse coding for image classification

https://doi.org/10.1016/j.patrec.2015.08.011Get rights and content

Highlights

  • We use kernel sparse coding in the context of transfer learning.

  • We maximize the correlation between source and target sparse coding of the same class.

  • A unified framework is presented to learn the dictionary and transfer sparse coding.

Abstract

When there are a few labeled images, the classifier trained performs poorly even we use sparse coding technique to process image features. So we utilize other data from related domains as source data to help classification tasks. In this paper, we propose a Supervised Transfer Kernel Sparse Coding (STKSC) algorithm to construct discriminative sparse representations for cross domain image classification tasks. Specifically, we map source and target data into a high dimensional feature space by using kernel trick, hence capturing the nonlinear image features. In order to make the sparse representations robust to the domain mismatch, we incorporate the Maximum Mean Discrepancy (MMD) criterion into the objective function of kernel sparse coding. We also use label information to learn more discriminative sparse representations. Furthermore, we provide a unified framework to learn the dictionary and the discriminative sparse representations, which can be further used for classification. The experiment results validate that our method outperforms many state-of-art methods.

Introduction

Sparse coding has been successfully applied to image classification problems [1], [12], [20], [21]. Sparse coding yields sparse representations so that a sample can be represented by a linear combination of a few dictionary elements. When represented by sparse representations, images can be well interpreted and classified. Recently, sparse coding has been combined with kernel methods to deal with the nonlinear structure of the data [6], [18]. They all extended sparse coding to nonlinear sparse coding using kernel trick, thus yielding more discriminative sparse representations. In view of its merits, we also map source and target data into a high dimensional feature space by an implicit mapping function and then learn the sparse representations of the data in the kernel space.

However, labeled images are often short and labeling new images is time consuming. So we can use other related dataset, source domain, to help our target domain image classification tasks [1], [13], [15]. When the feature distribution of source domain differs greatly from that of target domain, directly applying the classifier or object models trained on source domain to target domain is impossible. Transfer learning has been widely studied to solve this problem, which produces excellent results. A survey on transfer learning is presented [14].

Many researchers have focus on the use of sparse coding in transfer learning. Source and target images are combined to learn a shared dictionary and sparse representations for all the images [1], [12]. They all adopted a nonparametric distance measure to reduce the distribution divergence between source and target data. The new representations obtained are proved to be robust for the classification tasks. However, in [1], source data are also used as test data. In real applications, source data are all tagged, and the task is to classify the target data. In [12], the limited labeled target data are not considered in the algorithm. Only using the source labeled data is not enough for the target classification tasks. Shekhar et al. proposed a method to learn a shared discriminative dictionary of source and target data in a low dimensional space [17].

However, they seek the sparse representations of all the samples in the original feature space or a low dimensional space. The drawback is that they are not adequate for representing image features. Image features are not linearly separable in the original feature space in which many nonlinear similarity functions such as spatial pyramid descriptor, region covariance descriptor are used to measure the similarities between them. Pyramid match kernel is used for spatial pyramid descriptor, while geodesic distance for region covariance descriptor. These similarity measures between images are highly nonlinear. However, sparse coding means that the data is represented by a linear combination of a few dictionary elements. Directly using the linear method to represent the nonlinear image features leads to poor performance. We overcome the problem by mapping the samples into a high dimensional feature space using kernel trick to get more separable sparse representations by a coupled dictionary. When mapping the samples into a high dimensional feature space, we convert the linearly inseparable data in the original feature space into linear separable data in the high dimensional feature space. So we can apply sparse coding theory in the high dimensional feature space. In the high dimensional feature space, the computational complexity is so high that we use kernel function to transform the objective into a feasible problem.

Our another novelty is adding label information into the objective using the sparse coding of source and target data. Specifically, we maximize the correlation between the sparse coding of source and target data of the same class. Due to the introduction of supervision information, we can learn more discriminative sparse representations which are helpful for our classification tasks. Several experiments are implemented to demonstrate that our method yields very good performance.

This paper is organized as follows. Section 2 provides a brief review of related work. In Section 3, we introduce our novel Supervised Transfer Kernel Sparse Coding (STKSC) algorithm. In Section 4, the optimization of STKSC is presented. Extensive experiments are conducted in Section 5. The conclusions and future work are presented in Section 6.

Section snippets

Related work

Sparse coding has received growing attention because of its advantage and promising performance for many computer vision applications [1], [12], [22]. Researchers have developed a lot of algorithms to get sparse and efficient representations of images [18], [23]. Given input samples Y=[y1,,yn]Rd×n, the sparse coding matrix XRm×n and dictionary D=[d1,,dm]Rd×m can be trained by solving the following problem (D,X)=minD,XYDXF2+γi=1nxi1where xi is a column of XRm×n. The Frobenius norm is

Supervised transfer kernel sparse coding

Define source data Ds={Ys,Zs}, target data Dt={Yt} in which Ys=[ys1,,ysns]Rd×ns with labels Zs=[zs1,,zsns]Rns, ns is the number of source samples. We define target labeled samples Ytl=[ytl1,,ytlntl]Rd×ntl with labels Ztl=[ztl1,,ztlntl]Rntl and unlabeled samples Ytu=[ytu1,,ytuntu]Rd×ntu, respectively. All the target samples are Yt=[YtlYtu]Rd×ntnt=ntl+ntu, nt is the number of target samples. Q′ means the transpose of a matrix Q in this paper.

Suppose Y=[YsYt]Rd×n,n=ns+nt, source and

Optimization of STKSC

Given dictionary D, the objective function can be written as minD,XΦ(Y)Φ(D)XF2+tr(XaXPX)+γi=1nxi1,whereP=αM+βLλLaSince xi is independent of other items in X, we update xi while other vectors fixed. minxiΦ(yi)Φ(D)xi2+Piixixi+j=1,jinPijxixj+γxi1,=ϕ(yi)ϕ(yi)2xiϕ(D)ϕ(yi)+xiϕ(D)ϕ(D)xi+Piixixi+j=1,jinPijxixj+γxi1,

We can use any kernel functions that satisfy the Mercers condition [2], such as Gaussian kernel, Polynomial kernel, to compute ϕ(y)′ϕ(y). Putting these kernels

Experiments and analysis

In this section, we perform experiments on publicly available benchmark datasets for cross domain image classification tasks (see Table 2). It can be observed that our method outperforms the state-of-art methods in most of cases.

Conclusions and future work

Learning discriminative sparse representations is of great importance for the cross domain image classification problems. In this paper, we use source and target data to learn a shared dictionary and sparse representations of all the data by minimizing the reconstruction error of kernel sparse coding. Moreover, the supervision information is used by maximizing the correlation between the sparse representations of source and target data of the same class. We also incorporate the MMD and graph

Acknowledgments

This work is supported by National Natural Science Foundation of China (Grant No. 61472305, 61070143, 61303034), Science and technology project of Shaanxi province, China (Grant No. 2015GY027), and the Fundamental Research Funds for the Central Universities (Grant No. SMC1405).

References (23)

  • M. Fang et al.

    Multi-source transfer learning based on label shared subspace

    Patt. Recognit. Lett.

    (2015)
  • S. Wold et al.

    Principal component analysis

    Chemom. Intell. Lab. Syst.

    (1987)
  • S. Yang et al.

    Semi-supervised action recognition in video via labeled kernel sparse coding and sparse l 1 graph

    Patt. Recognit. Lett.

    (2012)
  • M. Al-Shedivat et al.

    Supervised transfer sparse coding

    Twenty-eighth AAAI Conference on Artificial Intelligence

    (2014)
  • C. Cortes et al.

    Support-vector networks

    Mach. Learn.

    (1995)
  • M. Everingham et al.

    The pascal visual object classes (voc) challenge

    Int. J. Comput. Vis.

    (2010)
  • R.E. Fan et al.

    Liblinear: A library for large linear classification

    J. Mach. Learn. Res.

    (2008)
  • S. Gao et al.

    Sparse representation with kernels

    IEEE Trans. Image Process.

    (2013)
  • G. Griffin, A. Holub, P. Perona, Caltech-256 object category dataset, Technical Report 7694, California Institute of...
  • J.J. Hull

    A database for handwritten text recognition research

    IEEE Trans. Patt. Anal. Mach. Intell.

    (1994)
  • S. Lazebnik et al.

    Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories

    Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on

    (2006)
  • Cited by (14)

    • T2-FDL: A robust sparse representation method using adaptive type-2 fuzzy dictionary learning for medical image classification

      2020, Expert Systems with Applications
      Citation Excerpt :

      The sparse representation was first proposed in the signal processing field and later found its place in different data representation applications (Huang & Aviyente, 2007). Recently, the sparse representation has been widely used in many image classification applications (Li, Fang, Wang, & Zhang, 2015; Zhang, Wang, Tao, Gong, & Zheng, 2017; Zhang, Shen, Wei, Li, & Sangaiah, 2017). Furthermore, it can significantly improve visual recognition and image classification by providing an efficient approximation (Chatfield, Lempitsky, Vedaldi, & Zisserman, 2011).

    • FDSR: A new fuzzy discriminative sparse representation method for medical image classification

      2020, Artificial Intelligence in Medicine
      Citation Excerpt :

      Incomplete data in many signal processing, machine learning, and pattern recognition problems results in some sort of sparse representation. Thus, in recent years, the sparse representation has drawn much attention in different classification problems [12–14]. In sparse representation, the input data is approximated in terms of a linear combination of atoms from a given overcomplete basis, called a dictionary.

    • Multiple Universum Empirical Kernel Learning

      2020, Engineering Applications of Artificial Intelligence
    • Domain adaptation from RGB-D to RGB images

      2017, Signal Processing
      Citation Excerpt :

      Domain difference between RGB-D dataset and RGB dataset is usually large due to causes from the imaging acquisition process. Many visual domain adaptation methods [14–17] have been widely studied to solve the problem, yielding excellent results. A survey of visual domain adaptation methods is presented in [18].

    • Learning Coupled Classifiers with RGB images for RGB-D object recognition

      2017, Pattern Recognition
      Citation Excerpt :

      Discriminative Domain Adaptation (DDA) method [24] transferred labeled source data to target domain after refining the data. Supervised Transfer Kernel Sparse Coding (STKSC) method [12] unified kernel sparse coding in transfer learning and exploited the labels to learn the sparse representations across domains. All the above methods use the few labeled samples from target domain and achieve better performance.

    View all citing articles on Scopus

    This paper has been recommended for acceptance by Dr. S. Wang.

    View full text