Elsevier

Knowledge-Based Systems

Volume 163, 1 January 2019, Pages 174-185
Knowledge-Based Systems

Transfer subspace learning via low-rank and discriminative reconstruction matrix

https://doi.org/10.1016/j.knosys.2018.08.026Get rights and content

Highlights

  • A new approach in unsupervised domain transfer learning is proposed.

  • The low rank and sparse constraints are imposed on the reconstruction matrix.

  • The discriminative ability of the target and source samples is captured.

  • The information content of the reconstruction coefficient matrix is utilized.

Abstract

In this paper, we investigate the unsupervised domain transfer learning in which there is no label in the target samples whereas the source samples are all labeled. We use the transformation matrix to transfer both target and source samples to a common subspace where they have the same distribution and each target sample in the transformed space is constructed of a linear combination of the source samples. To preserve the local and global structure of the samples in the transferred domain, the low-rank and sparse constraints are imposed on the reconstruction coefficient matrix. In this paper, in order to consider the discriminative ability of the target and source samples, the information content of the reconstruction coefficient matrix is utilized. To capture the discriminative ability of the target samples, it is assumed that the class labels of the source samples which are linearly incorporated in constructing a target sample should be the same. Based on this assumption, it is assured that the target samples are well distributed over the transferred domain. To handle this, we utilize the linear entropy to measure the discriminant power of the target domain. This term considers the discriminative ability of the target samples without using their hidden labels. Also, to assess the discriminative ability of source samples, we use max-margin classifier where the kernel matrix is defined by using the reconstruction coefficient matrix. To evaluate the proposed approach, it is applied on MSRC, VOC 2007, CMU PIE, Office, Caltech-256, Extended Yale B and two imbalanced datasets. The experimental results show that our proposed approach outperforms its competitors.

Introduction

Most computer vision tasks often suffer from insufficient labeled data because providing training samples is too expensive and time consuming. To overcome this challenge, transfer learning is introduced which states that the auxiliary relevant source samples from different datasets can facilitate the learning process of the target task. It has been shown, in many cases, knowledge transfer, if applied correctly, would improve the performance of learning. In transfer learning, the common assumption stating that the training and test data should have similar distributions or shared subspaces is incorrect. Suppose the target data (xt,yt) are distributed according to pt(x,y) and the source data (xs,ys) are sampled from ps(x,y). In transfer learning, it is assumed that the two datasets are sampled from two different distributions pt(x,y)ps(x,y). Therefore, the source data cannot be used directly in the learning process of the target task because a negative transfer may occur. To address the different distribution between the target data and the source data, many approaches are given. These approaches can be divided into two categories. In the first category, methods attempt to find a new common subspace in which the target and source data have similar distributions [1], [2], [3], [4]. In this category, many methods attempt to align the source and target data in the learned subspace by minimizing the reconstruction error. The basic assumption in reconstruction is that each datum in a specific neighborhood in the target domain can be reconstructed by the same neighborhood in the source domain [5]. A naïve solution is the least square criterion based reconstruction. However, it does not always tend to the optimal solution and consequently it tends to overfit. In [6], the reconstruction is done such that the locality structure of data is also preserved. Jhuo et al. [7] transform the source samples into the intermediate representation which is linearly reconstructed by the target samples. In addition, to capture any structure information of the source domain, a low-rank constraint is utilized and to handle the undesirable noise and outliers, a reconstruction error is considered. Moreover, they extend their work to the multiple source domains. Shao et al. [8] transform the source samples and the target samples into the common space and then enforce a low rank constraint on reconstruction coefficient matrix Z and use a sparse matrix E to compensate for the noisy data in the common space. Their work consists of two steps. First, the reconstruction parameters Z and E are fixed, and the basis matrix of the new representation P is updated. Second, parameters Z and E are updated while P is kept fixed. Xu et al. [9] improve the approach presented in [8] by adding a discriminant subspace learning function. They use regression method for classification by a non-negative label relaxation matrix which relaxes a strict binary label matrix into a slack variable matrix. In the second category, model parameters are adjusted such that they are adapted to the target data [10], [11].

Recent studies show that the low-rank reconstruction is a robust transductive learning. However, the critical challenges still exist. Evidently, reconstruction matrix has a critical role in obtaining the transformation matrix. The majority of the existing methods ignore the discriminative information contained in the reconstruction matrix Z. This may degrade the quality of the transformation matrix P. The main contribution of this paper is to propose new ways to effectively capture the discriminative information of both target and source samples simultaneously. In this paper, to incorporate the discriminative information of the target domain, the label information is not explicitly used; rather, the information content of the reconstruction matrix is utilized. However, to consider the discriminant capability of source domain the label information and the information content of the reconstruction coefficient matrix Z are jointly considered. To consider the discriminant capability of source samples, it is assumed that the two similar samples in the source domain should have the same contribution in reconstructing target samples. In other words, the two source samples are similar provided that they are incorporated in reconstructing of the same subset of the target samples. Hence, ZTZ can be considered as a similarity matrix. Also, we prove that it is a kernel matrix too. Therefore, we propose a new max-margin based term which utilizes the information contained in the kernel matrix ZTZ.

In our approach, the target samples have no label information. Therefore, there is no way to explicitly incorporate the discriminative information of the target samples. Although, there are some methods which assign a pseudo-label to target samples in order to incorporate the hidden discriminative information of the target samples. In this paper, we propose a new approach in which the discriminative information of the target samples is also considered without explicitly using the latent label of target samples. To do this, it is assumed that the labels of the source samples which are incorporated in linearly reconstructing target samples are encouraged to be the same. Since, in our approach, each target datum is reconstructed by the same neighbors in the source domain. It makes the source and target samples in the transferred domain have the similar geometrical property. Moreover, the consistency assumption on manifold learning [12], states that the class labels of the nearby samples on the same manifold are likely to be the same. As a result, by putting these reasons together, it could be concluded that the source samples which are contributed in constructing of the target sample are likely to have the same label. To integrate this assumption in the proposed approach, we add a new term by using linear entropy to measure the mixedness of the labels of the incorporated source samples in reconstructing each target sample.

This paper is organized as follows: in Section 2 we present the literature review. The proposed approach is given in Section 3. Experimental results are presented in Section 4. Finally, the concluding remarks and future work are given in Section 5.

Section snippets

Related works

Transfer learning is a research area in machine learning, which states that the knowledge is gained from a source task, can be transferred to a related target task. In many cases, it is shown that the knowledge transfer, if it is applied correctly, would improve the performance of learning. Transfer learning has been applied successfully on a variety of applications include computer vision [13], banking digital ecosystems [14] and collaborative filtering [15]. Knowledge transferring can be

The proposed approach

In this section, the proposed approach is demonstrated. The main contribution of this paper is to propose new ways to effectively capture the discriminative information of both target and source samples. To do this, we add two new terms to the objective function. Fig. 1 illustrates the motivations behind the contributions of the proposed approach. As it is mentioned, the main goal in transfer learning is to learn a target task given the source samples with this assumption that source and target

Experiments

In this section, we evaluate the proposed approach. To do this, we carry out extensive experiments on different datasets. To show the effectiveness of the proposed method, we compare our proposed method with the Joint Geometrical and Statistical Alignment (JGSA) [29], Joint Distribution Adaptation (JDA) [28], Transfer Joint Matching (TJM) [49], Discriminative Transfer Subspace Learning via Low-Rank and Sparse Representation (TSL-LRSR) [9], Geodesic Flow Kernel (GFK) [1], Transfer Component

Conclusion and further study

In this paper, a new approach for unsupervised domain transfer learning is given. To do this, a common subspace is found such that the target and source samples have the same distribution and each target data is linearly reconstructed using the source data. To preserve the locality structure, a low rank constraint is enforced on reconstruction coefficient matrix. The main contribution is to incorporate the discriminative ability of the source and target samples by using the information content

Acknowledgment

This research was in part supported by a grant from Institute for Research in Fundamental Sciences (IPM), Iran (Grant number CS1395-4-68).

References (54)

  • FuY.e.

    Low-rank and Sparse Modeling for Visual Analysis

    (2014)
  • RoweisS.T. et al.

    Nonlinear dimensionality reduction by locally linear embedding

    Science

    (2000)
  • JhuoI.H. et al.

    Robust visual domain adaptation with low-rank reconstruction

  • ShaoM. et al.

    Generalized transfer subspace learning through low-rank constraint

    Int. J. Comput. Vis.

    (2014)
  • XuY. et al.

    Discriminative transfer subspace learning via low-rank and sparse representation

    IEEE Trans. Image Process.

    (2016)
  • DuanL. et al.

    Visual event recognition in videos by learning from Web data

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2012)
  • DuanL. et al.

    Exploiting web images for event recognition in consumer videos: a multiple source domain adaptation approach

  • ZhouD. et al.

    Learning with local and global consistency

  • Hoo-ChangS. et al.

    Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning

    IEEE Trans. Med. Imaging

    (2016)
  • BehboodV. et al.

    Fuzzy refinement domain adaptation for long term prediction in banking ecosystem

    IEEE Trans. Ind. Inform.

    (2014)
  • PanS.J. et al.

    A survey on transfer learning

    IEEE Trans. Knowl. Data Eng.

    (2010)
  • RazzaghiP.

    Self-taught support vector machines

    Knowl. Inform. Syst.

    (2017)
  • GammermanA. et al.

    Learning by transduction

    Uncertain. Artif. Intell.

    (1998)
  • S. Bickel, M. Bruckner, T. Scheffer, Discriminative learning for differing training and test distributions, in:...
  • PanS.J. et al.

    Domain adaptation via transfer component analysis

    IEEE Trans. Neural Netw.

    (2011)
  • LongM. et al.

    Transfer sparse coding for robust image representation

  • LongM. et al.

    Transfer joint matching for unsupervised domain adaptation

  • Cited by (39)

    • Multimodal brain tumor detection using multimodal deep transfer learning[Formula presented]

      2022, Applied Soft Computing
      Citation Excerpt :

      Domain adaption considers this issue by learning a feature representation for the test domain such that the training and test domains have the same distribution [13]. In recent years, domain adaptation technique has gotten much more attention in object detection [14–16], natural language processing [17–19], bioinformatics [20–22], and computer vision [23]. Since MRI brain images are taken from patients with different conditions, it differences in the training and the test distributions.

    • Label Disentangled Analysis for unsupervised visual domain adaptation

      2021, Knowledge-Based Systems
      Citation Excerpt :

      It includes 2427 images in Ar, 4365 images in Cl, 4439 images in Pr, and 4357 images in Rw. To extensively evaluate our method, we compare LDA with the following methods: NN (Nearest Neighbor), GFK (Geodesic Flow Kernel) [24], JDA (Joint Distribution Adaptation) [16], DTSL (Discriminative Transfer Subspace Learning) [48], RTML (Robust Transfer Metric Learning) [23], SVM with linear kernel, TCA (Transfer Component Analysis) [14], CORAL (CORrelation ALignment) [27], SCA (Scatter Component Analysis) [49], JGSA (Joint Geometrical and Statistical Alignment) [17], BDA (Balanced Distribution Adaptation) [50], DICD (Domain Invariant and Class Discriminative) [19], KOT (Kernel optimal transport map) [20], PCA (Principal Component Analysis), TJM (Transfer Joint Matching) [15], ARTL (Adaptation Regularization Transfer Learning) [51], LDRM (Low-rank and Discriminative Reconstruction Matrix) [4], DAC (Clustering for Domain Adaptation) [52], JMFL (Joint Metric and Feature representation Learning) [13], SPaDA (Self-Paced Domain Adaptation) [53]. The experimental performance comparisons of our LDA to the competitive domain adaptation methods on benchmark cross-domain datasets are presented in Table 3 to Table 8, in which the first and second best results are marked in bold and underlined, respectively.

    • Modality adaptation in multimodal data

      2021, Expert Systems with Applications
    View all citing articles on Scopus
    View full text