Elsevier

Knowledge-Based Systems

Volume 152, 15 July 2018, Pages 100-106
Knowledge-Based Systems

Transfer learning with stacked reconstruction independent component analysis

https://doi.org/10.1016/j.knosys.2018.04.010Get rights and content

Abstract

Significant improvements to transfer learning have emerged in recent years, because deep learning has been proposed to learn more higher level and robust features. However, most of existing deep learning approaches are based on the framework of auto-encoder or sparse auto-encoder, which pose challenges for independent component analysis and fail to measure similarities between data spaces. Therefore, in this paper, we propose a new strategy to achieve a better feature representation performance for transfer learning. There are several advantages in our method as follows: 1) The model of Stacked Reconstruction Independent Component Analysis (SRICA) is used to pursuit an optimal feature representation; 2) The label information is used by Logistic Regression Model to optimize representation features and the distance of distributions between domains is minimized by the method of KL-Divergence. Extensive experiments conducted on several image datasets demonstrate the superiority of our proposed method compared with all competing state-of-the-art methods.

Introduction

Transfer Learning emphasizes the transfer of knowledge across source domain to target domain with information adaptation to develop an effective hypothesis of target predication model. In the past years, a lot of transfer learning approaches have been studied [1]. The crucial problem in transfer learning is how to reduce the difference between the source and the target domains and the error of classification with regression models. Among all the approaches of transfer learning, the feature-representation-transfer approaches have been applied in all the scenarios with few or without any assumptions, especially in the scenario where the data content or priors and parameters of model have changed a lot. The common idea of feature-representation-transfer approaches is to learn a transformation from all different domains for the target domain. As the data are encoded into the latent feature representation, the divergence between domains can be reduced [2], [3], [4], [5].

Deep Learning algorithms learn intermediate concepts between raw input and target. These intermediate concepts have been proven to be high-level features which could yield better transfer across domains [6]. In recent years, Deep Learning has been applied in transfer learning and presented a good performance. Xavier and Bengio proposed a deep learning framework called Stacked Denoising Auto-encoder (SDA) [7], along with Restricted Boltzmann Machines (RBMs) [8]. In this work, a high-level feature extraction is firstly learned in an unsupervised method from text reviews in all available domains for the code layer. Secondly, a linear classifier such as Support Vector Machine (SVM) is trained on the source domain, labeled data with high-level feature representation and eventually tested on the target domain with the classifier. In the follow-up work, Chen et al. proposed marginalized SDA (mSDA) [9] that addresses two crucial limitations of SDA: high computational cost and lack of scalability to high-dimensional features. The process of representation learning is effective as same as SDA but it is shown to be more efficient. Zhuang et al. introduced a transfer learning framework namely Transfer Learning with Deep Auto-encoders (TLDA) [10] which consists of two encoding and decoding layers. In this proposed method, the KL-Divergence is imposed for minimizing the distance of distributions between instances in the source and target domains, and the softmax regression is used to encoder label information of the source domain.

Though the goal of aforementioned deep leaning methods is to learn the high-level features between source and target domains, most of them use the auto-encoder model, which probably miss the sparsity of the features obtained from the domain data and the independence of the component. Therefore, the high level features are not guaranteed to be sparse and robust. Furthermore, the lack of independent analysis may fail to measure similarities between domains.

To address the above issues, we propose a method of transfer learning based on Stacked Reconstruction Independent with Stacked RICA and optimize the feature with logistic regression to transfer. More specifically, we first use the model of Reconstruction Independent Component Analysis (RICA) to learn features from all domains with unlabeled data, which generate sparse and linear independent representations. Secondly, we use the Stacked RICA to learn more higher level features for minimizing the divergence between source and target domains. Finally, we minimize KL-Divergence from source and target domains, and use Logistic Regression Model to encode the label information in source domain. The basic idea for SRICA is illustrated in Fig. 1. Experimental results demonstrate the higher accuracy of our proposed method compared with several state-of-the-art methods. Our main contributions in this paper are summarized as follows:

  • The method of RICA is introduced for feature reconstruction. The combined source and target data in the training stage can span the entire feature space better. The sparsity indicates that only a few data points are selected in latent feature subspace which can overcome the over-fitting problem effectively.

  • As the method of deep learning has been proved to be the better method to obtain higher level or more robust features for transfer learning, the deep learning method of stacked reconstruction independence component analysis is used in our model. The method of sigmoid is used as the activation function, which can be generalized as a non-linearity method.

Section snippets

Reconstruction independent component analysis

Reconstruction Independent Component Analysis (RICA) aims to generate sparse representations of whitened or non-whitened data from unlabeled data. In terms of this method, we find a set of linearly independent basis features to represent input data efficiently. Given an input x, in order to learn vectors of x′, which are represented in the columns of a matrix W, we need to minimize the objective function as shown in Eq. (1):J(W)=λWx1+12WTWxx22

The optimization problem is represented as:minWλ

Problem formalization

Given the source domain Ds with labeled data, namely, Ds={xi(s),yi(s)}|i=1ns with xi(s)Rm×1, yi(s){1,2,,c}, where ns is the number of instances in source domain, while the target domain Dt with unlabeled data, namely, Dt={xi(t)}|i=1nt with xi(t)Rm×1, where ns is the number of instances in target domain. The goal of our framework is to train a classifier in source domain to make precise predication on target domain. Actually, our framework can effectively employ the labeled data, and use the

Experiments

In this section, we conduct extensive experiments to evaluate the effectiveness of our proposed framework. Three real-world image datasets have been used in our experiments, where the first one is used for multi-class classification, and the rest ones are used for binary classification.

Related work

Poultney et al. [17] proposed an unsupervised algorithm for learning sparse and completed feature representations in subspace, which uses an energy-based model. In this algorithm, the decoder component produces accurate reconstructions of the patches, while the encoder component provides a fast prediction of the code without any preprocessing of the input data particularly. Vincent et al. proposed Denoising Autoencoders [18] to learn a more robust representation form the corrupted data

Conclusion

In this paper, we proposed a transfer learning framework for learning feature representation with a deep learning architecture. In our framework, the RICA algorithm is introduced to use as a deep learning architecture. More specifically, there are two components in our method, the first one is Stacked RICA, which learns a good feature representation from the input data. The other one is the framework of auto-encoder and logistic regression, which incorporates the labeled data information of the

Acknowledgments

This work is supported in part by the National Key Research and Development Program of China under grant 2016YFC0801406,the Natural Science Foundation of China under grants (61503112, 61673152, 91746209), and the Program for Changjiang Scholas and Innovative Research Team in University (PCSIRT) of the Ministry of Education under grant IRT17R32.

References (27)

  • J. Zhao et al.

    Structural knowledge transfer for learning sum-product networks

    Knowl. Based Syst.

    (2017)
  • J. Lu et al.

    Transfer learning using computational intelligence: a survey

    Knowl. Based Syst.

    (2015)
  • S.J. Pan et al.

    A survey on transfer learning

    IEEE Trans. Knowl. Data Eng.

    (2010)
  • A. Argyriou et al.

    A spectral regularization framework for multi-task structure learning

    International Conference on Neural Information Processing Systems

    (2007)
  • J. Blitzer et al.

    Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification.

    ACL 2007, Proceedings of the Meeting of the Association for Computational Linguistics, June 23–30, 2007, Prague, Czech Republic

    (2007)
  • T. Jebara

    Multi-task feature and kernel selection for svms

    International Conference on Machine Learning

    (2004)
  • R. Raina et al.

    Self-taught learning: transfer learning from unlabeled data

    International Conference on Machine Learning

    (2007)
  • X. Glorot et al.

    Domain adaptation for large-scale sentiment classification: a deep learning approach

    International Conference on International Conference on Machine Learning

    (2011)
  • P. Vincent et al.

    Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion.

    J. Mach. Learn. Res.

    (2010)
  • G.E. Hinton et al.

    A fast learning algorithm for deep belief nets.

    Neural Comput.

    (2014)
  • M. Chen et al.

    Marginalized denoising autoencoders for domain adaptation

    Proceedings of the 29th ICML

    (2012)
  • F. Zhuang et al.

    Supervised representation learning: transfer learning with deep autoencoders

    International Conference on Artificial Intelligence

    (2015)
  • D.H.W. Jr et al.

    Applied logistic regression

    Technometrics

    (2013)
  • Cited by (37)

    • Dictionary-based transfer learning with Universum data

      2022, Information Sciences
      Citation Excerpt :

      Recently, many works on transfer learning have been conducted by researchers. Generally, existing TL methods can be divided into four categories, which include instance-based transfer learning approach [31,16], feature-representation-based approach [49,21], parameter-based approach [36,19] and relational knowledge-based approach [23,5]. In the instance-based transfer learning approach, a part of samples from the source domain are reused for the target domain by importance sampling techniques and instance weight adjustment.

    • Representation learning with collaborative autoencoder for personalized recommendation

      2021, Expert Systems with Applications
      Citation Excerpt :

      However, traditional matrix factorization techniques have inherent limitations in representation learning and may contain non-helpful feature attributes from the input space prior to training, which can be detrimental to the performance of personalized recommendation (Zhuang et al., 2017b). In recent years, deep learning methods are proposed to project the raw data into higher-level feature space (Yi et al., 2018), and there have been some efforts to apply deep-based methods for recommendation systems already (Ali et al., 2020; Chae et al., 2019; Gao et al., 2020). He et al. proposed a general framework called neural network-based Collaborative Filtering (He et al., 2017), which focused on implicit feedback and utilized a multi-layer perceptron to learn the user–item interaction function.

    View all citing articles on Scopus
    View full text