Transfer learning with stacked reconstruction independent component analysis
Introduction
Transfer Learning emphasizes the transfer of knowledge across source domain to target domain with information adaptation to develop an effective hypothesis of target predication model. In the past years, a lot of transfer learning approaches have been studied [1]. The crucial problem in transfer learning is how to reduce the difference between the source and the target domains and the error of classification with regression models. Among all the approaches of transfer learning, the feature-representation-transfer approaches have been applied in all the scenarios with few or without any assumptions, especially in the scenario where the data content or priors and parameters of model have changed a lot. The common idea of feature-representation-transfer approaches is to learn a transformation from all different domains for the target domain. As the data are encoded into the latent feature representation, the divergence between domains can be reduced [2], [3], [4], [5].
Deep Learning algorithms learn intermediate concepts between raw input and target. These intermediate concepts have been proven to be high-level features which could yield better transfer across domains [6]. In recent years, Deep Learning has been applied in transfer learning and presented a good performance. Xavier and Bengio proposed a deep learning framework called Stacked Denoising Auto-encoder (SDA) [7], along with Restricted Boltzmann Machines (RBMs) [8]. In this work, a high-level feature extraction is firstly learned in an unsupervised method from text reviews in all available domains for the code layer. Secondly, a linear classifier such as Support Vector Machine (SVM) is trained on the source domain, labeled data with high-level feature representation and eventually tested on the target domain with the classifier. In the follow-up work, Chen et al. proposed marginalized SDA (mSDA) [9] that addresses two crucial limitations of SDA: high computational cost and lack of scalability to high-dimensional features. The process of representation learning is effective as same as SDA but it is shown to be more efficient. Zhuang et al. introduced a transfer learning framework namely Transfer Learning with Deep Auto-encoders (TLDA) [10] which consists of two encoding and decoding layers. In this proposed method, the KL-Divergence is imposed for minimizing the distance of distributions between instances in the source and target domains, and the softmax regression is used to encoder label information of the source domain.
Though the goal of aforementioned deep leaning methods is to learn the high-level features between source and target domains, most of them use the auto-encoder model, which probably miss the sparsity of the features obtained from the domain data and the independence of the component. Therefore, the high level features are not guaranteed to be sparse and robust. Furthermore, the lack of independent analysis may fail to measure similarities between domains.
To address the above issues, we propose a method of transfer learning based on Stacked Reconstruction Independent with Stacked RICA and optimize the feature with logistic regression to transfer. More specifically, we first use the model of Reconstruction Independent Component Analysis (RICA) to learn features from all domains with unlabeled data, which generate sparse and linear independent representations. Secondly, we use the Stacked RICA to learn more higher level features for minimizing the divergence between source and target domains. Finally, we minimize KL-Divergence from source and target domains, and use Logistic Regression Model to encode the label information in source domain. The basic idea for SRICA is illustrated in Fig. 1. Experimental results demonstrate the higher accuracy of our proposed method compared with several state-of-the-art methods. Our main contributions in this paper are summarized as follows:
- •
The method of RICA is introduced for feature reconstruction. The combined source and target data in the training stage can span the entire feature space better. The sparsity indicates that only a few data points are selected in latent feature subspace which can overcome the over-fitting problem effectively.
- •
As the method of deep learning has been proved to be the better method to obtain higher level or more robust features for transfer learning, the deep learning method of stacked reconstruction independence component analysis is used in our model. The method of sigmoid is used as the activation function, which can be generalized as a non-linearity method.
Section snippets
Reconstruction independent component analysis
Reconstruction Independent Component Analysis (RICA) aims to generate sparse representations of whitened or non-whitened data from unlabeled data. In terms of this method, we find a set of linearly independent basis features to represent input data efficiently. Given an input x, in order to learn vectors of x′, which are represented in the columns of a matrix W, we need to minimize the objective function as shown in Eq. (1):
The optimization problem is represented as:
Problem formalization
Given the source domain Ds with labeled data, namely, with where ns is the number of instances in source domain, while the target domain Dt with unlabeled data, namely, with where ns is the number of instances in target domain. The goal of our framework is to train a classifier in source domain to make precise predication on target domain. Actually, our framework can effectively employ the labeled data, and use the
Experiments
In this section, we conduct extensive experiments to evaluate the effectiveness of our proposed framework. Three real-world image datasets have been used in our experiments, where the first one is used for multi-class classification, and the rest ones are used for binary classification.
Related work
Poultney et al. [17] proposed an unsupervised algorithm for learning sparse and completed feature representations in subspace, which uses an energy-based model. In this algorithm, the decoder component produces accurate reconstructions of the patches, while the encoder component provides a fast prediction of the code without any preprocessing of the input data particularly. Vincent et al. proposed Denoising Autoencoders [18] to learn a more robust representation form the corrupted data
Conclusion
In this paper, we proposed a transfer learning framework for learning feature representation with a deep learning architecture. In our framework, the RICA algorithm is introduced to use as a deep learning architecture. More specifically, there are two components in our method, the first one is Stacked RICA, which learns a good feature representation from the input data. The other one is the framework of auto-encoder and logistic regression, which incorporates the labeled data information of the
Acknowledgments
This work is supported in part by the National Key Research and Development Program of China under grant 2016YFC0801406,the Natural Science Foundation of China under grants (61503112, 61673152, 91746209), and the Program for Changjiang Scholas and Innovative Research Team in University (PCSIRT) of the Ministry of Education under grant IRT17R32.
References (27)
- et al.
Structural knowledge transfer for learning sum-product networks
Knowl. Based Syst.
(2017) - et al.
Transfer learning using computational intelligence: a survey
Knowl. Based Syst.
(2015) - et al.
A survey on transfer learning
IEEE Trans. Knowl. Data Eng.
(2010) - et al.
A spectral regularization framework for multi-task structure learning
International Conference on Neural Information Processing Systems
(2007) - et al.
Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification.
ACL 2007, Proceedings of the Meeting of the Association for Computational Linguistics, June 23–30, 2007, Prague, Czech Republic
(2007) Multi-task feature and kernel selection for svms
International Conference on Machine Learning
(2004)- et al.
Self-taught learning: transfer learning from unlabeled data
International Conference on Machine Learning
(2007) - et al.
Domain adaptation for large-scale sentiment classification: a deep learning approach
International Conference on International Conference on Machine Learning
(2011) - et al.
Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion.
J. Mach. Learn. Res.
(2010) - et al.
A fast learning algorithm for deep belief nets.
Neural Comput.
(2014)
Marginalized denoising autoencoders for domain adaptation
Proceedings of the 29th ICML
Supervised representation learning: transfer learning with deep autoencoders
International Conference on Artificial Intelligence
Applied logistic regression
Technometrics
Cited by (37)
Representation learning with deep sparse auto-encoder for multi-task learning
2022, Pattern RecognitionDictionary-based transfer learning with Universum data
2022, Information SciencesCitation Excerpt :Recently, many works on transfer learning have been conducted by researchers. Generally, existing TL methods can be divided into four categories, which include instance-based transfer learning approach [31,16], feature-representation-based approach [49,21], parameter-based approach [36,19] and relational knowledge-based approach [23,5]. In the instance-based transfer learning approach, a part of samples from the source domain are reused for the target domain by importance sampling techniques and instance weight adjustment.
Representation learning with collaborative autoencoder for personalized recommendation
2021, Expert Systems with ApplicationsCitation Excerpt :However, traditional matrix factorization techniques have inherent limitations in representation learning and may contain non-helpful feature attributes from the input space prior to training, which can be detrimental to the performance of personalized recommendation (Zhuang et al., 2017b). In recent years, deep learning methods are proposed to project the raw data into higher-level feature space (Yi et al., 2018), and there have been some efforts to apply deep-based methods for recommendation systems already (Ali et al., 2020; Chae et al., 2019; Gao et al., 2020). He et al. proposed a general framework called neural network-based Collaborative Filtering (He et al., 2017), which focused on implicit feedback and utilized a multi-layer perceptron to learn the user–item interaction function.
Class-driven content-based medical image retrieval using hash codes of deep features
2021, Biomedical Signal Processing and ControlRepresentation learning via serial robust autoencoder for domain adaptation
2020, Expert Systems with ApplicationsAutoencoder based sample selection for self-taught learning
2020, Knowledge-Based Systems