Structure preservation and distribution alignment in discriminative transfer subspace learning
Introduction
Traditional supervised learning is based on the common assumption that the training data and testing data are obtained from the same feature space and follow the same data distribution [1]. It requires massive amounts of labeled training data for each gallery or corpus. However, in real-world applications, it is not possible to guarantee that the training samples always have the same distribution as the test samples owning to various factors, such as differences in visual resolution and illumination. Changes in the distribution will result in performance degradation of the original learning system. Therefore, many models need to be rebuilt from scratch with an extremely large number of training samples. However, recollecting the training data is prohibitive owing to the considerable manual effort involved [2], [3], especially for multi-label learning [66]. Moreover, retraining the models without applying the knowledge learned from previous domains or tasks is inefficient [4]. Transfer learning [1] has emerged as a method for addressing the above-mentioned issues, with the objective of borrowing well-learned knowledge from an auxiliary source domain and applying it to a related target domain. In addition, it has the potential to avoid the problem of learning with limited data.
Depending on whether the source domain and target domain data labels are available, transfer learning can be categorized into multi-task learning, self-taught learning, domain adaptation, and unsupervised transfer learning [1]. Domain adaptation (DA) has shown promising performance in many areas, e.g., image classification [5], objection recognition [6], text categorization [7], and video event detection [8]. DA assumes that the source domain data and the target domain data have the same feature space and label space but different data distributions. It employs information from both the source domain and the target domain during the learning process to achieve automatic adaptation. Based on the availability of target labeled data, DA can be generally categorized into semi-supervised and unsupervised domain adaptation. When a small set of labeled data is available in the target domain, the problem is semi-supervised DA [9], [10], [11]. When no labeled data are available in the target domain, the problem is unsupervised DA [12], [13], [14], [15]. This paper focuses on the more challenging problem of unsupervised DA because, in real-world applications, unlabeled target data are often much more abundant and difficult to annotate.
Some common latent factors are shared by two domains, even though distribution shifts may occur between these domains. Therefore, how to find, express, and use these latent factors to reduce the distribution shifts between the two domains is the major issue in DA, which can be addressed using instance re-weighting methods [16], [17], [18], [19], [20], [21] or feature representation methods [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34]. Most instance re-weighting approaches aim to adjust model parameters according to the importance of instances in order to construct an adaptive source classifier that can adapt to the target domain. In these methods, the samples are fixed but the decision boundaries are allowed to change. By contrast, feature representation approaches aim to learn a good feature representation across domains, which can reduce the distribution difference between domains as much as possible. Using the new feature representation, standard supervised learning algorithms can be trained on the source domain and reused on the target domain. Thus, the performance of the target task is expected to improve significantly.
To match the distribution differences between the source and target domains, most feature-based domain adaptation (DA) methods primarily focus on learning a common subspace for the two domains by exploiting either the statistical property or the geometric structure independently. For methods that exploit statistical properties, they reduce the domain distribution shifts either by utilizing sample means (transfer component analysis (TCA) [24]), second-order statistics (CORAL [29]), data scatter (scatter component analysis (SCA) [22]) or class means (linear discriminant analysis (LDA)-inspired domain adaptation (LDADA) [45]). Although these methods have produced significant results, sample means, second-order statistics, data scatter, and class means are still simple statistical measures that cannot completely describe the properties of the data. Moreover, data contain inherent geometric structure information. For example, samples from the same class are more closely clustered, while samples from different classes are relatively far apart and draw from different subspaces. Methods that exploit geometric structure information either exploit the space relationship together with other representation learning to learn the new representation of data (transfer sparse coding (TSC) [33], Tuia et al. [69], transfer latent representation (TSL) [32]), or exploit the global subspace structure to learn a common subspace in which the target domain data can be recovered from the source domain data with low rank constraint (low-rank transfer subspace learning (LTSL) [38]). Although these methods fully exploit the space relationship and global subspace structure of the data, the local geometric structure and the statistical properties of the data are not exploited.
In fact, the statistical property and the geometric structure information of the data are observed from different viewpoints, and each viewpoint can only partially describe the data. Different viewpoints are not mutually exclusive, but complement each other. Thus, jointly exploring them to match the domain difference should be advantageous from both aspects. In addition, theoretical studies [35], [36], [37] on DA have shown that the target domain classification error can be reduced by reducing the source domain classification error and the distribution difference between the two domains. Inspired by this theoretical result, we propose to embed the reduction of the source domain classification error and the alignment of the domain distribution differences into a single framework for optimization. Furthermore, to align the domain distribution differences, we jointly exploit the geometric structure information and statistical property of the data.
In this paper, we consequently propose a new DA method, called structure preservation and distribution alignment (SPDA), in discriminative transfer subspace learning. The core idea underlying the proposed method is learning a projection matrix, by which (1) the source domain classification error can be reduced; and (2) the source domain and the target domain data are projected into a common subspace, in which 2a) the geometric structural information of the data can be preserved, and 2b) the distribution differences of the two domains can be reduced. Specifically, to reduce the classification error of the source domain, an ε-dragging technique [52] that relaxes the strict binary label matrix into a slack variable matrix is introduced into the source domain regression classifier model, which not only provides more freedom to learn a proper projection matrix, but also enlarges the distance between two data points from different classes. To fully exploit the structural information residing in the data, we explore the preservation of geometric structures from both the feature level and the sample level. From the feature level, in this common subspace spanned by the projection matrix, the source and target domains are well aligned and each target datum can be linearly reconstructed by the data from the source domain. We impose low-rank and sparse constraints on the reconstruction coefficient matrix. The low-rank constraint ensures that the reconstruction coefficient has a block-wise structure; thus, the global subspace structure of the data is preserved. The sparse constraint ensures that each target domain is reconstructed only by a few neighbors in the source domain, thereby preserving the local structure of the data. From the sample level, a graph regularization method is used to characterize the sample relationship so that two points that are close in the original space are still held in the projected common subspace. Thus, the space relationship of the samples is preserved. Further, to reduce the statistical shifts, both marginal and conditional distributions between the two domains are aligned. The source domain classification error reduction, statistical property, and geometric structure are synthesized in this transfer subspace learning framework, and an effective optimization method is derived in detail. Further, the functions and roles of each term in this framework are analyzed. The proposed method is illustrated in Fig. 1.
The main contributions of this paper can be summarized as follows:
- (1)
A unified framework for source domain classification error reduction, geometric structure preservation, and distribution alignment is proposed for unsupervised DA. When reducing the source classification error, an ε-dragging technique that relaxes the strict binary label matrix into a slack variable matrix is introduced in the regression classifier mode. This framework is formulated as a low-rankness, sparsity, and structural risk minimization problem, which can be solved by an effective optimization algorithm based on the alternating direction method of multipliers (ADMM).
- (2)
From the aspect of geometric structure preservation, the global subspace structure and local geometric structure are preserved from the feature level by imposing low-rank and sparse constraints, respectively, on the reconstruction coefficient matrix. From the sample level, the space relationship of samples is preserved by a nearest neighbor graph.
- (3)
The results of extensive experiments conducted on five benchmarks datasets show that our approach consistently outperforms several representative methods and exhibits classification performance comparable with that of modern deep DA methods.
Section snippets
Related work
DA has been introduced to address the problem of the source domain and the target domain sharing the same task but having different data distributions. Existing DA methods can be roughly categorized into two types: instance re-weighting methods [16], [17], [18], [19], [20], [21] and feature representation methods [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34]. Our work is mainly related to feature representation methods. Feature representation methods aim to learn
Structure preservation and distribution alignment (SPDA)
This section describes the SPDA method in detail. First, we define the notations used in this paper. Then, we describe the source domain classifier design, geometric structure preservation, and distribution alignment in detail.
Optimizing the SPDA framework
In this section, we describe how to solve the SPDA model. Then, complexity analysis of the proposed method is conducted.
Experiments
In this section, we evaluate the performance of the proposed method in image recognition problems. First, we introduce the datasets and basic experimental settings. Then, we compare the proposed SPDA method with several related and state-of-the art baseline methods. Finally, we evaluate and discuss the effectiveness and the parameter sensitivity. The SPDA code is available online.
Conclusion
In this paper, we proposed a novel approach, called SPDA, for unsupervised DA. Source domain classifier design, geometric structure preservation, and statistical distribution alignment can affect the performance of DA. Only by comprehensively formulating these three factors into the same framework can we achieve better DA performance. SPDA seeks a common subspace where the source discrimination power can be improved by introducing an ε-dragging technique into the linear regression classifier,
Acknowledgement
This work was funded by the Lab of Space Optoelectronic Measurement & Perception, No: LabSOMP-2018-01 and partly supported by the National Natural Science Foundation of China (Grant No. 61671175, 61370162 and 61672190).
Ting Xiao is a Ph.D. candidate at the School of Computer Science and Technology, Harbin Institute of Technology (HIT). She received her master's degree in computer application technology from Harbin Institute of Technology in 2016. Her research interests cover image processing, computer vision and machine learning.
References (69)
- et al.
Subspace distribution alignment for unsupervised domain adaptation
- et al.
Detecting change in data streams
- et al.
Transfer robust sparse coding based on graph and joint distribution adaption for image representation
Knowl. Based Syst.
(2018) - et al.
Unsupervised transfer learning for target detection from hyperspectral images
Neurocomputing
(2013) - et al.
A survey on transfer learning
IEEE Trans. Knowl. Data Eng.
(2010) - et al.
A survey of transfer learning
J. Big Data
(2016) - et al.
Transfer learning for visual categorization: a survey
IEEE Trans. Neural Netw. Learn. Syst.
(2015) - et al.
Visual domain adaptation: a survey of recent advances
IEEE Signal Process. Mag.
(2015) - et al.
Heterogeneous transfer learning for image classification
- et al.
Domain adaptation for face recognition: targetize source domain bridged by common subspace
Int. J. Comput. Vision
(2014)
Mining distinction and commonality across multiple domains using generative model for text classification
IEEE Trans. Knowl. Data Eng.
Video summarization via transferrable structured learning
Co-regularization based semi-supervised domain adaptation
Exploiting weakly-labeled web images to improve object classification: a domain adaptation approach
Dictionary based sparse representation for domain adaptation
Unsupervised visual domain adaptation using subspace alignment
Unsupervised domain adaptation by domain invariant projection
Unsupervised adaptation across domain shifts by generating intermediate data representations
IEEE Trans. Pattern Anal. Mach. Intell.
Cross-domain recognition by identifying joint subspaces of source domain and target domain
IEEE Trans. Cybern.
Boosting for transfer learning
Boosting for transfer learning with multiple sources
A two-stage weighting framework for multi-source domain adaptation
Connecting the dots with landmarks: discriminatively learning domain-invariant features for unsupervised domain adaptation
Landmarks-based kernelized subspace alignment for unsupervised domain adaptation
Iterative landmark selection and subspace alignment for unsupervised domain adaptation
J. Electron. Imaging
Scatter component analysis: a unified framework for domain adaptation and domain generalization
IEEE Trans. Pattern Anal. Mach. Intell.
Distribution-matching embedding for visual domain adaptation
J. Mach. Learn. Res.
Domain adaptation via transfer component analysis
IEEE Trans. Neural Netw.
Transfer feature learning with joint distribution adaptation
Transfer joint matching for unsupervised domain adaptation
Geodesic flow kernel for unsupervised domain adaptation
Semisupervised manifold alignment of multimodal remote sensing images
IEEE Trans. Geosci. Remote Sens.
Joint geometrical and statistical alignment for visual domain adaptation
Cited by (37)
Domain adaptive learning based on equilibrium distribution and dynamic subspace approximation
2024, Expert Systems with ApplicationsThe multi-task transfer learning for multiple data streams with uncertain data
2024, Information SciencesSelective transfer subspace learning for small-footprint end-to-end cross-domain keyword spotting
2024, Speech CommunicationDynamic classifier approximation for unsupervised domain adaptation
2023, Signal ProcessingCitation Excerpt :JGSA [15]: Joint geometrical and statistical alignment (JGSA) aims to align the distribution statistically and geometrically. SPDA [34]: Structure preservation and distribution alignment (SPDA) is a DA method that combines domain distribution alignment and the classification error of source domain. RDA [19]: Reliable domain adaptation (RDA) obtains a reliable target sample by using dual domain-specific projections and double task-classifiers.
Unsupervised domain adaptation via re-weighted transfer subspace learning with inter-class sparsity
2023, Knowledge-Based Systems
Ting Xiao is a Ph.D. candidate at the School of Computer Science and Technology, Harbin Institute of Technology (HIT). She received her master's degree in computer application technology from Harbin Institute of Technology in 2016. Her research interests cover image processing, computer vision and machine learning.
Peng Liu is an associate professor at the School of Computer Science and Technology, HIT. He received his doctoral degree in microelectronics and solid-state electronics from HIT in 2007. His research interests cover image processing, computer vision, transfer learning, reinforcement learning and pattern recognition.
Wei Zhao is an associate professor at the School of Computer Science and Technology, HIT. She received her doctoral degree in computer application technology from HIT in 2006. Her research interests cover pattern recognition, image processing, and deep-space target visual analysis.
Hongwei Liu is a professor at the School of Computer Science and Technology, HIT. He received Ph.D. degree in computer science and technology from Harbin Institute of Technology, in 2004. His research interests include, resource allocation and optimization in cloud computing system, evaluation theory and technology in cloud computing system, mobile computing and software reliability modeling.
Xianglong Tang is a professor at the School of Computer Science and Technology, HIT. He received his doctoral degree in computer application technology from HIT in 1995. His research interests cover pattern recognition, aerospace image processing, medical image processing, and machine learning.