Structure preservation and distribution alignment in discriminative transfer subspace learning

doi:10.1016/j.neucom.2019.01.069

Neurocomputing

Volume 337, 14 April 2019, Pages 218-234

https://doi.org/10.1016/j.neucom.2019.01.069 Get rights and content

Abstract

Domain adaptation (DA) is one of the most promising techniques for leveraging existing knowledge from a source domain and applying it to a related target domain. Most DA methods mainly focus on learning a common subspace for the two domains by exploiting either the statistical property or the geometric structure independently to reduce the domain distribution difference. However, these two properties are complementary to each other, and jointly exploring them could yield optimal results. Inspired by the theoretical results of DA, in this paper, we propose structure preservation and distribution alignment (SPDA) in discriminative transfer subspace learning, which embeds the source domain classification error and reduction and domain distribution alignment into a single framework for optimization. SPDA learns an appropriate projection matrix, by which (1) the source domain classification error can be reduced; and (2) the source and target domain data are projected into a common subspace, where the domain distributions are well aligned and each target datum can be linearly reconstructed using the data from the source domain. To reduce the source domain classification error, an ε-dragging technique that relaxes the strict binary label matrix is introduced to enlarge the distance between two data points from different classes. Further, the global subspace structure and the local geometric structure are preserved by imposing a low-rank constraint and a sparse constraint, respectively, on the reconstruction coefficient matrix. Moreover, the space relationship of the samples is preserved using a graph regularization method. In addition, marginal and conditional distributions between the domains are minimized to further reduce the domain shift statistically. We formulate source domain classifier design, geometric structure preservation, and distribution alignment as a rank-minimization problem, and we design an effective optimization algorithm based on the alternating direction method of multipliers (ADMM) to solve this problem. The functions and roles of each term in this framework are analyzed. The results of extensive experiments conducted on five datasets show that SPDA outperforms several state-of-the-art approaches and exhibits classification performance comparable with that of modern deep DA methods.

Introduction

Traditional supervised learning is based on the common assumption that the training data and testing data are obtained from the same feature space and follow the same data distribution [1]. It requires massive amounts of labeled training data for each gallery or corpus. However, in real-world applications, it is not possible to guarantee that the training samples always have the same distribution as the test samples owning to various factors, such as differences in visual resolution and illumination. Changes in the distribution will result in performance degradation of the original learning system. Therefore, many models need to be rebuilt from scratch with an extremely large number of training samples. However, recollecting the training data is prohibitive owing to the considerable manual effort involved [2], [3], especially for multi-label learning [66]. Moreover, retraining the models without applying the knowledge learned from previous domains or tasks is inefficient [4]. Transfer learning [1] has emerged as a method for addressing the above-mentioned issues, with the objective of borrowing well-learned knowledge from an auxiliary source domain and applying it to a related target domain. In addition, it has the potential to avoid the problem of learning with limited data.

Depending on whether the source domain and target domain data labels are available, transfer learning can be categorized into multi-task learning, self-taught learning, domain adaptation, and unsupervised transfer learning [1]. Domain adaptation (DA) has shown promising performance in many areas, e.g., image classification [5], objection recognition [6], text categorization [7], and video event detection [8]. DA assumes that the source domain data and the target domain data have the same feature space and label space but different data distributions. It employs information from both the source domain and the target domain during the learning process to achieve automatic adaptation. Based on the availability of target labeled data, DA can be generally categorized into semi-supervised and unsupervised domain adaptation. When a small set of labeled data is available in the target domain, the problem is semi-supervised DA [9], [10], [11]. When no labeled data are available in the target domain, the problem is unsupervised DA [12], [13], [14], [15]. This paper focuses on the more challenging problem of unsupervised DA because, in real-world applications, unlabeled target data are often much more abundant and difficult to annotate.

Some common latent factors are shared by two domains, even though distribution shifts may occur between these domains. Therefore, how to find, express, and use these latent factors to reduce the distribution shifts between the two domains is the major issue in DA, which can be addressed using instance re-weighting methods [16], [17], [18], [19], [20], [21] or feature representation methods [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34]. Most instance re-weighting approaches aim to adjust model parameters according to the importance of instances in order to construct an adaptive source classifier that can adapt to the target domain. In these methods, the samples are fixed but the decision boundaries are allowed to change. By contrast, feature representation approaches aim to learn a good feature representation across domains, which can reduce the distribution difference between domains as much as possible. Using the new feature representation, standard supervised learning algorithms can be trained on the source domain and reused on the target domain. Thus, the performance of the target task is expected to improve significantly.

To match the distribution differences between the source and target domains, most feature-based domain adaptation (DA) methods primarily focus on learning a common subspace for the two domains by exploiting either the statistical property or the geometric structure independently. For methods that exploit statistical properties, they reduce the domain distribution shifts either by utilizing sample means (transfer component analysis (TCA) [24]), second-order statistics (CORAL [29]), data scatter (scatter component analysis (SCA) [22]) or class means (linear discriminant analysis (LDA)-inspired domain adaptation (LDADA) [45]). Although these methods have produced significant results, sample means, second-order statistics, data scatter, and class means are still simple statistical measures that cannot completely describe the properties of the data. Moreover, data contain inherent geometric structure information. For example, samples from the same class are more closely clustered, while samples from different classes are relatively far apart and draw from different subspaces. Methods that exploit geometric structure information either exploit the space relationship together with other representation learning to learn the new representation of data (transfer sparse coding (TSC) [33], Tuia et al. [69], transfer latent representation (TSL) [32]), or exploit the global subspace structure to learn a common subspace in which the target domain data can be recovered from the source domain data with low rank constraint (low-rank transfer subspace learning (LTSL) [38]). Although these methods fully exploit the space relationship and global subspace structure of the data, the local geometric structure and the statistical properties of the data are not exploited.

In fact, the statistical property and the geometric structure information of the data are observed from different viewpoints, and each viewpoint can only partially describe the data. Different viewpoints are not mutually exclusive, but complement each other. Thus, jointly exploring them to match the domain difference should be advantageous from both aspects. In addition, theoretical studies [35], [36], [37] on DA have shown that the target domain classification error can be reduced by reducing the source domain classification error and the distribution difference between the two domains. Inspired by this theoretical result, we propose to embed the reduction of the source domain classification error and the alignment of the domain distribution differences into a single framework for optimization. Furthermore, to align the domain distribution differences, we jointly exploit the geometric structure information and statistical property of the data.

In this paper, we consequently propose a new DA method, called structure preservation and distribution alignment (SPDA), in discriminative transfer subspace learning. The core idea underlying the proposed method is learning a projection matrix, by which (1) the source domain classification error can be reduced; and (2) the source domain and the target domain data are projected into a common subspace, in which 2a) the geometric structural information of the data can be preserved, and 2b) the distribution differences of the two domains can be reduced. Specifically, to reduce the classification error of the source domain, an ε-dragging technique [52] that relaxes the strict binary label matrix into a slack variable matrix is introduced into the source domain regression classifier model, which not only provides more freedom to learn a proper projection matrix, but also enlarges the distance between two data points from different classes. To fully exploit the structural information residing in the data, we explore the preservation of geometric structures from both the feature level and the sample level. From the feature level, in this common subspace spanned by the projection matrix, the source and target domains are well aligned and each target datum can be linearly reconstructed by the data from the source domain. We impose low-rank and sparse constraints on the reconstruction coefficient matrix. The low-rank constraint ensures that the reconstruction coefficient has a block-wise structure; thus, the global subspace structure of the data is preserved. The sparse constraint ensures that each target domain is reconstructed only by a few neighbors in the source domain, thereby preserving the local structure of the data. From the sample level, a graph regularization method is used to characterize the sample relationship so that two points that are close in the original space are still held in the projected common subspace. Thus, the space relationship of the samples is preserved. Further, to reduce the statistical shifts, both marginal and conditional distributions between the two domains are aligned. The source domain classification error reduction, statistical property, and geometric structure are synthesized in this transfer subspace learning framework, and an effective optimization method is derived in detail. Further, the functions and roles of each term in this framework are analyzed. The proposed method is illustrated in Fig. 1.

The main contributions of this paper can be summarized as follows:

(1)
A unified framework for source domain classification error reduction, geometric structure preservation, and distribution alignment is proposed for unsupervised DA. When reducing the source classification error, an ε-dragging technique that relaxes the strict binary label matrix into a slack variable matrix is introduced in the regression classifier mode. This framework is formulated as a low-rankness, sparsity, and structural risk minimization problem, which can be solved by an effective optimization algorithm based on the alternating direction method of multipliers (ADMM).
(2)
From the aspect of geometric structure preservation, the global subspace structure and local geometric structure are preserved from the feature level by imposing low-rank and sparse constraints, respectively, on the reconstruction coefficient matrix. From the sample level, the space relationship of samples is preserved by a nearest neighbor graph.
(3)
The results of extensive experiments conducted on five benchmarks datasets show that our approach consistently outperforms several representative methods and exhibits classification performance comparable with that of modern deep DA methods.

Section snippets

Related work

DA has been introduced to address the problem of the source domain and the target domain sharing the same task but having different data distributions. Existing DA methods can be roughly categorized into two types: instance re-weighting methods [16], [17], [18], [19], [20], [21] and feature representation methods [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34]. Our work is mainly related to feature representation methods. Feature representation methods aim to learn

Structure preservation and distribution alignment (SPDA)

This section describes the SPDA method in detail. First, we define the notations used in this paper. Then, we describe the source domain classifier design, geometric structure preservation, and distribution alignment in detail.

Optimizing the SPDA framework

In this section, we describe how to solve the SPDA model. Then, complexity analysis of the proposed method is conducted.

Experiments

In this section, we evaluate the performance of the proposed method in image recognition problems. First, we introduce the datasets and basic experimental settings. Then, we compare the proposed SPDA method with several related and state-of-the art baseline methods. Finally, we evaluate and discuss the effectiveness and the parameter sensitivity. The SPDA code is available online.

Conclusion

In this paper, we proposed a novel approach, called SPDA, for unsupervised DA. Source domain classifier design, geometric structure preservation, and statistical distribution alignment can affect the performance of DA. Only by comprehensively formulating these three factors into the same framework can we achieve better DA performance. SPDA seeks a common subspace where the source discrimination power can be improved by introducing an ε-dragging technique into the linear regression classifier,

Acknowledgement

This work was funded by the Lab of Space Optoelectronic Measurement & Perception, No: LabSOMP-2018-01 and partly supported by the National Natural Science Foundation of China (Grant No. 61671175, 61370162 and 61672190).

Ting Xiao is a Ph.D. candidate at the School of Computer Science and Technology, Harbin Institute of Technology (HIT). She received her master's degree in computer application technology from Harbin Institute of Technology in 2016. Her research interests cover image processing, computer vision and machine learning.

References (69)

SunB. et al.
Subspace distribution alignment for unsupervised domain adaptation
D. Kifer et al.
Detecting change in data streams
ZhaoP. et al.
Transfer robust sparse coding based on graph and joint distribution adaption for image representation
Knowl. Based Syst.
(2018)
DuB. et al.
Unsupervised transfer learning for target detection from hyperspectral images
Neurocomputing
(2013)
PanS.J. et al.
A survey on transfer learning
IEEE Trans. Knowl. Data Eng.
(2010)
K. Weiss et al.
A survey of transfer learning
J. Big Data
(2016)
ShaoL. et al.
Transfer learning for visual categorization: a survey
IEEE Trans. Neural Netw. Learn. Syst.
(2015)
V.M. Patel et al.
Visual domain adaptation: a survey of recent advances
IEEE Signal Process. Mag.
(2015)
ZhuY. et al.
Heterogeneous transfer learning for image classification
KanM. et al.
Domain adaptation for face recognition: targetize source domain bridged by common subspace
Int. J. Comput. Vision
(2014)

ZhuangF. et al.

Mining distinction and commonality across multiple domains using generative model for text classification

IEEE Trans. Knowl. Data Eng.

(2012)

LiL. et al.

Video summarization via transferrable structured learning

A. Kumar et al.

Co-regularization based semi-supervised domain adaptation

A. Bergamo et al.

Exploiting weakly-labeled web images to improve object classification: a domain adaptation approach

R. Mehrotra et al.

Dictionary based sparse representation for domain adaptation

B. Fernando et al.

Unsupervised visual domain adaptation using subspace alignment

M. Baktashmotlagh et al.

Unsupervised domain adaptation by domain invariant projection

R. Gopalan et al.

Unsupervised adaptation across domain shifts by generating intermediate data representations

IEEE Trans. Pattern Anal. Mach. Intell.

(2014)

LinY. et al.

Cross-domain recognition by identifying joint subspaces of source domain and target domain

IEEE Trans. Cybern.

(2017)

DaiW. et al.

Boosting for transfer learning

YaoY. et al.

Boosting for transfer learning with multiple sources

SunQ. et al.

A two-stage weighting framework for multi-source domain adaptation

GongB. et al.

Connecting the dots with landmarks: discriminatively learning domain-invariant features for unsupervised domain adaptation

R. Aljundi et al.

Landmarks-based kernelized subspace alignment for unsupervised domain adaptation

XiaoT. et al.

Iterative landmark selection and subspace alignment for unsupervised domain adaptation

J. Electron. Imaging

(2018)

M. Ghifary et al.

Scatter component analysis: a unified framework for domain adaptation and domain generalization

IEEE Trans. Pattern Anal. Mach. Intell.

(2017)

M. Baktashmotlagh et al.

Distribution-matching embedding for visual domain adaptation

J. Mach. Learn. Res.

(2016)

PanS.J. et al.

Domain adaptation via transfer component analysis

IEEE Trans. Neural Netw.

(2011)

LongM. et al.

Transfer feature learning with joint distribution adaptation

LongM. et al.

Transfer joint matching for unsupervised domain adaptation

GongB. et al.

Geodesic flow kernel for unsupervised domain adaptation

B. Sun, J. Feng, K. Saenko Return of frustratingly easy domain adaptation. In Proceedings of the AAAI, (2016, February)...

Devis Tuia

Semisupervised manifold alignment of multimodal remote sensing images

IEEE Trans. Geosci. Remote Sens.

(2014)

ZhangJ. et al.

Joint geometrical and statistical alignment for visual domain adaptation

Cited by (37)

Domain adaptive learning based on equilibrium distribution and dynamic subspace approximation
2024, Expert Systems with Applications
Nowadays, big data analysis has become an important approach in social information network. However, the social information may not be distributed independently and identically (i.i.d.), which can be addressed using domain adaptation. However, most of the existing domain adaptation methods are designed to align cross-domain distributions. The label information of the samples in the target domain is completely unavailable. Thus, the class-conditional distribution differences cannot be well measured, and the effect of feature distortion on distribution alignment in the original feature space is difficult to handle. This paper proposes a domain adaptation learning based on the Equilibrium Distribution and Dynamic Subspace Approximation (EDDSA) to alleviate these problems. First, EDDSA learns to project the source and target domains into associated feature spaces, and dynamically approximates two subspaces to overcome the feature distortion problem. Second, the balanced distribution alignment term is introduced to dynamically weight the importance of the conditional and marginal distributions. Through many experiments, EDDSA is superior to most traditional methods.
The multi-task transfer learning for multiple data streams with uncertain data
2024, Information Sciences
In existing research on data streams, most problems are processed and studied based on single data streams. However, there exist multiple data streams and the reaching data may contain noise information which is considered uncertain in representation. In this paper, we propose the multi-task transfer learning for multiple data streams with uncertain data (MTUMDS), which can make use of the similarity of multiple data streams to carry out multi-task learning that can improve the classification ability of the data stream model. At the same time, transfer learning is applied for each single data stream, which transfers knowledge from the known classifiers of the previous time windows to the current target window classifier. This can settle the situation that the concept drift causes model fitness reduction. Then, in view of the noise and collection error hidden in the real data, boundary constraints are generated for each sample to build the SVM classifier to solve the uncertainty of the data. A large number of experiments in multiple data streams show that our approach has better performance and robustness than previous studies.
Selective transfer subspace learning for small-footprint end-to-end cross-domain keyword spotting
2024, Speech Communication
In small-footprint end-to-end keyword spotting, it is often expensive and time-consuming to acquire sufficient labels in various speech scenarios. To overcome this problem, transfer learning leverages the rich knowledge of the auxiliary domain to annotate the unlabeled target data. However, most existing transfer learning methods typically learn a domain-invariant feature representation while ignoring the negative transfer problem. In this paper, we propose a new and general cross-domain keyword spotting framework called selective transfer subspace learning (STSL) that avoid negative transfer and dramatically improve the accuracy for cross-domain keyword spotting by actively selecting appropriate source samples. Specifically, STSL first aligns geometrical relationship and weighted distribution discrepancy to learn a domain-invariant projection subspace. Then, it actively selects appropriate source samples that are similar to the target domain for transfer learning to avoid negative transfer. Finally, we formulate a minimization problem that alternately optimizes the projection subspace and source active selection, giving an effective optimization. Experimental results on 10 groups of cross-domain keyword spotting tasks show that our STSL outperforms some state-of-the-art transfer learning methods and no transfer learning methods.
Dynamic classifier approximation for unsupervised domain adaptation
2023, Signal Processing
Citation Excerpt :
JGSA [15]: Joint geometrical and statistical alignment (JGSA) aims to align the distribution statistically and geometrically. SPDA [34]: Structure preservation and distribution alignment (SPDA) is a DA method that combines domain distribution alignment and the classification error of source domain. RDA [19]: Reliable domain adaptation (RDA) obtains a reliable target sample by using dual domain-specific projections and double task-classifiers.
In recent years, domain adaptation (DA) method has been proposed to solve the problem of domain shift between the training set and test set. However, most feature-based unsupervised domain adaptation methods have a weakness of strict requirement on the transformation matrix and they are not integrated with classifier. Therefore, this paper proposes a novel dynamic classifier approximation (DCA) method for unsupervised domain adaptation. Specifically, the proposed DCA, which combines classifier method to relax the requirements of transformation matrix, learns domain invariant features from structure and data level. 1) At the structure level, the low-rank representation, sparse representation and manifold learning are combined to preserve the structure information of data in the original feature space. 2) At the data level, distribution alignments are adopted to minimize the distribution difference of the source and target domains in the common subspace, which is helpful to reduce the negative impact caused by domain shift. In addition, for learning the common subspace more flexibly, a dynamic classifier is introduced to reduce the strict requirements on projection matrix. This paper carries out a large number of experiments on four data sets, and the experimental results show that the proposed method is superior to many of the latest domain adaptation methods, which can indicate its effectiveness.
Open set domain adaptation with latent structure discovery and kernelized classifier learning
2023, Neurocomputing
Numerous visual domain adaptation methods have been proposed for transferring knowledge from a well-labeled source domain to an unlabeled but related target domain. Most of existing works are only geared to closed set domain adaptation, where an identical label space is shared between two domains. In this paper, we focus on a more realistic but challenging scenario, open set domain adaptation, where the target domain contains unknown classes that do not appear in the label space of source domain. The main task of open set domain adaptation is to simultaneously recognize the target images of known classes and those of unknown classes correctly. To achieve this goal, in this paper, we propose a novel open set domain adaptation method, which consists of two parts: latent structure discovery and kernelized classifier learning. In the first part, we employ an adaptive discriminative graph learning strategy to capture the intrinsic manifold structure of the source and target domain data in the latent feature space, such that the boundaries among all classes will be delineated more clearly. In the second part, the samples from the latent feature space are mapped into a high-dimensional kernel space to make them linearly separable, and a linear classifier is learned by jointly operating unknown target samples separating, known samples matching and local structure preserving. As the optimization problem is not convex with all variables, we devise an efficient iterative algorithm to solve it. The extensive experimental results on five image datasets confirm the superiority of the proposed method compared with the state-of-the-art traditional and deep competitors.
Unsupervised domain adaptation via re-weighted transfer subspace learning with inter-class sparsity
2023, Knowledge-Based Systems
Unsupervised domain adaptation is intended to respond to a challenging scenario where the training and testing samples are related yet distributed differently. A common technique to realize the alignment of distributions is feature transformation, which is gaining increasing attention in the transfer learning community. Many of existing studies concentrate on learning domain-invariant feature representations in order to transfer classifiers from the source to the target domain. However, the class discriminability of features is also crucial for the cross-domain learning, which could lead to decreased performance if ignored. To extract both the transferable and discriminative feature representation, a novel unsupervised domain adaptation method termed re-weighted transfer subspace learning with inter-class sparsity (ICS-RTSL) is proposed. Central to the proposed approach is the introduction of a class-wise sparsity regularization that aims to reduce the intra-class distances by maintaining the structural consistency of the source samples, thus making it possible to learn a more discriminative representation. In addition, a residual term through the least absolute criterion is constructed to mitigate the negative impact possibly brought by outliers. This is followed by the proposal of an efficient algorithm with iteratively re-weighted least squares to optimize the learning model. Supported by a set of comparative experiments, the effectiveness of the proposed ICS-RTSL is demonstrated through cross-domain image recognition tasks.

View all citing articles on Scopus

Peng Liu is an associate professor at the School of Computer Science and Technology, HIT. He received his doctoral degree in microelectronics and solid-state electronics from HIT in 2007. His research interests cover image processing, computer vision, transfer learning, reinforcement learning and pattern recognition.

Wei Zhao is an associate professor at the School of Computer Science and Technology, HIT. She received her doctoral degree in computer application technology from HIT in 2006. Her research interests cover pattern recognition, image processing, and deep-space target visual analysis.

Hongwei Liu is a professor at the School of Computer Science and Technology, HIT. He received Ph.D. degree in computer science and technology from Harbin Institute of Technology, in 2004. His research interests include, resource allocation and optimization in cloud computing system, evaluation theory and technology in cloud computing system, mobile computing and software reliability modeling.

Xianglong Tang is a professor at the School of Computer Science and Technology, HIT. He received his doctoral degree in computer application technology from HIT in 1995. His research interests cover pattern recognition, aerospace image processing, medical image processing, and machine learning.

View full text

Structure preservation and distribution alignment in discriminative transfer subspace learning

Abstract

Introduction

Section snippets

Related work

Structure preservation and distribution alignment (SPDA)

Optimizing the SPDA framework

Experiments

Conclusion

Acknowledgement

Knowl. Based Syst.

Neurocomputing

A survey on transfer learning

IEEE Trans. Knowl. Data Eng.

A survey of transfer learning

J. Big Data

Transfer learning for visual categorization: a survey

IEEE Trans. Neural Netw. Learn. Syst.

Visual domain adaptation: a survey of recent advances

IEEE Signal Process. Mag.

Heterogeneous transfer learning for image classification

Domain adaptation for face recognition: targetize source domain bridged by common subspace

Int. J. Comput. Vision

Mining distinction and commonality across multiple domains using generative model for text classification

IEEE Trans. Knowl. Data Eng.

Video summarization via transferrable structured learning

Co-regularization based semi-supervised domain adaptation

Exploiting weakly-labeled web images to improve object classification: a domain adaptation approach

Dictionary based sparse representation for domain adaptation

Unsupervised visual domain adaptation using subspace alignment

Unsupervised domain adaptation by domain invariant projection

Unsupervised adaptation across domain shifts by generating intermediate data representations

IEEE Trans. Pattern Anal. Mach. Intell.

Cross-domain recognition by identifying joint subspaces of source domain and target domain

IEEE Trans. Cybern.

Boosting for transfer learning

Boosting for transfer learning with multiple sources

A two-stage weighting framework for multi-source domain adaptation

Connecting the dots with landmarks: discriminatively learning domain-invariant features for unsupervised domain adaptation

Landmarks-based kernelized subspace alignment for unsupervised domain adaptation

Iterative landmark selection and subspace alignment for unsupervised domain adaptation

J. Electron. Imaging

Scatter component analysis: a unified framework for domain adaptation and domain generalization

IEEE Trans. Pattern Anal. Mach. Intell.

Distribution-matching embedding for visual domain adaptation

J. Mach. Learn. Res.

Domain adaptation via transfer component analysis

IEEE Trans. Neural Netw.

Transfer feature learning with joint distribution adaptation

Transfer joint matching for unsupervised domain adaptation

Geodesic flow kernel for unsupervised domain adaptation

Semisupervised manifold alignment of multimodal remote sensing images

IEEE Trans. Geosci. Remote Sens.

Joint geometrical and statistical alignment for visual domain adaptation