Elsevier

Neurocomputing

Volume 337, 14 April 2019, Pages 218-234
Neurocomputing

Structure preservation and distribution alignment in discriminative transfer subspace learning

https://doi.org/10.1016/j.neucom.2019.01.069Get rights and content

Abstract

Domain adaptation (DA) is one of the most promising techniques for leveraging existing knowledge from a source domain and applying it to a related target domain. Most DA methods mainly focus on learning a common subspace for the two domains by exploiting either the statistical property or the geometric structure independently to reduce the domain distribution difference. However, these two properties are complementary to each other, and jointly exploring them could yield optimal results. Inspired by the theoretical results of DA, in this paper, we propose structure preservation and distribution alignment (SPDA) in discriminative transfer subspace learning, which embeds the source domain classification error and reduction and domain distribution alignment into a single framework for optimization. SPDA learns an appropriate projection matrix, by which (1) the source domain classification error can be reduced; and (2) the source and target domain data are projected into a common subspace, where the domain distributions are well aligned and each target datum can be linearly reconstructed using the data from the source domain. To reduce the source domain classification error, an ε-dragging technique that relaxes the strict binary label matrix is introduced to enlarge the distance between two data points from different classes. Further, the global subspace structure and the local geometric structure are preserved by imposing a low-rank constraint and a sparse constraint, respectively, on the reconstruction coefficient matrix. Moreover, the space relationship of the samples is preserved using a graph regularization method. In addition, marginal and conditional distributions between the domains are minimized to further reduce the domain shift statistically. We formulate source domain classifier design, geometric structure preservation, and distribution alignment as a rank-minimization problem, and we design an effective optimization algorithm based on the alternating direction method of multipliers (ADMM) to solve this problem. The functions and roles of each term in this framework are analyzed. The results of extensive experiments conducted on five datasets show that SPDA outperforms several state-of-the-art approaches and exhibits classification performance comparable with that of modern deep DA methods.

Introduction

Traditional supervised learning is based on the common assumption that the training data and testing data are obtained from the same feature space and follow the same data distribution [1]. It requires massive amounts of labeled training data for each gallery or corpus. However, in real-world applications, it is not possible to guarantee that the training samples always have the same distribution as the test samples owning to various factors, such as differences in visual resolution and illumination. Changes in the distribution will result in performance degradation of the original learning system. Therefore, many models need to be rebuilt from scratch with an extremely large number of training samples. However, recollecting the training data is prohibitive owing to the considerable manual effort involved [2], [3], especially for multi-label learning [66]. Moreover, retraining the models without applying the knowledge learned from previous domains or tasks is inefficient [4]. Transfer learning [1] has emerged as a method for addressing the above-mentioned issues, with the objective of borrowing well-learned knowledge from an auxiliary source domain and applying it to a related target domain. In addition, it has the potential to avoid the problem of learning with limited data.

Depending on whether the source domain and target domain data labels are available, transfer learning can be categorized into multi-task learning, self-taught learning, domain adaptation, and unsupervised transfer learning [1]. Domain adaptation (DA) has shown promising performance in many areas, e.g., image classification [5], objection recognition [6], text categorization [7], and video event detection [8]. DA assumes that the source domain data and the target domain data have the same feature space and label space but different data distributions. It employs information from both the source domain and the target domain during the learning process to achieve automatic adaptation. Based on the availability of target labeled data, DA can be generally categorized into semi-supervised and unsupervised domain adaptation. When a small set of labeled data is available in the target domain, the problem is semi-supervised DA [9], [10], [11]. When no labeled data are available in the target domain, the problem is unsupervised DA [12], [13], [14], [15]. This paper focuses on the more challenging problem of unsupervised DA because, in real-world applications, unlabeled target data are often much more abundant and difficult to annotate.

Some common latent factors are shared by two domains, even though distribution shifts may occur between these domains. Therefore, how to find, express, and use these latent factors to reduce the distribution shifts between the two domains is the major issue in DA, which can be addressed using instance re-weighting methods [16], [17], [18], [19], [20], [21] or feature representation methods [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34]. Most instance re-weighting approaches aim to adjust model parameters according to the importance of instances in order to construct an adaptive source classifier that can adapt to the target domain. In these methods, the samples are fixed but the decision boundaries are allowed to change. By contrast, feature representation approaches aim to learn a good feature representation across domains, which can reduce the distribution difference between domains as much as possible. Using the new feature representation, standard supervised learning algorithms can be trained on the source domain and reused on the target domain. Thus, the performance of the target task is expected to improve significantly.

To match the distribution differences between the source and target domains, most feature-based domain adaptation (DA) methods primarily focus on learning a common subspace for the two domains by exploiting either the statistical property or the geometric structure independently. For methods that exploit statistical properties, they reduce the domain distribution shifts either by utilizing sample means (transfer component analysis (TCA) [24]), second-order statistics (CORAL [29]), data scatter (scatter component analysis (SCA) [22]) or class means (linear discriminant analysis (LDA)-inspired domain adaptation (LDADA) [45]). Although these methods have produced significant results, sample means, second-order statistics, data scatter, and class means are still simple statistical measures that cannot completely describe the properties of the data. Moreover, data contain inherent geometric structure information. For example, samples from the same class are more closely clustered, while samples from different classes are relatively far apart and draw from different subspaces. Methods that exploit geometric structure information either exploit the space relationship together with other representation learning to learn the new representation of data (transfer sparse coding (TSC) [33], Tuia et al. [69], transfer latent representation (TSL) [32]), or exploit the global subspace structure to learn a common subspace in which the target domain data can be recovered from the source domain data with low rank constraint (low-rank transfer subspace learning (LTSL) [38]). Although these methods fully exploit the space relationship and global subspace structure of the data, the local geometric structure and the statistical properties of the data are not exploited.

In fact, the statistical property and the geometric structure information of the data are observed from different viewpoints, and each viewpoint can only partially describe the data. Different viewpoints are not mutually exclusive, but complement each other. Thus, jointly exploring them to match the domain difference should be advantageous from both aspects. In addition, theoretical studies [35], [36], [37] on DA have shown that the target domain classification error can be reduced by reducing the source domain classification error and the distribution difference between the two domains. Inspired by this theoretical result, we propose to embed the reduction of the source domain classification error and the alignment of the domain distribution differences into a single framework for optimization. Furthermore, to align the domain distribution differences, we jointly exploit the geometric structure information and statistical property of the data.

In this paper, we consequently propose a new DA method, called structure preservation and distribution alignment (SPDA), in discriminative transfer subspace learning. The core idea underlying the proposed method is learning a projection matrix, by which (1) the source domain classification error can be reduced; and (2) the source domain and the target domain data are projected into a common subspace, in which 2a) the geometric structural information of the data can be preserved, and 2b) the distribution differences of the two domains can be reduced. Specifically, to reduce the classification error of the source domain, an ε-dragging technique [52] that relaxes the strict binary label matrix into a slack variable matrix is introduced into the source domain regression classifier model, which not only provides more freedom to learn a proper projection matrix, but also enlarges the distance between two data points from different classes. To fully exploit the structural information residing in the data, we explore the preservation of geometric structures from both the feature level and the sample level. From the feature level, in this common subspace spanned by the projection matrix, the source and target domains are well aligned and each target datum can be linearly reconstructed by the data from the source domain. We impose low-rank and sparse constraints on the reconstruction coefficient matrix. The low-rank constraint ensures that the reconstruction coefficient has a block-wise structure; thus, the global subspace structure of the data is preserved. The sparse constraint ensures that each target domain is reconstructed only by a few neighbors in the source domain, thereby preserving the local structure of the data. From the sample level, a graph regularization method is used to characterize the sample relationship so that two points that are close in the original space are still held in the projected common subspace. Thus, the space relationship of the samples is preserved. Further, to reduce the statistical shifts, both marginal and conditional distributions between the two domains are aligned. The source domain classification error reduction, statistical property, and geometric structure are synthesized in this transfer subspace learning framework, and an effective optimization method is derived in detail. Further, the functions and roles of each term in this framework are analyzed. The proposed method is illustrated in Fig. 1.

The main contributions of this paper can be summarized as follows:

  • (1)

    A unified framework for source domain classification error reduction, geometric structure preservation, and distribution alignment is proposed for unsupervised DA. When reducing the source classification error, an ε-dragging technique that relaxes the strict binary label matrix into a slack variable matrix is introduced in the regression classifier mode. This framework is formulated as a low-rankness, sparsity, and structural risk minimization problem, which can be solved by an effective optimization algorithm based on the alternating direction method of multipliers (ADMM).

  • (2)

    From the aspect of geometric structure preservation, the global subspace structure and local geometric structure are preserved from the feature level by imposing low-rank and sparse constraints, respectively, on the reconstruction coefficient matrix. From the sample level, the space relationship of samples is preserved by a nearest neighbor graph.

  • (3)

    The results of extensive experiments conducted on five benchmarks datasets show that our approach consistently outperforms several representative methods and exhibits classification performance comparable with that of modern deep DA methods.

Section snippets

Related work

DA has been introduced to address the problem of the source domain and the target domain sharing the same task but having different data distributions. Existing DA methods can be roughly categorized into two types: instance re-weighting methods [16], [17], [18], [19], [20], [21] and feature representation methods [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34]. Our work is mainly related to feature representation methods. Feature representation methods aim to learn

Structure preservation and distribution alignment (SPDA)

This section describes the SPDA method in detail. First, we define the notations used in this paper. Then, we describe the source domain classifier design, geometric structure preservation, and distribution alignment in detail.

Optimizing the SPDA framework

In this section, we describe how to solve the SPDA model. Then, complexity analysis of the proposed method is conducted.

Experiments

In this section, we evaluate the performance of the proposed method in image recognition problems. First, we introduce the datasets and basic experimental settings. Then, we compare the proposed SPDA method with several related and state-of-the art baseline methods. Finally, we evaluate and discuss the effectiveness and the parameter sensitivity. The SPDA code is available online.

Conclusion

In this paper, we proposed a novel approach, called SPDA, for unsupervised DA. Source domain classifier design, geometric structure preservation, and statistical distribution alignment can affect the performance of DA. Only by comprehensively formulating these three factors into the same framework can we achieve better DA performance. SPDA seeks a common subspace where the source discrimination power can be improved by introducing an ε-dragging technique into the linear regression classifier,

Acknowledgement

This work was funded by the Lab of Space Optoelectronic Measurement & Perception, No: LabSOMP-2018-01 and partly supported by the National Natural Science Foundation of China (Grant No. 61671175, 61370162 and 61672190).

Ting Xiao is a Ph.D. candidate at the School of Computer Science and Technology, Harbin Institute of Technology (HIT). She received her master's degree in computer application technology from Harbin Institute of Technology in 2016. Her research interests cover image processing, computer vision and machine learning.

References (69)

  • ZhuangF. et al.

    Mining distinction and commonality across multiple domains using generative model for text classification

    IEEE Trans. Knowl. Data Eng.

    (2012)
  • LiL. et al.

    Video summarization via transferrable structured learning

  • A. Kumar et al.

    Co-regularization based semi-supervised domain adaptation

  • A. Bergamo et al.

    Exploiting weakly-labeled web images to improve object classification: a domain adaptation approach

  • R. Mehrotra et al.

    Dictionary based sparse representation for domain adaptation

  • B. Fernando et al.

    Unsupervised visual domain adaptation using subspace alignment

  • M. Baktashmotlagh et al.

    Unsupervised domain adaptation by domain invariant projection

  • R. Gopalan et al.

    Unsupervised adaptation across domain shifts by generating intermediate data representations

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2014)
  • LinY. et al.

    Cross-domain recognition by identifying joint subspaces of source domain and target domain

    IEEE Trans. Cybern.

    (2017)
  • DaiW. et al.

    Boosting for transfer learning

  • YaoY. et al.

    Boosting for transfer learning with multiple sources

  • SunQ. et al.

    A two-stage weighting framework for multi-source domain adaptation

  • GongB. et al.

    Connecting the dots with landmarks: discriminatively learning domain-invariant features for unsupervised domain adaptation

  • R. Aljundi et al.

    Landmarks-based kernelized subspace alignment for unsupervised domain adaptation

  • XiaoT. et al.

    Iterative landmark selection and subspace alignment for unsupervised domain adaptation

    J. Electron. Imaging

    (2018)
  • M. Ghifary et al.

    Scatter component analysis: a unified framework for domain adaptation and domain generalization

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2017)
  • M. Baktashmotlagh et al.

    Distribution-matching embedding for visual domain adaptation

    J. Mach. Learn. Res.

    (2016)
  • PanS.J. et al.

    Domain adaptation via transfer component analysis

    IEEE Trans. Neural Netw.

    (2011)
  • LongM. et al.

    Transfer feature learning with joint distribution adaptation

  • LongM. et al.

    Transfer joint matching for unsupervised domain adaptation

  • GongB. et al.

    Geodesic flow kernel for unsupervised domain adaptation

  • B. Sun, J. Feng, K. Saenko Return of frustratingly easy domain adaptation. In Proceedings of the AAAI, (2016, February)...
  • Devis Tuia

    Semisupervised manifold alignment of multimodal remote sensing images

    IEEE Trans. Geosci. Remote Sens.

    (2014)
  • ZhangJ. et al.

    Joint geometrical and statistical alignment for visual domain adaptation

  • Cited by (37)

    • Dynamic classifier approximation for unsupervised domain adaptation

      2023, Signal Processing
      Citation Excerpt :

      JGSA [15]: Joint geometrical and statistical alignment (JGSA) aims to align the distribution statistically and geometrically. SPDA [34]: Structure preservation and distribution alignment (SPDA) is a DA method that combines domain distribution alignment and the classification error of source domain. RDA [19]: Reliable domain adaptation (RDA) obtains a reliable target sample by using dual domain-specific projections and double task-classifiers.

    View all citing articles on Scopus

    Ting Xiao is a Ph.D. candidate at the School of Computer Science and Technology, Harbin Institute of Technology (HIT). She received her master's degree in computer application technology from Harbin Institute of Technology in 2016. Her research interests cover image processing, computer vision and machine learning.

    Peng Liu is an associate professor at the School of Computer Science and Technology, HIT. He received his doctoral degree in microelectronics and solid-state electronics from HIT in 2007. His research interests cover image processing, computer vision, transfer learning, reinforcement learning and pattern recognition.

    Wei Zhao is an associate professor at the School of Computer Science and Technology, HIT. She received her doctoral degree in computer application technology from HIT in 2006. Her research interests cover pattern recognition, image processing, and deep-space target visual analysis.

    Hongwei Liu is a professor at the School of Computer Science and Technology, HIT. He received Ph.D. degree in computer science and technology from Harbin Institute of Technology, in 2004. His research interests include, resource allocation and optimization in cloud computing system, evaluation theory and technology in cloud computing system, mobile computing and software reliability modeling.

    Xianglong Tang is a professor at the School of Computer Science and Technology, HIT. He received his doctoral degree in computer application technology from HIT in 1995. His research interests cover pattern recognition, aerospace image processing, medical image processing, and machine learning.

    View full text