Elsevier

Knowledge-Based Systems

Volume 260, 25 January 2023, 110113
Knowledge-Based Systems

Transferable feature filtration network for multi-source domain adaptation

https://doi.org/10.1016/j.knosys.2022.110113Get rights and content

Abstract

Compared to conventional unsupervised domain adaptation, multi-source domain adaptation (MSDA) is suitable for more practical scenarios but is more challenging because the knowledge transferred to the target domain is learned from multiple source domains. Existing adversarial-based domain adaptation methods facilitate the obfuscation of domain distributions and cross-domain alignment for representations by deceiving the discriminator. However, traditional methods neglect the distinction in the transferability of the different features, resulting in untransferable domain-varying features, such as those extracted from the background, also being forced to align between domains. To this end, we propose a feature filtration mechanism and design a corresponding neural network to achieve a selective feature alignment based on the transferability of features; this is termed a transferable feature filtration network (TFFN). We construct a transferable feature learning framework containing two subprocesses, filtering, in which the attention mechanism-based filtration network automatically extracts transferable features under additional supervision, and repairing, in which we suggest enhancing the adaptability of the filtration network by repairing it with the help of an additional classifier focused on the target knowledge. Furthermore, to facilitate the matching of the source and target distributions at the class level, we propose a filtration consistency loss to enhance the cross-domain consistency of the filtering weights. Extensive experiments conducted on MSDA benchmark datasets show that the proposed method has significant advantages over the existing methods.

Introduction

Due to the powerful fitting ability of the deep neural networks and the rich annotated datasets, deep learning methods have demonstrated great success in tasks such as image classification, semantic segmentation, and natural language processing [1], [2]. However, it is well-known that due to the existence of dataset bias and domain shift, extending a trained model that performs well on the source domain directly to a disparate target domain often fails to preserve good performance [3], [4]. Moreover, the expensive cost of collecting and annotating large amounts of target data, especially those with scarce accessibility and with annotations accessible only to senior experts, hinders attempts to fine-tune source models using labeled target data [5], [6], [7], [8], [9]. This phenomenon has drastically motivated the development of domain adaptation, which seeks to achieve effective learning on the target domain by leveraging the knowledge transferred from the source domain from an unsupervised or semi-supervised perspective. To address this issue, for a model trained on labeled source datasets, unsupervised domain adaptation (UDA) is dedicated to improving the generalization performance on completely unlabeled target datasets.

In recent years, leveraging deep neural networks, which have been shown to have the ability to learn more transferable knowledge, to learn domain-invariant representations has evolved into one of the mainstream approaches for unsupervised domain adaptation, spawning key technological revolutions [1], [2], [10], [11], [12], [13]. Along with the effective application of GAN [14] in domain adaptation, adversarial networks dedicated to achieving representation alignments through adversarial training have become one of the most commonly adopted tools for extracting domain-invariant features. In addition, there are also approaches that focus attention on how to reduce the domain discrepancy. A widely used strategy is to project the source and target domain samples into a shared latent space and then minimize the domain discrepancy by some defined metrics such as the maximum mean discrepancy (MMD) [10], [15] and the Kullback–Leibler (KL) divergence [16]. However, in the presence of excessive interdomain distribution disparity, UDA focusing on interdomain knowledge transfer on a single source–target pair in a general sense will have to face resistance from additional interfering factors, which often leads to a suboptimal solution [17], [18]. Consequently, the rational suggestion of using multiple source domains for adaptation to the target domain, called multi-source domain adaptation (MSDA) [19], has been proposed in due course to cater to a practical scenario where a wide range of data sources are available in real-world contexts. Unfortunately, an increase in the annotated samples collected from the source domain does not directly imply a more beneficial transfer, as it brings the new challenge of how to handle domain shifts between multiple source domains, which may lead to performance inferior to that of using a single source domain [5], [20], [21].

As a key solution for domain adaptation, adversarial training is incorporated into several MSDA methods [5], [18], [20], [22]. While to some extent many advances are attributed to the progressiveness of the conventional adversarial method [14], the coarse-grained cross-domain alignment of feature representations neglects the fact that the features in different regions of an image differ in their transferability [2]. In general, domain-invariant features are equipped with a higher degree of transferability, while features that vary significantly with the domain (e.g., background features) are not transferable, but can still be aligned. These domain-varying features introduce additional interference in representation alignments between multiple source domains and hinder the representation matching of the target domain and a mixture of source domains. To achieve discrimination for the domains to which the samples belong, adversarial discriminative methods indiscriminately enforce an alignment of all the features, including those that are domain-varying. A serious consequence is that in the absence of additional supervision, the discriminator may rely excessively on the domain-varying features to reflect the distinguishability between domains. Erroneous judgments with high confidence from the classification of target samples by the classifier can be triggered with this excessive dependence, as shown in Fig. 1. How to focus the attention of the domain discriminator on the features with high transferability is a problem worth investigating and is the main work of this paper.

In this paper, we design a feature filtration mechanism that incorporates attention, and innovatively construct a new network architecture for traditional adversarial methods, called a transferable feature filtration network (TFFN). Specifically, the framework is equipped with a specially designed two-stage training procedure consisting of two stages, filtering and repairing, which motivates the filtration network to capture highly transferable features and filter out untransferable features under the supervision of the classifier by alternating the training between stages. Thus, the domain discriminator is encouraged to focus on the alignment of the transferable features, which is valuable since the domain shift between source domains is largely alleviated. In the repair phase, we propose to enhance the adaptability of the filtration network, i.e., to ensure that the repair of the filtration network is not only supervised by the source classifier but is also equally compatible with the target domain. Inspired by imitation learning methods [23], [24], [25], we employ an additional classifier dedicated to the target domain that constitutes a teacher–student structure (TS) with the source classifier, which is able to provide more complete supervision and enhanced adaptability to the filtration network. In addition, to ensure the transferability of the filtered features at the class level, we introduce a filtration consistency loss based on attention and pseudo labels.

Our contributions to this work are as follows:

  • For the MSDA scenario, we propose a new feature filtration mechanism that exploits attention, improves traditional adversarial methods and explores the feasibility of using feature filtration networks to extract the transferable features.

  • To let the discriminator operate under the constraints of a well-established filtration network, we creatively design and incorporate a two-stage training procedure, dividing the entire training process into two subprocesses, filtering and repairing. To the best of our knowledge, we are the first to use two-stage training that separates feature alignment and image classification to exploit the transferability of the features.

  • To ensure that the TFFN is supervised from both the source and target domains, we additionally employ a classifier dedicated to the target samples based on imitation learning and verify the effectiveness of this TS structure for enhancing the adaptability of the filtration network on the target domain. Moreover, we propose a filtration consistency loss with the help of pseudo labels, recommending enhancing the cross-domain consistency of the filtration network for specific classes.

  • We conduct extensive experiments on the MSDA benchmark datasets and the experimental results demonstrate that our TFFN outperforms the existing methods by a large margin.

Section snippets

Unsupervised domain adaptation

Typically, UDA hopes to generalize a model trained only with the supervised knowledge of a single source domain to a target domain in which the labels are completely lacking; however, the presence of a domain shift hinders the distribution matching for these two domains [13], [26], [27]. The various UDA methods mainly focus on three different perspectives. The first category of approaches explores overcoming the hindrance by minimizing the discrepancy metric. The work in [28], [29], [30]

Problem formulation

In MSDA, the annotated source samples are drawn from multiple domains. We denote the M source domains by S=Smm=1M, and the target domain by T. For each source domain, we denote the collected data and their corresponding ground-truth labels by xiSm,yiSmi=1NSm and yiSm1,2,,K, where NSm is the number of samples in the source domain Sm, and K is the number of classes shared by the source and target domains. Similarly, the collected target samples are represented by xiTi=1NT, where NT is the

Experiments on benchmark datasets

In this section, we compare TFFN with the baselines on three well-known MSDA benchmark datasets: Digit-five, Office-31 and Office–Home.

Conclusion

In this paper, with the motivation of mitigating the negative impact of the untransferable features on the adaptation process from a fine-grained perspective, we propose a transferable feature filtration network (TFFN) for multi-source domain adaptation. The two-stage training framework using the filtration network g is proven to be extremely effective and feasible through experiments conducted on the standard MSDA datasets, and this framework can be extended as a component to models in various

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by the National Key Research and Development Program of China (No. 2020YFA0714103), the Innovation Capacity Construction Project of Jilin Province Development and Reform Commission(2021FGWCXNLJSSZ10, 2019C053-3) and the Fundamental Research Funds for the Central Universities, JLU .

References (66)

  • ShimodairaH.

    Improving predictive inference under covariate shift by weighting the log-likelihood function

    J. Statist. Plann. Inference

    (2000)
  • LeT. et al.

    Deep multi-wasserstein unsupervised domain adaptation

    Pattern Recognit. Lett.

    (2019)
  • XuR. et al.

    Reliable weighted optimal transport for unsupervised domain adaptation

  • WangX. et al.

    Transferable attention for domain adaptation

  • NiallX. et al.

    Dataset shift in machine learning

    J. R. Stat. Soc. Series A (Stat. Soc.)

    (2009)
  • PanS.J. et al.

    A survey on transfer learning

    IEEE Trans. Knowl. Data Eng.

    (2010)
  • YangL. et al.

    Curriculum manager for source selection in multi-source domain adaptation

  • LuJ. et al.

    Fuzzy multiple-source transfer learning

    IEEE Trans. Fuzzy Syst.

    (2020)
  • ZhaoS. et al.

    Multi-source distilling domain adaptation

  • TzengE. et al.

    Adversarial discriminative domain adaptation

  • LiK. et al.

    Multi-source contribution learning for domain adaptation

    IEEE Trans. Neural Netw. Learn. Syst.

    (2021)
  • ZhuY. et al.

    Aligning domain-specific distribution and classifier for cross-domain classification from multiple sources

  • OquabM. et al.

    Learning and transferring mid-level image representations using convolutional neural networks

  • DonahueJ. et al.

    Decaf: A deep convolutional activation feature for generic visual recognition

  • YosinskiJ. et al.

    How transferable are features in deep neural networks?

    Adv. Neural Inf. Process. Syst.

    (2014)
  • GoodfellowI. et al.

    Generative Adversarial Nets, Vol. 27

    (2014)
  • KangG. et al.

    Contrastive adaptation network for unsupervised domain adaptation

  • ZhuangF. et al.

    Supervised representation learning: Transfer learning with deep autoencoders

  • ShenJ. et al.

    Wasserstein distance guided representation learning for domain adaptation

  • ZhaoH. et al.

    Adversarial multiple source domain adaptation

    Adv. Neural Inf. Process. Syst.

    (2018)
  • DuanL. et al.

    Domain adaptation from multiple sources: A domain-dependent regularization approach

    IEEE Trans. Neural Netw. Learn. Syst.

    (2012)
  • XuR. et al.

    Deep cocktail network: Multi-source unsupervised domain adaptation with category shift

  • PengX. et al.

    Moment matching for multi-source domain adaptation

  • FuY. et al.

    Partial feature selection and alignment for multi-source domain adaptation

  • BousmalisK. et al.

    Unsupervised pixel-level domain adaptation with generative adversarial networks

  • MengZ. et al.

    Adversarial teacher-student learning for unsupervised domain adaptation

  • MengZ. et al.

    Domain adaptation via teacher-student learning for end-to-end speech recognition

  • TorralbaA. et al.

    Unbiased look at dataset bias

  • TzengE. et al.

    Deep domain confusion: Maximizing for domain invariance

    Comput. Sci.

    (2014)
  • LongM. et al.

    Learning transferable features with deep adaptation networks

  • LongM. et al.

    Deep transfer learning with joint adaptation networks

  • GrettonA. et al.

    A kernel two-sample test

    J. Mach. Learn. Res.

    (2012)
  • Ben-DavidS. et al.

    A theory of learning from different domains

    Mach. Learn.

    (2010)
  • Cited by (0)

    View full text