Transferable feature filtration network for multi-source domain adaptation
Introduction
Due to the powerful fitting ability of the deep neural networks and the rich annotated datasets, deep learning methods have demonstrated great success in tasks such as image classification, semantic segmentation, and natural language processing [1], [2]. However, it is well-known that due to the existence of dataset bias and domain shift, extending a trained model that performs well on the source domain directly to a disparate target domain often fails to preserve good performance [3], [4]. Moreover, the expensive cost of collecting and annotating large amounts of target data, especially those with scarce accessibility and with annotations accessible only to senior experts, hinders attempts to fine-tune source models using labeled target data [5], [6], [7], [8], [9]. This phenomenon has drastically motivated the development of domain adaptation, which seeks to achieve effective learning on the target domain by leveraging the knowledge transferred from the source domain from an unsupervised or semi-supervised perspective. To address this issue, for a model trained on labeled source datasets, unsupervised domain adaptation (UDA) is dedicated to improving the generalization performance on completely unlabeled target datasets.
In recent years, leveraging deep neural networks, which have been shown to have the ability to learn more transferable knowledge, to learn domain-invariant representations has evolved into one of the mainstream approaches for unsupervised domain adaptation, spawning key technological revolutions [1], [2], [10], [11], [12], [13]. Along with the effective application of GAN [14] in domain adaptation, adversarial networks dedicated to achieving representation alignments through adversarial training have become one of the most commonly adopted tools for extracting domain-invariant features. In addition, there are also approaches that focus attention on how to reduce the domain discrepancy. A widely used strategy is to project the source and target domain samples into a shared latent space and then minimize the domain discrepancy by some defined metrics such as the maximum mean discrepancy (MMD) [10], [15] and the Kullback–Leibler (KL) divergence [16]. However, in the presence of excessive interdomain distribution disparity, UDA focusing on interdomain knowledge transfer on a single source–target pair in a general sense will have to face resistance from additional interfering factors, which often leads to a suboptimal solution [17], [18]. Consequently, the rational suggestion of using multiple source domains for adaptation to the target domain, called multi-source domain adaptation (MSDA) [19], has been proposed in due course to cater to a practical scenario where a wide range of data sources are available in real-world contexts. Unfortunately, an increase in the annotated samples collected from the source domain does not directly imply a more beneficial transfer, as it brings the new challenge of how to handle domain shifts between multiple source domains, which may lead to performance inferior to that of using a single source domain [5], [20], [21].
As a key solution for domain adaptation, adversarial training is incorporated into several MSDA methods [5], [18], [20], [22]. While to some extent many advances are attributed to the progressiveness of the conventional adversarial method [14], the coarse-grained cross-domain alignment of feature representations neglects the fact that the features in different regions of an image differ in their transferability [2]. In general, domain-invariant features are equipped with a higher degree of transferability, while features that vary significantly with the domain (e.g., background features) are not transferable, but can still be aligned. These domain-varying features introduce additional interference in representation alignments between multiple source domains and hinder the representation matching of the target domain and a mixture of source domains. To achieve discrimination for the domains to which the samples belong, adversarial discriminative methods indiscriminately enforce an alignment of all the features, including those that are domain-varying. A serious consequence is that in the absence of additional supervision, the discriminator may rely excessively on the domain-varying features to reflect the distinguishability between domains. Erroneous judgments with high confidence from the classification of target samples by the classifier can be triggered with this excessive dependence, as shown in Fig. 1. How to focus the attention of the domain discriminator on the features with high transferability is a problem worth investigating and is the main work of this paper.
In this paper, we design a feature filtration mechanism that incorporates attention, and innovatively construct a new network architecture for traditional adversarial methods, called a transferable feature filtration network (TFFN). Specifically, the framework is equipped with a specially designed two-stage training procedure consisting of two stages, filtering and repairing, which motivates the filtration network to capture highly transferable features and filter out untransferable features under the supervision of the classifier by alternating the training between stages. Thus, the domain discriminator is encouraged to focus on the alignment of the transferable features, which is valuable since the domain shift between source domains is largely alleviated. In the repair phase, we propose to enhance the adaptability of the filtration network, i.e., to ensure that the repair of the filtration network is not only supervised by the source classifier but is also equally compatible with the target domain. Inspired by imitation learning methods [23], [24], [25], we employ an additional classifier dedicated to the target domain that constitutes a teacher–student structure (TS) with the source classifier, which is able to provide more complete supervision and enhanced adaptability to the filtration network. In addition, to ensure the transferability of the filtered features at the class level, we introduce a filtration consistency loss based on attention and pseudo labels.
Our contributions to this work are as follows:
- •
For the MSDA scenario, we propose a new feature filtration mechanism that exploits attention, improves traditional adversarial methods and explores the feasibility of using feature filtration networks to extract the transferable features.
- •
To let the discriminator operate under the constraints of a well-established filtration network, we creatively design and incorporate a two-stage training procedure, dividing the entire training process into two subprocesses, filtering and repairing. To the best of our knowledge, we are the first to use two-stage training that separates feature alignment and image classification to exploit the transferability of the features.
- •
To ensure that the TFFN is supervised from both the source and target domains, we additionally employ a classifier dedicated to the target samples based on imitation learning and verify the effectiveness of this TS structure for enhancing the adaptability of the filtration network on the target domain. Moreover, we propose a filtration consistency loss with the help of pseudo labels, recommending enhancing the cross-domain consistency of the filtration network for specific classes.
- •
We conduct extensive experiments on the MSDA benchmark datasets and the experimental results demonstrate that our TFFN outperforms the existing methods by a large margin.
Section snippets
Unsupervised domain adaptation
Typically, UDA hopes to generalize a model trained only with the supervised knowledge of a single source domain to a target domain in which the labels are completely lacking; however, the presence of a domain shift hinders the distribution matching for these two domains [13], [26], [27]. The various UDA methods mainly focus on three different perspectives. The first category of approaches explores overcoming the hindrance by minimizing the discrepancy metric. The work in [28], [29], [30]
Problem formulation
In MSDA, the annotated source samples are drawn from multiple domains. We denote the source domains by , and the target domain by . For each source domain, we denote the collected data and their corresponding ground-truth labels by and , where is the number of samples in the source domain , and is the number of classes shared by the source and target domains. Similarly, the collected target samples are represented by , where is the
Experiments on benchmark datasets
In this section, we compare TFFN with the baselines on three well-known MSDA benchmark datasets: Digit-five, Office-31 and Office–Home.
Conclusion
In this paper, with the motivation of mitigating the negative impact of the untransferable features on the adaptation process from a fine-grained perspective, we propose a transferable feature filtration network (TFFN) for multi-source domain adaptation. The two-stage training framework using the filtration network is proven to be extremely effective and feasible through experiments conducted on the standard MSDA datasets, and this framework can be extended as a component to models in various
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work is supported by the National Key Research and Development Program of China (No. 2020YFA0714103), the Innovation Capacity Construction Project of Jilin Province Development and Reform Commission(2021FGWCXNLJSSZ10, 2019C053-3) and the Fundamental Research Funds for the Central Universities, JLU .
References (66)
Improving predictive inference under covariate shift by weighting the log-likelihood function
J. Statist. Plann. Inference
(2000)- et al.
Deep multi-wasserstein unsupervised domain adaptation
Pattern Recognit. Lett.
(2019) - et al.
Reliable weighted optimal transport for unsupervised domain adaptation
- et al.
Transferable attention for domain adaptation
- et al.
Dataset shift in machine learning
J. R. Stat. Soc. Series A (Stat. Soc.)
(2009) - et al.
A survey on transfer learning
IEEE Trans. Knowl. Data Eng.
(2010) - et al.
Curriculum manager for source selection in multi-source domain adaptation
- et al.
Fuzzy multiple-source transfer learning
IEEE Trans. Fuzzy Syst.
(2020) - et al.
Multi-source distilling domain adaptation
- et al.
Adversarial discriminative domain adaptation