Discovering Latent Domains for Unsupervised Domain Adaptation Through Consistency

Mancini, Massimiliano; Porzi, Lorenzo; Cermelli, Fabio; Caputo, Barbara

doi:10.1007/978-3-030-30645-8_36

Massimiliano Mancini^14,15,18,
Lorenzo Porzi¹⁶,
Fabio Cermelli^17,18 &
…
Barbara Caputo^17,18

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11752))

Included in the following conference series:

International Conference on Image Analysis and Processing

2083 Accesses
2 Citations

Abstract

In recent years, great advances in Domain Adaptation (DA) have been possible through deep neural networks. While this is true even for multi-source scenarios, most of the methods are based on the assumption that the domain to which each sample belongs is known a priori. However, in practice, we might have a source domain composed by a mixture of multiple sub-domains, without any prior about the sub-domain to which each source sample belongs. In this case, while multi-source DA methods are not applicable, restoring to single-source ones may lead to sub-optimal results. In this work, we explore a recent direction in deep domain adaptation: automatically discovering latent domains in visual datasets. Previous works address this problem by using a domain prediction branch, trained with an entropy loss. Here we present a novel formulation for training the domain prediction branch which exploits (i) domain prediction output for various perturbations of the input features and (ii) the min-entropy consensus loss, which forces the predictions of the perturbation to be both consistent and with low entropy. We compare our approach to the previous state-of-the-art on publicly-available datasets, showing the effectiveness of our method both quantitatively and qualitatively.

You have full access to this open access chapter, Download conference paper PDF

Layer-wise domain correction for unsupervised domain adaptation

Article 19 January 2018

Contrastive Vicinal Space for Unsupervised Domain Adaptation

Learning to Learn with Variational Information Bottleneck for Domain Generalization

Keywords

1 Introduction

Most learning based models rely on the assumption that training and test data are drawn from the same distribution. Unfortunately, this assumption does not hold in many real-world computer vision applications due to unpredictable changes in the environment (e.g. weather, illumination, occlusions). Despite the progress brought in visual recognition by deep learning and the availability of large fully annotated dataset, models trained on a given distribution different from the test one still struggle to generalize, giving poor performance. This problem is commonly referred to as domain shift and it is especially relevant in computer vision due to the large appearance variability of the visual data. The domain shift problem has been widely studied in the last decade and many techniques have been proposed to limit its effect [29].

Domain Adaptation (DA) methods specifically focus on learning a given task in a source domain, then transferring the acquired knowledge in the domain of interest, i.e. the target domain. In past years, researchers studied the theoretical aspects of this problem [1, 28] and proposed several shallow [14] and deep learning based [4, 10, 21, 22] algorithms. However, recent studies [7] have shown that the domain shift problem can only be alleviated but not entirely solved even adopting deep architectures.

Often, DA methods consider a single-source, single-target scenario, but a setting in which multiple source domains are available is arguably more interesting and realistic. In fact, datasets could contain images taken with different cameras, from many viewpoints or with different lighting conditions. Approaching such cases with single-source DA algorithms will lead to poor results. For this reason, many DA methods have been proposed to learn from multiple sources [3, 8, 25, 28, 35]. However, these approaches assume to know the domain label of each sample. A more challenging scenario arises when the domain to which a sample belongs is not known in advance. This problem, also referred to as latent domain discovery, consider the presence of multiple but mixed sources and/or target domains, offering either partial or no information, about the ground-truth domain of each sample. In previous years, few works [11, 13, 27, 36] focused on this setting, simultaneously performing the discovery of latent domains and using the information to learn a classification model for the target one.

In this paper, we propose a novel formulation for the domain discovery algorithm proposed in [27]. In particular, we enhance the domain classifier training by employing a different objective. This objective is based on (i) producing multiple domain predictions on perturbations of the features of a given sample and (ii) applying on those predictions the recently proposed Min-Entropy Consensus (MEC) Loss [30]. This loss enforces both consistency and low entropy for the perturbed domain predictions of a single sample. An overview of the method is reported in Fig. 1. Our empirical study demonstrates that we are able to extract meaningful latent domains from the source samples, achieving better performance than previous latent domain discovery DA methods on popular benchmarks, such as Office-31 [32] and PACS [18].

2 Related Works

Deep Domain Adaptation. In recent years, deep learning based DA approaches have show to be very effective in addressing this task. Usually, robust domain-invariant features are learned in deep architectures using either supervised neural networks [4, 10, 21, 30] or deep autoencoders [39]. Some methods [21, 22] rely on the idea of aligning source and target features by minimizing the Maximum Mean Discrepancy (MMD). A different approach is represented by methods that operates in a domain-adversarial setting [10], i.e. they focus on learning a domain-agnostic feature space by minimizing a domain confusion loss. Recent works have also explored the use of generative models [2, 31]. Our work is close to recent trends exploring the use of domain-specific batch-normalization layers [4, 5, 25, 26], since we use a variant of those layers [27] to adapt the model from the latent source domains to the target one. Our approach is also linked to consistency-based DA strategies [9, 30, 33]. Different from these works, we employ a consistency loss [30] for learning the domain prediction branch. Our work is also related to multi-source DA [8, 35, 37] and domain generalization [3, 6, 18, 23, 24]. Similarly to these scenarios, we assume the presence of multiple source domains. However, in our case these domains are mixed and we must discover them in order to exploit the advantages of multi-source DA approaches.

Latent Domain Discovery for DA. Very few works tried to address the latent domain discovery problem in the literature. While previous works on shallow features considered the use multiple Gaussian distributions [13], domain distinctiveness [11], exemplar SVMs [19, 38] and manifold learning [36], only one work addressed this problem in the context of deep DA [27]. In [27] we proposed to exploit a domain prediction branch and domain alignment layers [4, 5] to discover latent domains and improve the DA performances on the target domain. While in [27] the domain prediction branch was trained through an entropy loss, in this work we show how we can achieve similar or better results by employing a different loss, [30] which encourages both low entropy and consistency on the domain predictions for perturbations of the same input features.

3 Method

3.1 Problem Formulation and Notation

As in standard Unsupervised Domain Adaptation (UDA), we assume to have access to a source and a target domain. The source domain contains semantically labeled samples, while the target domain contains only unlabelled samples. However, different from standard UDA, we assume that the source domain is composed of a mixture of multiple domains and, contrary to multi-source DA, we do not assume to know to which domain each source sample belongs. Following previous works [27], we assume to have $\mathsf {k}$ source domains. Notice that this number might not be known a priori: in our current formulation we leave it has an hyperparameter. Source domains are characterized by unknown probability distributions $p_{\mathtt {xy}}^{s_1},\dots ,p_{\mathtt {xy}}^{s_\mathsf {k_s}}$ defined over $\mathcal {X}\times \mathcal {Y}$, where $\mathcal {X}$ is the input space (e.g. images in our case) and $\mathcal {Y}$ the output space (e.g. object categories). The source data are thus modelled as a set $\mathcal {S}=\{(x_1^s,y_1^s),\dots ,(x_\mathsf {n}^s,y_\mathsf {n}^s)\}$ with $x_\mathcal {S}=\{x_1^s,\dots ,x_\mathsf {n}^s\}$ and $y_\mathcal {S}=\{y_1^s,\dots ,y_\mathsf {n}^s\}$, the source data and label sets, respectively. The set $\mathcal {S}$ contains i.i.d. observations from a mixture distribution $p_{\mathtt {xy}}^s=\sum _{i=1}^\mathsf {k_s} \pi _{s_i} p_{\mathtt {xy}}^{s_i}$, where $\pi _{s_i}$ is the unknown probability of sampling from a source domain $s_i$. Similarly, we assume to have target domain data $\mathcal {T}=\{x_1^t,\dots ,x_\mathsf {m}^t\}$ of i.i.d. observations drawn from $p_\mathtt {x}^t$.

During training we receive semantically labeled source samples with unknown domain membership plus unlabeled target samples. Our goal is to learn a model able to address a given task (i.e. classification) in the target domain. Following [27], we address this task by using domain specific batch-normalization [15] (BN) layers to perform DA [4, 5, 20, 26]. These layers are influenced by the latent domain discovery process, performed by a domain prediction branch. With respect to [27] we propose a new objective for the domain prediction branch. In the following we will review how BN can be used to address DA [4, 5, 26] and how a simple variant can be used in the case where we have multiple but unknown source domains [27]. We will then describe how the domain assignment branch can be trained by using the Min-Entropy Consensus loss [30] (Sect. 3.3), building the whole objective for the training procedure.

3.2 Multi-domain DA-Layers

BN-based DA methods [4, 5, 20] are a simple yet effective way to tackle the DA problem. Since features extracted by a neural network tend to follow domain-dependent distributions [20], we can align them through domain specific normalization layers. Following [27], let us denote as $q^d_\mathtt {x}$ the distribution of activations for a given feature channel and domain d. Domain Alignment Layers [4, 5] (${{\,\mathrm{DAL}\,}}$) normalize an input $x^d\sim q^d_\mathtt {x}$ according to

$$\begin{aligned} {{\,\mathrm{DAL}\,}}(x^d; \mu _d, \sigma _d) = \frac{x^d - \mu _d}{\sqrt{\sigma _d^2 + \epsilon }}, \end{aligned}$$

(1)

where $\mu _d = {{\,\mathrm{E}\,}}_{x\sim q^d_\mathtt {x}}[x]$, $\sigma ^2_d = {{\,\mathrm{Var}\,}}_{x\sim q^d_\mathtt {x}}[x]$ are mean and variance of the input distribution, respectively, and $\epsilon >0$ is a small constant to avoid numerical issues. During training the statistics $\{\mu _d,\sigma ^2_d\}$ are computed over the current mini-batch, thus we apply standard BN but separately for each available d.

The previous formulation requires full domain knowledge (i.e. d) for each sample, something that we do not have in our setting for the source domain. In [27] a variant of the ${{\,\mathrm{DAL}\,}}$ layers called Multi-Domain Alignmet Layers (${{\,\mathrm{mDA}\,}}$) has been proposed to tackle this issue. ${{\,\mathrm{mDA}\,}}$ layers exploit the probabilities that a source sample belongs to one of the latent domains. Formally, denoting as $w_{i,d}$ the probability of $x_i$ belonging to d and a source mini-batch $\mathcal {B}=\{x_i\}_{i=1}^\mathsf {b}$, ${{\,\mathrm{mDA}\,}}$ layers normalize $x_i$ as follows:

$$\begin{aligned} {{\,\mathrm{mDA}\,}}(x_i, \varvec{w}_i; \varvec{\hat{\mu }}, \varvec{\hat{\sigma }}) = \sum _{d\in \mathcal {D}} w_{i,d} \frac{x_i - \hat{\mu }_d}{\sqrt{\hat{\sigma }_d^2 + \epsilon }}, \end{aligned}$$

(2)

where $\varvec{w}_i=\{w_{i,d}\}_{d\in \mathcal {D}}$, $\varvec{\hat{\mu }}=\{\hat{\mu }_d\}_{d\in \mathcal {D}}$, $\varvec{\hat{\sigma }}=\{\hat{\sigma }^2_d\}_{d\in \mathcal {D}}$ and $\mathcal {D}$ is the set of source latent domains. Notice that $\mu _d$ and $\sigma _d^2$ are computed in a weighted fashion:

$$\begin{aligned} \begin{aligned} \mu _d&= \sum _{i=1}^\mathsf {b} \alpha _{i,d} x_i,&\sigma _d^2&= \sum _{i=1}^\mathsf {b} \alpha _{i,d} (x_i - \mu _d)^2, \;\;\;\text {with}\;\;\;\alpha _{i,d} = \frac{w_{i,d}}{\sum _{j=1}^\mathsf {b} w_{j,d}} \end{aligned} \end{aligned}$$

(3)

Equation (2) is used to normalize source samples in our setting, where the domain of each sample is not known a priori. While for the target domain we can directly use (1), this formulation can be easily extended to the case where also the target is a mixture of multiple datasets.

3.3 Min-Entropy Consensus Loss for Domain Prediction

A crucial aspect of ${{\,\mathrm{mDA}\,}}$ layers is the domain assignment $\varvec{w}_{i}$ that each sample receives. To this extent, as in [27] we employ a domain prediction branch. This branch is composed by a minimal set of layers followed by a softmax operation on $\mathsf {k}$ outputs. This branch is a different section of the network which shares with the classification part only the bottom-most layers, due to their higher domain specificity [27]. In [27] the domain prediction branch is trained by exploiting an entropy loss. In this work, we argue that we can train a more effective domain prediction branch if we enforce the entropy loss through consensus among domain assignments for perturbations of the same input.

Formally, let us define as $g^\theta $ the domain prediction branch, parametrized by $\theta $. We split it into two parts: $g_{E}^\theta $ and $g_{D}^\theta $, denoting the feature extractor and the domain classifier respectively. Given the low-level features $x_i$, in [27] the domain prediction branch produces the domain assignments $\varvec{w}_{i}$ as follows:

$$\begin{aligned} \varvec{w}_i = g^\theta (x_i) = g_{D}^\theta (g_{E}^\theta (x_i)) \end{aligned}$$

(4)

In order to obtain multiple assignments of perturbed version of the input, we employ a non-parametric random transformation $\phi $. The assignment of the perturbed sample is obtained by replacing the feature extraction function $g_E^\theta $ with $\phi \circ g_E^\theta $:

$$\begin{aligned} \varvec{\hat{w}}_i = g^\theta (x_i) = g_{D}^\theta (\phi (g_{E}^\theta (x_i))) \end{aligned}$$

(5)

where $\varvec{\hat{w}}_i$ denotes the assignment given to the perturbed features. Since $\phi $ is random, applying this function multiple times on the same input will produce different outputs. With this in mind we can create a matrix $\varvec{\hat{W}}_i = [\varvec{\hat{w}}_i^1, \cdots , \varvec{\hat{w}}_i^\mathsf {r}]$ where each element $\varvec{\hat{w}}_i^j$ is obtained by classifying with $g_D^\theta $ a different application of $\phi $ on the features extracted by $g_E^\theta $.

Since $\varvec{\hat{W}}_i$ is a set of $\mathsf {r}$ predictions related to different perturbations of the same sample, we can enforce consistency within $\varvec{\hat{W}}_i$, obtaining an unsupervised objective for the domain prediction branch. However, as noted in [30], standard consistency loss [9, 33] force only consistent predictions across perturbations of the same sample, without taking into account the actual confidence on the assignment. To this extent, we follow [30] and we employ the Min-Entropy Consensus (MEC) loss as an objective for the domain classifier. Given a set $\varvec{\hat{W}}_i = [\varvec{\hat{w}}_i^1, \cdots , \varvec{\hat{w}}_i^\mathsf {r}]$, we minimize the following objective:

$$\begin{aligned} \text {MEC}(x_i) = -\frac{1}{\mathsf {r}}\max _{d\in D} \sum _{j=1}^{\mathsf {r}} \log (w^j_{i,d}) \end{aligned}$$

(6)

The domain loss on the full source set is:

$$\begin{aligned} L_\text {dom} = \frac{1}{\mathsf {n}}\sum _{x\in x_\mathcal {S}} \text {MEC}(x_i) \end{aligned}$$

(7)

With (7) we have defined a loss which allows to obtain domain predictions that are both consistent and confident for a given sample. In the experiments we use Dropout [34] as $\phi $ with ratio 0.5, setting $\mathsf {r}=2$ as in [30].

To train the full architecture we need to define an objective for the semantic classification part. Following [4, 5, 27] we employ a cross-entropy loss on the labeled source samples and an entropy loss for the unlabeled target ones. Denoting as $f_C^\theta $ the classification branch we have:

$$\begin{aligned} \begin{aligned} L_\text {cls}(\theta )=&- \frac{1}{\mathsf {n}} \sum _{i=1}^\mathsf {n} \log f_C^\theta (y_i^s; x_i^s)+\frac{\lambda _C}{\mathsf {m}} \sum _{i=1}^\mathsf {m} H(f_C^\theta (\cdot ;x_i^t)). \end{aligned} \end{aligned}$$

(8)

The first term on the right-hand-side is the average log-loss related to the supervised examples in $\mathcal {S}$, where $f_C^\theta (y_i^s; x_i^s)$ denotes the output of the classification branch of the network for a source sample, i.e. the predicted probability of $x_i^s$ having class $y_i^s$. The second term on the right-hand-side of (8) is the entropy H of the classification distribution $f_C^\theta (\cdot ; x_i^t)$, averaged over all unlabeled target examples $x_i^t$ in $\mathcal {T}$, scaled by a positive hyperparameter $\lambda _C$. The full objective is:

$$\begin{aligned} \begin{aligned} L(\theta ) = L_\text {cls}(\theta )+\lambda _D L_\text {dom}(\theta ), \end{aligned} \end{aligned}$$

(9)

where $L_\text {cls}$ is a loss term that penalizes based on the final classification task, while $L_\text {dom}$ accounts for the domain classification task, with a hyperparameter $\lambda _D$ balancing the two. We highlight that, due to dependency of the classification branch on the mDA layers, the network learns to predict domain assignment probabilities that also result in a low classification loss. A schematic representation of our architecture is depicted in Fig. 1. Since the semantic classification part needs a single domain assignment for each sample, we set $\varvec{w}_i$ as the average of the domain predictions on perturbed inputs: i.e. $\varvec{w}_i = \frac{1}{\mathsf {r}}\sum _j=1^\mathsf {r} \hat{w}_i^j$.

4 Experiments

4.1 Experimental Setup

In our evaluation we consider the following benchmarks: the PACS dataset [18] and the Office-31 [32] dataset.

Office-31 is a widely used DA benchmark which contains images of 31 object categories collected from 3 different sources: Webcam (W), DSLR camera (D) and the Amazon website (A). We test our model on the multi-source setting [36], where each domain is in turn considered as target, while the others as sources. We use this benchmark to compare with [27] as well as previous shallow algorithms [11, 13, 36]. In this setting we use as input to our algorithm the activations of the $\mathtt {fc7}$ layer of an AlexNet [17] architecture, applying mDA layers to the features and after the domain classifier, as in [27]. The structure of the domain prediction branch is the same of [27], except for the addition of a BN layer (without scale and bias) to the domain logits, since we found that this addition stabilizes the training procedure. The hyperparameters used for training are the same of [27], with $\lambda _D=0.5$ and $\mathsf {k}=2$.

PACS [18] is a recently proposed dataset which contains images of 7 categories extracted from 4 different representations, with significant domain shift: i.e. Photo (P), Art paintings (A), Cartoon (C) and Sketch (S). Following [18], we train our model considering 3 domains as sources and the remaining as target, using all the images of each domain. Differently from [18] we consider a DA setting (i.e. target data are available at training time). For the experiments on the PACS dataset we consider the ResNet-18 architecture [12]. As in [27], to apply our approach, we replace each BN layer in the network with an mDA-layer. As in the previous case, the structure of the domain prediction branch and the hyperparameters selected for training are the same of [27], with $\lambda _D=0.5$ and $\mathsf {k}=3$ and with the insertion of a BN layer after the domain prediction logits.

We implement all the models with the Caffe [16] framework and our evaluation is performed using a NVIDIA GeForce 1080 GTX GPU. Both the architectures have been initialized with their weights pretrained on ImageNet: for AlexNet we take the pre-trained model available in Caffe, while for ResNet we use the converted version of the original model developed in Torch^{Footnote 1}.

4.2 Results

Analysis of Our Method. In a first series of experiments we compare our model and [27] on the PACS dataset using the ResNet-18 architecture. As a baseline we report the performances of the base architecture, the single source DA model of [5] (DIAL) and the multi-source version of [5] which is our upper bound since it assumes perfect domain separation (Multi-source DA). The results are shown in Table 1. As the table shows our model achieves comparable performances with respect to [27] in average. By analyzing the results it is possible to see that our model performs comparably to the Multi-source DA upper bound in the domains where the gap with the single source baseline is minimal (Photo and Art). However our model largely outperforms [27] when Sketch is used as target. We ascribe this behaviour to the fact that enforcing consistency allows to regularize and strengthen the latent domain discovery process, providing favourable domain separation even when the difference among the domains is less pronounced (as in this case, where Photo, Art and Cartoon are the source domains). At the same time, this regularization could harm the confidence of the domain prediction branch and the statistics estimated by Eq. 2 if the source domains are close. This happens for instance when Cartoon is employed as target, where there two domains (Photo and Art) are close to each other and far from the third domain (Sketch).

Table 1. PACS dataset: comparison of different methods using the ResNet architecture. The first row indicates the target domain, while all the others are considered as sources.

Full size table

To understand the outcome of the latent domain discovery process, we report histograms analyzing how many samples of a domain receive a given probability to belong to a latent domain. The analysis is shown in Fig. 2. As the figure shows, every time Sketch is among the source domains (yellow bar) almost all its samples are assigned to a single latent domain. Moreover, when Sketch is present, since the difference among the other source domains is more subtle, they tend to receive assignments spread among the other two latent domains, even if with different distributions. This is clear in Fig. 2b where Photo samples tend to be assigned to the first latent domain and Cartoon samples to the second one. Similarly, in the case where Sketch is the target (Fig. 2d), Cartoon samples are assigned to the first latent domains, with Photo samples mainly assigned to the second, and Art samples spread among the three latent domains. This latter outcome is reasonable due to the fact that Art is a domain which is visually intermediate between Photo and Cartoon. A similar effect can be noted when Cartoon and Sketch are both source domains: due to the fact that Cartoon is the closest visual domain to Sketch, its samples may receive probabilities even in the latent domain to which Sketch samples are assigned.

To further confirm this analysis, Fig. 3, reports the top images assigned to each of the latent domains. The figure highlights also how the appearance plays a crucial role in the domain discovery process, since the dominant color of an image highly influences its domain assignment. This can be an important aspects for exploring future applications in the real world, where the shift might be caused by changes in e.g. illumination and weather condition.

Comparison with the State-of-the-Art. Finally, we compare the performances of our model against state of the art approaches on the Office-31 dataset, using as input $\mathtt {fc7}$ features of the AlexNet architecture. We compare with deep approach of [27] and with the shallow ones [11, 13, 36], which are among the few approaches tackling the latent domain discovery problem. The results are shown in Table 2. Our model outperforms both shallow [11, 13, 36] and deep [27] methods. Our algorithm obtains a gain of almost 1% in average with respect to the baseline [27], confirming the effectiveness of the proposed training objective for the domain classification branch and the fact that our algorithm performs better than [27] when the difference among the source domains is less marked.

Table 2. Office-31: comparison with state-of-the-art algorithms. In the first row we indicate the source (top) and the target domains (bottom).

Full size table

5 Conclusions

In this work we have presented an algorithm for addressing the problem of latent domain DA, where the source domain is a mixture of multiple datasets and we do not know the domain membership of each sample. Our method is based on [27], where the latent DA task is solved by employing domain-specific alignment layers. These layers perform a normalization weighted on the probability of a sample to belong to a given domain, with the probability predicted by a domain classifier. While in [27] an entropy loss is employed to train the domain prediction branch, here we propose to use the Minimal-Entropy Consensus (MEC) loss [30] on perturbed version of the features that we provide to the domain classifier for a single sample. Due to the consistency, this loss is more stable with respect to standard entropy and regularizes the domain separation process. Results on the PACS and Office-31 datasets show that our model outperforms all the baselines in Office-31, while achieving similar or higher performances on PACS with respect to [27]. In future works we plan to expand the findings of this work by exploring the impact of using various perturbation and consensus strategies.

Notes

1.
https://github.com/HolmesShuan/ResNet-18-Caffemodel-on-ImageNet.

References

Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., Vaughan, J.W.: A theory of learning from different domains. Mach. Learn. 79(1), 151–175 (2010)
Article MathSciNet Google Scholar
Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., Krishnan, D.: Unsupervised pixel-level domain adaptation with generative adversarial networks. In: CVPR (2017)
Google Scholar
Carlucci, F.M., D’Innocente, A., Bucci, S., Caputo, B., Tommasi, T.: Domain generalization by solving jigsaw puzzles. In: CVPR (2019)
Google Scholar
Carlucci, F.M., Porzi, L., Caputo, B., Ricci, E., Rota Bulò, S.: Autodial: automatic domain alignment layers. In: ICCV (2017)
Google Scholar
Carlucci, F.M., Porzi, L., Caputo, B., Ricci, E., Bulò, S.R.: Just DIAL: domain alignment layers for unsupervised domain adaptation. In: Battiato, S., Gallo, G., Schettini, R., Stanco, F. (eds.) ICIAP 2017. LNCS, vol. 10484, pp. 357–369. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68560-1_32
Chapter Google Scholar
D’Innocente, A., Caputo, B.: Domain generalization with domain-specific aggregation modules. In: Brox, T., Bruhn, A., Fritz, M. (eds.) GCPR 2018. LNCS, vol. 11269, pp. 187–198. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-12939-2_14
Chapter Google Scholar
Donahue, J., et al.: Decaf: a deep convolutional activation feature for generic visual recognition. In: ICML (2014)
Google Scholar
Duan, L., Tsang, I.W., Xu, D., Chua, T.S.: Domain adaptation from multiple sources via auxiliary classifiers. In: ICML (2009)
Google Scholar
French, G., Mackiewicz, M., Fisher, M.: Self-ensembling for visual domain adaptation. In: ICLR (2018)
Google Scholar
Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: ICML (2015)
Google Scholar
Gong, B., Grauman, K., Sha, F.: Reshaping visual datasets for domain adaptation. In: NIPS (2013)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Hoffman, J., Kulis, B., Darrell, T., Saenko, K.: Discovering latent domains for multisource domain adaptation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, pp. 702–715. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33709-3_50
Chapter Google Scholar
Huang, J., Gretton, A., Borgwardt, K.M., Schölkopf, B., Smola, A.J.: Correcting sample selection bias by unlabeled data. In: NIPS (2006)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: ACM-Multimedia, pp. 675–678. ACM (2014)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)
Google Scholar
Li, D., Yang, Y., Song, Y.Z., Hospedales, T.M.: Deeper, broader and artier domain generalization. In: ICCV (2017)
Google Scholar
Li, W., Xu, Z., Xu, D., Dai, D., Van Gool, L.: Domain generalization and adaptation using low rank exemplar SVMs. IEEE T-PAMI 40(5), 1114–1127 (2018)
Article Google Scholar
Li, Y., Wang, N., Shi, J., Liu, J., Hou, X.: Revisiting batch normalization for practical domain adaptation. arXiv preprint arXiv:1603.04779 (2016)
Long, M., Wang, J.: Learning transferable features with deep adaptation networks. In: ICML (2015)
Google Scholar
Long, M., Wang, J., Jordan, M.I.: Unsupervised domain adaptation with residual transfer networks. In: NIPS (2016)
Google Scholar
Mancini, M., Bulò, S.R., Caputo, B., Ricci, E.: Best sources forward: domain generalization through source-specific nets. In: ICIP (2018)
Google Scholar
Mancini, M., Bulò, S.R., Caputo, B., Ricci, E.: Robust place categorization with deep domain generalization. IEEE RAL 3(3), 2093–2100 (2018)
Google Scholar
Mancini, M., Bulò, S.R., Caputo, B., Ricci, E.: Adagraph: Unifying predictive and continuous domain adaptation through graphs. In: CVPR (2019)
Google Scholar
Mancini, M., Karaoguz, H., Ricci, E., Jensfelt, P., Caputo, B.: Kitting in the wild through online domain adaptation. In: IROS (2018)
Google Scholar
Mancini, M., Porzi, L., Bulò, S.R., Caputo, B., Ricci, E.: Boosting domain adaptation by discovering latent domains. In: CVPR (2018)
Google Scholar
Mansour, Y., Mohri, M., Rostamizadeh, A.: Domain adaptation: Learning bounds and algorithms. arXiv preprint arXiv:0902.3430 (2009)
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
Article Google Scholar
Roy, S., Siarohin, A., Sangineto, E., Bulo, S.R., Sebe, N., Ricci, E.: Unsupervised domain adaptation using feature-whitening and consensus loss. In: CVPR (2019)
Google Scholar
Russo, P., Carlucci, F.M., Tommasi, T., Caputo, B.: From source to target and back: symmetric bi-directional adaptive GAN. In: CVPR (2018)
Google Scholar
Saenko, K., Kulis, B., Fritz, M., Darrell, T.: Adapting visual category models to new domains. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 213–226. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_16
Chapter Google Scholar
Saito, K., Ushiku, Y., Harada, T.: Asymmetric tri-training for unsupervised domain adaptation. arXiv preprint arXiv:1702.08400 (2017)
Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Sun, Q., Chattopadhyay, R., Panchanathan, S., Ye, J.: A two-stage weighting framework for multi-source domain adaptation. In: NIPS (2011)
Google Scholar
Xiong, C., McCloskey, S., Hsieh, S.H., Corso, J.J.: Latent domains modeling for visual domain adaptation. In: AAAI (2014)
Google Scholar
Xu, R., Chen, Z., Zuo, W., Yan, J., Lin, L.: Deep cocktail network: multi-source unsupervised domain adaptation with category shift. In: CVPR (2018)
Google Scholar
Xu, Z., Li, W., Niu, L., Xu, D.: Exploiting low-rank structure from latent domains for domain generalization. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 628–643. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10578-9_41
Chapter Google Scholar
Zeng, X., Ouyang, W., Wang, M., Wang, X.: Deep learning of scene-specific classifier for pedestrian detection. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 472–487. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10578-9_31
Chapter Google Scholar

Download references

Acknowledgements

This work was partially supported by the ERC grant 637076 - RoboExNovo.

Author information

Authors and Affiliations

Sapienza University of Rome, Rome, Italy
Massimiliano Mancini
Fondazione Bruno Kessler, Trento, Italy
Massimiliano Mancini
Mapillary Research, Graz, Austria
Lorenzo Porzi
Politecnico di Torino, Turin, Italy
Fabio Cermelli & Barbara Caputo
Italian Institute of Technology, Turin, Italy
Massimiliano Mancini, Fabio Cermelli & Barbara Caputo

Authors

Massimiliano Mancini
View author publications
You can also search for this author in PubMed Google Scholar
Lorenzo Porzi
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Cermelli
View author publications
You can also search for this author in PubMed Google Scholar
Barbara Caputo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Massimiliano Mancini .

Editor information

Editors and Affiliations

University of Trento, Povo, Italy
Elisa Ricci
Mapillary Research, Graz, Austria
Samuel Rota Bulò
University of Amsterdam, Amsterdam, The Netherlands
Cees Snoek
Fondazione Bruno Kessler, Povo, Italy
Oswald Lanz
Fondazione Bruno Kessler, Povo, Italy
Stefano Messelodi
University of Trento, Povo, Italy
Nicu Sebe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mancini, M., Porzi, L., Cermelli, F., Caputo, B. (2019). Discovering Latent Domains for Unsupervised Domain Adaptation Through Consistency. In: Ricci, E., Rota Bulò, S., Snoek, C., Lanz, O., Messelodi, S., Sebe, N. (eds) Image Analysis and Processing – ICIAP 2019. ICIAP 2019. Lecture Notes in Computer Science(), vol 11752. Springer, Cham. https://doi.org/10.1007/978-3-030-30645-8_36

Download citation

DOI: https://doi.org/10.1007/978-3-030-30645-8_36
Published: 02 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30644-1
Online ISBN: 978-3-030-30645-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)