Unsupervised Domain Adaptation with Regularized Domain Instance Denoising

Csurka, Gabriela; Chidlowskii, Boris; Clinchant, Stéphane; Michel, Sophia

doi:10.1007/978-3-319-49409-8_37

Gabriela Csurka¹⁵,
Boris Chidlowskii¹⁵,
Stéphane Clinchant¹⁵ &
…
Sophia Michel¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9915))

Included in the following conference series:

European Conference on Computer Vision

8260 Accesses
4 Citations

Abstract

We propose to extend the marginalized denoising autoencoder (MDA) framework with a domain regularization whose aim is to denoise both the source and target data in such a way that the features become domain invariant and the adaptation gets easier. The domain regularization, based either on the maximum mean discrepancy (MMD) measure or on the domain prediction, aims to reduce the distance between the source and the target data. We also exploit the source class labels as another way to regularize the loss, by using a domain classifier regularizer. We show that in these cases, the noise marginalization gets reduced to solving either the linear matrix system $\mathbf{A}\mathbf{X}=\mathbf{B}$, for which there exists a closed-form solution, or to a Sylvester linear matrix equation $\mathbf{A}\mathbf{X}+\mathbf{X}\mathbf{B}=\mathbf{C}$ that can be solved efficiently using the Bartels-Stewart algorithm. We did an extensive study on how these regularization terms improve the baseline performance and we present experiments on three image benchmark datasets, conventionally used for domain adaptation methods. We report our findings and comparisons with state-of-the-art methods.

You have full access to this open access chapter, Download conference paper PDF

Evaluating Stacked Marginalised Denoising Autoencoders Within Domain Adaptation Methods

Unsupervised domain adaptation with target reconstruction and label confusion in the common subspace

Article 15 November 2018

Graph-structure constraint and Schatten p-norm-based unsupervised domain adaptation for image classification

Article 08 August 2020

Keywords

1 Introduction

Domain Adaptation problems arise each time we need to leverage labeled data in one or more related source domains, to learn a classifier for unseen or unlabeled data in a target domain. The domains are assumed to be related, but not identical. The underlying domain shift occurs in multiple real-world applications. Numerous approaches have been proposed in the last years to address textual and visual domain adaptation (we refer the reader to [23, 32, 36] for recent surveys on transfer learning and domain adaptation methods). For text data, the domain shift is frequent in named entity recognition, statistical machine translation, opinion mining, speech tagging and document ranking [3, 11, 33, 41]. Domain adaptation has equally received a lot of attention in computer vision [1, 13–15, 17, 20–22, 29, 34, 35] where domain shift is a consequence of changing conditions, such as background, location or pose, or considering different image types, such as photos, paintings, sketches [4, 9, 25].

In this paper, we build on an approach to domain adaptation based on noise marginalization [5]. In deep learning, a denoising autoencoder (DA) learns a robust feature representation from training examples. In the case of domain adaptation, it takes the unlabeled instances of both source and target data and learns a new feature representation by reconstructing the original features from their noised counterparts. A marginalized denoising autoencoder (MDA) is a technique to marginalize the noise at training time; it avoids the explicit data corruption and does not require an optimization procedure for learning the model parameters but computes the model in a closed form. This makes MDAs scalable and computationally faster than the regular denoising autoencoders. The principle of noise marginalization has been successfully extended to learning with corrupted features [30], link prediction and multi-label learning [6], relational learning [7], collaborative filtering [26] and heterogeneous cross-domain learning [27, 40].

The marginalized domain adaptation refers to such a denoising of source and target instances that explicitly makes their features domain invariant. To achieve this goal, we extend the MDA with a domain regularization term. We explore three ways of such a regularization. The first way uses the maximum mean discrepancy (MMD) measure [24]. The second way is inspired by the adversarial learning of deep neural networks [19]. The third regularization term is based on preserving accurate classification of the denoised source instances. In all cases, the regularization term belongs to the class of squared loss functions. This guarantees the noise marginalization and the computational efficiency, either as a closed form solution or as a solution of Sylvester linear matrix equation $\mathbf{A}\mathbf{X}+\mathbf{X}\mathbf{B}=\mathbf{C}$.

2 Feature Denoising for Domain Adaptation

Let $\mathbf{X}^s=[\mathbf{X}_1,\ldots ,\mathbf{X}_{n_S}]$ denote a set of $n_S$ source domains, with the corresponding labels $\mathbf{Y}^s=[\mathbf{Y}_1,\ldots \mathbf{Y}_{n_S}]$, and let $\mathbf{X}^t$ denote the unlabeled target domain data. The Marginalized Denoising Autoencoder (MDA) approach [5] is to reconstruct the input data from partial random corruption [39] with a marginalization that yields optimal reconstruction weights $\mathbf{W}$ in a closed form. The MDA minimizes the loss written as:

$$\begin{aligned} \mathcal {L}(\mathbf{W}, \mathbf{X}) = \frac{1}{K} \sum _{k = 1}^K\Vert \mathbf{X}-\tilde{\mathbf{X}}_k \mathbf{W}\Vert ^2 + \omega \Vert \mathbf{W}\Vert ^2, \end{aligned}$$

(1)

where $\tilde{\mathbf{X}}_k \in \mathrm{I\!R}^{N \times d}$ is the k-th corrupted version of $\mathbf{X}=[\mathbf{X}^s, \mathbf{X}^t]$ by random feature dropout with a probability p, $\mathbf{W}\in \mathrm{I\!R}^{d \times d}$, and $\omega \Vert \mathbf W \Vert ^2$ is a regularization term. To avoid the explicit feature corruption and an iterative optimization, Chen et al. [5] has shown that in the limiting case $K\rightarrow \infty $, the weak law of large numbers allows to rewrite $\mathcal {L}(\mathbf{W},\mathbf{X})$ as its expectation. The optimal solution $\mathbf{W}$ can be written as $\mathbf{W}=(\mathbf{Q}+\omega \mathbf{I}_d)^{-1} \mathbf{P}$, where $\mathbf{P}=\mathbb {E}[\mathbf{X}^\top \tilde{\mathbf{X}}]$ and $\mathbf{Q}=\mathbb {E}[\tilde{\mathbf{X}}^\top \tilde{\mathbf{X}}]$ depend only on the covariance matrix $\mathbf{S}$ of the uncorrupted data, $\mathbf{S}=\mathbf{X}^{\top } \mathbf{X}$, and the noise level p:

$$\begin{aligned} \mathbf{P}= (1-p)\mathbf{S}\quad \mathrm{and} \quad \mathbf{Q}_{ij} =\left[ \begin{array}{ll} \mathbf{S}_{i j} (1-p)^2 , &{} \mathrm{if} \quad i \ne j,\\ \mathbf{S}_{i j} (1-p), &{} \mathrm{if} \quad i = j. \end{array}\right. \end{aligned}$$

(2)

2.1 Domain Regularization

To better address the domain adaptation, we extend the feature denoising with a domain regularization in order to favor the learning of domain invariant features. We explore three versions of the domain regularization. We combine them with the loss () and show how to marginalize the noise for each version and to keep $\mathbf{W}$ as a solution of a linear matrix equation. The three versions of the domain regularization are as follows:

Regularization $\mathcal {R}_{m}$ Based on the Maximum Mean Discrepancy (MMD) with the Linear Kernel; It aims at reducing the gap between the denoised domain means. The MMD was already used for domain adaptation with feature transformation learning [2, 31] and as a regularizer for the cross-domain classifier learning [13, 28, 38]. In this paper, in contrast to these papers where the distributions are approximated with MMD using multiple nonlinear kernels we use MMD with the linear kernel^{Footnote 1}, the only one allowing us to keep the solution for $\mathbf{W}$ closed form.

The regularization term for K corrupted versions of $\mathbf{X}$ is given by:

$$\begin{aligned} \mathcal {R}_{m}=\frac{1}{K} \sum _{k=1}^K Tr (\mathbf{W}^\top \tilde{\mathbf{X}}_k^\top \mathbf{N}\tilde{\mathbf{X}}_k \mathbf{W}), \quad \mathrm{where} \quad \mathbf{N}=\Big [ \begin{array}{rr} \frac{1}{N^2_s} \mathbf{1}^{s,s} &{} \frac{1}{N_s N_t} \mathbf{1}^{s,t} \\ \frac{1}{N_s N_t} \mathbf{1}^{s,t} &{}\frac{1}{N^2_t} \mathbf{1}^{t,t} \end{array}\Big ], \end{aligned}$$

$\mathbf{1}^{a,b}$ is a constant matrix of size $N_a\times N_b$ with all elements being equal to 1 and $N_s, N_t$ are the number of source and target examples. After the noise marginalization, we obtain $\mathbb {E}[\mathcal {R}_{m}] =Tr(\mathbf{W}^\top \mathbf{M}\mathbf{W})$, where $\mathbf{M}=\mathbb {E}[\tilde{\mathbf{X}}^\top \mathbf{N}\tilde{\mathbf{X}}]$ is computed similarly to $\mathbf{Q}$ in (), by using $\mathbf{S}_m=\mathbf{X}^\top \mathbf{N}\mathbf{X}$ instead of the correlation matrix $\mathbf{S}$.

Regularization $\mathcal {R}_d$ Based on Domain Prediction; It explicitly pushes the denoised source examples toward target instances. The domain regularizer $\mathcal {R}_{d}$, proposed in [8], is inspired by [18] where intermediate layers in a deep learning model are regularized using a domain prediction task. The main idea is to learn the denoising while pushing the source towards the target (or vice versa) and hence allowing the source classifier to perform better on the target. The regularization term $\mathcal {R}_d$ can be written as follows:

$$\begin{aligned} \mathcal {R}_{d}=\frac{1}{K} \sum _{k=1}^K \Vert \mathbf{Y}_\mathcal{T}- {\tilde{\mathbf{X}}_k} \mathbf{W}\mathbf{Z}_\mathcal{D}\Vert ^2, \end{aligned}$$

(3)

where $\mathbf{Z}_\mathcal{D}\in \mathrm{I\!R}^d$ is a domain classifier trained on the uncorrupted data to distinguish the target from the source and $\mathbf{Y}_\mathcal{T}=\mathbf{1}^N$ is a vector containing only ones, as all denoised instances should look like the target^{Footnote 2}. After the noise marginalization, the partial derivatives on $\mathbf{W}$ of this term expectation are the following:

Classification Regularization $\mathcal {R}_l$; It encourages the denoised source data to remain well classified by the classifier pre-trained on source data. The regularizer $\mathcal {R}_{l}$ is similar to $\mathcal {R}_{d}$, except that $\mathbf{Z}_l$ is trained on the uncorrupted source $\mathbf{X}^s$ and acts only on the labeled source data. Also, instead of $\mathbf{Y}_\mathcal{T}$, the groundtruth source labels $\mathbf{Y}_l=\mathbf{Y}^s$ are used^{Footnote 3}. In the marginalized version of $\mathcal {R}_l$, The partial derivatives on $\mathbf{W}$ can be written as

$$\begin{aligned} \frac{\partial {{\mathrm{\mathbb {E}}}}[\mathcal {R}_l]}{\partial \mathbf{W}} = -2 (1-p) \mathbf{X}_l^\top \mathbf{Y}_l \mathbf{Z}_l+ 2 \mathbf{Q}_l \mathbf{W}\mathbf{Z}_l \mathbf{Z}_l^\top , \end{aligned}$$

where $\mathbf{X}_l=\mathbf{X}^s$ and $\mathbf{Q}_l$ is computed similarly to $\mathbf{Q}$, with $\mathbf{S}_l=\mathbf{X}_l^{\top } \mathbf{X}_l$.

Table 1. A summary of our models and corresponding notations.

Full size table

2.2 Minimizing the Regularized Loss

We extend the noise marginalization framework for optimizing the data reconstruction loss () and minimize the expected loss $\mathbb {E}[\mathcal {L}+ \gamma _{\phi } \mathcal {R}_{\phi }]$, denoted ${{\mathrm{\mathbb {E}}}}[\mathcal {L}_{\phi }]$, where in the regularization term $\mathcal {R}_{\phi }$, $\phi $ refers to m, d or l version. From the marginalized terms presented in the previous sections, it is easy to show that when minimizing these regularized losses, the optimal solution for $\mathbf{W}$ given by $\partial {{\mathrm{\mathbb {E}}}}[\mathcal {L}_{\phi }]/\partial \mathbf{W}=\mathbf{0}$ can be reduced to solving the linear matrix system $\mathbf{A}\mathbf{W}=\mathbf{B}$, for which there exists a closed-form solution, or to a Sylvester linear matrix equation $\mathbf{A}\mathbf{W}+\mathbf{W}\mathbf{B}=\mathbf{C}$ that can be solved efficiently using the Bartels-Stewart algorithm. Due to the limited space, we report all the details in the full version and summarize the baseline, three extensions and the corresponding solutions in Table 1.

Similarly to the stacked MDAs, we can stack several layers together with only forward learning, where the denoised features of the previous layer serve as the input to the next layer and nonlinear functions such as tangent hyperbolic or rectified linear units can be applied between the layers.

3 Experimental Results

Datasets. We run experiments on the popular OFF31 [34] and OC10 [22] datasets, both with the full training protocol [21] where all source data is used for training and with the sampling protocol [22, 34]. We evaluated our models both with the provided SURFBOV and the DECAF6 [12] features. In addition we run experiments with the full training protocol on the Testbed Cross-Dataset [37] (TB) using both the provided SIFTBOV and the DECAF7 features.

Table 2. Single source domain adaptation with a single ($r=1$) and 3 stacked layers ($r=3$). Bold indicates the best result per column, underline refers to best single layer results.

Full size table

Parameter Setting. To compare different models we run all experiments with the same preprocessing and parameter values^{Footnote 4}. Features are L2 normalized and the feature dimensionality is PCA reduced to 200 (BOV features are in addition power normalized). Parameter values are $\omega =0.01$, $\gamma _{\phi }=1$ and $p=0.1$. Between layers we apply tangent hyperbolic nonlinearities and we concatenate the outputs of all layers with the original features (as in [5]).

We evaluate how the optimal denoising matrix $\mathbf{W}$ influences three different classification methods, a regularized multi-class ridge classifier trained on the source ($\mathbf{Z}= (\mathbf{X}_l^\top \mathbf{X}_l + \delta \mathbf{I}_d)^{-1} \mathbf{X}_l^\top \mathbf{Y}_l$), the nearest neighbor classifier (NN) and the Domain Specific Class Means (DSCM) classifier [10] where a target test example is assigned to a class based on a soft-max distance to the domain specific class means. Two last classifiers are selected for their non-linearity. Also the NN is related to retrieval and DSCM to clustering, so the impact of $\mathbf{W}$ on these two extra tasks is indirectly assessed.

Table 2 shows the domain adaptation results with a single source and Table 3 shows multi source results, both under the full training protocol. For each dataset, we consider all possible source-target pairs for the domain adaptation tasks. Hence we average over 9 tasks on OFF31 (with 3 domains A,D,W), and over 12 tasks on OC10 (4 domains (A,C,D,W) and TB (4 domains B,C,I,S).

Table 3. Multi-source adaptation results without stacking. Bold indicates best result per column.

Full size table

Table 2 shows the results on L2 normalized DECAF features. It compares the domain regularization extensions to the baselines (BL) obtained with the L2 normalized features (full) and with the PCA reduced features as well as with MDA. As the table shows, the best results are often obtained with MRl, except in the OC10 case where MRd performs better. On the other hand, the $\mathcal {R}_m$ regularizer (MRm) does not improve the M1 performance. Stacking several layers can further improve the results. When comparing these results to the literature we can see that on OC10 we perform comparably to DAM [14] (84 %) and DDC [38] (84.6 %) but worse than more complex methods such as JDA [29] (87.5 %), TTM [16] (87.5 %) or DAN [28] (87.3 %). On OFF31, the deep adaptation method DAN [28] (72.9 %) significantly outperforms our results. On the TD dataset, in order to compare our results on DECAF6 to CORAL+SVM [35] (40.2 %) we average six source-task pairs (without the domain B) and obtain 43.6 % with MRd+DSCM and 43.1 % with MRl+DSCM. We also outperform^{Footnote 5} CORAL+SVM [35] (64 %) with our MRd+Ridge (65.2 %) when using the sampling protocol on OFF31.

Concerning the BOV features, the best results (using 3 layers) with the full training protocol on OFF31 are with MRl+NN (29.7 %) and on OC10 with MRd+Ridge(48.2 %). The latter is comparable to CORAL+SVM [35] (48.8 %), but is below LSSA [1] (52.3 %) that first selects landmarks before learning the transformation. The landmark selection is complementary to our approach and can boost our results as well.

In Table 3, we report the averaged results for the multi-source cases, obtained with BOV features, under the full training protocol. For each dataset, all the configurations with at least 2 source domains are considered. It yields 6 such configurations for OFF31 and 16 configurations for OC10 and TB. The results indicate clearly that taking into account the domain regularization improves the performance.

4 Conclusion

In this paper we extended the marginalized denoising autoencoder (MDA) framework with a domain regularization to enforce domain invariance. We studied three versions of regularization, based on the maximum mean discrepancy measure, the domain prediction and the class predictions on source. We showed that in all these cases, the noise marginalization is reduced to closed form solution or to a Sylvester linear matrix system, for which there exist efficient and scalable solutions. This allows furthermore to easily stack several layers with low cost. We studied the effect of these domain regularizations and run single source and multi-source experiments on three benchmark datasets showing that adding the new regularization terms allow to outperform the baselines. Compared to the state-of-the-art, our method performs better than classical feature transformation methods but it is outperformed by more complex deep domain adaptation methods. Compared to the latter methods, the main advantage of the proposed approach, beyond its low computational cost, is that as we learn an unsupervised feature transformation, we can boost the performance of other tasks such as retrieval or clustering in the target space.

Notes

1.
Minimizing the distance between the corresponding domain centroids.
2.
In the multi source case, $\mathbf{Z}_\mathcal{D}\in \mathrm{I\!R}^{d \times ({n_S+1})}$, with the columns corresponding $n_S$ sources and 1 target domain classifiers, and $\mathbf{Y}_\mathcal{T}\in \mathrm{I\!R}^{N \times ({n_S+1})}$, with $y_{ns}=1$ if $s=n_S+1$ and -1 otherwise. N is the total number of instances (source and target).
3.
$\mathbf{Y}_l\in \mathrm{I\!R}^{N_s \times C}$, where $y_{nc}=1$ if $\mathbf{x}_n$ belongs to the class c and -1 otherwise. In the multi source case, we concatenate $n_S$ multi-class $\mathbf{Z}^a_l$ linear classifiers and the corresponding $\mathbf{Y}^a_l$ label matrices, where $\mathbf{Z}^a_l$ was trained on the source $\mathcal{D}^{s_a}$.
4.
Cross validation on the source was only helpful for some of the configurations, for others it yielded performance decrease.
5.
Their best results (68.5 % and 69.4 %) obtained with fine-tuned features are not directly comparable as our results can also be boosted when using these fine-tuned features.

References

Aljundi, R., Emonet, R., Muselet, D., Sebban, M.: Landmarks-based kernelized subspace alignment for unsupervised domain adaptation. In: Proceedings of CVPR, pp. 56–63. IEEE (2015)
Google Scholar
Baktashmotlagh, M., Harandi, M., Lovell, B., Salzmann, M.: Unsupervised domain adaptation by domain invariant projection. In: Proceedings of ICCV, pp. 769–776. IEEE (2013)
Google Scholar
Blitzer, J., Kakade, S., Foster, D.P.: Domain adaptation with coupled subspaces. In: Proceedings of AISTATS, pp. 173–181 (2011)
Google Scholar
Castrejón, L., Aytar, Y., Vondrick, C., Pirsiavash, H., Torralba, A.: Learning aligned cross-modal representations from weakly aligned data. In: Proceedings of CVPR, IEEE (2016)
Google Scholar
Chen, M., Xu, Z., Weinberger, K.Q., Sha, F.: Marginalized denoising autoencoders for domain adaptation. In: Proceedings of ICML, pp. 767–774 (2012)
Google Scholar
Chen, Z., Chen, M., Weinberger, K.Q., Zhang, W.: Marginalized denoising for link prediction and multi-label learning. In: Proceedings of AAAI (2015)
Google Scholar
Chen, Z., Zhang, W.: A marginalized denoising method for link prediction in relational data. In: Proceedings of ICDM (2014)
Google Scholar
Clinchant, S., Csurka, G., Chidlovskii, B.: A domain adaptation regularization for denoising autoencoders. In: Proceedings of ACL (2016)
Google Scholar
Crowley, E.J., Zisserman, A.: In search of art. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8927, pp. 54–70. Springer, Heidelberg (2015). doi:10.1007/978-3-319-16178-5_4
Chapter Google Scholar
Csurka, G., Chidlovskii, B., Perronnin, F.: Domain adaptation with a domain specific class means classifier. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8927, pp. 32–46. Springer, Heidelberg (2015). doi:10.1007/978-3-319-16199-0_3
Google Scholar
Daume, H., Marcu, D.: Domain adaptation for statistical classifiers. J. Artif. Intell. Res. 26(1), 101–126 (2006)
MathSciNet MATH Google Scholar
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: Decaf: a deep convolutional activation feature for generic visual recognition. CoRR (2013). arXiv:1310.1531
Duan, L., Tsang, I.W., Xu, D.: Domain transfer multiple kernel learning. Trans. Pattern Recogn. Mach. Anal. (PAMI) 34(3), 465–479 (2012)
Article Google Scholar
Duan, L., Tsang, I.W., Xu, D., Chua, T.S.: Domain adaptation from multiple sources via auxiliary classifiers. In: Proceedings of ICML, pp. 289–296 (2009)
Google Scholar
Farajidavar, N., deCampos, T., Kittler, J.: Adaptive transductive transfer machines. In: Proceedings of BMVC (2014)
Google Scholar
Farajidavar, N., Campos, T., Kittler, J.: Transductive transfer machine. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9005, pp. 623–639. Springer, Heidelberg (2015). doi:10.1007/978-3-319-16811-1_41
Google Scholar
Fernando, B., Habrard, A., Sebban, M., Tuytelaars, T.: Unsupervised visual domain adaptation using subspace alignment. In: Proceedings of ICCV, pp. 2960–2967. IEEE (2013)
Google Scholar
Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation, CoRR (2014). arXiv:1409.7495
Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: Proceedings of ICML, pp. 1180–1189 (2015)
Google Scholar
Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: a deep learning approach. In: Proceedings of ICML, pp. 513–520 (2011)
Google Scholar
Gong, B., Grauman, K., Sha, F.: Connecting the dots with landmarks: Discriminatively learning domain invariant features for unsupervised domain adaptation. In: Proceedings of ICML, pp. 222–230 (2013)
Google Scholar
Gong, B., Shi, Y., Sha, F., Grauman, K.: Geodesic flow kernel for unsupervised domain adaptation. In: Proceedings of CVPR, pp. 2066–2073. IEEE (2012)
Google Scholar
Gopalan, R., Li, R., Patel, V.M., Chellappa, R.: Domain adaptation for visual recognition. Found. Trends Comput. Graph. Vis. 8(4), 285–378 (2015)
Article Google Scholar
Huang, J., Smola, A., Gretton, A., Borgwardt, K., Schölkopf, B.: Correcting sample selection bias by unlabeled data. In: Proceedings of NIPS, (Curran Associates) (2007)
Google Scholar
Klare, B.F., Bucak, S.S., Jain, A.K., Akgul, T.: Towards automated caricature recognition. In: Proceedings of ICB (2012)
Google Scholar
Li, S., Kawale, J., Fu, Y.: Deep collaborative filtering via marginalized denoising auto-encode. In: Proceedings of CIKM, pp. 811–820. ACM (2015)
Google Scholar
Li, Y., Yang, M., Xu, Z., Zhang, Z.: Learning with marginalized corrupted features and labels together. In: Proceedings of AAAI (2016). arXiv:1602:07332
Long, M., Cao, Y., Wang, J., Jordan, M.I.: Learning transferable features with deep adaptation networks. In: Proceedings of ICML (2015)
Google Scholar
Long, M., Wang, J., Ding, G., Sun, J., Yu, P.S.: Transfer feature learning with joint distribution adaptation. In: Proceedings of ICCV, pp. 2200–2207. IEEE (2013)
Google Scholar
Maaten, L.V.D., Chen, M., Tyree, S., Weinberger, K.: Learning with marginalized corrupted features. In: Proceedings of ICML (2013)
Google Scholar
Pan, S.J., Tsang, I.W., Kwok, J.T., Yang, Q.: Domain adaptation via transfer component analysis. Trans. Neural Netw. 22(2), 199–210 (2011)
Article Google Scholar
Pan, S.J., Yang, Q.: A survey on transfer learning. Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
Article Google Scholar
Pan, S.J., Ni, X., Sun, J.T., Yang, Q., Chen, Z.: Cross-domain sentiment classification via spectral feature alignment. In: Proceedings of WWW (2010)
Google Scholar
Saenko, K., Kulis, B., Fritz, M., Darrell, T.: Adapting visual category models to new domains. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6316, pp. 213–226. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15561-1_16
Chapter Google Scholar
Sun, B., Feng, J., Saenko, K.: Return of frustratingly easy domain adaptation. In: Proceedings of AAAI (2016)
Google Scholar
Sun, S.S., Shi, H., Wu, Y.: A survey of multi-source domain adaptation. Inf. Fusion 24, 84–92 (2015)
Article Google Scholar
Tommasi, T., Tuytelaars, T.: A testbed for cross-dataset analysis. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8927, pp. 18–31. Springer, Heidelberg (2015). doi:10.1007/978-3-319-16199-0_2
Google Scholar
Tzeng, E., Hoffman, J., Zhang, N., Saenko, K., Darrell, T.: Deep domain confusion: Maximizing for domain invariance. CoRR (2014). arXiv:1412.3474
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of ICML (2008)
Google Scholar
Zhou, J.T., Pan, S.J., Tsang, I.W., Yan, Y.: Hybrid heterogeneous transfer learning through deep learning. In: Proceedings of AAAI (2014)
Google Scholar
Zhou, M., Chang, K.C.: Unifying learning to rank and domain adaptation: enabling cross-task document scoring. In: Proceedings of SIGKDD (ACM), pp. 781–790 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Xerox Research Center Europe, 6 Chemin de Maupertuis, 38240, Meylan, France
Gabriela Csurka, Boris Chidlowskii, Stéphane Clinchant & Sophia Michel

Authors

Gabriela Csurka
View author publications
You can also search for this author in PubMed Google Scholar
Boris Chidlowskii
View author publications
You can also search for this author in PubMed Google Scholar
Stéphane Clinchant
View author publications
You can also search for this author in PubMed Google Scholar
Sophia Michel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gabriela Csurka .

Editor information

Editors and Affiliations

Microsoft Research Asia, Beijing, China
Gang Hua
Facebook AI Research (FAIR), Menlo Park, USA
Hervé Jégou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Csurka, G., Chidlowskii, B., Clinchant, S., Michel, S. (2016). Unsupervised Domain Adaptation with Regularized Domain Instance Denoising. In: Hua, G., Jégou, H. (eds) Computer Vision – ECCV 2016 Workshops. ECCV 2016. Lecture Notes in Computer Science(), vol 9915. Springer, Cham. https://doi.org/10.1007/978-3-319-49409-8_37

Download citation

DOI: https://doi.org/10.1007/978-3-319-49409-8_37
Published: 24 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49408-1
Online ISBN: 978-3-319-49409-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Unsupervised Domain Adaptation with Regularized Domain Instance Denoising

Abstract

Similar content being viewed by others

Evaluating Stacked Marginalised Denoising Autoencoders Within Domain Adaptation Methods

Unsupervised domain adaptation with target reconstruction and label confusion in the common subspace

Graph-structure constraint and Schatten p-norm-based unsupervised domain adaptation for image classification

Keywords

1 Introduction

2 Feature Denoising for Domain Adaptation

2.1 Domain Regularization

2.2 Minimizing the Regularized Loss

3 Experimental Results

4 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us