NaCL: noise-robust cross-domain contrastive learning for unsupervised domain adaptation

Li, Jingzheng; Sun, Hailong

doi:10.1007/s10994-023-06343-8

NaCL: noise-robust cross-domain contrastive learning for unsupervised domain adaptation

Published: 27 June 2023

Volume 112, pages 3473–3496, (2023)
Cite this article

Machine Learning Aims and scope Submit manuscript

654 Accesses
1 Altmetric
Explore all metrics

Abstract

The Unsupervised Domain Adaptation (UDA) methods aim to enhance feature transferability possibly at the expense of feature discriminability. Recently, contrastive representation learning has been applied to UDA as a promising approach. One way is to combine the mainstream domain adaptation method with contrastive self-supervised tasks. The other way uses contrastive learning to align class-conditional distributions according to the semantic structure information of source and target domains. Nevertheless, there are some limitations in two aspects. One is that optimal solutions for the contrastive self-supervised learning and the domain discrepancy minimization may not be consistent. The other is that contrastive learning uses pseudo label information of target domain to align class-conditional distributions, where the pseudo label information contains noise such that false positive and negative pairs would deteriorate the performance of contrastive learning. To address these issues, we propose Noise-robust cross-domain Contrastive Learning (NaCL) to directly realize the domain adaptation task via simultaneously learning the instance-wise discrimination and encoding semantic structures in intra- and inter-domain to the learned representation space. More specifically, we adopt topology-based selection on the target domain to detect and remove false positive and negative pairs in contrastive loss. Theoretically, we demonstrate that not only NaCL can be considered an example of Expectation Maximization (EM), but also accurate pseudo label information is beneficial for reducing the expected error on target domain. NaCL obtains superior results on three public benchmarks. Further, NaCL can also be applied to semi-supervised domain adaptation with only minor modifications, achieving advanced diagnostic performance on COVID-19 dataset. Code is available at https://github.com/jingzhengli/NaCL

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on Image Data Augmentation for Deep Learning

Article Open access 06 July 2019

Self-supervised learning for medical image analysis: a comprehensive review

Article 16 April 2024

Context Autoencoder for Self-supervised Representation Learning

Article 28 August 2023

Data availability

The datasets are the benchmark datasets available online (Data Source available in manuscript).

Code availability

https://github.com/jingzhengli/NaCL.

References

Bahri, D., Jiang, H., & Gupta, M. (2020). Deep k-nn for noisy labels. In Proceedings of ICML.
Bekkouch, I. E. I., Youssry, Y., & Gafarov, R., et al. (2019). Triplet loss network for unsupervised domain adaptation. Algorithms.
Ben-David, S., Blitzer, J., & Crammer, K., et al. (2007). Analysis of representations for domain adaptation. In Proceedings of NeurIPS.
Ben-David, S., Blitzer, J., & Crammer, K., et al. (2010). A theory of learning from different domains. Machine Learning.
Berthelot, D., Roelofs, R., & Sohn, K., et al. (2021). Adamatch: A unified approach to semi-supervised learning and domain adaptation. In Proceedings of ICLR.
Bousmalis, K., Trigeorgis, G., & Silberman, N., et al. (2016). Domain separation networks. In Proceedings of NeurIPS.
Cao, Y., Xie, Z., & Liu, B., et al. (2020). Parametric instance classification for unsupervised visual feature learning. In Proceedings of NeurIPS.
Caron, M., Touvron, H., & Misra, I., et al. (2021). Emerging properties in self-supervised vision transformers. In Proceedings of ICCV.
Chen, C., Chen, Z., & Jiang, B., et al. (2019a). Joint domain alignment and discriminative feature learning for unsupervised deep domain adaptation. In Proceedings of AAAI.
Chen, C., Fu, Z., & Chen, Z., et al (2020a). Homm: Higher-order moment matching for unsupervised domain adaptation. In Proceedings of AAAI.
Chen, C., Xie, W., & Huang, W., et al. (2019b). Progressive feature alignment for unsupervised domain adaptation. In Proceedings of CVPR.
Chen, T., Kornblith, S., & Norouzi, M., et al. (2020b). A simple framework for contrastive learning of visual representations. In Proceedings of ICML.
Chen, X., Fan, H., & Girshick, R., et al. (2020c). Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297.
Chen, X., & He, K. (2021). Exploring simple siamese representation learning. In Proceedings of CVPR.
Chen, X., Wang, S., & Long, M., et al. (2019c). Transferability vs. discriminability: Batch spectral penalization for adversarial domain adaptation. In Proceedings of ICML.
Chen, Y., Pan, Y., & Wang, Y., et al. (2021). Transferrable contrastive learning for visual domain adaptation. In Proceedings of ACM MM.
Cicek, S., & Soatto, S. (2019). Unsupervised domain adaptation via regularized conditional alignment. In Proceedings of ICCV.
Deng, Z., Luo, Y., & Zhu, J. (2019). Cluster alignment with a teacher for unsupervised domain adaptation. In Proceedings of ICCV.
El Hamri, M., Bennani, Y., & Falih, I. (2022). Hierarchical optimal transport for unsupervised domain adaptation. Machine Learning 1–24.
French, G., Mackiewicz M., & Fisher, M. (2018). Self-ensembling for visual domain adaptation. In Proceedings of ICLR.
Ganin, Y., & Lempitsky, V. (2015). Unsupervised domain adaptation by backpropagation. In Proceedings of ICML.
Ganin, Y., Ustinova, E., & Ajakan, H., et al. (2016). Domain-adversarial training of neural networks. The Journal of Machine Learning Research.
Gao, W., Yang, B. B., & Zhou, Z. H. (2016). On the resistance of nearest neighbor to random noisy labels. arXiv preprint arXiv:1607.07526.
Ghasedi Dizaji, K., Herandi, A., & Deng, C., et al. (2017). Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization. In Proceedings of ICCV.
Ghifary, M., Kleijn, W. B., & Zhang, M., et al. (2016). Deep reconstruction-classification networks for unsupervised domain adaptation. In Proceedings of ECCV.
Grill, J. B., Strub, F., & Altché, F., et al. (2020). Bootstrap your own latent-a new approach to self-supervised learning. In Proceedings of NeurIPS.
Hadsell, R., Chopra, S., & LeCun, Y. (2006). Dimensionality reduction by learning an invariant mapping. In Proceedings of CVPR.
He, K., Fan, H., & Wu, Y., et al. (2020). Momentum contrast for unsupervised visual representation learning. In Proceedings of CVPR.
He, K., Zhang, X., & Ren, S., et al. (2016a). Deep residual learning for image recognition. In Proceedings of CVPR.
He, K., Zhang, X., & Ren, S., et al. (2016b). Identity mappings in deep residual networks. In Proceedings of ECCV.
Henaff, O. (2020). Data-efficient image recognition with contrastive predictive coding. In Proceedings of ICML.
Hendrycks, D., Basart, S., & Mu, N., et al. (2021). The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of ICCV.
Huang, J., Guan, D., & Xiao, A., et al. (2022). Category contrast for unsupervised domain adaptation in visual tasks. In Proceedings of CVPR.
Jiang, J., Fu, B., & Long, M. (2020). Transfer-learning-library.
Jin, Y., Wang, X., & Long, M., et al. (2020). Minimum class confusion for versatile domain adaptation. In Proceedings of ECCV.
Kang, G., Jiang, L., & Yang, Y., et al. (2019). Contrastive adaptation network for unsupervised domain adaptation. In Proceedings of CVPR.
Khosla, P., Teterwak, P., & Wang, C., et al. (2020). Supervised contrastive learning. In Proceedings of NeurIPS.
Kim, D., Saito, K., & Oh, T.H., et al. (2020). Cross-domain self-supervised learning for domain adaptation with few source labels. arXiv preprint arXiv:2003.08264.
Li, J., Zhou, P., & Xiong, C., et al. (2020). Prototypical contrastive learning of unsupervised representations. In Proceedings of ICLR.
Li, S., Xia, X., & Ge, S., et al. (2022). Selective-supervised contrastive learning with noisy labels. In Proceedings of CVPR (pp. 316–325).
Lin, H., Zhang, Y., & Qiu, Z., et al. (2022). Prototype-guided continual adaptation for class-incremental unsupervised domain adaptation. In Proceedings of ECCV.
Long, M., Cao, Z., & Wang, J., et al. (2018). Conditional adversarial domain adaptation. In Proceedings of NeurIPS.
Long, M., Zhu, H., & Wang, J., et al. (2017). Deep transfer learning with joint adaptation networks. In Proceedings of ICML.
Mansour, Y., Mohri, M., & Rostamizadeh, A. (2009). Domain adaptation: Learning bounds and algorithms. arXiv preprint arXiv:0902.3430.
Park, C., Lee, J., & Yoo, J., et al. (2020). Joint contrastive learning for unsupervised domain adaptation. arXiv preprint arXiv:2006.10297.
Pei, Z., Cao, Z., & Long, M., et al. (2018). Multi-adversarial domain adaptation. In Proceedings of AAAI.
Peng, X., Usman, B., & Kaushik, N., et al. (2017). Visda: The visual domain adaptation challenge. arXiv preprint arXiv:1710.06924.
Saenko, K., Kulis, B., & Fritz, M., et al. (2010). Adapting visual category models to new domains. In Proceedings of ECCV.
Saito, K., Watanabe, K., & Ushiku, Y., et al. (2018). Maximum classifier discrepancy for unsupervised domain adaptation. In Proceedings of CVPR.
Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In Proceedings of CVPR.
Sharma, A., Kalluri, T., & Chandraker, M. (2021). Instance level affinity-based transfer for unsupervised domain adaptation. In Proceedings of CVPR.
Shen, J., Qu, Y., & Zhang, W., et al. (2018). Wasserstein distance guided representation learning for domain adaptation. In Proceedings of AAAI.
Sohn, K. (2016). Improved deep metric learning with multi-class n-pair loss objective. In Proceedings of NeurIPS.
Sohn, K., & Berthelot, D. (2020). Fixmatch: Simplifying semi-supervised learning with consistency and confidence. In Proceedings of NeurIPS.
Sun, T., Lu, C., & Zhang, T., et al. (2022). Safe self-refinement for transformer-based domain adaptation. In Proceedings of CVPR (pp. 7191–7200).
Sun, Y., Tzeng, E., & Darrell, T., et al. (2019). Unsupervised domain adaptation through self-supervision. arXiv preprint arXiv:1909.11825.
Tang, H., Chen, K., & Jia, K. (2020). Unsupervised domain adaptation via structurally regularized deep clustering. In Proceedings of CVPR.
Thota, M., & Leontidis, G. (2021). Contrastive domain adaptation. In Proceedings of CVPR.
Tian, Y., Krishnan, D., & Isola, P. (2020). Contrastive multiview coding. In Proceedings of ECCV.
Tzeng, E., Hoffman, J., & Darrell, T., et al. (2015). Simultaneous deep transfer across domains and tasks. In Proceedings of ICCV.
Tzeng, E., Hoffman, J., & Saenko, K., et al. (2017). Adversarial discriminative domain adaptation. In Proceedings of CVPR.
Venkateswara, H., Eusebio, J., & Chakraborty, S., et al. (2017). Deep hashing network for unsupervised domain adaptation. In Proceedings of CVPR.
Wang, Q., & Breckon, T. (2020). Unsupervised domain adaptation via structured prediction based selective pseudo-labeling. In Proceedings of AAAI.
Wang, R., Wang, G., & Henao, R. (2019). Discriminative clustering for robust unsupervised domain adaptation. arXiv preprint arXiv:1905.13331.
Wang, R., Wu, Z., & Weng, Z., et al. (2022). Cross-domain contrastive learning for unsupervised domain adaptation. IEEE Transactions on Multimedia.
Wiles, O., Gowal, S., & Stimberg, F., et al. (2021). A fine-grained analysis on distribution shift. In Proceedings of ICLR.
Wu, P., Zheng, S., & Goswami, M., et al. (2020). A topological filter for learning with label noise. In Proceedings of NeurIPS.
Wu, Z., Efros, A. A., & Yu, S. X. (2018a). Improving generalization via scalable neighborhood component analysis. In Proceedings of ECCV.
Wu, Z., Xiong, Y., & Yu, S. X., et al. (2018b). Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of CVPR.
Wu, Z. F., Wei, T., & Jiang, J., et al. (2021). Ngc: A unified framework for learning with open-world noisy data. In Proceedings of ICCV.
Xiao, N., & Zhang, L. (2021). Dynamic weighted learning for unsupervised domain adaptation. In Proceedings of CVPR.
Xie, S., Zheng, Z., & Chen, L., et al. (2018). Learning semantic representations for unsupervised domain adaptation. In Proceedings of ICML.
Xu, M., Zhang, J., & Ni, B., et al. (2020). Adversarial domain adaptation with domain mixup. In Proceedings of AAAI.
Xu, R., Li, G., & Yang, J., et al. (2019). Larger norm more transferable: An adaptive feature norm approach for unsupervised domain adaptation. In Proceedings of ICCV.
Xu, T., Chen, W., & Pichao, W., et al. (2022). Cdtrans: Cross-domain transformer for unsupervised domain adaptation. In Proceedings of ICLR.
Yao, H., Wang, Y., & Li, S., et al. (2022). Improving out-of-distribution robustness via selective augmentation. arXiv preprint arXiv:2201.00299.
Zhang, Y., Chen, H., & Wei, Y., et al. (2019a). From whole slide imaging to microscopy: Deep microscopy adaptation network for histopathology cancer image classification. In International conference on medical image computing and computer-assisted intervention.
Zhang, Y., Liu, T., & Long, M., et al. (2019b). Bridging theory and algorithm for domain adaptation. In Proceedings of ICML.
Zhang, Y., Niu, S., & Qiu, Z., et al. (2020). Covid-da: deep domain adaptation from typical pneumonia to covid-19. arXiv preprint arXiv:2005.01577.

Download references

Funding

This work was supported partly by National Natural Science Foundation under Grant Nos.(61972013, 61932007, 62141209) and partly by Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing.

Author information

Authors and Affiliations

SKLSDE Lab, School of Computer Science and Engineering, Beihang University, XueYuan Road No. 37, Beijing, 100191, China
Jingzheng Li
SKLSDE Lab, School of Software, Beihang University, XueYuan Road No. 37, Beijing, 100191, China
Hailong Sun
Beijing Advanced Innovation Center for Big Data and Brain Computing, XueYuan Road No. 37, Beijing, 100191, China
Jingzheng Li & Hailong Sun

Authors

Jingzheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Hailong Sun
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

JL wrote and revised the manuscript, designed and implemented the research. HS contributed to the revision of the manuscript and the analysis of the results.

Corresponding author

Correspondence to Hailong Sun.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Editors: Fabio Vitale, Tania Cerquitelli, Marcello Restell, Charalampos Tsourakakis.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Proof of Proposition 1

Proof

Maximum likelihood is initially proposed to model clustering tasks. For unsupervised domain adaptation tasks, the objective of noise-robust cross-domain contrastive learning can be seen as adapting the model parameters $\theta$ trained on source data to maximize the log-likelihood function of the target domain data

$$\begin{aligned} \theta ^*=\underset{\theta }{\arg \max }\ \sum _{i=1}^{n_t} \log p\left( \varvec{x}_i \mid \theta \right) \end{aligned}$$

(9)

We assume that the observed target domain data $\left\{ \varvec{x}_{i}\right\} _{i=1}^{n_t}$ are related to latent variable $\mathcal {C}=\left\{ y_{c}\right\} _{c=1}^{ \mid \mathcal {C} \mid }$ which denotes the true labels of data within the label space of $\mid \mathcal {C} \mid$ categories. We can re-write the log-likelihood function as

$$\begin{aligned} \theta ^{*}=\underset{\theta }{\arg \max }\ \sum _{i=1}^{n_t} \log \sum _{y_{c} \in \mathcal {C}} p\left( \varvec{x}_{i}, y_{c} \mid \theta \right) . \end{aligned}$$

(10)

It is difficult to optimize Eq.(10) directly, so we utilize a surrogate function to lower-bound the log-likelihood function

$$\begin{aligned} \begin{aligned}&\sum _{i=1}^{n_t} \log \sum _{y_{c} \in \mathcal {C}} p\left( \varvec{x}_{i}, y_{c} \mid \theta \right) =\sum _{i=1}^{n_t} \log \sum _{y_{c} \in \mathcal {C}} \mathcal {Q}\left( y_{c}\right) \frac{p\left( \varvec{x}_i, y_{c} \mid \theta \right) }{\mathcal {Q}\left( y_{c}\right) } \\&\ge \sum _{i=1}^{n_t} \sum _{y_{c} \in \mathcal {C}} \mathcal {Q}\left( y_{c}\right) \log \frac{p\left( \varvec{x}_i, y_{c} \mid \theta \right) }{\mathcal {Q}\left( y_{c}\right) } \end{aligned} \end{aligned}$$

(11)

where $\mathcal {Q}\left( y_{c}\right)$ indicates a certain probability distribution for $y_{c}$. To make the inequality hold with equality, we require $\frac{p\left( \varvec{x}_i, y_{c} \mid \theta \right) }{\mathcal {Q}\left( y_{c}\right) }$ to be a constant, based on which we have

$$\begin{aligned} \begin{aligned} \mathcal {Q}\left( y_{c}\right) =\frac{p\left( \varvec{x}_{i}, y_{c} \mid \theta \right) }{\sum _{y_{c} \in \mathcal {C}} p\left( \varvec{x}_{i}, y_{c} \mid \theta \right) }=\frac{p\left( \varvec{x}_{i}, y_{c} \mid \theta \right) }{p\left( \varvec{x}_{i} \mid \theta \right) }=p\left( y_{c} \mid \varvec{x}_{i}, \theta \right) \end{aligned} \end{aligned}$$

(12)

Combining Eq.(11) and Eq.(12), and then ignoring the constant term, we should maximize the following equation, i.e., the expectation of the complete-data log-likelihood,

$$\begin{aligned} \sum _{i=1}^{n_t} \sum _{y_{c} \in \mathcal {C}} \mathcal {Q}\left( y_{c}\right) \log p\left( \varvec{x}_{i}, y_{c} \mid \theta \right) \end{aligned}$$

(13)

E-step. In this step, we use the current parameter $\theta ^{\text {old}}$ to estimate $p\left( y_{c} \mid \varvec{x}_{i}, \theta \right)$, i.e., pseudo-labels. To this end, we perform spherical k-means clustering on the features via encoder parameterized by $\theta ^{\text {old}}$ to obtain $\mid \mathcal {C} \mid$ cluster assignments. As described in Section 3.4 of the manuscript, we compute $p\left( y_{c} \mid \varvec{x}_{i}, \theta ^{\text {old}}\right) =\mathbb {I}[\varvec{x}_{i} \in \varvec{o}_{c}]$ in which $\mathbb {I}[\varvec{x}_{i} \in \varvec{o}_{c}]=1$ if $\varvec{x}_{i}$ belongs to this cluster where $\varvec{o}_{c}$ is its cluster center, otherwise $\mathbb {I}[\varvec{x}_{i} \in \varvec{o}_{c}]=0$.

M-step. Based on E-step, we are ready to maximize Eq.(13) as follows.

$$\begin{aligned} \begin{aligned} \theta ^{\text {new}}&= \underset{\theta }{\arg \max }\sum _{i=1}^{n_t} \sum _{y_{c} \in \mathcal {C}} p\left( y_{c} \mid \varvec{x}_{i}, \theta ^{\text {old}}\right) \log p\left( \varvec{x}_{i}, y_{c} \mid \theta \right) \\&= \underset{\theta }{\arg \max }\sum _{i=1}^{n_t} \sum _{y_{c} \in \mathcal {C}} \mathbb {I}[\varvec{x}_{i} \in \varvec{o}_{c}] \log p\left( \varvec{x}_{i}, y_{c} \mid \theta \right) . \end{aligned} \end{aligned}$$

(14)

Under the assumption that the class prior obeys a uniform distribution, we have

$$\begin{aligned} p\left( \varvec{x}_{i}, y_{c} \mid \theta \right) = p\left( \varvec{x}_{i} \mid y_{c} ,\theta \right) p\left( y_{c} \mid \theta \right) = \frac{1}{\mid \mathcal {C}\mid } p\left( \varvec{x}_{i} \mid y_{c} ,\theta \right) \end{aligned}$$

(15)

Likewise, we assume that the distribution around each class center is an isotropic Gaussian, which lead to

$$\begin{aligned} p\left( \varvec{x}_{i} \mid y_{c}, \theta \right) =\exp \left( \frac{-\left( \varvec{q}_{i}-\varvec{o}_{c}\right) ^{2}}{2 \sigma _{c}^{2}}\right) / \sum _{j=1}^{\mid \mathcal {C}\mid } \exp \left( \frac{-\left( \varvec{q}_{i}-\varvec{o}_{j}\right) ^{2}}{2 \sigma _{j}^{2}}\right) \end{aligned}$$

(16)

where the query $\varvec{q}_{i}$ is the output of instance $\varvec{x}_{i}$ in projection head, and the class center $\varvec{o}_{c}$ can be regarded as the cluster center of instance $\varvec{x}_{i}$. We use $\ell _{2}$-normalization for vectors $\varvec{q}$ and $\varvec{o}$, and then we get $(\varvec{q}-\varvec{o})^{2}=2-2 \varvec{q} \cdot \varvec{o}$. Combining this with Eqs.(10)(11)(13)(14)(15)(16), we can re-write maximum log-likelihood estimation as

$$\begin{aligned} \theta ^{\text {new}}=\underset{\theta }{\arg \min } \sum _{i=1}^{n_t}-\log \frac{\exp \left( \varvec{q}_{i} \cdot \varvec{o}_{c} / \tau \right) }{\sum _{j=1}^{\mid \mathcal {C}\mid } \exp \left( \varvec{q}_{i} \cdot \varvec{o}_{j} / \tau \right) } \end{aligned}$$

(17)

where $\tau \propto \sigma ^{2}$ stands for the density of the feature distribution around class center. We can see that the goal of Eq.(17) is to pull the query $\varvec{q}_{i}$ closer to its class center, while staying away from other class centers.

Next we elaborate that the contrastive loss in our method can be empirically interpreted as optimizing Eq.(17). Specifically, given a query $\varvec{q}_i$, the contrastive loss in our method can be written as

$$\begin{aligned} \mathcal {L}_{\text{ NaCL }}=\min - \frac{1}{\mid P(i) \mid } \sum _{\varvec{k}_i^+ \in P(i)} \log \frac{\exp \left( \varvec{q}_i \cdot \varvec{k}_i^+/ \tau \right) }{\sum _{\varvec{k}_j \in A} \exp \left( \varvec{q}_i \cdot \varvec{k}_j / \tau \right) } \end{aligned}$$

(18)

According to the previous assumptions, the positive keys in the set P(i) should be distributed around the class center $\varvec{o}_{c}$ of the query $\varvec{q}_i$. Thus, we can derive an approximation w.r.t the positive keys of query $\varvec{q}_i$ as follows,

$$\begin{aligned} \frac{1}{ \mid P(i) \mid }\left( \varvec{k}_{i_1}^{+}+\varvec{k}_{i_2}^{+}+\cdots +\varvec{k}_{i_{\mid P(i)\mid }}^{+}\right) \approx \varvec{o}_{c} \end{aligned}$$

(19)

By plugging Eq.(19) into Eq.(18), Eq.(18) can be re-written as

$$\begin{aligned} \mathcal {L}_{\text{ NaCL }} \approx \min -\log \frac{\exp \left( \varvec{q}_i \cdot \varvec{o}_{c}/ \tau \right) }{\sum _{\varvec{k}_j \in A} \exp \left( \varvec{q}_i \cdot \varvec{k}_j / \tau \right) } \end{aligned}$$

(20)

which has a similar form to the maximum log-likelihood estimation in Eq.(17) and both of them aim to pull the query $\varvec{q}_{i}$ closer to its class center, while staying away from other class centers.

In summary, the optimization process of contrastive loss in our method can be considered an example of EM: At each epoch in training process, E-step aims to estimate the posterior probability of latent true labels via clustering, M-step aims to maximize the lower-bound of log-likelihood. $\square$

1.2 Proof of Lemma 1

Proof

Recall that the triangle inequality for classification error (Ben-David et al., 2007). Let $h \in \mathcal {H}$ be a hypothesis and $\mathcal {D}$ be any distribution over input space $\mathcal {X}$. Then $\forall h_1, h_2, h_3 \in \mathcal {H}$, the following triangle inequality holds

$$\begin{aligned} \epsilon _{\mathcal {D}}\left( h_1, h_2\right) \le \epsilon _{\mathcal {D}}\left( h_1, h_3\right) +\epsilon _{\mathcal {D}}\left( h_3, h_2\right) \end{aligned}$$

(21)

Theorem 1 shows that

$$\begin{aligned} \epsilon _{T}\left( h\right) \le \epsilon _{S}\left( h\right) +\frac{1}{2} d_{\mathcal {H} \Delta \mathcal {H}}\left( P_{S}, P_{T}\right) +\lambda \end{aligned}$$

(22)

in which $\lambda =\epsilon _{T}(h^{*})+\epsilon _{S}(h^{*})$ with $h^{*}=\arg \min _{h \in \mathcal {H}} \epsilon _{T}(h)+\epsilon _{S}(h)$. According to triangle inequality, we have

$$\begin{aligned} \epsilon _{T}(h^{*})=\epsilon _{T}\left( h^{*}, f_{T}\right) \le \epsilon _{T}\left( h^{*}, \hat{f}_{T}\right) +\epsilon _{T}\left( \hat{f}_{T}, f_{T}\right) \end{aligned}$$

(23)

Combining Eq.(22) and Eq.(23), we derive

$$\begin{aligned} \epsilon _{T}\left( h\right) \le \epsilon _{S}\left( h\right) +\frac{1}{2} d_{\mathcal {H} \Delta \mathcal {H}}\left( P_{S}, P_{T}\right) +\epsilon _{T}\left( \hat{f}_{T}, f_{T}\right) +\beta \end{aligned}$$

(24)

in which $\beta = \epsilon _{T}(h^{*},\hat{f}_T)+\epsilon _{S}(h^{*})$ with $h^{*}=\arg \min _{h \in \mathcal {H}} \epsilon _{T}(h,\hat{f}_T)+\epsilon _{S}(h)$. $\square$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, J., Sun, H. NaCL: noise-robust cross-domain contrastive learning for unsupervised domain adaptation. Mach Learn 112, 3473–3496 (2023). https://doi.org/10.1007/s10994-023-06343-8

Download citation

Received: 27 November 2022
Revised: 25 March 2023
Accepted: 21 April 2023
Published: 27 June 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s10994-023-06343-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

NaCL: noise-robust cross-domain contrastive learning for unsupervised domain adaptation

Abstract

Access this article

Similar content being viewed by others

A survey on Image Data Augmentation for Deep Learning

Self-supervised learning for medical image analysis: a comprehensive review

Context Autoencoder for Self-supervised Representation Learning

Data availability

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

1.1 Proof of Proposition 1

Proof

1.2 Proof of Lemma 1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

NaCL: noise-robust cross-domain contrastive learning for unsupervised domain adaptation

Abstract

Access this article

Similar content being viewed by others

A survey on Image Data Augmentation for Deep Learning

Self-supervised learning for medical image analysis: a comprehensive review

Context Autoencoder for Self-supervised Representation Learning

Data availability

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

1.1 Proof of Proposition 1

Proof

1.2 Proof of Lemma 1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation