Abstract
Unsupervised domain adaptation (UDA) aims to extract domain-invariant features. Existing UDA methods mainly utilize a convolutional neural network (CNN) or vision transformer (ViT) as feature extractor, which align in the latent space that characterizes single view of the object and may lead to matching error—the distributions aligned in the CNN space may still be confused in the ViT space. To address this, we introduce global–local bi-alignment (GLBA) based on a hybrid structure Conformer, which enforces simultaneous alignment in both spaces, following the space-independent assumption: If two domains have the same distribution, their distributions in any latent space are aligned. The framework can be easily combined with previous UDA methods, essentially adding only an alignment loss without the need for elaborate structures or large numbers of parameters. Experiments demonstrate the effectiveness of GLBA and its state-of-the-art (SoTA) performance achieved with comparable parameter complexity. Code is available at https://github.com/JSJ515-Group/GLBA.







Similar content being viewed by others
Data availability
The four public open source datasets used in this work are directly available through references.
References
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1492–1500
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inform Process Syst 25:1097–1105
Liu Z, Mao H, Wu C-Y, Feichtenhofer C, Darrell T, Xie S (2022) A convnet for the 2020s. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11976–11986
Wang X, Girdhar R, Yu SX, Misra I (2023) Cut and learn for unsupervised object detection and instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3124–3134
Gao Y, Yang L, Huang Y, Xie S, Li S, Zheng W-S (2022) Acrofod: An adaptive method for cross-domain few-shot object detection. In: European Conference on Computer Vision, pp 673–690
Rajagopalan A et al. (2023) Improving robustness of semantic segmentation to motion-blur using class-centric augmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10470–10479
Ben-David S, Blitzer J, Crammer K, Pereira F (2006) Analysis of representations for domain adaptation. Adv Neural Inform Process Syst 19:137–144
Zhang Y, Deng B, Tang H, Zhang L, Jia K (2020) Unsupervised multi-class domain adaptation: theory, algorithms, and practice. IEEE Trans Pattern Anal Mach Intell 44(5):2775–2792
Zhu Y, Zhuang F, Wang J, Ke G, Chen J, Bian J, Xiong H, He Q (2020) Deep subdomain adaptation network for image classification. IEEE Trans Neural Netw Learn Syst 32(4):1713–1722
Kang G, Jiang L, Yang Y, Hauptmann AG (2019) Contrastive adaptation network for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4893–4902
Chen L, Chen H, Wei Z, Jin X, Tan X, Jin Y, Chen E (2022) Reusing the task-specific classifier as a discriminator: Discriminator-free adversarial domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7181–7190
Zhang Y, Gao Y, Li H, Yin A, Zhang D, Chen X (2023) Crucial semantic classifier-based adversarial learning for unsupervised domain adaptation. arXiv preprint arXiv:2302.01708
Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, March M, Lempitsky V (2016) Domain-adversarial training of neural networks. J Mach Learn Res 17(59):1–35
Tzeng E, Hoffman J, Saenko K, Darrell T (2017) Adversarial discriminative domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7167–7176
Long M, Cao Z, Wang J, Jordan MI (2018) Conditional adversarial domain adaptation. Adv Neural Inform Process Syst 31:1647–1657
Cui S, Wang S, Zhuo J, Su C, Huang Q, Tian Q (2020) Gradually vanishing bridge for adversarial domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12455–12464
Xu R, Liu P, Wang L, Chen C, Wang J (2020) Reliable weighted optimal transport for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4394–4403
Yang J, Liu J, Xu N, Huang J (2023) Tvt: Transferable vision transformer for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 520–530
Xu T, Chen W, Wang P, Wang F, Li H, Jin R (2021) Cdtrans: Cross-domain transformer for unsupervised domain adaptation. arXiv preprint arXiv:2109.06165
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inform Process Syst 30:5998–6008
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141
Zhang H, Wu C, Zhang Z, Zhu Y, Lin H, Zhang Z, Sun Y, He T, Mueller J, Manmatha R et al (2022) Resnest: Split-attention networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2736–2746
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European Conference on Computer Vision, pp 213–229. Springer
Shen Z, Bello I, Vemulapalli R, Jia X, Chen C-H (2020) Global self-attention networks for image recognition. arXiv preprint arXiv:2010.03019
Peng Z, Huang W, Gu S, Xie L, Wang Y, Jiao J, Ye Q (2021) Conformer: Local features coupling global representations for visual recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 367–376
Foret P, Kleiner A, Mobahi H, Neyshabur B (2020) Sharpness-aware minimization for efficiently improving generalization. arXiv preprint arXiv:2010.01412
Rangwani H, Aithal SK, Mishra M, Jain A, Radhakrishnan VB (2022) A closer look at smoothness in domain adversarial training. In: International Conference on Machine Learning, pp 18378–18399. PMLR
Wang M, Deng W (2018) Deep visual domain adaptation: a survey. Neurocomputing 312:135–153
Qiao F, Zhao L, Peng X (2020) Learning to learn single domain generalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12556–12565
Chen J, Gao Z, Wu X, Luo J (2023) Meta-causal learning for single domain generalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7683–7692
Xu Q, Zhang R, Zhang Y, Wang Y, Tian Q (2021) A fourier-based framework for domain generalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14383–14392
Tzeng E, Hoffman J, Zhang N, Saenko K, Darrell T (2014) Deep domain confusion: maximizing for domain invariance. arXiv preprint arXiv:1412.3474
Long M, Cao Y, Wang J, Jordan M (2015) Learning transferable features with deep adaptation networks. In: International Conference on Machine Learning, pp 97–105. PMLR
Zhang Y, Liu T, Long M, Jordan M (2019) Bridging theory and algorithm for domain adaptation. In: International Conference on Machine Learning, pp 7404–7413. PMLR
Long M, Zhu H, Wang J, Jordan MI (2017) Deep transfer learning with joint adaptation networks. In: International Conference on Machine Learning, pp 2208–2217. PMLR
Du Z, Li J, Su H, Zhu L, Lu K (2021) Cross-domain gradient discrepancy minimization for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3937–3946
Pei Z, Cao Z, Long M, Wang J (2018) Multi-adversarial domain adaptation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 32
Gao Z, Zhang S, Huang K, Wang Q, Zhong C (2021) Gradient distribution alignment certificates better adversarial domain adaptation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 8937–8946
Hoyer L, Dai D, Wang H, Van Gool L (2023) Mic: Masked image consistency for context-enhanced domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11721–11732
Borgwardt KM, Gretton A, Rasch MJ, Kriegel H-P, Schölkopf B, Smola AJ (2006) Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 22(14):49–57
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neural Inform Process Syst 27:2672–2680
Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7794–7803
Bello I, Zoph B, Vaswani A, Shlens J, Le QV (2019) Attention augmented convolutional networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 3286–3295
Hu H, Gu J, Zhang Z, Dai J, Wei Y (2018) Relation networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3588–3597
Acuna D, Zhang G, Law MT, Fidler S (2021) f-domain adversarial learning: Theory and algorithms. In: International Conference on Machine Learning, pp 66–75. PMLR
Ben-David S, Blitzer J, Crammer K, Kulesza A, Pereira F, Vaughan JW (2010) A theory of learning from different domains. Mach Learn 79:151–175
Nguyen X, Wainwright MJ, Jordan MI (2010) Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Trans Inf Theory 56(11):5847–5861
Jin Y, Wang X, Long M, Wang J (2020) Minimum class confusion for versatile domain adaptation. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI 16, pp 464–480. Springer
Saenko K, Kulis B, Fritz M, Darrell T (2010) Adapting visual category models to new domains. In: Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part IV 11, pp 213–226. Springer
Caputo B, Müller H, Martinez-Gomez J, Villegas M, Acar B, Patricia N, Marvasti N, Üsküdarlı S, Paredes R, Cazorla M et al (2014) Imageclef 2014: overview and analysis of the results. In: Information Access Evaluation. Multilinguality, Multimodality, and Interaction: 5th International Conference of the CLEF Initiative, CLEF 2014, Sheffield, UK, September 15-18, 2014. Proceedings 5, pp 192–211. Springer
Venkateswara H, Eusebio J, Chakraborty S, Panchanathan S (2017) Deep hashing network for unsupervised domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5018–5027
Peng X, Bai Q, Xia X, Huang Z, Saenko K, Wang B (2019) Moment matching for multi-source domain adaptation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1406–1415
Jiang J, Chen B, Fu B, Long M (2020) Transfer-learning-library. GitHub
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: an imperative style, high-performance deep learning library. Adv Neural Inform Process Syst 32:8024–8035
Saito K, Watanabe K, Ushiku Y, Harada T (2018) Maximum classifier discrepancy for unsupervised domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3723–3732
Lee C-Y, Batra T, Baig MH, Ulbricht D (2019) Sliced wasserstein discrepancy for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10285–10295
Cui S, Wang S, Zhuo J, Li L, Huang Q, Tian Q (2020) Towards discriminability and diversity: Batch nuclear-norm maximization under label insufficient situations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3941–3950
Tang H, Jia K (2020) Discriminative adversarial domain adaptation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp 5940–5947
Xu X, He H, Zhang H, Xu Y, He S (2019) Unsupervised domain adaptation via importance sampling. IEEE Trans Circ Syst Video Technol 30(12):4688–4699
Wei G, Lan C, Zeng W, Chen Z (2021) Metaalign: Coordinating domain alignment and classification for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 16643–16653
Li S, Xie M, Gong K, Liu CH, Wang Y, Li W (2021) Transferable semantic augmentation for domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11516–11525
Li S, Xie M, Lv F, Liu CH, Liang J, Qin C, Li W (2021) Semantic concentration for domain adaptation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 9102–9111
Zhou L, Xiao S, Ye M, Zhu X, Li S (2023) Adaptive mutual learning for unsupervised domain adaptation. IEEE Transactions on Circuits and Systems for Video Technology
Zhang Y, Wang X, Liang J, Zhang Z, Wang L, Jin R, Tan T (2023) Free lunch for domain adversarial training: Environment label smoothing. arXiv preprint arXiv:2302.00194
Luo Y-W, Ren C-X (2021) Conditional bures metric for domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13989–13998
Wang H, Wang Z, Du M, Yang F, Zhang Z, Ding S, Mardziel P, Hu X (2020) Score-cam: Score-weighted visual explanations for convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 24–25
Abnar S, Zuidema W (2020) Quantifying attention flow in transformers. arXiv preprint arXiv:2005.00928
Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(11):2579–2605
Acknowledgements
This research was supported by the Research Foundation of the Institute of Environment-friendly Materials and Occupational Health (Wuhu) Anhui University of Science and Technology (No. ALW2021YF04) and the Medical Special Cultivation Project of Anhui University of Science and Technology (No. YZ2023H2C005).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lin, Ye., Liu, E., Liang, X. et al. Global–local Bi-alignment for purer unsupervised domain adaptation. J Supercomput 80, 14925–14952 (2024). https://doi.org/10.1007/s11227-024-06038-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-024-06038-4