Learning cross-domain representations by vision transformer for unsupervised domain adaptation

Ye, Yifan; Fu, Shuai; Chen, Jing

doi:10.1007/s00521-023-08269-7

Learning cross-domain representations by vision transformer for unsupervised domain adaptation

Original Article
Published: 26 January 2023

Volume 35, pages 10847–10860, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

521 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Unsupervised Domain Adaptation (UDA) is a popular machine learning technique to reduce the distribution discrepancy among domains. Generally, most UDA methods utilize a deep Convolutional Neural Networks (CNNs) and a domain discriminator to learn a domain-invariant representation, but it does not equal to a discriminative domain-specific representation. Transformers (TRANS), which has been proved to be more robust to domain shift than CNNs, has gradually become a powerful alternative to CNNs in feature representation. On the other hand, the domain shift between the labeled source data and the unlabeled target data produces a significant amount of label noise, which needs a more robust connection between the source and target domain. This report proposes a simple yet effective UDA method for learning cross-domain representations by vision Transformers in a self-training manner. Unlike the conventional form of dividing an image into multiple non-overlapping patches, we proposed a novel method that aggregates both source domain labeled patches and target domain pseudo-labeled target patches. In addition, a cross-domain alignment loss is proposed to match the centroid of labeled source patches and pseudo-labeled target patches. Extensive experiments show that our proposed method achieves state-of-the-art (SOTA) results on several standard UDA benchmarks (90.5\(\%\) on ImageCLEF-DA, Office-31) by a transformers baseline model without any extra assistant networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on Image Data Augmentation for Deep Learning

Article Open access 06 July 2019

ImageNet Large Scale Visual Recognition Challenge

Article 11 April 2015

Image Matching from Handcrafted to Deep Features: A Survey

Article Open access 04 August 2020

Data availability

The data that support the finddings of this study are openly available in the following websites:

ImageCLEF-DA dataset is available in: https://www.imageclef.org/2014/adaptation

Office-31 dataset is available in: https://www.amazon.com (have mentioned in paper)

Office-Home dataset is available in: https://www.hemanthdv.org/officeHomeDataset.html.

Notes

References

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inform Process Syst 30:5998–6008
Google Scholar
Long M, Zhu H, Wang J, Jordan MI (2016) Unsupervised domain adaptation with residual transfer networks. Adv Neural Inform Process Syst 29:136–144
Google Scholar
Tzeng E, Hoffman J, Zhang N, Saenko K, Darrell T (2014) Deep domain confusion: maximizing for domain invariance. arXiv:1412.3474
Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, Marchand M, Lempitsky V (2016) Domain-adversarial training of neural networks. J Mach Learn Res 17(1):2030–2096
MATH MathSciNet Google Scholar
Xie S, Zheng Z, Chen L, Chen C (2018) Learning semantic representations for unsupervised domain adaptation. In: Proceedings of the 35th international conference on machine learning, pp 5423–5432
Gu X, Sun J, Xu Z (2020) Spherical space domain adaptation with robust pseudo-label loss. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9101–9110
Xiao N, Zhang L, Xu X, Guo T, Ma H (2021) Label disentangled analysis for unsupervised visual domain adaptation. Knowl-Based Syst 229:107309
Article Google Scholar
Wang Y, Nie L, Li Y, Chen S (2020) Soft large margin clustering for unsupervised domain adaptation. Knowl-Based Syst 192:105344
Article Google Scholar
Chen C, Chen Z, Jiang B, Jin X (2019) Joint domain alignment and discriminative feature learning for unsupervised deep domain adaptation. In: Proceedings of the AAAI conference on artificial intelligence, pp 3296–3303
Long M, Zhu H, Wang J, Jordan MI (2017) Deep transfer learning with joint adaptation networks. In: Proceedings of the 34th international conference on machine learning, pp 2208–2217
Yang L, Zhong P (2020) Discriminative and informative joint distribution adaptation for unsupervised domain adaptation. Knowl-Based Syst 207:106394
Article Google Scholar
Wu H, Yan Y, Ye Y, Ng MK, Wu Q (2020) Geometric knowledge embedding for unsupervised domain adaptation. Knowl-Based Syst 191:105155
Article Google Scholar
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neural Inform Process Syst 63:139–144
Google Scholar
Tzeng E, Hoffman J, Saenko K, Darrell T (2017) Adversarial discriminative domain adaptation. In: IEEE conference on computer vision and pattern recognition, pp 2962–2971
Xu M, Zhang J, Ni B, Li T, Wang C, Tian Q, Zhang W (2020) Adversarial domain adaptation with domain mixup. In: Proceedings of the AAAI conference on artificial intelligence, pp 6502–6509
Kang G, Jiang L, Yang Y, Hauptmann AG (2019) Contrastive adaptation network for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4893–4902
Zhang Q, Zhang J, Liu W, Tao D (2019) Category anchor-guided unsupervised domain adaptation for semantic segmentation. Adv Neural Inform Process Syst 32:435–445
Google Scholar
Zhang Y, Jing C, Lin H, Chen C, Huang Y, Ding X, Zou Y (2021) Hard class rectification for domain adaptation. Knowl-Based Syst 222:107011
Article Google Scholar
Saito K, Ushiku Y, Harada T (2017) Asymmetric tri-training for unsupervised domain adaptation. In: Proceedings of the international conference on machine learning, pp 2988–2997
Zou Y, Yu Z, Kumar B, Wang J (2018) Unsupervised domain adaptation for semantic segmentation via class-balanced self-training. In: Proceedings of the European conference on computer vision, pp 289–305
Zou Y, Yu Z, Liu X, Kumar B, Wang J (2019) Confidence regularized self-training. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 5982–5991
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2021) An image is worth 16 × 16 words: transformers for image recognition at scale. In: Proceedings of the 9th international conference on learning representations
Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H (2021) Training data-efficient image transformers & distillation through attention. In: Proceedings of the 38th international conference on machine learning, pp 10347–10357
Naseer M, Ranasinghe K, Khan S, Hayat M, Khan FS, Yang M-H (2021) Intriguing properties of vision transformers, arXiv preprint arXiv:2105.10497
Benz P, Ham S, Zhang C, Karjauv A, Kweon IS (2021) Adversarial robustness comparison of vision transformer and mlp-mixer to CNNS, arXiv preprint arXiv:2110.02797
Zhang B, Wang Y, Hou W, Wu H, Wang J, Okumura M, Shinozaki T (2021) Flexmatch: boosting semi-supervised learning with curriculum pseudo labeling. Adv Neural Inform Process Syst 34:18408–18419
Google Scholar
Khan S, Naseer M, Hayat M, Zamir SW, Khan FS, Shah M (2021) Transformers in vision: a survey. ACM Comput Surv (CSUR)
Yuan L, Chen Y, Wang T, Yu W, Shi Y, Jiang Z-H, Tay FE, Feng J, Yan S (2021) Tokens-to-token vit: training vision transformers from scratch on imagenet. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 558–567
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022
Chen C-FR, Fan Q, Panda R (2021) Crossvit: cross-attention multi-scale vision transformer for image classification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 357–366
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: Proceedings of the European conference on computer vision, pp 213–229
Zheng S, Lu J, Zhao H, Zhu X, Luo Z, Wang Y, Fu Y, Feng J, Xiang T, Torr PH et al (2021) Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6881–6890
He S, Luo H, Wang P, Wang F, Li H, Jiang W (2021) Transreid: transformer-based object re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 15013–15022
El-Nouby A, Neverova N, Laptev I, Jégou H (2021) Training vision transformers for image retrieval, arXiv preprint arXiv:2102.05644
Huang L, Tan J, Liu J, Yuan J (2020) Hand-transformer: non-autoregressive structured modeling for 3D hand pose estimation. In: Proceedings of the European conference on computer vision, pp 17–33
Jiang Y, Chang S, Wang Z (2021) Transgan: two pure transformers can make one strong gan, and that can scale up. Adv Neural Inform Process Syst 34:14745–14758
Google Scholar
d’Ascoli S, Touvron H, Leavitt M, Morcos A, Biroli G, Sagun L (2021) Convit: improving vision transformers with soft convolutional inductive biases, arXiv preprint arXiv:2103.10697
Caron M, Touvron H, Misra I, Jégou H, Mairal J, Bojanowski P, Joulin A (2021) Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9650–9660
Amini M-R, Gallinari P (2002) Semi-supervised logistic regression. In: Proceedings of the 15th European conference on artificial intelligence, pp 390–394
Grandvalet Y, Bengio Y (2004) Semi-supervised learning by entropy minimization. Adv Neural Inform Process Syst 17:529–536
Google Scholar
McLachlan GJ (1975) Iterative reclassification procedure for constructing an asymptotically optimal rule of allocation in discriminant analysis. J Am Statist Assoc 70:365–369
Article MATH MathSciNet Google Scholar
Rosenberg C, Hebert M, Schneiderman H (2005) Semi-supervised self-training of object detection models. In: Proceedings of the seventh IEEE workshops on application of computer vision, pp 29–36
Lee D-H et al (2013) Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on challenges in representation learning, ICML, p 896
Berthelot D, Carlini N, Goodfellow I, Papernot N, Oliver A, Raffel CA (2019) Mixmatch: a holistic approach to semi-supervised learning. Adv Neural Inform Process Syst 32:5049–5059
Google Scholar
Sohn K, Berthelot D, Carlini N, Zhang Z, Zhang H, Raffel CA, Cubuk ED, Kurakin A, Li C-L (2020) Fixmatch: simplifying semi-supervised learning with consistency and confidence. Adv Neural Inform Process Syst 33:596–608
Google Scholar
Tzeng E, Hoffman J, Zhang N, Saenko K, Darrell T (2014) Deep domain confusion: maximizing for domain invariance, arXiv preprint arXiv:1412.3474
Long M, Cao Y, Wang J, Jordan M (2015) Learning transferable features with deep adaptation networks. In: Proceedings of the international conference on machine learning, pp 97–105
Zellinger W, Grubinger T, Lughofer E, Natschläger T, Saminger-Platz S (2017) Central moment discrepancy (cmd) for domain-invariant representation learning. In: Proceedings of the 5th international conference on learning representations
Li J, Li Z, Lü S (2021) Unsupervised double weighted domain adaptation. Neural Comput Appl 33(8):3545–3566
Article Google Scholar
Long M, Cao Z, Wang J, Jordan MI (2018) Conditional adversarial domain adaptation. Adv Neural Inform Process Syst 31:1647–1657
Google Scholar
Zhou Q, Wang S, Xing Y et al (2021) Multiple adversarial networks for unsupervised domain adaptation. Knowl-Based Syst 212:106606
Article Google Scholar
Zhang W, Ouyang W, Li W, Xu D (2018) Collaborative and adversarial network for unsupervised domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3801–3809
Chen C, Xie W, Huang W, Rong Y, Ding X, Huang Y, Xu T, Huang J (2019) Progressive feature alignment for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 627–636
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 248–255
Na J, Jung H, Chang HJ, Hwang W (2021) Fixbi: bridging domain spaces for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1094–1103
Saenko K, Kulis B, Fritz M, Darrell T (2010) Adapting visual category models to new domains. In: Proceedings of the European conference on computer vision, pp 213–226
Venkateswara H, Eusebio J, Chakraborty S, Panchanathan S (2017) Deep hashing network for unsupervised domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5018–5027
Wu Y, Inkpen D, El-Roby A (2020) Dual mixup regularized learning for adversarial domain adaptation. In: Proceedings of the European conference on computer vision, pp 540–555
You F, Su H, Li J, Zhu L, Lu K, Yang Y (2021) Learning a weighted classifier for conditional domain adaptation. Knowl-Based Syst 215:106774
Article Google Scholar
Xu R, Li G, Yang J, Lin L (2019) Larger norm more transferable: an adaptive feature norm approach for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1426–1435
Du Z, Li J, Su H, Zhu L, Lu K (2021) Cross-domain gradient discrepancy minimization for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3937–3946
Xiao N, Zhang L (2021) Dynamic weighted learning for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15242–15251
Chang W-G, You T, Seo S, Kwak S, Han B (2019) Domain-specific batch normalization for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7354–7362
Zhang Y, Tang H, Jia K, Tan M (2019) Domain-symmetric networks for adversarial domain adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5031–5040
Sharma A, Kalluri T, Chandraker M (2021) Instance level affinity-based transfer for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5361–5371
Hu L, Kan M, Shan S, Chen X (2020) Unsupervised domain adaptation with hierarchical gradient synchronization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4043–4052
Li J, Chen E, Ding Z, Zhu L, Lu K, Shen HT (2020) Maximum density divergence for domain adaptation. IEEE Trans Pattern Anal Mach Intell 43(11):3918–3930
Article Google Scholar
Liang J, Hu D, Feng J (2020) Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. In: Proceedings of the international conference on machine learning, pp 6028–6039
Animasaun IL, Shah NA, Wakif A, Mahanthesh B, Sivaraj R, Koriko OK (2022) Ratio of momentum diffusivity to thermal diffusivity: introduction, meta-analysis, and scrutinization. CRC Press, Cambridge
Book Google Scholar
Cao W, Animasaun I, Yook S-J, Oladipupo V, Ji X (2022) Simulation of the dynamics of colloidal mixture of water with various nanoparticles at different levels of partial slip: Ternary-hybrid nanofluid. Int Commun Heat Mass Transfer 135:106069
Article Google Scholar
Van Der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9:2579–2625
MATH Google Scholar
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV)

Download references

Author information

Authors and Affiliations

School of Physics and Optoelectronic, Guangdong University of Technology, Xiaoguwei street, Guangzhou, 510000, Guangdong, China
Yifan Ye, Shuai Fu & Jing Chen

Authors

Yifan Ye
View author publications
You can also search for this author in PubMed Google Scholar
Shuai Fu
View author publications
You can also search for this author in PubMed Google Scholar
Jing Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jing Chen.

Ethics declarations

Conflict of interest

All authors disclosed no relevant relationships.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Yifan Ye and Shuai Fu have the same contribution to this article.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ye, Y., Fu, S. & Chen, J. Learning cross-domain representations by vision transformer for unsupervised domain adaptation. Neural Comput & Applic 35, 10847–10860 (2023). https://doi.org/10.1007/s00521-023-08269-7

Download citation

Received: 02 August 2022
Accepted: 06 January 2023
Published: 26 January 2023
Issue Date: May 2023
DOI: https://doi.org/10.1007/s00521-023-08269-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning cross-domain representations by vision transformer for unsupervised domain adaptation

Abstract

Access this article

Similar content being viewed by others

A survey on Image Data Augmentation for Deep Learning

ImageNet Large Scale Visual Recognition Challenge

Image Matching from Handcrafted to Deep Features: A Survey

Data availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning cross-domain representations by vision transformer for unsupervised domain adaptation

Abstract

Access this article

Similar content being viewed by others

A survey on Image Data Augmentation for Deep Learning

ImageNet Large Scale Visual Recognition Challenge

Image Matching from Handcrafted to Deep Features: A Survey

Data availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation