CAD Models to Real-World Images: A Practical Approach to Unsupervised Domain Adaptation in Industrial Object Classification

Ritter, Dennis; Hemberger, Mike; Hönig, Marc; Stopp, Volker; Rodner, Erik; Hildebrand, Kristian

doi:10.1007/978-3-031-74640-6_33

Dennis Ritter⁴,
Mike Hemberger⁵,
Marc Hönig⁶,
Volker Stopp⁶,
Erik Rodner⁷ &
…
Kristian Hildebrand⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2136))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

7 Accesses

Abstract

In this paper, we systematically analyze unsupervised domain adaptation pipelines for object classification in a challenging industrial setting. In contrast to standard natural object benchmarks existing in the field, our results highlight the most important design choices when only category-labeled CAD models are available but classification needs to be done with real-world images. Our domain adaptation pipeline achieves SoTA performance on the VisDA benchmark, but more importantly, drastically improves recognition performance on our new open industrial dataset comprised of 102 mechanical parts. We conclude with a set of guidelines that are relevant for practitioners needing to apply state-of-the-art unsupervised domain adaptation in practice. Our code is available at https://github.com/dritter-bht/synthnet-transfer-learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Cross-pollination of knowledge for object detection in domain adaptation for industrial automation

Article Open access 09 September 2024

Visual Domain Adaptation in the Deep Learning Era

PSO-based unified framework for unsupervised domain adaptation in image classification

Article 07 August 2024

Notes

References

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16$\times $16 words: transformers for image recognition at scale. In: ICLR (2021)
Google Scholar
Ganin, Y., et al.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17(1), 2096–2030 (2016)
ADS MathSciNet MATH Google Scholar
Goehring, D., Hoffman, J., Rodner, E., Saenko, K., Darrell, T.: Interactive adaptation of real-time object detectors. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 1282–1289. IEEE (2014)
Google Scholar
Goodfellow, I.J., et al.: Generative adversarial nets. In: NeurIPS (2014)
Google Scholar
Hendrycks, D., Mu, N., Cubuk, E.D., Zoph, B., Gilmer, J., Lakshminarayanan, B.: Augmix: a simple data processing method to improve robustness and uncertainty. In: ICLR (2019)
Google Scholar
Hoffman, J., et al.: Cycada: cycle-consistent adversarial domain adaptation. In: ICML (2017)
Google Scholar
Hoyer, L., Dai, D., Wang, H., Van Gool, L.: Mic: masked image consistency for context-enhanced domain adaptation. In: CVPR, pp. 11721–11732 (2023)
Google Scholar
Jiang, J., Chen, B., Fu, B., Long, M.: Transfer-learning-library (2020). https://github.com/thuml/Transfer-Learning-Library
Jiang, J., Shu, Y., Wang, J., Long, M.: Transferability in deep learning: a survey. ArXiv arxiv:2201.05867 (2022)
Jin, Y., Wang, X., Long, M., Wang, J.: Minimum class confusion for versatile domain adaptation. In: ECCV (2019)
Google Scholar
Kang, G., Jiang, L., Yang, Y., Hauptmann, A.: Contrastive adaptation network for unsupervised domain adaptation. In: CVPR, pp. 4888–4897 (2019)
Google Scholar
Kim, D., Wang, K., Sclaroff, S., Saenko, K.: A broad study of pre-training for domain generalization and adaptation. In: ECCV (2022)
Google Scholar
Kumar, A., Raghunathan, A., Jones, R.M., Ma, T., Liang, P.: Fine-tuning can distort pretrained features and underperform out-of-distribution. In: ICLR (2022)
Google Scholar
Lee, C.Y., Batra, T., Baig, M.H., Ulbricht, D.: Sliced wasserstein discrepancy for unsupervised domain adaptation. In: CVPR, pp. 10277–10287 (2019)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter MATH Google Scholar
Liu, Z., et al.: Swin transformer v2: scaling up capacity and resolution. In: CVPR, pp. 11999–12009 (2021)
Google Scholar
Long, M., Cao, Y., Wang, J., Jordan, M.: Learning transferable features with deep adaptation networks. In: ICML, pp. 97–105. PMLR (2015)
Google Scholar
Long, M., Cao, Z., Wang, J., Jordan, M.I.: Conditional adversarial domain adaptation. In: NeurIPS (2017)
Google Scholar
Long, M., Zhu, H., Wang, J., Jordan, M.I.: Deep transfer learning with joint adaptation networks. In: ICML (2016)
Google Scholar
Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. arXiv: Learning (2016)
Peng, X., Usman, B., Kaushik, N., Wang, D., Hoffman, J., Saenko, K.: Visda: a synthetic-to-real benchmark for visual domain adaptation. In: CVPR-W, pp. 2021–2026 (2018)
Google Scholar
Rangwani, H., Aithal, S.K., Mishra, M., Jain, A., Babu, R.V.: A closer look at smoothness in domain adversarial training. In: ICML (2022)
Google Scholar
Real, E., Shlens, J., Mazzocchi, S., Pan, X., Vanhoucke, V.: Youtube-boundingboxes: a large high-precision human-annotated data set for object detection in video. In: CVPR, pp. 7464–7473 (2017)
Google Scholar
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jegou, H.: Training data-efficient image transformers and distillation through attention. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning, vol. 139, pp. 10347–10357. PMLR (2021)
Google Scholar
Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. In: CVPR, pp. 2962–2971 (2017)
Google Scholar
Woo, S., et al.: Convnext v2: co-designing and scaling convnets with masked autoencoders. ArXiv arxiv:2301.00808 (2023)
Xu, T., Chen, W., Pichao, W., Wang, F., Li, H., Jin, R.: Cdtrans: cross-domain transformer for unsupervised domain adaptation. In: ICLR (2021)
Google Scholar
Yang, J., Liu, J., Xu, N., Huang, J.: Tvt: transferable vision transformer for unsupervised domain adaptation. In: WACV, pp. 520–530 (2021)
Google Scholar
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017)
Google Scholar

Download references

Acknowledgements

This work was funded by the German Federal Ministry of Education and Research (BMBF) through their support of the project SynthNet, a part of the KMU-Innovativ initiative (project code: 01IS21002C), the KI-Werkstatt project at the University of Applied Sciences Berlin (part of the Forschung an Fachhochschulen program (project code: 13FH028KI1) as well as project TAHAI (funded by IFAF Berlin).

Author information

Authors and Affiliations

Berliner Hochschule für Technik, Berlin, Germany
Dennis Ritter & Kristian Hildebrand
nyris GmbH, Berlin, Germany
Mike Hemberger
topex GmbH, Erkenbrechtsweiler, Germany
Marc Hönig & Volker Stopp
KI-Werkstatt/FB2, University of Applied Sciences Berlin, Berlin, Germany
Erik Rodner

Authors

Dennis Ritter
View author publications
You can also search for this author in PubMed Google Scholar
Mike Hemberger
View author publications
You can also search for this author in PubMed Google Scholar
Marc Hönig
View author publications
You can also search for this author in PubMed Google Scholar
Volker Stopp
View author publications
You can also search for this author in PubMed Google Scholar
Erik Rodner
View author publications
You can also search for this author in PubMed Google Scholar
Kristian Hildebrand
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dennis Ritter .

Editor information

Editors and Affiliations

University of Turin, Turin, Italy
Rosa Meo
Sapienza University of Rome, Rome, Italy
Fabrizio Silvestri

Appendices

A Implementation Details

1.1 A.1 Adapting Pretrained Models to Rendered Images Implementation Details

We use pretrained models “google/vit-base-patch16-224-in21k" (ViT) [2], “microsoft/swinv2-base-patch4-window12-192-22k" (SwinV2) [17], “facebook/convnextv2-base-22k-224" (ConvNextV2) [27], and “facebook/deit-base-distilled-patch16-224" (DeiT) [25] from Huggingface^{Footnote 1} for experiments using the VisDA-2017 dataset but only ViT and SwinV2 for our Topex-Printer dataset. ViT, SwinV2, and ConvNextV2 were pretrained on ImageNet22K, while DeiT has been pretrained on ImagNet1K. We perform three different training schemes, training the classification head only (CH), fine-tuning the full model (FT), and a combination of CH and FT, tuning the classification head first and continuing with full fine-tuning (CH-FT) inspired by [14].

1.
For CH we use the Pytorch^{Footnote 2} SGD optimizer with learning rates [10.0, 0.1, 0.001], momentum 0.9, no weight decay, no learning rate scheduler, and no warmup.
2.
For FT we use the Pytorch implementation of AdamW optimizer with learning rates [0.1, 0.001, 0.00001], weight decay 0.01, cosine annealing learning rate scheduler^{Footnote 3} [21] without restarts, and two warmup epochs (10% of total epochs).

For both datasets for data augmentation Pytorch 2.0.0 implementation^{Footnote 4} is used.

1.2 A.2 Adapting to Real-World Images with Unsupervised Domain Adaptation Implementation Details

For UDA experiments we start from the best source-domain-only trained CH checkpoint with respect to the model architecture and continue training using the same parameters as the best FT run for each model as described in the paper. We use Pytorch 2.0.0 implementations of image augmentations random resized crop, horizontal flip, and AugMix [6] with the same parameters described in the last paragraph of Sect. A.1. We use the Transfer Learning Library (tllib) [9, 10] implementations of CDAN (hidden size 1024) and MCC [11] (temperature 1.0) domain adaptation methods and also combine both using two different initial checkpoints for each model architecture. One initial checkpoint from Huggingface, pretrained on ImageNet22K [1] (“google/vit-base-patch16-224-in21k" (ViT) and “microsoft/swinv2-base-patch4-window12-192-22k" (SwinV2)) and the best-performing checkpoint after training only the classification head from our source-domain-only experiments. Again, we use global random seed 42 for all experiments and training is performed on a single Nvidia Tesla V100 PCIE 32GB GPU.

Different from other methods, we perform considerably better correctly identifying the truck class but underperform on the motorcycle and person class instead. The confusion matrix shown in Fig. 6 shows, that our trained model often mixes up motorcycle samples with bicycles (7%) and skateboards (10%) while the person class is mixed up rather uniformly (3%–4%) with skateboards, plants, motorcycles, and horses.

B Dataset Samples

(See Figs. 3, 4 and 5).

C Evaluation Results

(See Tables 3, 5 and 8).

Table 2. Acc@1 in % on target domain (real images) for all source-domain-only training experiments on VisDA-2017 classification dataset. Note that base transform means that random color jitter and random grayscale transforms are applied. Faded out rows are representing numerically instable runs that have been canceled due to NaN loss for example.

Full size table

Table 3. Acc@1 in % on target domain (real images) for all source-domain-only training experiments on the Topex-Printer dataset. Note that base transform means that random color jitter and random grayscale transforms are applied. Faded out rows are representing numerically instable runs that have been canceled due to NaN loss for example.

Full size table

Table 4. Acc@1 in % on target domain (real images) for best results per model and training scheme in our source domain training experiments on VisDA-2017 classification dataset. Note that base transform means that random color jitter and random grayscale transforms are applied instead of AugMix (other augmentations stay the same as explained in Sect. A.1).

Full size table

Table 5. Acc@1 in % on target domain (real images) for best results per model and training scheme in our source-domain-only training experiments on Topex-Printer dataset. Note that base transform means that random color jitter and random grayscale transforms are applied instead of AugMix (other augmentations stay the same as explained in Sect. A.1).

Full size table

Table 6. Acc@1 in % on target domain (real images) for all UDA experiments on VisDA-2017 classification dataset. Note that init checkpoint describes the model checkpoint used for the UDA experiments. CH refers to the best-performing CH training scheme from our DG experiments respecting the used model architecture and IN22K refers to the respective Huggingface model checkpoints described in Sect. A.2.

Full size table

Table 7. Acc@1 in % on target domain (real images) for all UDA experiments on the Topex-Printer dataset. Note that init checkpoint describes the model checkpoint used for the UDA experiments. CH refers to the best-performing CH training scheme from our source-domain-only training experiments respecting the used model architecture and IN22K refers to the respective Huggingface model checkpoints described in Sect. A.2.

Full size table

Table 8. Image classification top-1 accuracy in % on VisDA-2017 target domain (real images) across all classes compared to literature. We report our best source-domain-only and UDA runs for the ViT and SwinV2 architecture.

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ritter, D., Hemberger, M., Hönig, M., Stopp, V., Rodner, E., Hildebrand, K. (2025). CAD Models to Real-World Images: A Practical Approach to Unsupervised Domain Adaptation in Industrial Object Classification. In: Meo, R., Silvestri, F. (eds) Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2023. Communications in Computer and Information Science, vol 2136. Springer, Cham. https://doi.org/10.1007/978-3-031-74640-6_33

Download citation

DOI: https://doi.org/10.1007/978-3-031-74640-6_33
Published: 01 January 2025
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-74639-0
Online ISBN: 978-3-031-74640-6
eBook Packages: Artificial Intelligence (R0)

Publish with us

Policies and ethics

CAD Models to Real-World Images: A Practical Approach to Unsupervised Domain Adaptation in Industrial Object Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Cross-pollination of knowledge for object detection in domain adaptation for industrial automation

Visual Domain Adaptation in the Deep Learning Era

PSO-based unified framework for unsupervised domain adaptation in image classification

Notes

References

Acknowledgements