Abstract
The semantic segmentation of unsupervised simulation to real-world adjustment (USRA) is designed to improve the training of simulation data in a real-world environment. In practical applications, such as robotic vision and autonomous driving, this could save the cost of manually annotating data. Regular USRA's are often assumed to include large samples of unla-Beled's real-world data for training purposes. However, this assumption is incorrect because of the difficulties of collection and, in practice, data on some practices is still lacking. Therefore, our aim is to reduce the need for large amounts of real data, in the case of unsupervised simulation-real-world domain adaptability (USDA) and generalization (USDG) issues, which only exist in the real world. In order to make up for the limited actual data, this paper first constructs a pseudo-target domain, using a real data to achieve the simulation data style. Based on this method, this paper proposes a cross-domain interdomain randomization method based on class perception to extract domain invariant knowledge from simulated objects and virtual objects. We will demonstrate the effectiveness of our approach in USDA and USDG, such as Cityscapes and Foggy Cityscapes, which are far superior to existing technological means.






Similar content being viewed by others
Data availability
Inquiries about related datasets can be made through the author's email address 226151109@mail.sit.edu.cn.
References
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: Carla: an open urban driving simulator. In: CORL (2017)
Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: ground truth from computer games. In: ECCV (2016)
Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The synthia dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: CVPR (2016)
Hoffman, J., Tzeng, E., Park, T., Zhu, J.-Y., Isola, P., Saenko, K., Efros, A., Darrell, T.: Cycada: cycle-consistent adversarial domain adaptation. In: ICML (2018)
Tranheden, W., Olsson, V., Pinto, J., Svensson, L.: Dacs: domain adaptation via cross-domain mixed sampling. In: WACV (2021)
Tsai, Y.-H., Hung, W.-C., Schulter, S., Sohn, K., Yang, M.-H., Chandraker, M.: Learning to adapt structured output space for semantic segmentation. In: CVPR, (2018)
Vu, T.-H., Jain, H., Bucher, M., Cord, M., Pérez, P.: Advent: adversarial entropy minimization for domain adaptation in semantic segmentation. In: CVPR (2019)
Wulfmeier, M., Bewley, A., Posner, I.: Incremental adversarial domain adaptation for continually changing environments. In: ICRA (2018)
Choi, S., Jung, S., Yun, H., Kim, J.T., Kim, S., Choo, J.: Robustnet: improving domain generalization in urban-scene segmentation via instance selective whitening. In: CVPR (2021)
Kim, N., Son, T., Lan, C., Zeng, W., Kwak, S.: Wedge: web-image assisted domain generalization for semantic segmentation. arXiv:2109.14196 (2021)
Pan, X., Luo, P., Shi, J., Tang, X.: Two at once: enhancing learning and generalization capacities via ibn-net. In: ECCV (2018)
Peng, D., Lei, Y., Hayat, M., Guo, Y., Li, W.: Semantic-aware domain generalized segmentation. In: CVPR (2022)
Prakash, A., Boochoon, S., Brophy, M., Acuna, D., Cameracci, E., State, G., Shapira, O., Birchfield, S.: Structured domain randomization: Bridging the reality gap by context-aware synthetic data. In: ICRA (2019)
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: IROS (2017)
Yue, X., Zhang, Y., Zhao, S., Sangiovanni-Vincentelli, A., Keutzer, K., Gong, B.: Domain randomization and pyramid consistency: simulation-to-real generalization without accessing target domain data. In: ICCV (2019)
Luo, Y., Liu, P., Guan, T., Yu, J., Yang, Y.: Adversarial style mining for one-shot unsupervised domain adaptation. In: NeurIPS (2020)
Wu, X., Wu, Z., Lu, Y., Ju, L., Wang, S.: Style mixing and patchwise prototypical matching for one-shot unsupervised domain adaptive semantic segmentation. In: AAAI (2022)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR (2009)
Devaranjan, J., Kar, A., Fidler, S.: Metasim2: unsupervised learning of scene structure for synthetic data generation. In: ECCV (2020)
Kar, A., Prakash, A., Liu, M.-Y., Cameracci, E., Yuan, J., Rusiniak, M., Acuna, D., Torralba, A., Fidler, S.: Meta-sim: learning to generate synthetic datasets. In: ICCV (2019)
Luo, Y., Zheng, L., Guan, T., Yu, J., Yang, Y.: Taking a closer look at domain shift: category-level adversaries for semantics consistent domain adaptation. In: CVPR (2019)
Wulfmeier, M., Bewley, A., Posner, I.: Addressing appearance change in outdoor robotics with adversarial domain adaptation. In: IROS (2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. NIPS 28, 91–99 (2015)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: Single shot multibox detector. In: ECCV, pp. 21–37. Springer (2016)
Lin, T,-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR, pp. 2117–2125 (2017)
Tian, Z., Shen, C., Chen, H., He, T.: Fcos: fully convolutional one-stage object detection. In: ICCV, pp. 9627–9636 (2019)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv:1804.02767 (2018)
Jocher, G., Nishimura, K., Mineeva, T., Vilariño, R.: Yolov5, Code repository https://github.com/ultralytics/yolov5 (2020)
Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv:2004.10934 (2020)
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: exceeding yolo series in 2021. arXiv:2107.08430 (2021)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S. et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: ICLR (2020)
Bhojanapalli, S., Chakrabarti, A., Glasner, D., Li, D., Unterthiner, T., Veit, A.: Understanding robustness of transformers for image classification. In: CVPR, pp. 10231–10241 (2021)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229. Springer, (2020)
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: deformable transformers for end-to-end object detection. In: ICLR (2021)
Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: dynamic anchor boxes are better queries for detr. In: ICLR (2022)
Li, F., Zhang, H., Liu, S., Guo, J., Ni, L.M., Zhang, L.: Dn-detr: accelerate detr training by introducing query denoising. In: CVPR, pp. 13619–13627 (2022)
Wang, C.-Y., Liao, H.-Y.M., Wu, Y.-H., Chen, P.-Y., Hsieh, J.-W., Yeh, I.-H.: Cspnet: a new backbone that can enhance learning capability of cnn. In: CVPRW, pp. 390–391 (2020)
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017)
Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A., et al.: The open images dataset v4. IJCV 128(7), 1956–1981 (2020)
He Z., Zhang, L.: Multi-adversarial faster-rcnn for unrestricted object detection. In: ICCV, pp. 6668–6677 (2019)
Tarvainen A., Valpola, H.: Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. NIPS 30 (2017)
Khodabandeh, M., Vahdat, A., Ranjbar, M., Macready, W.G.: A robust learning approach to domain adaptive object detection. In: ICCV, pp. 480–490 (2019)
Kim, T., Jeong, M., Kim, S., Choi, S., Kim, C.: Diversify and match: A domain adaptive representation learning paradigm for object detection. In: CVPR, pp. 12456–12465 (2019)
He Z., Zhang, L.: Domain adaptive object detection via asymmetric tri-way faster-rcnn. In: ECCV, pp. 309–324. Springer (2020)
Nguyen, D.-K., Tseng, W.-L., Shuai, H.-H.: Domain-adaptive object detection via uncertainty-aware distribution alignment. In: ACMMM, pp. 2499–2507 (2020)
He, Z., Zhang, L., Yang, Y., Gao, X.: Partial alignment for object detection in the wild. In: TCSVT (2021)
Xu, C.-D., Zhao, X.-R., Jin, X., Wei, X.-S.: Exploring categorical regularization for domain adaptive object detection. In: CVPR, pp. 11724–11733. (2020)
Chen, C., Zheng, Z., Ding, X., Huang, Y., Dou, Q.: Harmonizing transferability and discriminability for adapting object detectors. In: CVPR, pp. 8869–8878 (2020)
Deng, J., Li, W., Chen, Y., Duan, L.: Unbiased mean teacher for cross-domain object detection. In: CVPR, pp. 4091–4101 (2021)
Yao, X., Zhao, S., Xu, P., Yang, J.: Multi-source domain adaptation for object detection. In: ICCV, pp. 3273–3282 (2021)
Shi, W., Zhang, L., Chen, W., Pu, S.: Universal domain adaptive object detector. In: ACMMM, pp. 2258–2266 (2022)
Li, W., Liu, X., Yuan, Y.: Sigma: semantic-complete graph matching for domain adaptive object detection. In: CVPR (2022)
Zhao, L., Wang, L.: Task-specific inconsistency alignment for domain adaptive object detection. In: CVPR (2022)
Wu, J., Chen, J., He, M., Wang, Y., Li, B., Ma, B., Gan, W., Wu, W., Wang, Y., Huang, D.: Target-relevant knowledge preservation for multi-source domain adaptive object detection. In: CVPR (2022)
He, M., Wang, Y., Wu, J., Wang, Y., Li, H., Li, B., Gan, W., Wu, W., Qiao, Y.: Cross domain object detection by target-perceived dual branch distillation. In: CVPR (2022)
Chen, M., Chen, W., Yang, S., Song, J., Wang, X., Zhang, L., Yan, Y., Qi, D., Zhuang, Y., Xie, D. et al.: Learning domain adaptive object detection with probabilistic teacher. ICML (2022)
Zhou, W., Du, D., Zhang, L., Luo, T., Wu, Y: Multi-granularity alignment domain adaptation for object detection. In: CVPR (2022)
Chen, Y., Li, W., Sakaridis, C., Dai, D., Van Gool, L.: Domain adaptive faster r-cnn for object detection in the wild. In: CVPR, pp. 3339–3348 (2018)
Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., Lempitsky, V.: Domain-adversarial training of neural networks. JMLR 17(1), 2096–2030 (2016)
Saito, K., Ushiku, Y., Harada, T., Saenko, K.: Strong-weak distribution alignment for adaptive object detection. In: CVPR, pp. 6956–6965 (2019)
Shen, Z., Maheshwari, H., Yao, W., Savvides, M.: Scl: towards accurate domain adaptive object detection via gradient detach based stacked complementary losses. arXiv:1911.02559 (2019)
Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV, pp. 2223–2232 (2017)
Hsu, C.-C., Tsai, Y.-H., Lin, Y.-Y., Yang, M.-H.: Every pixel matters: Center-aware feature alignment for domain adaptive object detector. In: ECCV, pp. 733–748. Springer (2020)
Chen, C., Zheng, Z., Huang, Y., Ding, X., Yu, Y.: I3net: Implicit instance-invariant network for adapting one-stage object detectors. In: CVPR, pp. 12576–12585 (2021)
Liu, W., Ren, G., Yu, R., Guo, S., Zhu, J., Zhang, L.: Image-adaptive yolo for object detection in adverse weather conditions. arXiv:2112.08088 (2021)
Hnewa, M., Radha, H.: Multiscale domain adaptive yolo for cross-domain object detection. arXiv:2106.01483 (2021)
Vidit, V., Salzmann, M.: Attention-based domain adaptation for single stage detectors. arXiv:2106.07283 (2021)
Zhang, S., Tuo, H., Hu, J., Jing, Z.: Domain adaptive yolo for one-stage cross-domain detection. In: ACML, pp. 785–797. PMLR (2021)
Duan, L., Tsang, I.W., Xu, D.: Domain transfer multiple kernel learning. TPAMI 34(3), 465–479 (2012)
Gopalan, R., Li, R., Chellappa, R.: Domain adaptation for object recognition: An unsupervised approach. In: ICCV, pp. 999–1006. IEEE (2011)
Long, M., Cao, Y., Wang, J., Jordan, M.: Learning transferable features with deep adaptation networks. In: ICML, pp. 97–105. PMLR (2015)
Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: ICML, pp. 1180–1189. PMLR (2015)
Motiian, S., Piccirilli, M., Adjeroh, D.A., Doretto, G.: Unified deep supervised domain adaptation and generalization. In: ICCV, pp. 5715–5725 (2017)
Inoue, N., Furuta, R., Yamasaki, T., Aizawa, K.: Cross-domain weakly-supervised object detection through progressive domain adaptation. In CVPR, pp. 5001–5009 (2018)
Cai, Q., Pan, Y., Ngo, C.-W., Tian, T., Duan, L., Yao, T: Exploring object relation in mean teacher for cross-domain detection. In: CVPR, pp. 11457–11466 (2019)
Hoyer, L., Dai, D., Van Gool, L.: Daformer: Improving network architectures and training strategies for domain-adaptive semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9924–9935 (2022)
Bousmalis, K., Irpan, A., Wohlhart, P., Bai, Y., Kelcey, M., Kalakrishnan, M., Downs, L., Ibarz, J., Pastor, P., Konolige, K. et al.: Using simulation and domain adaptation to improve efficiency of deep robotic grasping. In: ICRA (2018)
Fang, K., Bai, Y., Hinterstoisser, S., Savarese, S., Kalakrishnan, M.: Multi-task domain adaptation for deep learning of instance grasping from simulation. In: ICRA (2018)
Gogoll, D., Lottes, P., Weyler, J., Petrinic, N., Stachniss, C.: Unsupervised domain adaptation for transferring plant classification systems to new field environments, crops, and robots. In: IROS (2020)
Keser, M., Savkin, A., Tombari, F.: Content disentanglement for semantically consistent synthetic-to-real domain adaptation. In: IROS (2021)
Messikommer, N., Gehrig, D., Gehrig, M., Scaramuzza, D.: Bridging the gap between events and frames through unsupervised domain adaptation. RA-L 7(2), 3515–3522 (2022)
Palazzo, S., Guastella, D.C., Cantelli, L., Spadaro, P., Rundo, F., Muscato, G., Giordano, D., Spampinato, C.: Domain adaptation for outdoor robot traversability estimation from rgb data with safety-preserving loss. In: IROS (2020)
Sakuma, H., Konishi, Y.: Geometry-aware un-supervised domain adaptation for stereo matching. In: ICRA (2021)
Yun, W., Han, B., Lee, J., Kim, J.: Kim, J: Target-style-aware unsupervised domain adaptation for object detection. RA-L 6(2), 3825–3832 (2021)
Zhang, J., Tai, L., Yun, P., Xiong, Y., Liu, M., Boedecker, J., Burgard, W.: Vr-goggles for robots: real-to-sim domain adaptation for visual control. RA-L 4(2), 1148–1155 (2019)
Benaim, S., Wolf, L.: One-shot unsupervised cross domain translation. In: NeurIPS (2018)
Huang, J., Guan, D., Xiao, A., Lu, S.: Fsdr: Frequency space domain randomization for domain generalization. In: CVPR (2021)
Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: ICCV (2017)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: ICLR (2021)
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: hierarchical vision transformer using shifted windows. In: ICCV (2021)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S: End-to-end object detection with transformers. In: ECCV (2020)
Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., Torr, P.H.S. et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: CVPR (2021)
Xu, T., Chen, W., Wang, P., Wang, F., Li, H., Jin, R.: Cdtrans: Cross-domain transformer for unsupervised domain adaptation. In: ICLR (2022)
Olsson, V., Tranheden, W., Pinto, J., Svensson, L.: Classmix: segmentation-based data augmentation for semi-supervised learning. In: WACV (2021)
Tarvainen, A., Valpola, H.: Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: NeurIPS (2017)
Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: ECCV (2018)
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: ECCV (2016)
Strudel, R., Garcia, R., Laptev, I., Schmid, C.: Segmenter: transformer for semantic segmentation. In: ICCV (2021)
Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., Girdhar, R.: Masked-attention mask transformer for universal image segmentation. In: CVPR (2022)
Cheng, B., Schwing, A.G., Kirillov, A.: Per-pixel classification is not all you need for semantic segmentation. In: NeurIPS (2021)
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)
Yang, Y., Soatto, S.: Fda: fourier domain adaptation for semantic segmentation. In: CVPR (2020)
Sakaridis, C., Dai, D., Hecker, S., Van Gool, L.: Model adaptation with synthetic and real data for semantic dense foggy scene understanding. In: ECCV (2018)
Sakaridis, C., Dai, D., Van Gool, L.: Semantic foggy scene understanding with synthetic data. IJCV 126(9), 973–992 (2018)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019)
Zou, Y., Yu, Z., Kumar, B.V.K., Wang, J.: Unsupervised domain adaptation for semantic segmentation via class-balanced self-training. In: ECCV (2018)
Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Hoyer, L., Dai, D., Van Gool, L.: Daformer: improving network architectures and training strategies for domain-adaptive semantic segmentation. In: CVPR (2022)
Gong, R., Li, W., Chen, Y., Dai, D., Van Gool, L.: Dlow: domain flow and applications. IJCV 129(10), 2865–2888 (2021)
Gong, R., Li, W., Chen, Y., Van Gool, L.: Dlow: domain flow for adaptation and generalization. In: CVPR (2019)
Li, Y., Yuan, L., Vasconcelos, N.: Bidirectional learning for domain adaptation of semantic segmentation. In: CVPR (2019)
Kundu, J.N., Kulkarni, A., Singh, A., Jampani, V., Babu, R.V.: Generalize then adapt:Source-free domain adaptive semantic segmentation. ICCV 8, 3 (2021)
Iqbal, J., Ali, M.: Mlsl: Multi-level self-supervised learning for domain adaptation with spatially independent and semantically consistent labeling. In: WACV (2020)
Iqbal, J., Hafiz, R., Ali, M.: Fogadapt: self-supervised domain adaptation for semantic segmentation of foggy images. Neurocomputing (2022)
Acknowledgements
Thanks to the editors and reviewers for their contributions to this article.
Funding
No fund support.
Author information
Authors and Affiliations
Contributions
Conceptualization and methodology, LG; software and validation, HL; resources and supervision, AC; funding acquisition and investigation, XX; project administration, XZ. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gan, L., Liu, H., Chen, A. et al. Class-aware cross-domain target detection based on cityscape in fog. Machine Vision and Applications 34, 114 (2023). https://doi.org/10.1007/s00138-023-01463-6
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-023-01463-6