Skip to main content

Early Feature Distributions Alignment in Visible-to-Thermal Unsupervised Domain Adaptation for Object Detection

  • Conference paper
  • First Online:
Pattern Recognition (ICPR 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15317))

Included in the following conference series:

  • 214 Accesses

Abstract

Infrared or thermal images are used in many civilian and military applications to detect objects due to the heat they emit, especially when environmental conditions such as nighttime or adverse weather prevent the use of visible images. To train an object detector based on a deep neural network, a significant amount of annotated data is required to achieve good detection performance. However, annotations for infrared images are often unavailable and costly to obtain. Besides, the trained model may show poor robustness against the change of thermal sensor. Therefore, unsupervised domain adaptation (UDA) methods have been proposed to train an object detector with annotated visible images, which are easily available, and unannotated infrared images. This paper presents a new visible-to-thermal UDA approach for object detection based on Deformable-DETR with hybrid matching. Our approach aims to establish common features between visible and thermal images at the earliest stages of the backbone network. The feature distributions extracted from visible and thermal images are aligned thanks to discriminator networks and adversarial learning. Gradient images are also used as a domain translation of input images to ease the alignment. Detection performance is further improved by randomly masking tokens at the input of the transformer. Experiments on public datasets demonstrate that our method consistently outperforms previous works.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Akkaya, I.B., Altinel, F., Halici, U.: Self-training guided adversarial domain adaptation for thermal imagery. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4322–4331 (2021)

    Google Scholar 

  2. Cai, Q., Pan, Y., Ngo, C.W., Tian, X., Duan, L., Yao, T.: Exploring object relation in mean teacher for cross-domain detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11457–11466 (2019)

    Google Scholar 

  3. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13

    Chapter  Google Scholar 

  4. Chen, C., Zheng, Z., Ding, X., Huang, Y., Dou, Q.: Harmonizing transferability and discriminability for adapting object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8869–8878 (2020)

    Google Scholar 

  5. Chen, C., Zheng, Z., Huang, Y., Ding, X., Yu, Y.: I3net: implicit instance-invariant network for adapting one-stage object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12576–12585 (2021)

    Google Scholar 

  6. Chen, Y., Li, W., Sakaridis, C., Dai, D., Van Gool, L.: Domain adaptive faster R-CNN for object detection in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3339–3348 (2018)

    Google Scholar 

  7. Chen, Y., Wang, H., Li, W., Sakaridis, C., Dai, D., Van Gool, L.: Scale-aware domain adaptive faster R-CNN. Int. J. Comput. Vis. 129(7), 2223–2243 (2021)

    Article  Google Scholar 

  8. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)

    Google Scholar 

  9. Deng, J., Li, W., Chen, Y., Duan, L.: Unbiased mean teacher for cross-domain object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4091–4101 (2021)

    Google Scholar 

  10. Deng, J., Xu, D., Li, W., Duan, L.: Harmonious teacher for cross-domain object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 23829–23838 (2023)

    Google Scholar 

  11. Free teledyne flir thermal dataset for algorithm training. https://www.flir.com/oem/adas/adas-dataset-form/. Accessed 08 Mar 2024

  12. Gan, L., Lee, C., Chung, S.J.: Unsupervised RGB-to-thermal domain adaptation via multi-domain attention network. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 6014–6020 (2023)

    Google Scholar 

  13. Ganin, Y., et al.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17(59), 1–35 (2016)

    MathSciNet  Google Scholar 

  14. Official implementation of the paper “DETRs with hybrid matching”. https://github.com/HDETR/H-Deformable-DETR. Accessed 05 Apr 2024

  15. He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000–16009 (2022)

    Google Scholar 

  16. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  17. He, Z., Zhang, L.: Multi-adversarial faster-rcnn for unrestricted object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6668–6677 (2019)

    Google Scholar 

  18. Hoyer, L., Dai, D., Wang, H., Van Gool, L.: MIC: masked image consistency for context-enhanced domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11721–11732 (2023)

    Google Scholar 

  19. Hsu, C.-C., Tsai, Y.-H., Lin, Y.-Y., Yang, M.-H.: Every pixel matters: center-aware feature alignment for domain adaptive object detector. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 733–748. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_42

    Chapter  Google Scholar 

  20. Hsu, H.K., et al.: Progressive domain adaptation for object detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 749–757 (2020)

    Google Scholar 

  21. Hwang, S., Park, J., Kim, N., Choi, Y., So Kweon, I.: Multispectral pedestrian detection: benchmark dataset and baseline. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1037–1045 (2015)

    Google Scholar 

  22. Jia, D., et al.: DETRs with hybrid matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19702–19712 (2023)

    Google Scholar 

  23. Johnson-Roberson, M., Barto, C., Mehta, R., Sridhar, S.N., Rosaen, K., Vasudevan, R.: Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks? arXiv preprint arXiv:1610.01983 (2016)

  24. Khodabandeh, M., Vahdat, A., Ranjbar, M., Macready, W.G.: A robust learning approach to domain adaptive object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 480–490 (2019)

    Google Scholar 

  25. Kim, S., Choi, J., Kim, T., Kim, C.: Self-training and adversarial background regularization for unsupervised domain adaptive one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6092–6101 (2019)

    Google Scholar 

  26. Kim, Y.H., Shin, U., Park, J., Kweon, I.S.: MS-UDA: multi-spectral unsupervised domain adaptation for thermal image semantic segmentation. IEEE Robot. Autom. Lett. 6(4), 6497–6504 (2021)

    Article  Google Scholar 

  27. Lee, D.G., Jeon, M.H., Cho, Y., Kim, A.: Edge-guided multi-domain RGB-to-TIR image translation for training vision tasks with challenging labels. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 8291–8298 (2023)

    Google Scholar 

  28. Li, W., Liu, X., Yuan, Y.: Sigma: semantic-complete graph matching for domain adaptive object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5291–5300 (2022)

    Google Scholar 

  29. Li, Y.J., et al.: Cross-domain adaptive teacher for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7581–7590 (2022)

    Google Scholar 

  30. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  31. Marnissi, M.A., Fradi, H., Sahbani, A., Essoukri Ben Amara, N.: Feature distribution alignments for object detection in the thermal domain. Vis. Comput. 39(3), 1081–1093 (2023)

    Google Scholar 

  32. Oza, P., Sindagi, V.A., Sharmini, V.V., Patel, V.M.: Unsupervised domain adaptation of object detectors: a survey. IEEE Trans. Pattern Anal. Mach. Intell. (2023)

    Google Scholar 

  33. Prewitt, J.M., et al.: Object enhancement and extraction. Pict. Process. Psychopictorics 10(1), 15–19 (1970)

    Google Scholar 

  34. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)

    Google Scholar 

  35. RoyChowdhury, A., et al.: Automatic adaptation of object detectors to new domains using self-training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 780–790 (2019)

    Google Scholar 

  36. Saito, K., Ushiku, Y., Harada, T., Saenko, K.: Strong-weak distribution alignment for adaptive object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6956–6965 (2019)

    Google Scholar 

  37. Sobel, I.: An isotropic 3 \(\times \) 3 image gradient operator. Presentation at Stanford A.I. Project 1968 (02 2014)

    Google Scholar 

  38. Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)

    Google Scholar 

  39. Vs, V., Gupta, V., Oza, P., Sindagi, V.A., Patel, V.M.: MeGA-CDA: memory guided attention for category-aware unsupervised domain adaptive object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4516–4526 (2021)

    Google Scholar 

  40. Wang, W., et al.: Exploring sequence feature alignment for domain adaptive detection transformers. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 1730–1738 (2021)

    Google Scholar 

  41. Xie, R., Yu, F., Wang, J., Wang, Y., Zhang, L.: Multi-level domain adaptive learning for cross-domain detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)

    Google Scholar 

  42. Xu, M., Wang, H., Ni, B., Tian, Q., Zhang, W.: Cross-domain detection via graph-induced prototype alignment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12355–12364 (2020)

    Google Scholar 

  43. Yu, J., et al.: MTTrans: cross-domain object detection with mean teacher transformer. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. ECCV 2022. LNCS, vol. 13669, pp. 629–645. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20077-9_37

  44. Zhang, H., Wang, Y., Dayoub, F., Sunderhauf, N.: Varifocalnet: an IoU-aware dense object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8514–8523 (2021)

    Google Scholar 

  45. Zhang, H., Fromont, E., Lefevre, S., Avignon, B.: Multispectral fusion for object detection with cyclic fuse-and-refine blocks. In: 2020 IEEE International Conference on Image Processing (ICIP), pp. 276–280 (2020)

    Google Scholar 

  46. Zhang, J., Huang, J., Luo, Z., Zhang, G., Zhang, X., Lu, S.: DA-DETR: domain adaptive detection transformer with information fusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 23787–23798 (2023)

    Google Scholar 

  47. Zhao, G., Li, G., Xu, R., Lin, L.: Collaborative training between region proposal localization and classification for domain adaptive object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12363, pp. 86–102. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58523-5_6

    Chapter  Google Scholar 

  48. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)

    Google Scholar 

  49. Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020)

Download references

Acknowledgements

This work benefited from a government grant managed by the French National Research Agency (ANR-22-ASTR-0010-02) and the FactoryIA supercomputer financially supported by the Ile-de-France Regional Council.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adrien Maglo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Maglo, A., Audigier, R. (2025). Early Feature Distributions Alignment in Visible-to-Thermal Unsupervised Domain Adaptation for Object Detection. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15317. Springer, Cham. https://doi.org/10.1007/978-3-031-78447-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-78447-7_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-78446-0

  • Online ISBN: 978-3-031-78447-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics