Abstract
This paper studies the challenging cross-domain few-shot object detection (CD-FSOD), aiming to develop an accurate object detector for novel domains with minimal labeled examples. While transformer-based open-set detectors, such as DE-ViT, show promise in traditional few-shot object detection, their generalization to CD-FSOD remains unclear: 1) can such open-set detection methods easily generalize to CD-FSOD? 2) If not, how can models be enhanced when facing huge domain gaps? To answer the first question, we employ measures including style, inter-class variance (ICV), and indefinable boundaries (IB) to understand the domain gap. Based on these measures, we establish a new benchmark named CD-FSOD to evaluate object detection methods, revealing that most of the current approaches fail to generalize across domains. Technically, we observe that the performance decline is associated with our proposed measures: style, ICV, and IB. Consequently, we propose several novel modules to address these issues. First, the learnable instance features align initial fixed instances with target categories, enhancing feature distinctiveness. Second, the instance reweighting module assigns higher importance to high-quality instances with slight IB. Third, the domain prompter encourages features resilient to different styles by synthesizing imaginary domains without altering semantic contents. These techniques collectively contribute to the development of the Cross-Domain Vision Transformer for CD-FSOD (CD-ViTO), significantly improving upon the base DE-ViT. Experimental results validate the efficacy of our model. Datasets and codes are available at http://yuqianfu.com/CDFSOD-benchmark.
Y. Wang and Y. Pan—Equal contributions.
Part of this work commenced during Dr. Yuqian Fu’s PhD at Fudan University.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Despite varying definitions, we adopt the one of [52] that “both open-vocabulary and few-shot belong to open-set except their category representations”.
References
Dosovitskiy, A., et al.: An image is worth \(16 \times 16\) words: transformers for image recognition at scale. In: ICLR (2021)
Drange, G.: Arthropod taxonomy orders object detection dataset (2019). https://doi.org/10.34740/kaggle/dsv/1240192
Fan, D.P., Ji, G.P., Cheng, M.M., Shao, L.: Concealed object detection. TPAMI (2021)
Fan, Z., Ma, Y., Li, Z., Sun, J.: Generalized few-shot object detection without forgetting. In: CVPR (2021)
Fu, Y., Fu, Y., Jiang, Y.G.: Meta-FDMixup: cross-domain few-shot learning guided by labeled target data. In: ACM MM (2021)
Fu, Y., Xie, Y., Fu, Y., Jiang, Y.G.: StyleAdv: meta style adversarial training for cross-domain few-shot learning. In: CVPR (2023)
Fu, Y., Zhang, L., Wang, J., Fu, Y., Jiang, Y.G.: Depth guided adaptive meta-fusion network for few-shot video recognition. In: ACM MM (2020)
Gao, Y., Lin, K.Y., Yan, J., Wang, Y., Zheng, W.S.: AsyFOD: an asymmetric adaptation paradigm for few-shot domain adaptive object detection. In: CVPR (2023)
Gao, Y., Yang, L., Huang, Y., Xie, S., Li, S., Zheng, W.S.: AcroFOD: an adaptive method for cross-domain few-shot object detection. In: ECCV (2022)
Girshick, R.: Fast R-CNN. In: ICCV (2015)
Guirguis, K., Meier, J., Eskandar, G., Kayser, M., Yang, B., Beyerer, J.: Niff: alleviating forgetting in generalized few-shot object detection via neural instance feature forging. In: CVPR (2023)
Guo, Y., et al.: A broader study of cross-domain few-shot learning. In: ECCV (2020)
Han, G., He, Y., Huang, S., Ma, J., Chang, S.F.: Query adaptive few-shot object detection with heterogeneous graph convolutional networks. In: ICCV (2021)
Han, G., Huang, S., Ma, J., He, Y., Chang, S.F.: Meta faster R-CNN: towards accurate few-shot object detection with attentive feature alignment. In: AAAI (2022)
Han, G., Ma, J., Huang, S., Chen, L., Chang, S.F.: Few-shot object detection with fully cross-transformer. In: CVPR (2022)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Hu, S.X., Li, D., Stühmer, J., Kim, M., Hospedales, T.M.: Pushing the limits of simple pipelines for few-shot learning: external data and fine-tuning make a difference. In: CVPR (2022)
Inoue, N., Furuta, R., Yamasaki, T., Aizawa, K.: Cross-domain weakly-supervised object detection through progressive domain adaptation. In: CVPR (2018)
Jia, M., et al.: Visual prompt tuning. In: ECCV (2022)
Jiang, L., et al.: Underwater species detection using channel sharpening attention. In: ACM MM (2021)
Kang, B., Liu, Z., Wang, X., Yu, F., Feng, J., Darrell, T.: Few-shot object detection via feature reweighting. In: ICCV (2019)
Kaul, P., Xie, W., Zisserman, A.: Label, verify, correct: a simple few shot object detection method. In: CVPR (2022)
Köhler, M., Eisenbach, M., Gross, H.M.: Few-shot object detection: a comprehensive survey. arXiv preprint arXiv:2112.11699 (2021)
Lee, K., et al.: Rethinking few-shot object detection on a multi-domain benchmark. In: ECCV (2022)
Lester, B., Al-Rfou, R., Constant, N.: The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691 (2021)
Li, K., Wan, G., Cheng, G., Meng, L., Han, J.: Object detection in optical remote sensing images: a survey and a new benchmark. ISPRS (2020)
Li, Y., Mao, H., Girshick, R., He, K.: Exploring plain vision transformer backbones for object detection. In: ECCV (2022)
Lin, T.Y., et al.: Microsoft coco: common objects in context. In: ECCV (2014)
Luo, X., Wu, H., Zhang, J., Gao, L., Xu, J., Song, J.: A closer look at few-shot classification again. In: ICML (2023)
Luo, Y., Liu, P., Guan, T., Yu, J., Yang, Y.: Adversarial style mining for one-shot unsupervised domain adaptation. In: NeurIPS (2020)
Ma, J., Niu, Y., Xu, J., Huang, S., Han, G., Chang, S.F.: DiGeo: discriminative geometry-aware learning for generalized few-shot object detection. In: CVPR (2023)
Oquab, M., et al.: DINOv2: learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023)
Qiao, L., Zhao, Y., Li, Z., Qiu, X., Wu, J., Zhang, C.: DeFRCN: decoupled faster R-CNN for few-shot object detection. In: ICCV (2021)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML (2021)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR (2016)
Saleh, A., Laradji, I.H., Konovalov, D.A., Bradley, M., Vazquez, D., Sheaves, M.: A realistic fish-habitat dataset to evaluate algorithms for underwater visual analysis. Sci. Rep. (2020)
Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: NeurIPS (2017)
Song, K., Yan, Y.: A noise robust method based on completed local binary patterns for hot-rolled steel strip surface defects. Appl. Surface Sci. (2013)
Sun, B., Li, B., Cai, S., Yuan, Y., Zhang, C.: FSCE: few-shot object detection via contrastive proposal encoding. In: CVPR (2021)
Tang, H., Yuan, C., Li, Z., Tang, J.: Learning attention-guided pyramidal features for few-shot fine-grained recognition. Pattern Recogn. (2022)
Tseng, H.Y., Lee, H.Y., Huang, J.B., Yang, M.H.: Cross-domain few-shot classification via learned feature-wise transformation. In: ICLR (2020)
Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al.: Matching networks for one shot learning. In: NeurIPS (2016)
Wang, H., Deng, Z.H.: Cross-domain few-shot classification via adversarial task augmentation. arXiv preprint (2021)
Wang, K., Liew, J.H., Zou, Y., Zhou, D., Feng, J.: Panet: few-shot image semantic segmentation with prototype alignment. In: ICCV (2019)
Wang, X., Huang, T.E., Darrell, T., Gonzalez, J.E., Yu, F.: Frustratingly simple few-shot object detection. arXiv preprint arXiv:2003.06957 (2020)
Xie, G.S., Xiong, H., Liu, J., Yao, Y., Shao, L.: Few-shot semantic segmentation with cyclic memory network. In: ICCV (2021)
Xiong, W.: CD-FSOD: a benchmark for cross-domain few-shot object detection. In: ICASSP (2023)
Yan, X., Chen, Z., Xu, A., Wang, X., Liang, X., Lin, L.: Meta R-CNN: towards general solver for instance-level low-shot learning. In: ICCV (2019)
Zareian, A., Rosa, K.D., Hu, D.H., Chang, S.F.: Open-vocabulary object detection using captions. In: CVPR (2021)
Zhang, H., Zhang, L., Qi, X., Li, H., Torr, P.H., Koniusz, P.: Few-shot action recognition with permutation-invariant attention. In: ECCV (2020)
Zhang, J., Gao, L., Luo, X., Shen, H., Song, J.: DETA: denoised task adaptation for few-shot learning. In: ICCV (2023)
Zhang, X., Wang, Y., Boularias, A.: Detect every thing with few examples. arXiv preprint arXiv:2309.12969 (2023)
Zhao, S., et al.: Exploiting unlabeled data with vision and language models for object detection. In: ECCV (2022)
Zhong, Y., et al.: RegionCLIP: region-based language-image pretraining. In: CVPR (2022)
Zhou, K., Yang, Y., Qiao, Y., Xiang, T.: Domain generalization with mixstyle. In: ICLR (2021)
Zhou, X., Girdhar, R., Joulin, A., Krähenbühl, P., Misra, I.: Detecting twenty-thousand classes using image-level supervision. In: ECCV (2022)
Zhuo, L., Fu, Y., Chen, J., Cao, Y., Jiang, Y.G.: TGDM: target guided dynamic mixup for cross-domain few-shot learning. In: ACM MM (2022)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Fu, Y. et al. (2025). Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15116. Springer, Cham. https://doi.org/10.1007/978-3-031-73636-0_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-73636-0_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73635-3
Online ISBN: 978-3-031-73636-0
eBook Packages: Computer ScienceComputer Science (R0)