Abstract
In few-shot object detection (FSOD), many approaches retrain the detector in the inference stage, which is unrealistic in real applications. Moreover, high-quality region proposals are difficult to generate for novel classes using a limited support set. Inspired by the recent development of visual prompt learning (VPL) and detection with transformers (DETR), an approach is proposed in which 1) a class-agnostic training is designed to extend the detector to novel classes and 2) visual prompts are combined with pseudoclass embeddings to improve the query generation. The proposed approach is evaluated on multiple traffic datasets. The results show that it outperforms other mainstream approaches by a margin of 1.1% in mean average precision (mAP). An effective FSOD approach based on VPL and DETR is proposed, that has no retraining in the inference stage, and it accurately localizes novel objects by using an improved query generation mechanism.









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and materials
The data that support the findings of this study are available from the corresponding author upon reasonable request.
References
Sun B, Li B, Cai S, Yuan Y, Zhang C (2021) Fsce: Few-shot object detection via contrastive proposal encoding. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7352–7362
Li B, Yang B, Liu C, Liu F, Ji R, Ye Q (2021) Beyond max-margin: class margin equilibrium for few-shot object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7363–7372
Qiao L, Zhao Y, Li Z, Qiu X, Wu J, Zhang C (2021) Defrcn: Decoupled faster r-cnn for few-shot object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 8681–8690
Guirguis K, Hendawy A, Eskandar G, Abdelsamad M, Kayser M, Beyerer J (2022) Cfa: Constraint-based finetuning approach for generalized few-shot object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4039–4049
Liu F, Zhang X, Peng Z, Guo Z, Wan F, Ji X, Ye Q (2023) Integrally migrating pre-trained transformer encoder-decoders for visual object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 6825–6834
Chen T-I, Liu Y-C, Su H-T, Chang Y-C, Lin Y-H, Yeh J-F, Chen W-C, Hsu W (2022) Dual-awareness attention for few-shot object detection. IEEE Trans Multimed 24(12):1–15
Xiao Y, Lepetit V, Marlet R (2022) Few-shot object detection and viewpoint estimation for objects in the wild. IEEE Trans Pattern Anal Mach Intell 45(3):3090–3106
Zhang G, Luo Z, Cui K, Lu S, Xing EP (2022) Meta-detr: Image-level few-shot detection with inter-class correlation exploitation. IEEE Trans Pattern Anal Mach Intell 22(11):143–155
Wu X, Zhu F, Zhao R, Li H (2023) Cora: Adapting clip for open-vocabulary detection with region prompting and anchor pre-matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7031–7040
Tian Y, Cheng G, Gelernter J, Yu S, Song C, Yang B (2020) Joint temporal context exploitation and active learning for video segmentation. Pattern Recognit 100:107158
Zhou D, Tian Y, Chen W-G, Huang G (2021) Self-supervised saliency estimation for pixel embedding in road detection. IEEE Signal Process Lett 28:1325–1329
Wang P, Tian Y, Liu N, Wang J, Chai S, Wang X, Wang R (2022) A tooth surface design method combining semantic guidance, confidence, and structural coherence. IET Comput Vis 16(8):727–735
Tian Y, Jian G, Wang J, Chen H, Pan L, Xu Z, Li J, Wang R (2023) A revised approach to orthodontic treatment monitoring from oralscan video. IEEE J Biomed Health Inform 27(12):1–10
Tian Y, Fu H, Wang H, Liu Y, Xu Z, Chen H, Li J, Wang R (2023) Rgb oralscan video-based orthodontic treatment monitoring. Sci China Inf Sci 66(12):1–10
Chen Y, Xia R, Zou K, Yang K (2023) Ffti: Image inpainting algorithm via features fusion and two-steps inpainting. J Vis Commun Image Represent 91:103776
Chen Y, Xia R, Yang K, Zou K (2023) Mffn: Image super-resolution via multi-level features fusion network. Vis Comput 1–16
Chen Y, Xia R, Zou K, Yang K (2023) Rnon: image inpainting via repair network and optimization network. Int J Mach Learn Cybern 1–17
Tian Y, Gelernter J, Wang X et al (2019) Traffic sign detection using a multi-scale recurrent attention network. IEEE Trans Intell Transp Syst 20(12):4466–4475
Liu D, Tian Y, Xu Z, Jian G (2022) Handling occlusion in prohibited item detection from x-ray images. Neural Comput Appl 34(22):20285–20298
Tian Y, Chen T, Cheng G, Yu S, Li X, Li J, Yang B (2022) Global context assisted structure-aware vehicle retrieval. IEEE Trans Intell Transp Syst 23(1):165–174
Tian Y, Zhang Y, Xu H et al (2022) 3d tooth instance segmentation learning objectness and affinity in point cloud. ACM Trans Multimedia Comput Commun Appl 18:202–211
Tian Y, Zhang Y, Zhou D et al (2020) Triple attention network for video segmentation. Neurocomputing 417:202–211
Liu D, Tian Y, Zhang Y, Gelernter J, Wang X (2022) Heterogeneous data fusion and loss function design for tooth point cloud segmentation. Neural Comput Appl 34(22):17371–17380
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: International conference on learning representations, pp. 782–792
Liu Z, Lin Y, Cao Y et al (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012–10022
Yang J, Li C, Zhang P et al (2020) Focal self-attention for local-global interactions in vision transformers. In: Proceedings of the advances in neural information processing systems, pp. 138–146
Kim G, Jung H-G, Lee S-W (2021) Spatial reasoning for few-shot object detection. Pattern Recognit 120:108118
Zhang T, Zhang X, Zhu P, Jia X, Tang X, Jiao L (2023) Generalized few-shot object detection in remote sensing images. ISPRS J Photogramm Remote Sens 195:353–364
Cheng M, Wang H, Long Y (2021) Meta-learning-based incremental few-shot object detection. IEEE Trans Circuits Syst Video Technol 32(4):2158–2169
Cheng G, Yan B, Shi P, Li K, Yao X, Guo L, Han J (2021) Prototype-cnn for few-shot object detection in remote sensing images. IEEE Trans Geosci Remote Sens 60:1–10
Li X, Deng J, Fang Y (2021) Few-shot object detection on remote sensing images. IEEE Trans Geosci Remote Sens 60:1–14
Tian Y, Wang H, Wang X (2017) Object localization via evaluation multi-task learning. Neurocomputing 253:34–41
Tian Y, Gelernter J, Wang X, Chen W, Gao J, Zhang Y, Li X (2018) Lane marking detection via deep convolutional neural network. Neurocomputing 280:46–55
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: Proceedings of the European conference on computer vision, pp. 740–755
Loshchilov I, Hutter F (2019) Decoupled weight decay regularization. In: Proceedings of the international conference on learning representations, pp. 1363–1372
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3354–3361
Zhu Z, Liang D, Zhang S, Huang X, Li B, Hu S (2016) Traffic-sign detection and classification in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2110–2118
Guo C, Li C, Guo J, Loy CC, Hou J, Kwong S, Cong R (2020) Zero-reference deep curve estimation for low-light image enhancement. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1780–1789
Zhou S, Li C, Change Loy C (2022) Lednet: Joint low-light enhancement and deblurring in the dark. In: Proceedings of the European conference on computer vision, pp. 573–589
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2020) Generative adversarial networks. Commun ACM 63(11):139–144
Chi C, Zhang S, Xing J, Lei Z, Li SZ, Zou X (2020) Pedhunter: Occlusion robust pedestrian detector in crowded scenes. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 10639–10646
Ke L, Tai Y-W, Tang C-K (2021) Deep occlusion-aware instance segmentation with overlapping bilayers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4019–4028
Acknowledgements
The authors would like to thank AJE (www.aje.com) for its language editing assistance during the preparation of this manuscript.
Funding
This work was supported in part by the National Natural Science Foundation of China (61976188), the Special Project for Basic Business Expenses of Zhejiang Provincial Colleges and Universities (No. JRK22003), and Opening Foundation of State Key Laboratory of Virtual Reality Technology and System of Beihang University (No. VRLAB2023B02).
Author information
Authors and Affiliations
Contributions
Erjun Sun: Formal analysis, Writing - original draft preparation. Di Zhou: Conceptualization, Methodology, Writing - review & editing. Yan Tian: Software, Data curation, Writing - review & editing. Zhaocheng Xu: Writing - review & editing. Xun Wang: Writing - review & editing.
Corresponding author
Ethics declarations
Ethics and informed consent for data used
The research does not involve human participants and/or animals. Consent for data used has already been fully informed.
Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sun, E., Zhou, D., Tian, Y. et al. Transformer-based few-shot object detection in traffic scenarios. Appl Intell 54, 947–958 (2024). https://doi.org/10.1007/s10489-023-05245-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-05245-5