Transformer-based few-shot object detection in traffic scenarios

Sun, Erjun; Zhou, Di; Tian, Yan; Xu, Zhaocheng; Wang, Xun

doi:10.1007/s10489-023-05245-5

Transformer-based few-shot object detection in traffic scenarios

Published: 26 December 2023

Volume 54, pages 947–958, (2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Erjun Sun^1,2,
Di Zhou^1,3,
Yan Tian⁴,
Zhaocheng Xu⁵ &
…
Xun Wang⁴

945 Accesses
Explore all metrics

Abstract

In few-shot object detection (FSOD), many approaches retrain the detector in the inference stage, which is unrealistic in real applications. Moreover, high-quality region proposals are difficult to generate for novel classes using a limited support set. Inspired by the recent development of visual prompt learning (VPL) and detection with transformers (DETR), an approach is proposed in which 1) a class-agnostic training is designed to extend the detector to novel classes and 2) visual prompts are combined with pseudoclass embeddings to improve the query generation. The proposed approach is evaluated on multiple traffic datasets. The results show that it outperforms other mainstream approaches by a margin of 1.1% in mean average precision (mAP). An effective FSOD approach based on VPL and DETR is proposed, that has no retraining in the inference stage, and it accurately localizes novel objects by using an improved query generation mechanism.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Few-Shot Object Detection via Classify-Free RPN

A novel feature-based model for zero-shot object detection with simulated attributes

Article 18 September 2021

Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Availability of data and materials

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

Sun B, Li B, Cai S, Yuan Y, Zhang C (2021) Fsce: Few-shot object detection via contrastive proposal encoding. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7352–7362
Li B, Yang B, Liu C, Liu F, Ji R, Ye Q (2021) Beyond max-margin: class margin equilibrium for few-shot object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7363–7372
Qiao L, Zhao Y, Li Z, Qiu X, Wu J, Zhang C (2021) Defrcn: Decoupled faster r-cnn for few-shot object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 8681–8690
Guirguis K, Hendawy A, Eskandar G, Abdelsamad M, Kayser M, Beyerer J (2022) Cfa: Constraint-based finetuning approach for generalized few-shot object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4039–4049
Liu F, Zhang X, Peng Z, Guo Z, Wan F, Ji X, Ye Q (2023) Integrally migrating pre-trained transformer encoder-decoders for visual object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 6825–6834
Chen T-I, Liu Y-C, Su H-T, Chang Y-C, Lin Y-H, Yeh J-F, Chen W-C, Hsu W (2022) Dual-awareness attention for few-shot object detection. IEEE Trans Multimed 24(12):1–15
Google Scholar
Xiao Y, Lepetit V, Marlet R (2022) Few-shot object detection and viewpoint estimation for objects in the wild. IEEE Trans Pattern Anal Mach Intell 45(3):3090–3106
Google Scholar
Zhang G, Luo Z, Cui K, Lu S, Xing EP (2022) Meta-detr: Image-level few-shot detection with inter-class correlation exploitation. IEEE Trans Pattern Anal Mach Intell 22(11):143–155
Google Scholar
Wu X, Zhu F, Zhao R, Li H (2023) Cora: Adapting clip for open-vocabulary detection with region prompting and anchor pre-matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7031–7040
Tian Y, Cheng G, Gelernter J, Yu S, Song C, Yang B (2020) Joint temporal context exploitation and active learning for video segmentation. Pattern Recognit 100:107158
Article Google Scholar
Zhou D, Tian Y, Chen W-G, Huang G (2021) Self-supervised saliency estimation for pixel embedding in road detection. IEEE Signal Process Lett 28:1325–1329
Article Google Scholar
Wang P, Tian Y, Liu N, Wang J, Chai S, Wang X, Wang R (2022) A tooth surface design method combining semantic guidance, confidence, and structural coherence. IET Comput Vis 16(8):727–735
Article Google Scholar
Tian Y, Jian G, Wang J, Chen H, Pan L, Xu Z, Li J, Wang R (2023) A revised approach to orthodontic treatment monitoring from oralscan video. IEEE J Biomed Health Inform 27(12):1–10
Article Google Scholar
Tian Y, Fu H, Wang H, Liu Y, Xu Z, Chen H, Li J, Wang R (2023) Rgb oralscan video-based orthodontic treatment monitoring. Sci China Inf Sci 66(12):1–10
Google Scholar
Chen Y, Xia R, Zou K, Yang K (2023) Ffti: Image inpainting algorithm via features fusion and two-steps inpainting. J Vis Commun Image Represent 91:103776
Article Google Scholar
Chen Y, Xia R, Yang K, Zou K (2023) Mffn: Image super-resolution via multi-level features fusion network. Vis Comput 1–16
Chen Y, Xia R, Zou K, Yang K (2023) Rnon: image inpainting via repair network and optimization network. Int J Mach Learn Cybern 1–17
Tian Y, Gelernter J, Wang X et al (2019) Traffic sign detection using a multi-scale recurrent attention network. IEEE Trans Intell Transp Syst 20(12):4466–4475
Liu D, Tian Y, Xu Z, Jian G (2022) Handling occlusion in prohibited item detection from x-ray images. Neural Comput Appl 34(22):20285–20298
Article Google Scholar
Tian Y, Chen T, Cheng G, Yu S, Li X, Li J, Yang B (2022) Global context assisted structure-aware vehicle retrieval. IEEE Trans Intell Transp Syst 23(1):165–174
Article Google Scholar
Tian Y, Zhang Y, Xu H et al (2022) 3d tooth instance segmentation learning objectness and affinity in point cloud. ACM Trans Multimedia Comput Commun Appl 18:202–211
Article Google Scholar
Tian Y, Zhang Y, Zhou D et al (2020) Triple attention network for video segmentation. Neurocomputing 417:202–211
Article Google Scholar
Liu D, Tian Y, Zhang Y, Gelernter J, Wang X (2022) Heterogeneous data fusion and loss function design for tooth point cloud segmentation. Neural Comput Appl 34(22):17371–17380
Article Google Scholar
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: International conference on learning representations, pp. 782–792
Liu Z, Lin Y, Cao Y et al (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012–10022
Yang J, Li C, Zhang P et al (2020) Focal self-attention for local-global interactions in vision transformers. In: Proceedings of the advances in neural information processing systems, pp. 138–146
Kim G, Jung H-G, Lee S-W (2021) Spatial reasoning for few-shot object detection. Pattern Recognit 120:108118
Article Google Scholar
Zhang T, Zhang X, Zhu P, Jia X, Tang X, Jiao L (2023) Generalized few-shot object detection in remote sensing images. ISPRS J Photogramm Remote Sens 195:353–364
Article Google Scholar
Cheng M, Wang H, Long Y (2021) Meta-learning-based incremental few-shot object detection. IEEE Trans Circuits Syst Video Technol 32(4):2158–2169
Article Google Scholar
Cheng G, Yan B, Shi P, Li K, Yao X, Guo L, Han J (2021) Prototype-cnn for few-shot object detection in remote sensing images. IEEE Trans Geosci Remote Sens 60:1–10
Google Scholar
Li X, Deng J, Fang Y (2021) Few-shot object detection on remote sensing images. IEEE Trans Geosci Remote Sens 60:1–14
Google Scholar
Tian Y, Wang H, Wang X (2017) Object localization via evaluation multi-task learning. Neurocomputing 253:34–41
Article Google Scholar
Tian Y, Gelernter J, Wang X, Chen W, Gao J, Zhang Y, Li X (2018) Lane marking detection via deep convolutional neural network. Neurocomputing 280:46–55
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: Proceedings of the European conference on computer vision, pp. 740–755
Loshchilov I, Hutter F (2019) Decoupled weight decay regularization. In: Proceedings of the international conference on learning representations, pp. 1363–1372
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3354–3361
Zhu Z, Liang D, Zhang S, Huang X, Li B, Hu S (2016) Traffic-sign detection and classification in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2110–2118
Guo C, Li C, Guo J, Loy CC, Hou J, Kwong S, Cong R (2020) Zero-reference deep curve estimation for low-light image enhancement. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1780–1789
Zhou S, Li C, Change Loy C (2022) Lednet: Joint low-light enhancement and deblurring in the dark. In: Proceedings of the European conference on computer vision, pp. 573–589
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2020) Generative adversarial networks. Commun ACM 63(11):139–144
Article MathSciNet Google Scholar
Chi C, Zhang S, Xing J, Lei Z, Li SZ, Zou X (2020) Pedhunter: Occlusion robust pedestrian detector in crowded scenes. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 10639–10646
Ke L, Tai Y-W, Tang C-K (2021) Deep occlusion-aware instance segmentation with overlapping bilayers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4019–4028

Download references

Acknowledgements

The authors would like to thank AJE (www.aje.com) for its language editing assistance during the preparation of this manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China (61976188), the Special Project for Basic Business Expenses of Zhejiang Provincial Colleges and Universities (No. JRK22003), and Opening Foundation of State Key Laboratory of Virtual Reality Technology and System of Beihang University (No. VRLAB2023B02).

Author information

Authors and Affiliations

School of Micro-Nano Electronics, Zhejiang University, Hangzhou, 310027, China
Erjun Sun & Di Zhou
MetaX Integrated Circuits(ShangHai) Co., Ltd., Shanghai, 200000, China
Erjun Sun
Zhejiang Uniview Technologies Co., Ltd., Hangzhou, 310051, China
Di Zhou
School of Computer Science and Technology, Zhejiang Gongshang University, Hangzhou, 310018, China
Yan Tian & Xun Wang
School of Mathematical and Computational Sciences, Massey University, Auckland, 0632, New Zealand
Zhaocheng Xu

Authors

Erjun Sun
View author publications
You can also search for this author inPubMed Google Scholar
Di Zhou
View author publications
You can also search for this author inPubMed Google Scholar
Yan Tian
View author publications
You can also search for this author inPubMed Google Scholar
Zhaocheng Xu
View author publications
You can also search for this author inPubMed Google Scholar
Xun Wang
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Erjun Sun: Formal analysis, Writing - original draft preparation. Di Zhou: Conceptualization, Methodology, Writing - review & editing. Yan Tian: Software, Data curation, Writing - review & editing. Zhaocheng Xu: Writing - review & editing. Xun Wang: Writing - review & editing.

Corresponding author

Correspondence to Di Zhou.

Ethics declarations

Ethics and informed consent for data used

The research does not involve human participants and/or animals. Consent for data used has already been fully informed.

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sun, E., Zhou, D., Tian, Y. et al. Transformer-based few-shot object detection in traffic scenarios. Appl Intell 54, 947–958 (2024). https://doi.org/10.1007/s10489-023-05245-5

Download citation

Accepted: 17 December 2023
Published: 26 December 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s10489-023-05245-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transformer-based few-shot object detection in traffic scenarios

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Few-Shot Object Detection via Classify-Free RPN

A novel feature-based model for zero-shot object detection with simulated attributes

Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector

Explore related subjects

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics and informed consent for data used

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now