Contrastive label assignment in vehicle detection

Sun, Erjun; Zhou, Di; Xu, Zhaocheng; Sun, Jie; Wang, Xun

doi:10.1007/s10489-023-05023-3

Contrastive label assignment in vehicle detection

Published: 03 November 2023

Volume 53, pages 29713–29722, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Erjun Sun^1,2,
Di Zhou^1,3,
Zhaocheng Xu⁴,
Jie Sun⁵ &
…
Xun Wang⁵

188 Accesses
Explore all metrics

Abstract

Vehicle detection is a critical task that involves identifying and localizing vehicles in a traffic scenario. However, the traditional approach of one-to-one set matching for label assignment, where each ground-truth bounding box is assigned to one specific query, can lead to sparse positive samples. To address this issue, we drew inspiration from contrastive learning and employed contrasting samples generated by feature augmentation, rather than supplementing the complex one-to-many matching in label assignment. Our proposed approach was evaluated on the publicly available GM traffic dataset and Hangzhou traffic dataset, and the results demonstrate that our approach outperforms other state-of-the-art methods, with average precision (AP) improvements of 1.0% and 1.1%, respectively. Overall, our approach effectively handles the sparsity of positive samples in vehicle detection and achieves better performance than existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Motion-Aligned and Hardness-Aware Dynamic Update Network for Weakly-Supervised Vehicle Detection in Satellite Videos

Reconciling global and local optimal label assignments for heavily occluded pedestrian detection

Article 29 March 2024

Unsupervised Semantic Segmentation of Urban Scenes via Cross-Modal Distillation

Article Open access 15 January 2025

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Availability of data and materials

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

Tian Y, Chen T, Cheng G, Yu S, Li X, Li J, Yang B (2022) Global context assisted structure-aware vehicle retrieval. IEEE Trans Intell Transp Syst 23(1):165–174
Article Google Scholar
Liu H, Nie H, Zhang Z, Li Y-F (2021) Anisotropic angle distribution learning for head pose estimation and attention understanding in human-computer interaction. Neurocomputing 433:310–322
Article Google Scholar
Liu H, Fang S, Zhang Z, Li D, Lin K, Wang J (2021) Mfdnet: collaborative poses perception and matrix fisher distribution for head pose estimation. IEEE Trans Multimedia 24:2449–2460
Article Google Scholar
Liu H, Liu T, Zhang Z, Sangaiah AK, Yang B, Li Y (2022) Arhpe: asymmetric relation-aware representation learning for head pose estimation in industrial human-computer interaction. IEEE Trans Industr Inf 18(10):7107–7117
Article Google Scholar
Tian Y, Gelernter J, Wang X, Chen W, Gao J, Zhang Y, Li X (2018) Lane marking detection via deep convolutional neural network. Neurocomputing 280:46–55
Article Google Scholar
Tian Y, Wang H, Wang X (2017) Object localization via evaluation multi-task learning. Neurocomputing 253:34–41
Article Google Scholar
Chen X, Wei F, Zeng G, et al (2022) Conditional detr v2: efficient detection transformer with box queries. arXiv:2207.08914
Zhang H, Li F, Liu S, Su H, Zhu J, Ni LM, Shum H-Y (2023) Dino: Detr with improved denoising anchor boxes for end-to-end object detection. In: Proceedings of the international conference on learning representations, pp 460–470
Li F, Zhang H, Liu S, et al (2022) Dn-detr: accelerate detr training by introducing query denoising. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13619–13627
Liu S, Li F, Zhang H, Yang X, Qi X, Su H, Zhu J, Zhang L (2022) Dab-detr: dynamic anchor boxes are better queries for detr. In: Proceedings of the international conference on learning representation, pp 213–229
Zhang R, Tian Y, Xu Z, Liu D (2023) Design of anchor boxes and data augmentation for transformer-based vehicle localization. J Vis Commun Image Represent 90:103711
Article Google Scholar
Wang Y, Zhang X, Yang T, et al (2022) Anchor detr: query design for transformer-based object detection. In: Proceedings of the AAAI conference on artificial intelligence, pp 302–311
Yang J, Li C, Zhang P, Dai X, Xiao B, Yuan L, Gao J (2021) Focal attention for long-range interactions in vision transformers. In: Proceedings of the advances in neural information processing systems, pp 2172–2180
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022
Zhang S, Wang X, Wang J, Pang J, Lyu C, Zhang W, Luo P, Chen K (2023) Dense distinct query for end-to-end object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7329–7338
Ouyang-Zhang J, Cho JH, Zhou X, Krähenbühl P (2022) Nms strikes back. arXiv:2212.06137
Tian Y, Wang X, Wu J, Wang R, Yang B (2019) Multi-scale hierarchical residual network for dense captioning. J Artif Intell Res 64:181–196
Article MathSciNet Google Scholar
Chen Q, Chen X, Zeng G, Wang J (2022) Group detr: fast detr training with group-wise one-to-many label assignment. arXiv:2207.13085
Zong Z, Song G, Liu Y (2023) Detrs with collaborative hybrid assignments training. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1009–1018
Jia D, Yuan Y, He H, Wu X, Yu H, Lin W, Sun L, Zhang C, Hu H (2023) Detrs with hybrid matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19702–19712
Zhang R, Tian Y, Liu D (2022) Uncertainty region discovery and model refinement for domain adaptation in road detection. IEEE Intell Trans Syst Mag 2–10
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: Proceedings of the European conference on computer vision, pp 213–229
Meng D, Chen X, Fan Z, et al (2021) Conditional detr for fast training convergence. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3651–3660
Liu H, Liu T, Chen Y, Zhang Z, Li Y-F (2022) Ehpe: skeleton cues-based gaussian coordinate encoding for efficient human pose estimation. IEEE Trans Multimedia 1–12
Liu T, Liu H, Yang B, Zhang Z (2023) Ldcnet: limb direction cuesaware network for flexible human pose estimation in industrial behavioral biometrics systems. IEEE Trans Industrial Inform 1–11
Liu H, Zhang C, Deng Y, Xie B, Liu T, Zhang Z, Li Y-F (2023) Transifc: invariant cues-aware feature concentration learning for efficient fine-grained bird image classification. IEEE Trans Multimedia 1–14
Tian Y, Gelernter J, Wang X, Li J, Yu Y (2019) Traffic sign detection using a multi-scale recurrent attention network. IEEE Trans Intell Transp Syst 20(12):4466–4475
Article Google Scholar
Liu T, Liu H, Li Y, Zhang Z, Liu S (2018) Efficient blind signal reconstruction with wavelet transforms regularization for educational robot infrared vision sensing. IEEE/ASME Trans Mechatron 24(1):384–394
Article Google Scholar
Liu T, Liu H, Li Y-F, Chen Z, Zhang Z, Liu S (2019) Flexible ftir spectral imaging enhancement for industrial robot infrared vision sensing. IEEE Trans Industr Inf 16(1):544–554
Article Google Scholar
Liu T, Wang J, Yang B, Wang X (2021) Ngdnet: Nonuniform gaussian-label distribution learning for infrared head pose estimation and on-task behavior understanding in the classroom. Neurocomputing 436:210–220
Article Google Scholar
Zhang H, Chang H, Ma B, Wang N, Chen X (2020) Dynamic r-cnn: towards high quality object detection via dynamic training. In: Proceedings of the european conference on computer vision, pp 260–275
Oksuz K, Cam BC, Akbas E, Kalkan S (2021) Rank & sort loss for object detection and instance segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3009–3018
Kim K, Lee, HS (2020) Probabilistic anchor assignment with iou prediction for object detection. In: Proceedings of the european conference on computer vision, pp 355–371
Li S, He C, Li R, Zhang L (2022) A dual weighting label assignment scheme for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9387–9396
Li S, Li M, Li R, He C, Zhang L (2023) One-to-few label assignment for end-to-end dense detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7350–7359
Tian Y, Zhang Y, Chen W-G, Liu D, Wang H, Xu H, Han J, Ge Y (2022) 3d tooth instance segmentation learning objectness and affinity in point cloud. ACM Trans Multimed Comput Commun Appl 18(4):1–16
Article Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez, AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Proceedings of the advances in neural information processing Systems
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Proceedings of the advances in neural information processing systems, pp 1009–1018
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, et al (2019) Pytorch: an imperative style, high-performance deep learning library. In: Proceedings of the advances in neural information processing systems, pp 759–768
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the international conference on artificial intelligence and statistics, pp 249–256
Lin, T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 658–666
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500
Tian Y, Hu W, Jiang H, Wu J (2019) Densely connected attentional pyramid residual network for human pose estimation. Neurocomputing 347:13–23
Article Google Scholar
Tian Y, Cao Y, Wu J, Hu W, Song C, Yang T (2019) Multi-cue combination network for action-based video classification. IET Comput Vision 13(6):542–548
Article Google Scholar
Tian Y, Zhang Y, Zhou D, Cheng G, Chen W-G, Wang R (2020) Triple attention network for video segmentation. Neurocomputing 417:202–211
Article Google Scholar
Tian Y, Cheng G, Gelernter J, Yu S, Song C, Yang B (2020) Joint temporal context exploitation and active learning for video segmentation. Pattern Recogn 100:107158
Liu D, Tian Y, Xu Z, Jian G (2022) Handling occlusion in prohibited item detection from x-ray images. Neural Comput Appl 34(22):20285–20298
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank AJE (www.aje.com) for its language editing assistance during the preparation of this manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (61976188, 61972351 and 62111530300), the Special Project for Basic Business Expenses of Zhejiang Provincial Colleges and Universities (JRK22003), and the Opening Foundation of State Key Laboratory of Virtual Reality Technology and System of Beihang University (VRLAB2023B02).

Author information

Authors and Affiliations

School of Micro-Nano Electronics, Zhejiang University, Hangzhou, 310027, China
Erjun Sun & Di Zhou
MetaX Integrated Circuits(ShangHai) Co.,Ltd., Shanghai, 200000, China
Erjun Sun
Zhejiang Uniview Technologies Co., Ltd., Hangzhou, 310051, China
Di Zhou
School of Mathematical and Computational Sciences, Massey University, Auckland, 0632, New Zealand
Zhaocheng Xu
School of Computer Science and Technology, Zhejiang Gongshang University, Hangzhou, 310018, China
Jie Sun & Xun Wang

Authors

Erjun Sun
View author publications
You can also search for this author in PubMed Google Scholar
Di Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Zhaocheng Xu
View author publications
You can also search for this author in PubMed Google Scholar
Jie Sun
View author publications
You can also search for this author in PubMed Google Scholar
Xun Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Erjun Sun: Formal analysis, Writing - original draft preparation. Di Zhou: Conceptualization, Methodology, Writing - review & editing. Zhaocheng Xu: Software, Data curation, Writing - review & editing. Jie Sun: Writing - review & editing. Xun Wang: Writing - review & editing.

Corresponding author

Correspondence to Di Zhou.

Ethics declarations

Conflict of interest/Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sun, E., Zhou, D., Xu, Z. et al. Contrastive label assignment in vehicle detection. Appl Intell 53, 29713–29722 (2023). https://doi.org/10.1007/s10489-023-05023-3

Download citation

Accepted: 18 September 2023
Published: 03 November 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s10489-023-05023-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Contrastive label assignment in vehicle detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Motion-Aligned and Hardness-Aware Dynamic Update Network for Weakly-Supervised Vehicle Detection in Satellite Videos

Reconciling global and local optimal label assignments for heavily occluded pedestrian detection

Unsupervised Semantic Segmentation of Urban Scenes via Cross-Modal Distillation

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest/Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Contrastive label assignment in vehicle detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Motion-Aligned and Hardness-Aware Dynamic Update Network for Weakly-Supervised Vehicle Detection in Satellite Videos

Reconciling global and local optimal label assignments for heavily occluded pedestrian detection

Unsupervised Semantic Segmentation of Urban Scenes via Cross-Modal Distillation

Explore related subjects

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest/Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation