Skip to main content
Log in

Contrastive label assignment in vehicle detection

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Vehicle detection is a critical task that involves identifying and localizing vehicles in a traffic scenario. However, the traditional approach of one-to-one set matching for label assignment, where each ground-truth bounding box is assigned to one specific query, can lead to sparse positive samples. To address this issue, we drew inspiration from contrastive learning and employed contrasting samples generated by feature augmentation, rather than supplementing the complex one-to-many matching in label assignment. Our proposed approach was evaluated on the publicly available GM traffic dataset and Hangzhou traffic dataset, and the results demonstrate that our approach outperforms other state-of-the-art methods, with average precision (AP) improvements of 1.0% and 1.1%, respectively. Overall, our approach effectively handles the sparsity of positive samples in vehicle detection and achieves better performance than existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Availability of data and materials

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

  1. Tian Y, Chen T, Cheng G, Yu S, Li X, Li J, Yang B (2022) Global context assisted structure-aware vehicle retrieval. IEEE Trans Intell Transp Syst 23(1):165–174

    Article  Google Scholar 

  2. Liu H, Nie H, Zhang Z, Li Y-F (2021) Anisotropic angle distribution learning for head pose estimation and attention understanding in human-computer interaction. Neurocomputing 433:310–322

    Article  Google Scholar 

  3. Liu H, Fang S, Zhang Z, Li D, Lin K, Wang J (2021) Mfdnet: collaborative poses perception and matrix fisher distribution for head pose estimation. IEEE Trans Multimedia 24:2449–2460

    Article  Google Scholar 

  4. Liu H, Liu T, Zhang Z, Sangaiah AK, Yang B, Li Y (2022) Arhpe: asymmetric relation-aware representation learning for head pose estimation in industrial human-computer interaction. IEEE Trans Industr Inf 18(10):7107–7117

    Article  Google Scholar 

  5. Tian Y, Gelernter J, Wang X, Chen W, Gao J, Zhang Y, Li X (2018) Lane marking detection via deep convolutional neural network. Neurocomputing 280:46–55

    Article  Google Scholar 

  6. Tian Y, Wang H, Wang X (2017) Object localization via evaluation multi-task learning. Neurocomputing 253:34–41

    Article  Google Scholar 

  7. Chen X, Wei F, Zeng G, et al (2022) Conditional detr v2: efficient detection transformer with box queries. arXiv:2207.08914

  8. Zhang H, Li F, Liu S, Su H, Zhu J, Ni LM, Shum H-Y (2023) Dino: Detr with improved denoising anchor boxes for end-to-end object detection. In: Proceedings of the international conference on learning representations, pp 460–470

  9. Li F, Zhang H, Liu S, et al (2022) Dn-detr: accelerate detr training by introducing query denoising. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13619–13627

  10. Liu S, Li F, Zhang H, Yang X, Qi X, Su H, Zhu J, Zhang L (2022) Dab-detr: dynamic anchor boxes are better queries for detr. In: Proceedings of the international conference on learning representation, pp 213–229

  11. Zhang R, Tian Y, Xu Z, Liu D (2023) Design of anchor boxes and data augmentation for transformer-based vehicle localization. J Vis Commun Image Represent 90:103711

    Article  Google Scholar 

  12. Wang Y, Zhang X, Yang T, et al (2022) Anchor detr: query design for transformer-based object detection. In: Proceedings of the AAAI conference on artificial intelligence, pp 302–311

  13. Yang J, Li C, Zhang P, Dai X, Xiao B, Yuan L, Gao J (2021) Focal attention for long-range interactions in vision transformers. In: Proceedings of the advances in neural information processing systems, pp 2172–2180

  14. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022

  15. Zhang S, Wang X, Wang J, Pang J, Lyu C, Zhang W, Luo P, Chen K (2023) Dense distinct query for end-to-end object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7329–7338

  16. Ouyang-Zhang J, Cho JH, Zhou X, Krähenbühl P (2022) Nms strikes back. arXiv:2212.06137

  17. Tian Y, Wang X, Wu J, Wang R, Yang B (2019) Multi-scale hierarchical residual network for dense captioning. J Artif Intell Res 64:181–196

    Article  MathSciNet  Google Scholar 

  18. Chen Q, Chen X, Zeng G, Wang J (2022) Group detr: fast detr training with group-wise one-to-many label assignment. arXiv:2207.13085

  19. Zong Z, Song G, Liu Y (2023) Detrs with collaborative hybrid assignments training. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1009–1018

  20. Jia D, Yuan Y, He H, Wu X, Yu H, Lin W, Sun L, Zhang C, Hu H (2023) Detrs with hybrid matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19702–19712

  21. Zhang R, Tian Y, Liu D (2022) Uncertainty region discovery and model refinement for domain adaptation in road detection. IEEE Intell Trans Syst Mag 2–10

  22. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: Proceedings of the European conference on computer vision, pp 213–229

  23. Meng D, Chen X, Fan Z, et al (2021) Conditional detr for fast training convergence. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3651–3660

  24. Liu H, Liu T, Chen Y, Zhang Z, Li Y-F (2022) Ehpe: skeleton cues-based gaussian coordinate encoding for efficient human pose estimation. IEEE Trans Multimedia 1–12

  25. Liu T, Liu H, Yang B, Zhang Z (2023) Ldcnet: limb direction cuesaware network for flexible human pose estimation in industrial behavioral biometrics systems. IEEE Trans Industrial Inform 1–11

  26. Liu H, Zhang C, Deng Y, Xie B, Liu T, Zhang Z, Li Y-F (2023) Transifc: invariant cues-aware feature concentration learning for efficient fine-grained bird image classification. IEEE Trans Multimedia 1–14

  27. Tian Y, Gelernter J, Wang X, Li J, Yu Y (2019) Traffic sign detection using a multi-scale recurrent attention network. IEEE Trans Intell Transp Syst 20(12):4466–4475

    Article  Google Scholar 

  28. Liu T, Liu H, Li Y, Zhang Z, Liu S (2018) Efficient blind signal reconstruction with wavelet transforms regularization for educational robot infrared vision sensing. IEEE/ASME Trans Mechatron 24(1):384–394

    Article  Google Scholar 

  29. Liu T, Liu H, Li Y-F, Chen Z, Zhang Z, Liu S (2019) Flexible ftir spectral imaging enhancement for industrial robot infrared vision sensing. IEEE Trans Industr Inf 16(1):544–554

    Article  Google Scholar 

  30. Liu T, Wang J, Yang B, Wang X (2021) Ngdnet: Nonuniform gaussian-label distribution learning for infrared head pose estimation and on-task behavior understanding in the classroom. Neurocomputing 436:210–220

    Article  Google Scholar 

  31. Zhang H, Chang H, Ma B, Wang N, Chen X (2020) Dynamic r-cnn: towards high quality object detection via dynamic training. In: Proceedings of the european conference on computer vision, pp 260–275

  32. Oksuz K, Cam BC, Akbas E, Kalkan S (2021) Rank & sort loss for object detection and instance segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3009–3018

  33. Kim K, Lee, HS (2020) Probabilistic anchor assignment with iou prediction for object detection. In: Proceedings of the european conference on computer vision, pp 355–371

  34. Li S, He C, Li R, Zhang L (2022) A dual weighting label assignment scheme for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9387–9396

  35. Li S, Li M, Li R, He C, Zhang L (2023) One-to-few label assignment for end-to-end dense detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7350–7359

  36. Tian Y, Zhang Y, Chen W-G, Liu D, Wang H, Xu H, Han J, Ge Y (2022) 3d tooth instance segmentation learning objectness and affinity in point cloud. ACM Trans Multimed Comput Commun Appl 18(4):1–16

    Article  Google Scholar 

  37. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez, AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Proceedings of the advances in neural information processing Systems

  38. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Proceedings of the advances in neural information processing systems, pp 1009–1018

  39. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, et al (2019) Pytorch: an imperative style, high-performance deep learning library. In: Proceedings of the advances in neural information processing systems, pp 759–768

  40. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the international conference on artificial intelligence and statistics, pp 249–256

  41. Lin, T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988

  42. Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 658–666

  43. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  44. Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500

  45. Tian Y, Hu W, Jiang H, Wu J (2019) Densely connected attentional pyramid residual network for human pose estimation. Neurocomputing 347:13–23

    Article  Google Scholar 

  46. Tian Y, Cao Y, Wu J, Hu W, Song C, Yang T (2019) Multi-cue combination network for action-based video classification. IET Comput Vision 13(6):542–548

    Article  Google Scholar 

  47. Tian Y, Zhang Y, Zhou D, Cheng G, Chen W-G, Wang R (2020) Triple attention network for video segmentation. Neurocomputing 417:202–211

    Article  Google Scholar 

  48. Tian Y, Cheng G, Gelernter J, Yu S, Song C, Yang B (2020) Joint temporal context exploitation and active learning for video segmentation. Pattern Recogn 100:107158

  49. Liu D, Tian Y, Xu Z, Jian G (2022) Handling occlusion in prohibited item detection from x-ray images. Neural Comput Appl 34(22):20285–20298

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank AJE (www.aje.com) for its language editing assistance during the preparation of this manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (61976188, 61972351 and 62111530300), the Special Project for Basic Business Expenses of Zhejiang Provincial Colleges and Universities (JRK22003), and the Opening Foundation of State Key Laboratory of Virtual Reality Technology and System of Beihang University (VRLAB2023B02).

Author information

Authors and Affiliations

Authors

Contributions

Erjun Sun: Formal analysis, Writing - original draft preparation. Di Zhou: Conceptualization, Methodology, Writing - review & editing. Zhaocheng Xu: Software, Data curation, Writing - review & editing. Jie Sun: Writing - review & editing. Xun Wang: Writing - review & editing.

Corresponding author

Correspondence to Di Zhou.

Ethics declarations

Conflict of interest/Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, E., Zhou, D., Xu, Z. et al. Contrastive label assignment in vehicle detection. Appl Intell 53, 29713–29722 (2023). https://doi.org/10.1007/s10489-023-05023-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-05023-3

Keywords

Navigation