Skip to main content
Log in

Densely packed object detection with transformer-based head and EM-merger

  • Special Issue Paper
  • Published:
Service Oriented Computing and Applications Aims and scope Submit manuscript

Abstract

Due to the high density of objects and their varying sizes, detecting them accurately and without repetition in such scenarios is more challenging than traditional object detection methods. In this paper, we propose a YOLOv5-based object detection approach equipped with a Transformer-based Head and EM-Merger unit specifically designed for densely packed scenes. We incorporate the transformer architecture into the prediction heads to enable a self-attention mechanism that captures long-term dependencies between the densely packed objects. Additionally, we introduce an EM-Merger unit to resolve redundant object detections. Experimental results on the RebarDSC and SKU110K datasets demonstrate that our method significantly outperforms the baseline approach, achieving new state-of-the-art detection performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Li Z, Dong M, Wen S, Hu X, Zhou P, Zeng Z (2019) Clu-cnns: object detection for medical images. Neurocomputing 350:53–59

    Article  Google Scholar 

  2. Chen X, Ma H, Wan J, Li B, Xia T (2017) Multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1907–1915

  3. Jha S, Seo C, Yang E, Joshi GP (2021) Real time object detection and tracking system for video surveillance system. Multimedia Tools Appl 80(3):3981–3996

    Article  Google Scholar 

  4. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  5. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  6. Dai J, Li Y, He K, Sun J (2016) R-fcn: object detection via region-based fully convolutional networks. Adv Neural Inf Process Syst 29

  7. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

  8. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

  9. Everingham M, Winn J (2010) The pascal visual object classes challenge 2007 (voc2007) development kit. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  10. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision. Springer, pp 740–755

  11. Neubeck A, Van Gool L (2006) Efficient non-maximum suppression. In: 18th international conference on pattern recognition (ICPR’06), vol 3. IEEE, pp 850–855

  12. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv:1804.02767

  13. Mao Q-C, Sun H-M, Liu Y-B, Jia R-S (2019) Mini-yolov3: real-time object detector for embedded applications. Ieee Access 7:133529–133538

    Article  Google Scholar 

  14. Wang K, Liew JH, Zou Y, Zhou D, Feng J (2019) Panet: few-shot image semantic segmentation with prototype alignment. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9197–9206

  15. Sun Z, Cao S, Yang Y, Kitani KM (2021) Rethinking transformer-based set prediction for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3611–3620

  16. Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19

  17. Park J, Woo S, Lee J-Y, Kweon IS (2018) Bam: bottleneck attention module. arXiv:1807.06514

  18. Misra D, Nalamada T, Arasanipalai AU, Hou Q (2021) Rotate to attend: convolutional triplet attention module. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3139–3148

  19. Alwageed HS (2022) Detection of cyber attacks in smart grids using svm-boosted machine learning models. SOCA 16(4):313–326

    Article  Google Scholar 

  20. Zhao Z-Q, Zheng P, Xu S-T, Wu X (2019) Object detection with deep learning: a review. IEEE Trans Neural Netw Learn Syst 30(11):3212–3232

    Article  Google Scholar 

  21. Rekha H, Siddappa M (2022) Hybrid deep learning model for attack detection in internet of things. SOCA 16(4):293–312

    Article  Google Scholar 

  22. Wang W, Lai L, Chen J, Wu Q (2022) Cam-based non-local attention network for weakly supervised fire detection. In: Service oriented computing and applications, pp 1–10

  23. Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, Liu T, Wang X, Wang G, Cai J et al (2018) Recent advances in convolutional neural networks. Pattern Recogn 77:354–377

    Article  Google Scholar 

  24. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  25. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37

  26. Zhai S, Shang D, Wang S, Dong S (2020) Df-ssd: an improved ssd object detection algorithm based on densenet and feature fusion. IEEE Access 8:24344–24357

    Article  Google Scholar 

  27. Guo G, Zhang Z (2022) Road damage detection algorithm for improved yolov5. Sci Rep 12(1):1–12

    Article  MathSciNet  Google Scholar 

  28. Howard A, Sandler M, Chu G, Chen L-C, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V, et al. (2019) Searching for mobilenetv3. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1314–1324

  29. Zhao Z, Yang X, Zhou Y, Sun Q, Ge Z, Liu D (2021) Real-time detection of particleboard surface defects based on improved yolov5 target detection. Sci Rep 11(1):1–15

  30. Ru C, Zhang S, Qu C, Zhang Z (2022) The high-precision detection method for insulators’ self-explosion defect based on the unmanned aerial vehicle with improved lightweight eca-yolox-tiny model. Appl Sci 12(18):9314

    Article  Google Scholar 

  31. Barlaz MA, Ham RK, Schaefer DM, Isaacson R (1990) Methane production from municipal refuse: a review of enhancement techniques and microbial dynamics. Crit Rev Environ Sci Technol 19(6):557–584

  32. Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv:1212.5701

  33. Bilenko M, Mooney RJ (2003) Adaptive duplicate detection using learnable string similarity measures. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 39–48

  34. Naumann F, Herschel M (2010) An introduction to duplicate detection. Synth Lect Data Manag 2(1):1–87

    Article  MATH  Google Scholar 

  35. Bodla N, Singh B, Chellappa R, Davis LS (2017) Soft-nms–improving object detection with one line of code. In: Proceedings of the IEEE international conference on computer vision, pp 5561–5569

  36. Zhu X, Lyu S, Wang X, Zhao Q (2021) Tph-yolov5: Improved yolov5 based on transformer prediction head for object detection on drone-captured scenarios. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2778–2788

  37. Goldman E, Herzig R, Eisenschtat A, Goldberger J, Hassner T (2019) Precise detection in densely packed scenes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5227–5236

  38. Ye C, Zhang H, Xu X, Cai W, Qin J, Choi K-S (2021) Object detection in densely packed scenes via semi-supervised learning with dual consistency. In: IJCAI, pp 1245–1251

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (NSFC) 62272172, Guangdong Basic and Applied Basic Research Foundation 2023A1515012920, Tip-top Scientific and Technical Innovative Youth Talents of Guangdong Special Support Program 2019TQ05X200 and 2022 Tencent Wechat Rhino-Bird Focused Research Program (Tencent WeChat RBFR2022008), and the Major Key Project of PCL under Grant PCL2021A09.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qingyao Wu.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhong, X., Zhang, N., Hu, H. et al. Densely packed object detection with transformer-based head and EM-merger. SOCA 17, 109–117 (2023). https://doi.org/10.1007/s11761-023-00361-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11761-023-00361-z

Keywords

Navigation