Abstract
Due to the high density of objects and their varying sizes, detecting them accurately and without repetition in such scenarios is more challenging than traditional object detection methods. In this paper, we propose a YOLOv5-based object detection approach equipped with a Transformer-based Head and EM-Merger unit specifically designed for densely packed scenes. We incorporate the transformer architecture into the prediction heads to enable a self-attention mechanism that captures long-term dependencies between the densely packed objects. Additionally, we introduce an EM-Merger unit to resolve redundant object detections. Experimental results on the RebarDSC and SKU110K datasets demonstrate that our method significantly outperforms the baseline approach, achieving new state-of-the-art detection performance.
Similar content being viewed by others
References
Li Z, Dong M, Wen S, Hu X, Zhou P, Zeng Z (2019) Clu-cnns: object detection for medical images. Neurocomputing 350:53–59
Chen X, Ma H, Wan J, Li B, Xia T (2017) Multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1907–1915
Jha S, Seo C, Yang E, Joshi GP (2021) Real time object detection and tracking system for video surveillance system. Multimedia Tools Appl 80(3):3981–3996
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Dai J, Li Y, He K, Sun J (2016) R-fcn: object detection via region-based fully convolutional networks. Adv Neural Inf Process Syst 29
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Everingham M, Winn J (2010) The pascal visual object classes challenge 2007 (voc2007) development kit. Int J Comput Vis 88(2):303–338
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision. Springer, pp 740–755
Neubeck A, Van Gool L (2006) Efficient non-maximum suppression. In: 18th international conference on pattern recognition (ICPR’06), vol 3. IEEE, pp 850–855
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv:1804.02767
Mao Q-C, Sun H-M, Liu Y-B, Jia R-S (2019) Mini-yolov3: real-time object detector for embedded applications. Ieee Access 7:133529–133538
Wang K, Liew JH, Zou Y, Zhou D, Feng J (2019) Panet: few-shot image semantic segmentation with prototype alignment. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9197–9206
Sun Z, Cao S, Yang Y, Kitani KM (2021) Rethinking transformer-based set prediction for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3611–3620
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Park J, Woo S, Lee J-Y, Kweon IS (2018) Bam: bottleneck attention module. arXiv:1807.06514
Misra D, Nalamada T, Arasanipalai AU, Hou Q (2021) Rotate to attend: convolutional triplet attention module. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3139–3148
Alwageed HS (2022) Detection of cyber attacks in smart grids using svm-boosted machine learning models. SOCA 16(4):313–326
Zhao Z-Q, Zheng P, Xu S-T, Wu X (2019) Object detection with deep learning: a review. IEEE Trans Neural Netw Learn Syst 30(11):3212–3232
Rekha H, Siddappa M (2022) Hybrid deep learning model for attack detection in internet of things. SOCA 16(4):293–312
Wang W, Lai L, Chen J, Wu Q (2022) Cam-based non-local attention network for weakly supervised fire detection. In: Service oriented computing and applications, pp 1–10
Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, Liu T, Wang X, Wang G, Cai J et al (2018) Recent advances in convolutional neural networks. Pattern Recogn 77:354–377
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Zhai S, Shang D, Wang S, Dong S (2020) Df-ssd: an improved ssd object detection algorithm based on densenet and feature fusion. IEEE Access 8:24344–24357
Guo G, Zhang Z (2022) Road damage detection algorithm for improved yolov5. Sci Rep 12(1):1–12
Howard A, Sandler M, Chu G, Chen L-C, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V, et al. (2019) Searching for mobilenetv3. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1314–1324
Zhao Z, Yang X, Zhou Y, Sun Q, Ge Z, Liu D (2021) Real-time detection of particleboard surface defects based on improved yolov5 target detection. Sci Rep 11(1):1–15
Ru C, Zhang S, Qu C, Zhang Z (2022) The high-precision detection method for insulators’ self-explosion defect based on the unmanned aerial vehicle with improved lightweight eca-yolox-tiny model. Appl Sci 12(18):9314
Barlaz MA, Ham RK, Schaefer DM, Isaacson R (1990) Methane production from municipal refuse: a review of enhancement techniques and microbial dynamics. Crit Rev Environ Sci Technol 19(6):557–584
Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv:1212.5701
Bilenko M, Mooney RJ (2003) Adaptive duplicate detection using learnable string similarity measures. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 39–48
Naumann F, Herschel M (2010) An introduction to duplicate detection. Synth Lect Data Manag 2(1):1–87
Bodla N, Singh B, Chellappa R, Davis LS (2017) Soft-nms–improving object detection with one line of code. In: Proceedings of the IEEE international conference on computer vision, pp 5561–5569
Zhu X, Lyu S, Wang X, Zhao Q (2021) Tph-yolov5: Improved yolov5 based on transformer prediction head for object detection on drone-captured scenarios. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2778–2788
Goldman E, Herzig R, Eisenschtat A, Goldberger J, Hassner T (2019) Precise detection in densely packed scenes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5227–5236
Ye C, Zhang H, Xu X, Cai W, Qin J, Choi K-S (2021) Object detection in densely packed scenes via semi-supervised learning with dual consistency. In: IJCAI, pp 1245–1251
Acknowledgements
This work was supported by National Natural Science Foundation of China (NSFC) 62272172, Guangdong Basic and Applied Basic Research Foundation 2023A1515012920, Tip-top Scientific and Technical Innovative Youth Talents of Guangdong Special Support Program 2019TQ05X200 and 2022 Tencent Wechat Rhino-Bird Focused Research Program (Tencent WeChat RBFR2022008), and the Major Key Project of PCL under Grant PCL2021A09.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhong, X., Zhang, N., Hu, H. et al. Densely packed object detection with transformer-based head and EM-merger. SOCA 17, 109–117 (2023). https://doi.org/10.1007/s11761-023-00361-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11761-023-00361-z