Densely packed object detection with transformer-based head and EM-merger

Zhong, Xiaojing; Zhang, Ni; Hu, Hao; Li, Li; Cen, Junhua; Wu, Qingyao

doi:10.1007/s11761-023-00361-z

Densely packed object detection with transformer-based head and EM-merger

Special Issue Paper
Published: 21 April 2023

Volume 17, pages 109–117, (2023)
Cite this article

Service Oriented Computing and Applications Aims and scope Submit manuscript

Xiaojing Zhong^1,2^na1,
Ni Zhang^1,2^na1,
Hao Hu⁵,
Li Li⁶,
Junhua Cen⁶ &
…
Qingyao Wu ORCID: orcid.org/0000-0002-8564-7289^1,3,4

273 Accesses
1 Citation
Explore all metrics

Abstract

Due to the high density of objects and their varying sizes, detecting them accurately and without repetition in such scenarios is more challenging than traditional object detection methods. In this paper, we propose a YOLOv5-based object detection approach equipped with a Transformer-based Head and EM-Merger unit specifically designed for densely packed scenes. We incorporate the transformer architecture into the prediction heads to enable a self-attention mechanism that captures long-term dependencies between the densely packed objects. Additionally, we introduce an EM-Merger unit to resolve redundant object detections. Experimental results on the RebarDSC and SKU110K datasets demonstrate that our method significantly outperforms the baseline approach, achieving new state-of-the-art detection performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AG-YOLO: Attention-guided network for real-time object detection

Article 04 September 2023

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

STFormer: Cross-Level Feature Fusion in Object Detection

References

Li Z, Dong M, Wen S, Hu X, Zhou P, Zeng Z (2019) Clu-cnns: object detection for medical images. Neurocomputing 350:53–59
Article Google Scholar
Chen X, Ma H, Wan J, Li B, Xia T (2017) Multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1907–1915
Jha S, Seo C, Yang E, Joshi GP (2021) Real time object detection and tracking system for video surveillance system. Multimedia Tools Appl 80(3):3981–3996
Article Google Scholar
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Article Google Scholar
Dai J, Li Y, He K, Sun J (2016) R-fcn: object detection via region-based fully convolutional networks. Adv Neural Inf Process Syst 29
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Everingham M, Winn J (2010) The pascal visual object classes challenge 2007 (voc2007) development kit. Int J Comput Vis 88(2):303–338
Article Google Scholar
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision. Springer, pp 740–755
Neubeck A, Van Gool L (2006) Efficient non-maximum suppression. In: 18th international conference on pattern recognition (ICPR’06), vol 3. IEEE, pp 850–855
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv:1804.02767
Mao Q-C, Sun H-M, Liu Y-B, Jia R-S (2019) Mini-yolov3: real-time object detector for embedded applications. Ieee Access 7:133529–133538
Article Google Scholar
Wang K, Liew JH, Zou Y, Zhou D, Feng J (2019) Panet: few-shot image semantic segmentation with prototype alignment. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9197–9206
Sun Z, Cao S, Yang Y, Kitani KM (2021) Rethinking transformer-based set prediction for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3611–3620
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Park J, Woo S, Lee J-Y, Kweon IS (2018) Bam: bottleneck attention module. arXiv:1807.06514
Misra D, Nalamada T, Arasanipalai AU, Hou Q (2021) Rotate to attend: convolutional triplet attention module. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3139–3148
Alwageed HS (2022) Detection of cyber attacks in smart grids using svm-boosted machine learning models. SOCA 16(4):313–326
Article Google Scholar
Zhao Z-Q, Zheng P, Xu S-T, Wu X (2019) Object detection with deep learning: a review. IEEE Trans Neural Netw Learn Syst 30(11):3212–3232
Article Google Scholar
Rekha H, Siddappa M (2022) Hybrid deep learning model for attack detection in internet of things. SOCA 16(4):293–312
Article Google Scholar
Wang W, Lai L, Chen J, Wu Q (2022) Cam-based non-local attention network for weakly supervised fire detection. In: Service oriented computing and applications, pp 1–10
Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, Liu T, Wang X, Wang G, Cai J et al (2018) Recent advances in convolutional neural networks. Pattern Recogn 77:354–377
Article Google Scholar
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Zhai S, Shang D, Wang S, Dong S (2020) Df-ssd: an improved ssd object detection algorithm based on densenet and feature fusion. IEEE Access 8:24344–24357
Article Google Scholar
Guo G, Zhang Z (2022) Road damage detection algorithm for improved yolov5. Sci Rep 12(1):1–12
Article MathSciNet Google Scholar
Howard A, Sandler M, Chu G, Chen L-C, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V, et al. (2019) Searching for mobilenetv3. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1314–1324
Zhao Z, Yang X, Zhou Y, Sun Q, Ge Z, Liu D (2021) Real-time detection of particleboard surface defects based on improved yolov5 target detection. Sci Rep 11(1):1–15
Ru C, Zhang S, Qu C, Zhang Z (2022) The high-precision detection method for insulators’ self-explosion defect based on the unmanned aerial vehicle with improved lightweight eca-yolox-tiny model. Appl Sci 12(18):9314
Article Google Scholar
Barlaz MA, Ham RK, Schaefer DM, Isaacson R (1990) Methane production from municipal refuse: a review of enhancement techniques and microbial dynamics. Crit Rev Environ Sci Technol 19(6):557–584
Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv:1212.5701
Bilenko M, Mooney RJ (2003) Adaptive duplicate detection using learnable string similarity measures. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 39–48
Naumann F, Herschel M (2010) An introduction to duplicate detection. Synth Lect Data Manag 2(1):1–87
Article MATH Google Scholar
Bodla N, Singh B, Chellappa R, Davis LS (2017) Soft-nms–improving object detection with one line of code. In: Proceedings of the IEEE international conference on computer vision, pp 5561–5569
Zhu X, Lyu S, Wang X, Zhao Q (2021) Tph-yolov5: Improved yolov5 based on transformer prediction head for object detection on drone-captured scenarios. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2778–2788
Goldman E, Herzig R, Eisenschtat A, Goldberger J, Hassner T (2019) Precise detection in densely packed scenes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5227–5236
Ye C, Zhang H, Xu X, Cai W, Qin J, Choi K-S (2021) Object detection in densely packed scenes via semi-supervised learning with dual consistency. In: IJCAI, pp 1245–1251

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (NSFC) 62272172, Guangdong Basic and Applied Basic Research Foundation 2023A1515012920, Tip-top Scientific and Technical Innovative Youth Talents of Guangdong Special Support Program 2019TQ05X200 and 2022 Tencent Wechat Rhino-Bird Focused Research Program (Tencent WeChat RBFR2022008), and the Major Key Project of PCL under Grant PCL2021A09.

Author information

Xiaojing Zhong and Ni Zhang have contributed equally to this work.

Authors and Affiliations

School of Software Engineering, South China University of Technology, Guangzhou, China
Xiaojing Zhong, Ni Zhang & Qingyao Wu
Key Laboratory of Big Data and Intelligent Robot, Ministry of Education, Guangzhou, China
Xiaojing Zhong & Ni Zhang
Pazhou Lab, Guangzhou, China
Qingyao Wu
Peng Cheng Laboratory, Guangzhou, China
Qingyao Wu
Zhongnan Building Materials Group Co. Ltd., Guangdong GW, Guangzhou, China
Hao Hu
Internet Technology Co. Ltd., Guangdong GW, Guangzhou, China
Li Li & Junhua Cen

Authors

Xiaojing Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Ni Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hao Hu
View author publications
You can also search for this author in PubMed Google Scholar
Li Li
View author publications
You can also search for this author in PubMed Google Scholar
Junhua Cen
View author publications
You can also search for this author in PubMed Google Scholar
Qingyao Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qingyao Wu.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhong, X., Zhang, N., Hu, H. et al. Densely packed object detection with transformer-based head and EM-merger. SOCA 17, 109–117 (2023). https://doi.org/10.1007/s11761-023-00361-z

Download citation

Received: 30 November 2022
Revised: 24 February 2023
Accepted: 21 March 2023
Published: 21 April 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s11761-023-00361-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Densely packed object detection with transformer-based head and EM-merger

Abstract

Access this article

Similar content being viewed by others

AG-YOLO: Attention-guided network for real-time object detection

Object detection using YOLO: challenges, architectural successors, datasets and applications

STFormer: Cross-Level Feature Fusion in Object Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Densely packed object detection with transformer-based head and EM-merger

Abstract

Access this article

Similar content being viewed by others

AG-YOLO: Attention-guided network for real-time object detection

Object detection using YOLO: challenges, architectural successors, datasets and applications

STFormer: Cross-Level Feature Fusion in Object Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation