Skip to main content

Advertisement

Log in

Higher efficient YOLOv7: a one-stage method for non-salient object detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Compared to the remarkable progress within the discipline of object detection in recent years, real-time detection of non-salient objects remains a challenging research task. However, most existing detection methods fail to adequately extract the global features of targets, leading to suboptimal performance when dealing with non-salient objects. In this paper, we propose a unified framework called Higher efficient (He)-YOLOv7 to enhance the detection capability of YOLOv7 for non-salient objects.Firstly, we introduce an refined Squeeze and Excitation Network (SENet) to dynamically adjust the weights of feature channels, thereby enhancing the model's perception of non-salient objects. Secondly, we design an Angle Intersection over Union (AIoU) loss function that considers relative positional information, optimizing the widely used Complete Intersection over Union (CIoU) loss function in YOLOv7. This significantly accelerates the model's convergence. Moreover, He-YOLOv7 adopts a blended data augmentation strategy to simulate occlusion among objects, further improving the model's ability to filter out noise information and enhancing its robustness. Comparison of experimental results demonstrates a significant improvement of 2.4% mean Average Precision (mAP) on the Microsoft Common Objects in Context (MS COCO) dataset and a notable enhancement of 1.2% mAP on the PASCAL VOC dataset. Simultaneously, our approach demonstrates comparable performance to state-of-the-art real-time object detection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Algorithm 1
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Data Availability

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

References

  1. Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60:84–90

    Article  Google Scholar 

  2. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:91–99

    Google Scholar 

  3. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 1–9

  4. Wang C-Y, Bochkovskiy A, Liao H-Y M (2023) YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE, pp 7464–7475

  5. Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang, X (2018) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 3156–3164

  6. Huang L, Zhang C, Zhang H (2020) Self-adaptive training: beyond empirical risk minimization. Adv Neural Inf Process Sys 33:19365–19376

    Google Scholar 

  7. Tan M, Pang R, Le Q V (2020) Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE, pp 10781–10790

  8. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 7132–7141

  9. Liu MY, Chen CY, Lin WY (2019) Distance-IoU loss: faster and better learning for bounding box regression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE, pp 5179–5188

  10. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 779–788

  11. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S (2016) SSD: Single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37

  12. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv: 1704.04861

  13. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 7263–7271

  14. Bochkovskiy A, Wang CY, Liao HYM (2020) YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv: 2004. 10934

  15. Wang H, Jin Y, Ke H, Zhang X (2022) DDH-YOLOv5: improved YOLOv5 based on Double IoU-aware Decoupled Head for object detection. J Real-Time Image Proc 19:1023–1033

    Article  Google Scholar 

  16. Li C, Li L, Jiang H et al (2022) YOLOv6: a single-stage object detection framework for industrial applications. arXiv preprint arXiv: 2209.02976

  17. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision. IEEE, pp 2980–2988

  18. Ali H, Khursheed M, Fatima SK, Shuja SM, Noor S (2019) Object recognition for dental instruments using SSD-MobileNet. In: International conference on information science and communication technology (ICISCT). IEEE, pp 1–6

  19. Li X, Qi H, Ji X, Dai J, Wei Y (2020) RoI transformer: A joint detection and classification network for object detection. IEEE Trans Pattern Anal Mach Intell 43(6):1941–1954

    Google Scholar 

  20. Chowdhury PN, Sain A, Bhunia AK, Xiang T, Gryaditskaya Y, Song Y-Z (2022) Fs-coco: towards understanding of freehand sketches of common objects in context. In: European conference on computer vision. Springer, pp 253–270

  21. Ren Z, Zhou Y, Chen Y, Zhou R, Gao Y (2021) Efficient human pose estimation by maximizing fusion and high-level spatial attention. In: International conference on automatic face and gesture recognition. IEEE, pp 01–06

  22. Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE, pp 13713–13722

  23. Ukwuoma CC, Zhiguang Q, Hossin MA, Cobbinah BM, Oluwasanmi A, Chikwendu IA, Ejiyi CJ, Abubakar HS (2021) Holistic attention on pooling based cascaded partial decoder for real-time salient object detection. In: International conference on pattern recognition and artificial intelligence (PRAI). IEEE, pp 378–384

  24. Chen K, Wang J, Pang J, Cao Y, Xiong Y, Li X (2018) Spatial attention module for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 6298–6306

  25. Zhang S, Wen L, Bian X, Lei J, Liu S (2020) Global context module with two complementary attention mechanisms for object detection. IEEE Trans Image Process 29:3702–3712

    Google Scholar 

  26. Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. arXiv preprint arXiv: 1804. 02767

  27. Zhu X, Wang Y, Dai J, Lu H, Wei Y (2019) Feature selective anchor-free module for single-shot object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 424–433

  28. Christlein V, Spranger L, Seuret M, Nicolaou A, Král P, Maier A (2019) Deep generalized max pooling. In: International conference on document analysis and recognition (ICDAR). IEEE, pp 1090–1096

  29. Zhou K, Wang Y, Zhang T, Liu J, Peng C (2019) Objects as points. arXiv preprint arXiv: 1904. 07850

  30. Lin M, Chen Q, Yan S (2013) Network in network. arXiv preprint arXiv:1312.4400

  31. Shen Y, Zhang F, Liu D, Pu W, Zhang Q (2022) Manhattan-distance IOU loss for fast and accurate bounding box regression and object detection. Neurocomputing 500:99–114

    Article  Google Scholar 

  32. Abdelwahab M, Elhoseiny M, Hussein ME (2019) MASR: multi-lingual ASR using pre-trained deep learning models. arXiv preprint arXiv: 1910. 13422

  33. Su Y, Li D, Chen X (2021) Lung nodule detection based on faster R-CNN framework. Comput Methods Programs Biomed 200:105866

    Article  Google Scholar 

  34. Cen H (2023) Target location detection of mobile robots based on R-FCN deep convolutional neural network. Int J Syst Assur Eng Manag 14:728–737

    Article  Google Scholar 

  35. Li C, Li L, Geng Y, Jiang H, Cheng M, Zhang B, Ke Z, Xu X, Chu X (2023) Yolov6 v3. 0: A full-scale reloading. arXiv preprint arXiv: 2301.05586

  36. Fu C-Y, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659

  37. Cai Z, Vasconcelos N (2018) Cascade r-cnn: Delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 6154–6162

  38. Cheng X, Yu J (2020) RetinaNet with difference channel attention and adaptively spatial feature fusion for steel surface defect detection. IEEE Trans Instrum Meas 70:1–11

    Google Scholar 

  39. Sun Z, Cao S, Yang Y, Kitani KM (2021) Rethinking transformer-based set prediction for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision. IEEE, pp 3611–3620

  40. Sun P, Zhang R, Jiang Y, Kong T, Xu C, Zhan W, Tomizuka M, Li L, Yuan Z, and Wang C (2021) Sparse r-cnn: End-to-end object detection with learnable proposals. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE, pp 14454–14463

  41. Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. In: Proceedings of the 31st international conference on neural information processing systems. ACM, pp 6000–6010

  42. Hossain MS, Shahriar GM, Syeed MM, Uddin MF, Hasan M, Shivam S, Advani S (2023) Region of interest (ROI) selection using vision transformer for automatic analysis using whole slide images. Sci Rep 13(1):11314

    Article  Google Scholar 

  43. Terven J, Cordova-Esparza D (2023) A comprehensive review of YOLO: From YOLOv1 to YOLOv8 and beyond. arXiv preprint arXiv: 2304.00501

  44. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision. IEEE, pp 618–626

  45. Sriram S, Vinayakumar R, Sowmya V, Alazab M, Soman K (2020) Multi-scale learning based malware variant detection using spatial pyramid pooling network. In: IEEE INFOCOM 2020-IEEE conference on computer communications workshops (INFOCOM WKSHPS). IEEE, pp 740–745

  46. Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV).Springer, pp 3–19

  47. Zhou X, Koltun V, Krähenbühl P (2020) Tracking objects as points. In: European conference on computer vision(ECCV). Springer, pp 474–490

  48. Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2020) Deformable detr: deformable transformers for end-to-end object detection. arXiv preprint arXiv: 2010.04159

  49. Chen Y, Zhang Z, Cao Y, Wang L, Lin S, Hu H (2020) Reppoints v2: Verification meets regression for object detection. Adv Neural Inf Process Syst 33:5621–5631

    Google Scholar 

  50. Liang T, Chu X, Liu Y, Wang Y, Tang Z, Chu W, Chen J, Ling H (2021) Cbnetv2: a composite backbone network architecture for object detection. arXiv preprint arXiv: 2107.00420

  51. Wang C-Y, Yeh I-H, Liao H-Y M (2021) You only learn one representation: unified network for multiple tasks. arXiv preprint arXiv:2105.04206

  52. Xu X, Jiang Y, Chen W, Huang Y, Zhang Y, Sun X (2022) Damo-yolo: a report on real-time object detection design. arXiv preprint arXiv:2211.15444

  53. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision. IEEE, pp 10012–10022

  54. Dai X, Chen Y, Xiao B, Chen D, Liu M, Yuan L, Zhang L (2021) Dynamic head: unifying object detection heads with attentions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE, pp 7373–7382

  55. Liu Z, Hu H, Lin Y, Yao Z, Xie Z, Wei Y, Ning J, Cao Y, Zhang Z, Dong L (2022) Swin transformer v2: scaling up capacity and resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE, pp 12009–12019

  56. Lv W, Xu S, Zhao Y, Wang G, Wei J, Cui C, Du Y, Dang Q, Liu Y (2023) Detrs beat yolos on real-time object detection. arXiv preprint arXiv:2304.08069

  57. Xu S, Wang X, Lv W, Chang Q, Cui C, Deng K, Wang G, Dang Q, Wei S, Du Y (2023) PP-YOLOE: an evolved version of YOLO. arXiv preprint arXiv:2203.16250

  58. Ge Z, Liu S, Wang F, Li Z, Sun J (2021) Yolox: exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430

  59. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 8759–8768

  60. Ding J, Xue N, Long Y, Xia G-S, Lu Q (2019) Learning RoI transformer for oriented object detection in aerial images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE, pp 2849–285

  61. Ding X, Zhang X, Ma N, Han J, Ding G, Sun J (2021) Repvgg: Making vgg-style convnets great again. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE, pp 13733–13742

Download references

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62172212, in part by the Natural Science Foundation of Jiangsu Province under Grant BK20230031.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liyan Zhang.

Ethics declarations

Conflict of Interest

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dong, C., Tang, Y. & Zhang, L. Higher efficient YOLOv7: a one-stage method for non-salient object detection. Multimed Tools Appl 83, 42257–42283 (2024). https://doi.org/10.1007/s11042-023-17185-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-17185-w

Keywords

Navigation