Skip to main content
Log in

Residual attention mechanism and weighted feature fusion for multi-scale object detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Object detection is one of the critical problems in computer vision research, which is also an essential basis for understanding high-level semantic information of images. To improve object detection performance, an improved YOLOv3 multi-scale object detection method is proposed in this article. Firstly, a residual attention module is introduced into the neck of YOLOv3, which includes the channel attention module, spatial attention module, and skip connection. The residual attention module is applied to the three layers of features obtained from the backbone, making the output feature focus on the channels and regions related to the object. Secondly, an additional weight is proposed to add to each input feature in the top-down feature fusion stage of YOLOv3, the size of which is determined by the degree of contribution of each input feature to the output features. The experimental results on KITTI, PASCAL VOC, and bird’s nest datasets fully verify the effectiveness of the proposed method in object detection. The proposed method has significant value in electric power inspection and self-driving automobiles.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data Availability

The datasets generated during and/or analysed during the current study are not publicly available due to it is a special dataset established by power enterprises based on actual projects but are available from the corresponding author on reasonable request.

References

  1. Benenson R, Omran M, Hosang J, Schiele B (2014) Ten years of pedestrian detection, what have we learned?. In: European conference on computer vision, pp 613–627. Springer

  2. Chen S, Wang B, Tan X, Hu X (2018) Embedding attention and residual network for accurate salient object detection. IEEE Trans Cybern 50(5):2050–2062

    Article  Google Scholar 

  3. Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chua T-S (2017) Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 5659–5667

  4. Corbetta M, Shulman GL (2002) Control of goal-directed and stimulus-driven attention in the brain. Nat Rev Neurosci 3(3):201–215

    Article  Google Scholar 

  5. Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp 379–387

  6. Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp 764–773

  7. Feng D, Harakeh A, Waslander SL, Dietmayer K (2021) A review and comparative study on probabilistic object detection in autonomous driving. IEEE Transactions on Intelligent Transportation Systems

  8. Fu C-Y, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd:, Deconvolutional single shot detector. arXiv:1701.06659

  9. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1440–1448

  10. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587

  11. Guo G, Zhang N (2019) A survey on deep learning based face recognition. Comput Vis Image Underst 102805:189

    Google Scholar 

  12. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2961–2969

  13. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778

  14. Hong F, Lu CH, Liu C, Liu RR, Wei J (2020) A traffic surveillance multi-scale vehicle detection object method base on encoder-decoder. IEEE Access PP(99):1–1

    Google Scholar 

  15. Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13713–13722

  16. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141

  17. I Jie JL (2020) Bird nest detection on transmission tower based on improved ssd algorithm. Comput Syst Appl, 202–208

  18. Ju M, Luo J, Wang Z, Luo H (2021) Adaptive feature fusion with attention mechanism for multi-scale target detection. Neural Comput Applic 33 (7):2769–2781

    Article  Google Scholar 

  19. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105

    Google Scholar 

  20. Law H, Deng J (2020) Cornernet: Detecting objects as paired keypoints. Int J Comput Vis 128(3):642–656

    Article  Google Scholar 

  21. Li W, Liu K, Zhang L, Cheng F (2020) Object detection based on an adaptive attention mechanism. Sci Rep 10(1):1–13

    Google Scholar 

  22. Li C, Pourtaherian A, van Onzenoort L, a Ten WT, de With P (2020) Infant facial expression analysis: towards a real-time video monitoring system using r-cnn and hmm. IEEE J Biomed Health Inf 25(5):1429–1440

    Article  Google Scholar 

  23. Li Y-L, Wang S (2019) Har-net:, Joint learning of hybrid attention for single-stage object detection. arXiv:1904.11141

  24. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2117–2125

  25. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: European Conference on Computer Vision, pp 21–37. Springer

  26. Liu S, Huang D, et al. (2018) Receptive field block net for accurate and fast object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 385–400

  27. Ma W, Wu Y, Cen F, Wang G (2020) Mdfn: Multi-scale deep feature learning network for object detection. Pattern Recogn 107149:100

    Google Scholar 

  28. Mao J, Xiao T, Jiang Y, Cao Z (2017) What can help pedestrian detection?. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3127–3136

  29. Park J, Woo S, Lee J-Y, Kweon IS (2018) Bam:, Bottleneck attention module. arXiv:1807.06514

  30. Parkhi OM, Vedaldi A, Zisserman A (2015) Deep face recognition

  31. Pouyanfar S, Wang T, Chen S-C (2019) Residual attention-based fusion for video classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 0–0

  32. Qian R, Lai X, Li X (2021) 3d object detection for autonomous driving:, A survey. arXiv:2106.10823

  33. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788

  34. Redmon J, Farhadi A (2017) Yolo9000: Better, faster, stronger IEEE Conference on Computer Vision and Pattern Recognition, pp 6517–6525

  35. Redmon J, Farhadi A (2018) Yolov3:, An incremental improvement. arXiv:1804.02767

  36. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:91–99

    Google Scholar 

  37. Rensink RA (2000) The dynamic representation of scenes. Vis cogn 7(1-3):17–42

    Article  Google Scholar 

  38. Singh S, Ahuja U, Kumar M, Kumar K, Sachdeva M (2021) Face mask detection using yolov3 and faster r-cnn models: Covid-19 environment. Multimed Tools Appl, 1–16

  39. Uçar A, Demir Y, Güzeliş C (2017) Object recognition and detection with deep learning for autonomous driving applications. Simulation 93(9):759–769

    Article  Google Scholar 

  40. Wang L, Bao Y, Li H, Fan X, Luo Z (2017) Compact cnn based video representation for efficient video copy detection. In: International conference on multimedia modeling, pp 576–587. Springer

  41. Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3156–3164

  42. Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) Eca-net: efficient channel attention for deep convolutional neural networks, 2020 ieee. In: CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE

  43. Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19

  44. Yu C, Liu K, Zou W (2020) A method of small object detection based on improved deep learning. Opt Mem Neural Netw 29(2):69–76

    Article  Google Scholar 

  45. Yya B, Hl A, Wei FB (2020) Faster-yolo: an accurate and faster object detection method. Digital Signal Processing, 102

  46. Zhou P, Ni B, Geng C, Hu J, Xu Y (2018) Scale-transferrable object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 528–537

  47. Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv:1904.07850

Download references

Acknowledgments

This work is supported by the grants from National Science Foundation of China (No.62102373, 61873246, 62006213), The Science and Technology Research Project of Henan Province (No.212102310053) and Henan University Science and Technology Innovation Talents Program (No.21HASTIT028).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiye Qi.

Ethics declarations

Conflict of Interests

The authors declare no competing interests.

We declare that we have no financial and personal relationships with other people or organisations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, J., Qi, Q., Zhang, H. et al. Residual attention mechanism and weighted feature fusion for multi-scale object detection. Multimed Tools Appl 82, 40873–40889 (2023). https://doi.org/10.1007/s11042-023-14997-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-14997-8

Keywords

Navigation