Abstract
Object detection is one of the critical problems in computer vision research, which is also an essential basis for understanding high-level semantic information of images. To improve object detection performance, an improved YOLOv3 multi-scale object detection method is proposed in this article. Firstly, a residual attention module is introduced into the neck of YOLOv3, which includes the channel attention module, spatial attention module, and skip connection. The residual attention module is applied to the three layers of features obtained from the backbone, making the output feature focus on the channels and regions related to the object. Secondly, an additional weight is proposed to add to each input feature in the top-down feature fusion stage of YOLOv3, the size of which is determined by the degree of contribution of each input feature to the output features. The experimental results on KITTI, PASCAL VOC, and bird’s nest datasets fully verify the effectiveness of the proposed method in object detection. The proposed method has significant value in electric power inspection and self-driving automobiles.
Similar content being viewed by others
Data Availability
The datasets generated during and/or analysed during the current study are not publicly available due to it is a special dataset established by power enterprises based on actual projects but are available from the corresponding author on reasonable request.
References
Benenson R, Omran M, Hosang J, Schiele B (2014) Ten years of pedestrian detection, what have we learned?. In: European conference on computer vision, pp 613–627. Springer
Chen S, Wang B, Tan X, Hu X (2018) Embedding attention and residual network for accurate salient object detection. IEEE Trans Cybern 50(5):2050–2062
Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chua T-S (2017) Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 5659–5667
Corbetta M, Shulman GL (2002) Control of goal-directed and stimulus-driven attention in the brain. Nat Rev Neurosci 3(3):201–215
Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp 379–387
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp 764–773
Feng D, Harakeh A, Waslander SL, Dietmayer K (2021) A review and comparative study on probabilistic object detection in autonomous driving. IEEE Transactions on Intelligent Transportation Systems
Fu C-Y, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd:, Deconvolutional single shot detector. arXiv:1701.06659
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587
Guo G, Zhang N (2019) A survey on deep learning based face recognition. Comput Vis Image Underst 102805:189
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2961–2969
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
Hong F, Lu CH, Liu C, Liu RR, Wei J (2020) A traffic surveillance multi-scale vehicle detection object method base on encoder-decoder. IEEE Access PP(99):1–1
Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13713–13722
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141
I Jie JL (2020) Bird nest detection on transmission tower based on improved ssd algorithm. Comput Syst Appl, 202–208
Ju M, Luo J, Wang Z, Luo H (2021) Adaptive feature fusion with attention mechanism for multi-scale target detection. Neural Comput Applic 33 (7):2769–2781
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Law H, Deng J (2020) Cornernet: Detecting objects as paired keypoints. Int J Comput Vis 128(3):642–656
Li W, Liu K, Zhang L, Cheng F (2020) Object detection based on an adaptive attention mechanism. Sci Rep 10(1):1–13
Li C, Pourtaherian A, van Onzenoort L, a Ten WT, de With P (2020) Infant facial expression analysis: towards a real-time video monitoring system using r-cnn and hmm. IEEE J Biomed Health Inf 25(5):1429–1440
Li Y-L, Wang S (2019) Har-net:, Joint learning of hybrid attention for single-stage object detection. arXiv:1904.11141
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2117–2125
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: European Conference on Computer Vision, pp 21–37. Springer
Liu S, Huang D, et al. (2018) Receptive field block net for accurate and fast object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 385–400
Ma W, Wu Y, Cen F, Wang G (2020) Mdfn: Multi-scale deep feature learning network for object detection. Pattern Recogn 107149:100
Mao J, Xiao T, Jiang Y, Cao Z (2017) What can help pedestrian detection?. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3127–3136
Park J, Woo S, Lee J-Y, Kweon IS (2018) Bam:, Bottleneck attention module. arXiv:1807.06514
Parkhi OM, Vedaldi A, Zisserman A (2015) Deep face recognition
Pouyanfar S, Wang T, Chen S-C (2019) Residual attention-based fusion for video classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 0–0
Qian R, Lai X, Li X (2021) 3d object detection for autonomous driving:, A survey. arXiv:2106.10823
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788
Redmon J, Farhadi A (2017) Yolo9000: Better, faster, stronger IEEE Conference on Computer Vision and Pattern Recognition, pp 6517–6525
Redmon J, Farhadi A (2018) Yolov3:, An incremental improvement. arXiv:1804.02767
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:91–99
Rensink RA (2000) The dynamic representation of scenes. Vis cogn 7(1-3):17–42
Singh S, Ahuja U, Kumar M, Kumar K, Sachdeva M (2021) Face mask detection using yolov3 and faster r-cnn models: Covid-19 environment. Multimed Tools Appl, 1–16
Uçar A, Demir Y, Güzeliş C (2017) Object recognition and detection with deep learning for autonomous driving applications. Simulation 93(9):759–769
Wang L, Bao Y, Li H, Fan X, Luo Z (2017) Compact cnn based video representation for efficient video copy detection. In: International conference on multimedia modeling, pp 576–587. Springer
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3156–3164
Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) Eca-net: efficient channel attention for deep convolutional neural networks, 2020 ieee. In: CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19
Yu C, Liu K, Zou W (2020) A method of small object detection based on improved deep learning. Opt Mem Neural Netw 29(2):69–76
Yya B, Hl A, Wei FB (2020) Faster-yolo: an accurate and faster object detection method. Digital Signal Processing, 102
Zhou P, Ni B, Geng C, Hu J, Xu Y (2018) Scale-transferrable object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 528–537
Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv:1904.07850
Acknowledgments
This work is supported by the grants from National Science Foundation of China (No.62102373, 61873246, 62006213), The Science and Technology Research Project of Henan Province (No.212102310053) and Henan University Science and Technology Innovation Talents Program (No.21HASTIT028).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare no competing interests.
We declare that we have no financial and personal relationships with other people or organisations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, J., Qi, Q., Zhang, H. et al. Residual attention mechanism and weighted feature fusion for multi-scale object detection. Multimed Tools Appl 82, 40873–40889 (2023). https://doi.org/10.1007/s11042-023-14997-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-14997-8