Residual attention mechanism and weighted feature fusion for multi-scale object detection

Zhang, Jie; Qi, Qiye; Zhang, Huanlong; Du, Qifan; Wang, Fengxian; Shi, Xiaoping

doi:10.1007/s11042-023-14997-8

Residual attention mechanism and weighted feature fusion for multi-scale object detection

Published: 04 April 2023

Volume 82, pages 40873–40889, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Jie Zhang¹,
Qiye Qi ORCID: orcid.org/0000-0002-6325-5364¹,
Huanlong Zhang¹,
Qifan Du¹,
Fengxian Wang¹ &
…
Xiaoping Shi²

231 Accesses
1 Altmetric
Explore all metrics

Abstract

Object detection is one of the critical problems in computer vision research, which is also an essential basis for understanding high-level semantic information of images. To improve object detection performance, an improved YOLOv3 multi-scale object detection method is proposed in this article. Firstly, a residual attention module is introduced into the neck of YOLOv3, which includes the channel attention module, spatial attention module, and skip connection. The residual attention module is applied to the three layers of features obtained from the backbone, making the output feature focus on the channels and regions related to the object. Secondly, an additional weight is proposed to add to each input feature in the top-down feature fusion stage of YOLOv3, the size of which is determined by the degree of contribution of each input feature to the output features. The experimental results on KITTI, PASCAL VOC, and bird’s nest datasets fully verify the effectiveness of the proposed method in object detection. The proposed method has significant value in electric power inspection and self-driving automobiles.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AG-YOLO: Attention-guided network for real-time object detection

Article 04 September 2023

STFormer: Cross-Level Feature Fusion in Object Detection

Application Research of YOLOv3 Incorporating Self-attention Mechanism

Data Availability

The datasets generated during and/or analysed during the current study are not publicly available due to it is a special dataset established by power enterprises based on actual projects but are available from the corresponding author on reasonable request.

References

Benenson R, Omran M, Hosang J, Schiele B (2014) Ten years of pedestrian detection, what have we learned?. In: European conference on computer vision, pp 613–627. Springer
Chen S, Wang B, Tan X, Hu X (2018) Embedding attention and residual network for accurate salient object detection. IEEE Trans Cybern 50(5):2050–2062
Article Google Scholar
Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chua T-S (2017) Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 5659–5667
Corbetta M, Shulman GL (2002) Control of goal-directed and stimulus-driven attention in the brain. Nat Rev Neurosci 3(3):201–215
Article Google Scholar
Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp 379–387
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp 764–773
Feng D, Harakeh A, Waslander SL, Dietmayer K (2021) A review and comparative study on probabilistic object detection in autonomous driving. IEEE Transactions on Intelligent Transportation Systems
Fu C-Y, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd:, Deconvolutional single shot detector. arXiv:1701.06659
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587
Guo G, Zhang N (2019) A survey on deep learning based face recognition. Comput Vis Image Underst 102805:189
Google Scholar
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2961–2969
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
Hong F, Lu CH, Liu C, Liu RR, Wei J (2020) A traffic surveillance multi-scale vehicle detection object method base on encoder-decoder. IEEE Access PP(99):1–1
Google Scholar
Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13713–13722
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141
I Jie JL (2020) Bird nest detection on transmission tower based on improved ssd algorithm. Comput Syst Appl, 202–208
Ju M, Luo J, Wang Z, Luo H (2021) Adaptive feature fusion with attention mechanism for multi-scale target detection. Neural Comput Applic 33 (7):2769–2781
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Google Scholar
Law H, Deng J (2020) Cornernet: Detecting objects as paired keypoints. Int J Comput Vis 128(3):642–656
Article Google Scholar
Li W, Liu K, Zhang L, Cheng F (2020) Object detection based on an adaptive attention mechanism. Sci Rep 10(1):1–13
Google Scholar
Li C, Pourtaherian A, van Onzenoort L, a Ten WT, de With P (2020) Infant facial expression analysis: towards a real-time video monitoring system using r-cnn and hmm. IEEE J Biomed Health Inf 25(5):1429–1440
Article Google Scholar
Li Y-L, Wang S (2019) Har-net:, Joint learning of hybrid attention for single-stage object detection. arXiv:1904.11141
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2117–2125
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: European Conference on Computer Vision, pp 21–37. Springer
Liu S, Huang D, et al. (2018) Receptive field block net for accurate and fast object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 385–400
Ma W, Wu Y, Cen F, Wang G (2020) Mdfn: Multi-scale deep feature learning network for object detection. Pattern Recogn 107149:100
Google Scholar
Mao J, Xiao T, Jiang Y, Cao Z (2017) What can help pedestrian detection?. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3127–3136
Park J, Woo S, Lee J-Y, Kweon IS (2018) Bam:, Bottleneck attention module. arXiv:1807.06514
Parkhi OM, Vedaldi A, Zisserman A (2015) Deep face recognition
Pouyanfar S, Wang T, Chen S-C (2019) Residual attention-based fusion for video classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 0–0
Qian R, Lai X, Li X (2021) 3d object detection for autonomous driving:, A survey. arXiv:2106.10823
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788
Redmon J, Farhadi A (2017) Yolo9000: Better, faster, stronger IEEE Conference on Computer Vision and Pattern Recognition, pp 6517–6525
Redmon J, Farhadi A (2018) Yolov3:, An incremental improvement. arXiv:1804.02767
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:91–99
Google Scholar
Rensink RA (2000) The dynamic representation of scenes. Vis cogn 7(1-3):17–42
Article Google Scholar
Singh S, Ahuja U, Kumar M, Kumar K, Sachdeva M (2021) Face mask detection using yolov3 and faster r-cnn models: Covid-19 environment. Multimed Tools Appl, 1–16
Uçar A, Demir Y, Güzeliş C (2017) Object recognition and detection with deep learning for autonomous driving applications. Simulation 93(9):759–769
Article Google Scholar
Wang L, Bao Y, Li H, Fan X, Luo Z (2017) Compact cnn based video representation for efficient video copy detection. In: International conference on multimedia modeling, pp 576–587. Springer
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3156–3164
Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) Eca-net: efficient channel attention for deep convolutional neural networks, 2020 ieee. In: CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19
Yu C, Liu K, Zou W (2020) A method of small object detection based on improved deep learning. Opt Mem Neural Netw 29(2):69–76
Article Google Scholar
Yya B, Hl A, Wei FB (2020) Faster-yolo: an accurate and faster object detection method. Digital Signal Processing, 102
Zhou P, Ni B, Geng C, Hu J, Xu Y (2018) Scale-transferrable object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 528–537
Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv:1904.07850

Download references

Acknowledgments

This work is supported by the grants from National Science Foundation of China (No.62102373, 61873246, 62006213), The Science and Technology Research Project of Henan Province (No.212102310053) and Henan University Science and Technology Innovation Talents Program (No.21HASTIT028).

Author information

Authors and Affiliations

College of Electrical and Information Engineering, Zhengzhou University of Light Industry, Dongfeng Road, Zhengzhou, 450002, Henan Province, People’s Republic of China
Jie Zhang, Qiye Qi, Huanlong Zhang, Qifan Du & Fengxian Wang
Harbin Institute of Technology, Harbin, People’s Republic of China
Xiaoping Shi

Authors

Jie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Qiye Qi
View author publications
You can also search for this author in PubMed Google Scholar
Huanlong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Qifan Du
View author publications
You can also search for this author in PubMed Google Scholar
Fengxian Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoping Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qiye Qi.

Ethics declarations

Conflict of Interests

The authors declare no competing interests.

We declare that we have no financial and personal relationships with other people or organisations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, J., Qi, Q., Zhang, H. et al. Residual attention mechanism and weighted feature fusion for multi-scale object detection. Multimed Tools Appl 82, 40873–40889 (2023). https://doi.org/10.1007/s11042-023-14997-8

Download citation

Received: 05 April 2022
Revised: 03 October 2022
Accepted: 22 February 2023
Published: 04 April 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s11042-023-14997-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Residual attention mechanism and weighted feature fusion for multi-scale object detection

Abstract

Access this article

Similar content being viewed by others

AG-YOLO: Attention-guided network for real-time object detection

STFormer: Cross-Level Feature Fusion in Object Detection

Application Research of YOLOv3 Incorporating Self-attention Mechanism

Data Availability

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Residual attention mechanism and weighted feature fusion for multi-scale object detection

Abstract

Access this article

Similar content being viewed by others

AG-YOLO: Attention-guided network for real-time object detection

STFormer: Cross-Level Feature Fusion in Object Detection

Application Research of YOLOv3 Incorporating Self-attention Mechanism

Data Availability

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation