A recursive attention-enhanced bidirectional feature pyramid network for small object detection

Zhang, Huanlong; Du, Qifan; Qi, Qiye; Zhang, Jie; Wang, Fengxian; Gao, Miao

doi:10.1007/s11042-022-13951-4

A recursive attention-enhanced bidirectional feature pyramid network for small object detection

Published: 27 September 2022

Volume 82, pages 13999–14018, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Huanlong Zhang¹,
Qifan Du¹,
Qiye Qi¹,
Jie Zhang¹,
Fengxian Wang¹ &
…
Miao Gao²

815 Accesses
5 Citations
Explore all metrics

Abstract

Single Shot MultiBox Detector (SSD) method shows outstanding performance by using multiscale feature maps in object detection task. However, the SSD method exhibits low accuracy in small object detection. In this paper, A Recursive Attention-Enhanced Bidirectional Feature Pyramid Network (RA-BiFPN) is proposed. Firstly, we designed the attention-enhanced bidirectional feature pyramid network (A-BiFPN) to improve the detection accuracy of the small object. The A-BiFPN is composed of bidirectional feature pyramid network (BiFPN) and the coordinate attention. Among them, the BiFPN employs top-down and bottom-up paths to aggregate features at different scales so that features at all scales contain rich semantic and detailed information. These features help coordinate attention that embeds positional information into channel attention so that the network can easily focus on the channels and locations related to the object in the feature map. Secondly, in order to enhance the ability of the A-BiFPN to characterize small targets, we adopted the recursive structure to feed back the output feature of the A-BiFPN into the backbone network. In this way, the recursive structure goes through the bottom-up backbone repeatedly to enrich the representation power of the A-BiFPN. The experimental results show that the detection accuracy of our method in PASCAL VOC, NWPU VHR-10 , KITTI and RSOD dataset is improved by 2.65%, 7.98% ,7.02% and 5.63% respectively compared to the original SSD.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhanced SSD with interactive multi-scale attention features for object detection

Article 06 January 2021

Small Object Detection Algorithm Combining Coordinate Attention Mechanism and P2-BiFPN Structure

Local Enhancement and Bidirectional Feature Refinement Network for Single-Shot Detector

Article 15 February 2021

References

Benenson R, Omran M, Hosang J, Schiele B (2014) Ten years of pedestrian detection, what have we learned?. In: European Conference on Computer Vision. Springer, Cham, pp 613–627
Bochkovskiy A, Wang C-Y, Liao H-Y M (2020) Yolov4: optimal speed and accuracy of object detection. arXiv:2004.10934
Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162
Cao C, Liu X, Yang Y, Yu Y, Wang J, Wang Z, Huang Y, Wang L, Huang C, Xu W et al (2015) Look and think twice: capturing top-down visual attention with feedback convolutional neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 2956–2964
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Machine Intell 40 (4):834–848
Article Google Scholar
Choi H-T, Lee H-J, Kang H, Yu S, Park H-H (2021) Ssd-emb: an improved ssd using enhanced feature map block for object detection. Sensors 21(8):2842
Article Google Scholar
Feng D, Harakeh A, Waslander S, Dietmayer K (2020) A review and comparative study on probabilistic object detection in autonomous driving. arXiv:2011.10671
Ghiasi G, Lin T-Y, Le QV (2019) Nas-fpn: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7036–7045
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Guo G, Zhang N (2019) A survey on deep learning based face recognition. Comput Vis Image Underst 189:102805
Article Google Scholar
Guo W, Yang W, Zhang H, Hua G (2018) Geospatial object detection in high resolution satellite images based on multi-scale convolutional neural network. Remote Sensing 10(1):131
Article Google Scholar
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Hou Q, Zhang L, Cheng M-M, Feng J (2020) Strip pooling: rethinking spatial pooling for scene parsing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4003–4012
Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13713–13722
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Hwang Y-J, Lee J-G, Moon U-C, Park H-H (2020) Ssd-tseffm: new ssd using trident feature and squeeze and extraction feature fusion. Sensors 20(13):3630
Article Google Scholar
Jiang D, Sun B, Su S, Zuo Z, Wu P, Tan X (2020) Fassd: a feature fusion and spatial attention-based single shot detector for small object detection. Electronics 9(9):1536
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Google Scholar
Kumar K (2019) Evs-dk: event video skimming using deep keyframe. J Vis Commun Image Represent 58:345–352
Article Google Scholar
Kumar K (2021) Text query based summarized event searching interface system using deep learning over cloud. Multimedia Tools and Applications 80(7):11079–11094
Article Google Scholar
Kumar K, Shrimankar DD (2017) F-des: fast and deep event summarization. IEEE Trans Multimedia 20(2):323–334
Article Google Scholar
Kumar K, Shrimankar DD (2018) Deep event learning boost-up approach: delta. Multimedia Tools and Applications 77(20):26635–26655
Article Google Scholar
Kumar K, Shrimankar DD, Singh N (2016) Equal partition based clustering approach for event summarization in videos. In: 2016 12th international conference on signal-image technology & internet-based systems (SITIS). IEEE, pp 119–126
Kumar K, Shrimankar DD, Singh N (2018) Eratosthenes sieve based key-frame extraction technique for event summarization in videos. Multimedia Tools and Applications 77(6):7383–7404
Article Google Scholar
Li C, Pourtaherian A, van Onzenoort L, A Ten WT, De With P (2020) Infant facial expression analysis: towards a real-time video monitoring system using r-cnn and hmm. IEEE J Biomed Health Inform 25(5):1429–1440
Article Google Scholar
Li K, Cheng G, Bu S, You X (2017) Rotation-insensitive and context-augmented object detection in remote sensing images. IEEE Trans Geosci Remote Sens 56(4):2337–2348
Article Google Scholar
Li Y, Pei X, Huang Q, Jiao L, Shang R, Marturi N (2020) Anchor-free single stage detector in remote sensing images based on multiscale dense path aggregation feature pyramid network. IEEE Access 8:63121–63133
Article Google Scholar
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Liu L, Ouyang W, Wang X, Fieguth P, Chen J, Liu X, et al. (2018) Deep learning for generic object detection. A Survey [J]
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759–8768
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Mao J, Xiao T, Jiang Y, Cao Z (2017) What can help pedestrian detection?. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3127–3136
Pan H, Jiang J, Chen G (2020) Tdfssd: top-down feature fusion single shot multibox detector. Signal Processing: Image Communication 89:115987
Google Scholar
Parkhi O, Vedaldi A, Zisserman A (2015) Deep face recognition. In: BMVC 2015 - Proceedings of the British Machine Vision Conference, pp 1–12
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv:1804.02767
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:91–99
Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Solanki A, Bamrara R, Kumar K, Singh N (2020) Vedl: a novel video event searching technique using deep learning. In: Soft Computing: Theories and Applications. Springer, pp 905–914
Tan M, Pang R, Le Q V (2020) Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10781–10790
Uçar A, Demir Y, Güzeliş C (2017) Object recognition and detection with deep learning for autonomous driving applications. Simulation 93(9):759–769
Article Google Scholar
Wang L, Bao Y, Li H, Fan X, Luo Z (2017) Compact cnn based video representation for efficient video copy detection. In: International conference on multimedia modeling. Springer, pp 576–587
Wang Y, Liu X, Guo R (2022) An object detection algorithm based on the feature pyramid network and single shot multibox detector. Clust Comput 1–12
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Xiong S, Tan Y, Li Y, Wen C, Yan P (2021) Subtask attention based object detection in remote sensing images. Remote Sensing 13(10):1925
Article Google Scholar
Yin Q, Yang W, Ran M, Wang S (2021) Fd-ssd: an improved ssd object detection algorithm based on feature fusion and dilated convolution. Signal Processing: Image Communication 98:116402
Google Scholar
Yin R, Zhao W, Fan X, Yin Y (2020) Af-ssd: an accurate and fast single shot detector for high spatial remote sensing imagery. Sensors 20(22):6530
Article Google Scholar
Zaidi SSA, Ansari MS, Aslam A, Kanwal N, Asghar M, Lee B (2021) A survey of modern deep learning based object detection models. arXiv:2104.11892
Zhai S, Shang D, Wang S, Dong S (2020) Df-ssd: an improved ssd object detection algorithm based on densenet and feature fusion. IEEE Access 8:24344–24357
Article Google Scholar
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890
Zhou P, Ni B, Geng C, Hu J, Xu Y (2018) Scale-transferrable object detection. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 528–537
Zhou T, Li L, Li X, Feng C-M, Li J, Shao L (2021) Group-wise learning for weakly supervised semantic segmentation. IEEE Trans Image Process 31:799–811
Article Google Scholar
Zhou T, Qi S, Wang W, Shen J, Zhu S-C (2021) Cascaded parsing of human-object interaction recognition. IEEE Trans Pattern Anal Mach Intell
Zhou T, Wang S, Zhou Y, Yao Y, Li J, Shao L (2020) Motion-attentive transition for zero-shot video object segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 34, pp 13066–13073
Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv:1904.07850

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grant (61873246, 62072416, 62006213, 62102373), Program for Science & Technology Innovation Talents in Universities of Henan Province (21HASTIT028), Natural Science Foundation of Henan (202300410495), Key Scientific Research Projects of Colleges and Universities in Henan Province (21A120010).

Author information

Authors and Affiliations

College of Electrical and Information Engineering, Zhengzhou University of Light Industry, Dongfeng Road, Zhengzhou, 450002, Henan Province, People’s Republic of China
Huanlong Zhang, Qifan Du, Qiye Qi, Jie Zhang & Fengxian Wang
China Tobacco Henan Industrial CO.,LTD, Henan Province, People’s Republic of China
Miao Gao

Authors

Huanlong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Qifan Du
View author publications
You can also search for this author in PubMed Google Scholar
Qiye Qi
View author publications
You can also search for this author in PubMed Google Scholar
Jie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Fengxian Wang
View author publications
You can also search for this author in PubMed Google Scholar
Miao Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huanlong Zhang.

Ethics declarations

Conflict of Interests

We declare that we have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, H., Du, Q., Qi, Q. et al. A recursive attention-enhanced bidirectional feature pyramid network for small object detection. Multimed Tools Appl 82, 13999–14018 (2023). https://doi.org/10.1007/s11042-022-13951-4

Download citation

Received: 31 January 2022
Revised: 12 April 2022
Accepted: 12 September 2022
Published: 27 September 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s11042-022-13951-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A recursive attention-enhanced bidirectional feature pyramid network for small object detection

Abstract

Access this article

Similar content being viewed by others

Enhanced SSD with interactive multi-scale attention features for object detection

Small Object Detection Algorithm Combining Coordinate Attention Mechanism and P2-BiFPN Structure

Local Enhancement and Bidirectional Feature Refinement Network for Single-Shot Detector

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A recursive attention-enhanced bidirectional feature pyramid network for small object detection

Abstract

Access this article

Similar content being viewed by others

Enhanced SSD with interactive multi-scale attention features for object detection

Small Object Detection Algorithm Combining Coordinate Attention Mechanism and P2-BiFPN Structure

Local Enhancement and Bidirectional Feature Refinement Network for Single-Shot Detector

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation