An attention-based feature pyramid network for single-stage small object detection

Jiao, Lin; Kang, Chenrui; Dong, Shifeng; Chen, Peng; Li, Gaoqiang; Wang, Rujing

doi:10.1007/s11042-022-14159-2

An attention-based feature pyramid network for single-stage small object detection

Published: 18 November 2022

Volume 82, pages 18529–18544, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Lin Jiao ORCID: orcid.org/0000-0003-0403-0365^1,2,
Chenrui Kang³,
Shifeng Dong^2,4,
Peng Chen¹,
Gaoqiang Li¹ &
…
Rujing Wang^2,4

563 Accesses
6 Citations
Explore all metrics

Abstract

Recently, single-stage detection methods have made great progress, achieving comparable accuracy to two-stage detection methods. However, they have poor performance over small object detection. In this work, we improve the performance of the single-stage detector for detecting objects of small sizes. The proposed method makes two major novel contributions. The first is to devise an attention-based feature pyramid network (aFPN) by introducing a learnable fusion factor for controlling feature information that deep layers deliver to shallow layers. The design of a learnable fusion factor could adapt a feature pyramid network to small object detection. The second contribution is to propose a soft-weighted loss function, which reduces the false attention during network training. To be specify, we reweight the contribution of training samples to the network loss according to their distances with the boundaries of the ground-truth box, leading to fewer false-positive detections. To verify the performance of the proposed method, we conduct extensive experiments on different datasets by comparing including RetinaNet, ATSS, FCOS, FreeAnchor, et al. Experimental results show that our method can achieve 44.2% AP on MS COCO dataset, 23.0% AP on VisDrone dataset, which significantly gains improvements with nearly no computation overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hierarchical Focused Feature Pyramid Network for Small Object Detection

DFE-Net: detail feature extraction network for small object detection

Article 15 February 2024

Small Object Detection Using Deep Feature Pyramid Networks

Data Availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

Bello I, Zoph B, Le Q, Vaswani A, Shlens J (2019) Attention augmented convolutional networks. In: 2019 IEEE/CVF International conference on computer vision (ICCV), pp 3285–3294. https://doi.org/10.1109/ICCV.2019.00338
Bottou L (2012) Stochastic gradient descent tricks. In: Neural networks: Tricks of the trade. Springer, pp 421–436
Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162
Cao Y, Xu J, Lin S, Wei F, Hu H (2019) Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: 2019 IEEE/CVF International conference on computer vision workshop (ICCVW), pp 1971–1980. https://doi.org/10.1109/ICCVW.2019.00246
Chen K, Wang J, Pang J, Cao Y, Xiong Y, Li X, Sun S, Feng W, Liu Z, Xu J et al (2019) Mmdetection: open mmlab detection toolbox and benchmark. arXiv:1906.07155
Dai P, Zhang S, Zhang H, Cao X (2021) Progressive contour regression for arbitrary-shape scene text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7393–7402
Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6569–6578
Gao Z, Xie J, Wang Q, Li P (2019) Global second-order pooling convolutional networks. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 3019–3028. https://doi.org/10.1109/CVPR.2019.00314
Ghiasi G, Lin TY, Le QV (2019) Nas-fpn: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7036–7045
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Gong Y, Yu X, Ding Y, Peng X, Zhao J, Han Z (2020) Effective fusion factor in fpn for tiny object detection
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hoiem D, Chodpathumwan Y, Dai Q (2012) Diagnosing error in object detectors. In: European conference on computer vision. Springer, pp 340–353
Hosang J, Omran M, Benenson R, Schiele B (2015) Taking a deeper look at pedestrians. In: 2015 IEEE Conference on computer vision and pattern recognition (CVPR), pp 4073–4082. https://doi.org/10.1109/CVPR.2015.7299034
Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42 (8):2011–2023. https://doi.org/10.1109/TPAMI.2019.2913372
Article Google Scholar
Kong T, Sun F, Liu H, Jiang Y, Li L, Shi J (2020) Foveabox: beyound anchor-based object detection. IEEE Trans Image Process 29:7389–7398
Article MATH Google Scholar
Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), pp 734–750
Lee S, Tariq S, Shin Y, Woo S (2021) Detecting handcrafted facial image manipulations and gan-generated facial images using shallow-fakefacenet. Appl Soft Comput 105(107):256. https://doi.org/10.1016/j.asoc.2021.107256
Article Google Scholar
Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 510–519. https://doi.org/10.1109/CVPR.2019.00060
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Lin TY, Goyal P, Girshick R, He K, Dollár P. (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 2980–2988
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, pp 740–755
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 8759–8768. https://doi.org/10.1109/CVPR.2018.00913
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Nyo MT, Mebarek-Oudina F, Hlaing SS, Khan NA (2022) Otsu’s thresholding technique for mri image brain tumor segmentation. Multimedia Tools and Applications
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: 2015 IEEE Conference on computer vision and pattern recognition (CVPR), pp 815–823. https://doi.org/10.1109/CVPR.2015.7298682
Tian Z, Shen C, Chen H, He T (2019) Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9627–9636
Tian Z, Shen C, Chen H, He T (2020) Fcos: a simple and strong anchor-free object detector. IEEE Transactions on Pattern Analysis and Machine Intelligence
Toğaçar M, Ergen B, Cömert Z (2020) Classification of white blood cells using deep features obtained from convolutional neural network models based on the combination of feature selection methods. Appl Soft Comput 97(106):810. https://doi.org/10.1016/j.asoc.2020.106810
Article Google Scholar
Wang B, Jin S, Yan Q, Xu H, Luo C, Wei L, Zhao W, Hou X, Ma W, Xu Z, Zheng Z, Sun W, Lan L, Zhang W, Mu X, Shi C, Wang Z, Lee J, Jin Z, Dong J (2020) Ai-assisted ct imaging analysis for covid-19 screening: building and deploying a medical ai system. Appl Soft Comput 98(106):897. https://doi.org/10.1016/j.asoc.2020.106897
Article Google Scholar
Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) Eca-net: efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 11,531–11,539. https://doi.org/10.1109/CVPR42600.2020.01155
Wang R, Jiao L, Xie C, Chen P, Du J, Li R (2021) S-rpn: sampling-balanced region proposal network for small crop pest detection. Comput Electron Agric 187:106,290. https://doi.org/10.1016/j.compag.2021.106290. https://www.sciencedirect.com/science/article/pii/S0168169921003070
Article Google Scholar
Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: convolutional block attention module. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision – ECCV 2018. Springer International Publishing, Cham, pp 3–19
Zhang L, Lin L, Liang X, He K (2016) Is faster r-cnn doing well for pedestrian detection?. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision – ECCV 2016, Springer International Publishing, Cham, pp 443–457
Zhang S, Chi C, Yao Y, Lei Z, Li SZ (2020) Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9759–9768
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4203–4212
Zhang X, Wan F, Liu C, Ji X, Ye Q (2021) Learning to match anchors for visual object detection. IEEE Trans Pattern Anal Mach Intell 44:3096–3109. https://doi.org/10.1109/TPAMI.2021.3050494
Article Google Scholar
Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2019) M2det: a single-shot object detector based on multi-level feature pyramid network. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 9259–9266
Zhou T, Li J, Wang S, Tao R, Shen J (2020) Matnet: motion-attentive transition network for zero-shot video object segmentation. IEEE Trans Image Process 29:8326–8338. https://doi.org/10.1109/TIP.2020.3013162
Article MATH Google Scholar
Zhou T, Wang W, Qi S, Ling H, Shen J (2020) Cascaded human-object interaction recognition. In: 2020 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 4262–4271. https://doi.org/10.1109/CVPR42600.2020.00432
Zhu C, He Y, Savvides M (2019) Feature selective anchor-free module for single-shot object detection. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp. 840–849. https://doi.org/10.1109/CVPR.2019.00093
Zhu P, Wen L, Bian X, Ling H, Hu Q (2018) Vision meets drones: a challenge. arXiv:1804.07437
Zhu Y, Du J (2021) Textmountain: accurate scene text detection via instance segmentation. Pattern Recogn 110:107,336. https://doi.org/10.1016/j.patcog.2020.107336. https://www.sciencedirect.com/science/article/pii/S0031320320301394
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Internet, Anhui Unviersity, Hefei, 230039, China
Lin Jiao, Peng Chen & Gaoqiang Li
Institute of Intelligent Machines, Hefei Institutes of Physical Science, Chinese Academy of Science, Hefei, 230031, China
Lin Jiao, Shifeng Dong & Rujing Wang
Southwest University of Science and Technology, Mianyang, 621010, China
Chenrui Kang
University of Science and Technology of China, Hefei, 230031, China
Shifeng Dong & Rujing Wang

Authors

Lin Jiao
View author publications
You can also search for this author inPubMed Google Scholar
Chenrui Kang
View author publications
You can also search for this author inPubMed Google Scholar
Shifeng Dong
View author publications
You can also search for this author inPubMed Google Scholar
Peng Chen
View author publications
You can also search for this author inPubMed Google Scholar
Gaoqiang Li
View author publications
You can also search for this author inPubMed Google Scholar
Rujing Wang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Lin Jiao.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jiao, L., Kang, C., Dong, S. et al. An attention-based feature pyramid network for single-stage small object detection. Multimed Tools Appl 82, 18529–18544 (2023). https://doi.org/10.1007/s11042-022-14159-2

Download citation

Received: 01 March 2022
Revised: 08 July 2022
Accepted: 27 October 2022
Published: 18 November 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s11042-022-14159-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An attention-based feature pyramid network for single-stage small object detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Hierarchical Focused Feature Pyramid Network for Small Object Detection

DFE-Net: detail feature extraction network for small object detection

Small Object Detection Using Deep Feature Pyramid Networks

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now