DAF-Net: dense attention feature pyramid network for multiscale object detection

Achinek, Divine Njengwie; Shehu, Ibrahim Shehi; Athuman, Athuman Mohamed; Fu, Xianping

doi:10.1007/s13735-024-00323-x

DAF-Net: dense attention feature pyramid network for multiscale object detection

Regular Paper
Published: 08 April 2024

Volume 13, article number 18, (2024)
Cite this article

International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Divine Njengwie Achinek¹,
Ibrahim Shehi Shehu¹,
Athuman Mohamed Athuman¹ &
…
Xianping Fu^1,2

124 Accesses
Explore all metrics

Abstract

In recent years, object detection has become one of the most prominent components in computer vision. State-of-the-art object detectors now employ convolutional neural networks (CNNs) techniques alongside other deep neural network techniques to improve detection performance and accuracy. Most of the recent object detectors employ feature pyramid network (FPN) and their variants while others use combinations of attention mechanisms to achieve better performance. The open question is object detectors inconsistency between the lower layer features, their resolution receptive field and semantic information with the upper layers features in detecting objects. Although some researchers have attempted to address this issue, we exploit ideas surrounding the field and proposed a more prominent architecture called dense attention feature pyramid network (DAF-Net) for multiscale object detection. DAF-Net consists of two attention models, the spatial attention model and channel attention model. Different from other attention models, we proposed lightweight attention models which are fully data-driven then implemented a dense connected attention FPN to reduce the model’s complexity and resolve the learning of redundant feature maps. First, we developed the two attention models then used only the spatial attention model in the backbone of our network, and finally used both attention models to filter and maintain a steady flow of semantic information from lower layers to improve the model’s accuracy and efficiency. Experimental results on underwater images from the National Natural Science Foundation of China (NSFC) (Underwater Image Dataset, National Natural Science Foundation of China (NSFC). Online, retrieved from http://www.cnurpc.org/index.html), MS COCO dataset, and PASCAL VOC dataset indicate higher accuracy and better detection results using the proposed model compared to the benchmark model YOLOX-Darknet53 (Ge in Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430). Our model achieved 70.2mAP, 48.9 mAP, and 83.9 mAP on (NSFC), MS COCO, and PASCAL VOC datasets, respectively, compared with benchmark model 68.9mAP on (NSFC), 47.7mAP on MS COCO, and 82.4mAP on PASCAL VOC.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 2

Underwater object detection based on enhanced YOLOv4 architecture

Article 24 November 2023

Refined marine object detector with attention-based spatial pyramid pooling networks and bidirectional feature fusion strategy

Article 14 May 2022

Object detection based on an adaptive attention mechanism

Article Open access 09 July 2020

Data availability

The datasets that support the findings of this research are openly available in the National Natural Science Foundation of China (NSFC) at http://www.cnurpc.org/index.html reference number [50], COCO Common Objects in Context at https://cocodataset.org/ reference number [44] and Visual Object Classes Challenge 2012 (VOC2012) at http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html reference number [45]. They are also available upon request and we encourage interested parties to contact Divine Njengwie Achinek at achinekdivine002@dlmu.edu.cn to obtain access to the dataset.

References

Li Y, Zhou S, Chen H (2022) Attention-based fusion factor in FPN for object detection. Appl Intell 1–10.
Cao J, Chen Q, Guo J, Shi R (2020) Attention-guided context feature pyramid network for object detection. arXiv preprint arXiv:2005.11475
Feichtenhofer C, Pinz A, Zisserman A (2017) Detect to track and track to detect. In: IEEE International Conference on Computer Vision 3038–3046
Yuan Y, Xiong Z, Wang Qi (2017) An incremental framework11 for video-based traffic sign detection, tracking, and recognition. IEEE Trans Intell Transp Syst 18(7):1918–1929
Article Google Scholar
Chen K, Tao W (2018) Learning linear regression via single convolutional layer for visual object tracking. IEEE Trans Multimedia 21(1):86–97
Article MathSciNet Google Scholar
Hongwei Hu, Ma Bo, Shen J, Sun H, Shao L, Porikli F (2018) Robust object tracking using manifold regularized convolutional neural networks. IEEE Trans Multimedia 21(2):510–521
Google Scholar
Jiang H, Learned-Miller E (2017) Face detection with the faster r-cnn. In: IEEE International Conference on Automatic Face & Gesture Recognition 650–657. IEEE
Yang S, Luo P, Loy C-C, Tang X (2016) Wider face: A face detection benchmark. In IEEE Conference on Computer Vision and Pattern Recognition 5525–5533.
Yang S, Luo P, Loy C-C, Tang X (2018) Facenessnet: Face detection through deep facial part responses. IEEE Trans Pattern Anal Mach Intell 40(8):1845–1859
Article Google Scholar
Njengwie Achinek D, Shehi Shehu I, Mohamed Athuman A, Fu X (2021) Enhanced single shot multiple detection for real-time object detection in multiple scenes. In: The 5th International Conference on Computer Science and Application Engineering (pp. 1–9).
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117–2125).
Ge Z, Liu S, Wang F, Li Z, Sun J (2021) Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430.
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779–788).
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg C (2016) Ssd: Single shot multibox detector. In European conference on computer vision (pp. 21–37). Springer, Cham.
Wang CY, Yeh IH, Liao HYM (2021) You only learn one representation: Unified network for multiple tasks. arXiv preprint arXiv:2105.04206.
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inf Proc Syst 28.
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision (pp. 2961–2969).
Pramanik A, Pal SK, Maiti J, Mitra P (2021) Granulated RCNN and multi-class deep sort for multi-object detection and tracking. IEEE Trans Emerg Top Comput Intell 6(1):171–181
Article Google Scholar
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8759–8768).
Kim SW, Kook HK, Sun JY, Kang MC, Ko SJ (2018) Parallel feature pyramid network for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV) (pp. 234–250).
Guo C, Fan B, Zhang Q, Xiang S, Pan C (2020) Augfpn: Improving multi-scale feature learning for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12595–12604).
Kong T, Sun F, Tan C, Liu H, Huang W (2018) Deep feature pyramid reconfiguration for object detection. In: Proceedings of the European conference on computer vision (ECCV) (pp. 169–185).
Nie J, Anwer RM, Cholakkal H, Khan FS, Pang Y, Shao L (2019) Enriched feature guided refinement network for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision (pp. 9537–9546).
Ghiasi G, Lin TY, Le QV (2019) Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7036–7045).
Singh B, Davis LS (2018) An analysis of scale invariance in object detection snip. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3578–3587).
Wu Y, Jiang X, Fang Z, Gao Y, Fujita H (2021) Multi-modal 3d object detection by 2d-guided precision anchor proposal and multi-layer fusion. Appl Soft Comput 108:107405
Article Google Scholar
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7132–7141).
Zhu Y, Zhao C, Guo H, Wang J, Zhao X, Lu H (2018) Attention CoupleNet: Fully convolutional attention coupling network for object detection. IEEE Trans Image Process 28(1):113–126
Article MathSciNet Google Scholar
Pirinen A, Sminchisescu C (2018) Deep reinforcement learning of region proposal networks for object detection. In: proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6945–6954).
Sukhbaatar S, Grave E, Lample G, Jegou H, Joulin A (2019) Augmenting self-attention with persistent memory. arXiv preprint arXiv:1907.01470.
Srinivas A, Lin TY, Parmar N, Shlens J, Abbeel P, Vaswani A (2021) Bottleneck transformers for visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 16519–16529).
Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV) (pp. 3–19).
Li W, Liu K, Zhang L, Cheng F (2020) Object detection based on an adaptive attention mechanism. Sci Rep 10(1):1–13
Google Scholar
Cao J, Chen Q, Guo J, Shi R (2020) Attention-guided context feature pyramid network for object detection. arXiv preprint arXiv:2005.11475.
Goyal P, Dollár P, Girshick R, Noordhuis P, Wesolowski L, Kyrola A, He K (2017) Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677.
Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV) (pp. 801–818).
Birodkar V, Lu Z, Li S, Rathod V, Huang J (2021) The surprising impact of mask-head architecture on novel class segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 7015–7025).
Cheng B, Misra I, Schwing AG, Kirillov A, Girdhar R (2022) Masked-attention mask transformer for universal image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1290–1299).
Han C, Zhao Q, Zhang S, Chen Y, Zhang Z, Yuan J (2022) YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception. arXiv preprint arXiv:2208.11434.
Shafiee MJ, Chywl B, Li F, Wong A (2017) Fast YOLO: A fast you only look once system for real-time embedded object detection in video. arXiv preprint arXiv:1709.05943.
Wang CY, Bochkovskiy A, Liao HYM (2022) YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696.
Zheng W, Tang W, Jiang L, Fu CW (2021) SE-SSD: Self-ensembling single-stage object detector from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 14494–14503).
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Article Google Scholar
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision (pp. 740–755). Springer, Cham.
Everingham M, Eslami SM, Van Gool L, Williams CK, Winn J, Zisserman A (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vision 111(1):98–136
Article Google Scholar
Liu S, Huang D, Wang Y (2019) Learning spatial fusion for single-shot object detection. arXiv preprint arXiv:1911.09516.
Tan M, Pang R, Le QV (2020) Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10781–10790).
Huang X, Wang X, Lv W, Bai X, Long X, Deng K, Yoshie O (2021) PP-YOLOv2: A practical object detector. arXiv preprint arXiv:2104.10419.
Glenn-Jocher YFL (2021) 3181. Ultralytics: Github; https://github.com/ultralytics/yolov5/discussions/3181m1.
Underwater Image Dataset, National Natural Science Foundation of China (NSFC). Online, retrieved from http://www.cnurpc.org/index.html.
Priyadarshni D, Kolekar MH (2020) Underwater object detection and tracking. In: Soft Computing: Theories and Applications (pp. 837–846). Springer, Singapore.
Han F, Yao J, Zhu H, Wang C (2020) Underwater image processing and object detection based on deep CNN method. J Sensors
Castrillón M, Déniz O, Hernández D, Lorenzo J (2011) A comparison of face and facial feature detectors based on the Viola-Jones general object detection framework. Mach Vis Appl 22(3):481–494
Google Scholar
Wang CY, Bochkovskiy A, Liao HYM (2023) YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7464–7475).
Glenn J (2023) Ultralytics YOLOv8. https://github.com/ultralytics/ultralytics.

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China Grant 62176037 and 61802043, by the Liaoning Revitalization Talents Program Grant XLYC1908007, by the Foundation of Liaoning Key Research and Development Program Grant 201801728, by the Fundamental Research Funds for the Central Universities Grant 3132016352 and Grant 3132020215, by the Dalian Science and Technology Innovation Fund 2018J12GX037 and 2019J11CY001.

Author information

Authors and Affiliations

Information Science and Technology College, DalianMaritimeUniversity, Dalian, 116026, China
Divine Njengwie Achinek, Ibrahim Shehi Shehu, Athuman Mohamed Athuman & Xianping Fu
Pengcheng Laboratory, Shenzhen, 518055, China
Xianping Fu

Authors

Divine Njengwie Achinek
View author publications
You can also search for this author in PubMed Google Scholar
Ibrahim Shehi Shehu
View author publications
You can also search for this author in PubMed Google Scholar
Athuman Mohamed Athuman
View author publications
You can also search for this author in PubMed Google Scholar
Xianping Fu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Divine Njengwie Achinek and Xianping Fu conceived to present ideas developed the theory and performed the computations. Divine Njengwie Achinek Developed and design the methodology. Divine Njengwie Achinek. Ibrahim Shehi Shehu. and Mohamed Athuman. carried out the experiments and validation with the help of Xianping Fu. Divine Njengwie Achinek. Ibrahim Shehi Shehu. wrote the main manuscript with the help of Mohamed Athuman. Divine Njengwie Achinek. prepared the figures and diagram and all authors reviewed the manuscript.

Corresponding author

Correspondence to Divine Njengwie Achinek.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest with regard to the content and publication of the findings documented in this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Achinek, D.N., Shehu, I.S., Athuman, A.M. et al. DAF-Net: dense attention feature pyramid network for multiscale object detection. Int J Multimed Info Retr 13, 18 (2024). https://doi.org/10.1007/s13735-024-00323-x

Download citation

Received: 17 March 2023
Revised: 01 February 2024
Accepted: 11 February 2024
Published: 08 April 2024
DOI: https://doi.org/10.1007/s13735-024-00323-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DAF-Net: dense attention feature pyramid network for multiscale object detection

Abstract

Access this article

Similar content being viewed by others

Underwater object detection based on enhanced YOLOv4 architecture

Refined marine object detector with attention-based spatial pyramid pooling networks and bidirectional feature fusion strategy

Object detection based on an adaptive attention mechanism

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

DAF-Net: dense attention feature pyramid network for multiscale object detection

Abstract

Access this article

Similar content being viewed by others

Underwater object detection based on enhanced YOLOv4 architecture

Refined marine object detector with attention-based spatial pyramid pooling networks and bidirectional feature fusion strategy

Object detection based on an adaptive attention mechanism

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation