IAFPN: interlayer enhancement and multilayer fusion network for object detection

Li, Zhicheng; Yang, Chao; Jiang, Longyu

doi:10.1007/s00138-024-01577-5

IAFPN: interlayer enhancement and multilayer fusion network for object detection

Research
Published: 08 July 2024

Volume 35, article number 93, (2024)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Zhicheng Li¹,
Chao Yang¹ &
Longyu Jiang¹

237 Accesses
Explore all metrics

Abstract

Feature pyramid network (FPN) improves object detection performance by means of top-down multilevel feature fusion. However, the current FPN-based methods have not effectively utilized the interlayer features to suppress the aliasing effects in the feature downward fusion process. We propose an interlayer attention feature pyramid network that attempts to integrate attention gates into FPN through interlayer enhancement to establish the correlation between context and model, thereby highlighting the salient region of each layer and suppressing the aliasing effects. Moreover, in order to avoid feature dilution in the feature downward fusion process and inability of multilayer features to utilize each other, simplified non-local algorithm is used in the multilayer fusion module to fuse and enhance the multiscale features. A comprehensive analysis of MS COCO and PASCAL VOC benchmarks demonstrate that our network achieves precise object localization and also outperforms current FPN-based object detection algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cross-Layer Feature Attention Module for Multi-scale Object Detection

MFANet: Multi-scale feature fusion network with attention mechanism

Article 11 May 2022

Stacked Pyramid Attention Network for Object Detection

Article 07 April 2021

References

Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)
Cao, J., Chen, Q., Guo, J., et al.: Attention-guided context feature pyramid network for object detection (2020). arXiv preprint arXiv:2005.11475
Cao, Y., Xu, J., Lin, S., et al.: GCNET: non-local networks meet squeeze-excitation networks and beyond. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW) (2020)
Chalavadi, V., Jeripothula, P., Datla, R., et al.: mSODANet: a network for multi-scale object detection in aerial images using hierarchical dilated convolutions. Pattern Recogn. 126, 108548 (2022)
Article Google Scholar
Chen, K., Wang, J., Pang, J., et al.: MMDetection: open MMLab detection toolbox and benchmark (2019). arXiv preprint arXiv:1906.07155
Chen, Y., Zhu, X., Li, Y., et al.: Enhanced semantic feature pyramid network for small object detection. Signal Process. Image Commun. 113, 11691 (2023)
Article Google Scholar
Everingham, M., Gool, L.V., Williams, C., et al.: The Pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 2, 88 (2010)
Google Scholar
Fan, Z., Liu, Q.: Adaptive region-aware feature enhancement for object detection. Pattern Recogn. 124, 108437 (2022)
Article Google Scholar
Farhadi, A., Redmon, J.: YOLO9000: better, faster, stronger (2016)
Fei, W., Jiang, M., Chen, Q., et al.: Residual attention network for image classification. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Feng, M., Lu, H., Ding, E.: Attentive feedback network for boundary-aware salient object detection. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Gidaris, S., Komodakis, N.: Object detection via a multi-region and semantic segmentation-aware CNN model. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1134–1142 (2015)
Girshick, R., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation (2013)
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
He, K., Gkioxari, G., Dollar, P., et al.: Mask R-CNN. In: International Conference on Computer Vision (2017)
Hou, Q., Cheng, M.M., Hu, X., et al.: Deeply supervised salient object detection with short connections. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Hu, Y., Lu, M., Lu, X.: Driving behaviour recognition from still images by using multi-stream fusion CNN. Mach. Vis. Appl. 30, 851–865 (2019)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Article Google Scholar
Lee, G., Tai, Y.W., Kim, J.: Deep saliency with encoded low level distance map and high level features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 660–668 (2016)
Li, Z., Peng, C., Yu, G., et al.: DetNet: a backbone network for object detection (2018). arXiv preprint arXiv:1804.06215
Li, Z., Lang, C., Liew, J.H., et al.: Cross-layer feature pyramid network for salient object detection. IEEE Trans. Image Process. 30, 4587–4598 (2021)
Article Google Scholar
Lin, T.Y., Maire, M., Belongie, S., et al.: Microsoft COCO: common objects in context (2014)
Lin, T.Y., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Lin, T.Y., Goyal, P., Girshick, R., et al.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Liu, N., Han, J., Yang, M.H.: PiCANet: learning pixel-wise contextual attention for saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3089–3098 (2018)
Liu, S., Qi, L., Qin, H., et al.: Path aggregation network for instance segmentation. IEEE (2018)
Liu, W., Anguelov, D., Erhan, D., et al.: SSD: single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 Oct, 2016, Proceedings, Part I, pp. 21–37. Springer (2016)
Liu, Y., Wang, Y., Wang, S., et al.: CBNet: a novel composite backbone network architecture for object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 11653–11660 (2020)
Luo, Y., Cao, X., Zhang, J., et al.: CE-FPN: enhancing channel information for object detection. Multimed. Tools Appl. 1–20 (2022)
Oktay, O., Schlemper, J., Folgoc, L.L., et al.: Attention U-Net: learning where to look for the pancreas (2018). arXiv preprint arXiv:1804.03999
Pang, J., Chen, K., Shi, J., et al.: Libra R-CNN: towards balanced learning for object detection. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement (2018). arXiv preprint arXiv:1804.02767
Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28 (2015)
Sermanet, P., Eigen, D., Zhang, X., et al.: OverFeat: integrated recognition, localization and detection using convolutional networks. In: International Conference on Learning Representations (2013)
Tian, Z., Shen, C., Chen, H., et al.: FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019)
Wang, S., Ge, H., Yang, J., et al.: Reciprocal kernel-based weighted collaborative-competitive representation for robust face recognition. Mach. Vis. Appl. 32, 1–12 (2021)
Article Google Scholar
Wang, X., Girshick, R., Gupta, A., et al.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Wu, Z., Su, L., Huang, Q.: Cascaded partial decoder for fast and accurate salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3907–3916 (2019)
Xiao, M., Wang, W., Shen, X., et al.: Research on defect detection method of powder metallurgy gear based on machine vision. Mach. Vis. Appl. 32, 1–13 (2021)
Xu, Y., Wen, G., Hu, Y., et al.: Multiple attentional pyramid networks for Chinese herbal recognition. Pattern Recogn. 110, 107558 (2021)
Article Google Scholar
Yang, H., Zhang, Y.: A context-and level-aware feature pyramid network for object detection with attention mechanism. Vis. Comput. 1–12 (2023)
Zaidi, S.S.A., Ansari, M.S., Aslam, A., et al.: A survey of modern deep learning based object detection models. Digit. Signal Process. 126, 103514 (2022)
Article Google Scholar
Zhao, T., Wu, X.: Pyramid feature attention network for saliency detection. arXiv e-prints (2019)
Zhao, X., Pang, Y., Zhang, L., et al.: Suppress and balance: a simple gated network for salient object detection. In: European Conference on Computer Vision, pp. 35–51. Springer (2020)
Zhu, Y., Zhao, C., Guo, H., et al.: Attention CoupleNet: fully convolutional attention coupling network for object detection. IEEE Trans. Image Process. 28(1), 113–126 (2018)
Article MathSciNet Google Scholar

Download references

Acknowledgements

This research was supported by the National Natural Science Foundation of China (No: 61871124 and 61876037), with funding from the China Ship Development and Design Center (No. JJ-2021-702-05), and the National Key Laboratory of Science and Technology on Underwater Acoustic Antagonizing (No: 2021-JCJQ-LB-033-09).

Author information

Authors and Affiliations

Laboratory of Marine Information Science and Technology, School of Computer Science and Engineering, Southeast University, Nanjing, 210096, Jiangsu, People’s Republic of China
Zhicheng Li, Chao Yang & Longyu Jiang

Authors

Zhicheng Li
View author publications
You can also search for this author inPubMed Google Scholar
Chao Yang
View author publications
You can also search for this author inPubMed Google Scholar
Longyu Jiang
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Zhicheng Li completed the experiment and wrote the manuscript, Chao Yang completed the figure production and revised the manuscript, Lonyu Jiang reviewed the manuscript.

Corresponding author

Correspondence to Longyu Jiang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, Z., Yang, C. & Jiang, L. IAFPN: interlayer enhancement and multilayer fusion network for object detection. Machine Vision and Applications 35, 93 (2024). https://doi.org/10.1007/s00138-024-01577-5

Download citation

Received: 07 September 2023
Revised: 25 April 2024
Accepted: 25 June 2024
Published: 08 July 2024
DOI: https://doi.org/10.1007/s00138-024-01577-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

IAFPN: interlayer enhancement and multilayer fusion network for object detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Cross-Layer Feature Attention Module for Multi-scale Object Detection

MFANet: Multi-scale feature fusion network with attention mechanism

Stacked Pyramid Attention Network for Object Detection

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now