Abstract
Feature pyramid network (FPN) improves object detection performance by means of top-down multilevel feature fusion. However, the current FPN-based methods have not effectively utilized the interlayer features to suppress the aliasing effects in the feature downward fusion process. We propose an interlayer attention feature pyramid network that attempts to integrate attention gates into FPN through interlayer enhancement to establish the correlation between context and model, thereby highlighting the salient region of each layer and suppressing the aliasing effects. Moreover, in order to avoid feature dilution in the feature downward fusion process and inability of multilayer features to utilize each other, simplified non-local algorithm is used in the multilayer fusion module to fuse and enhance the multiscale features. A comprehensive analysis of MS COCO and PASCAL VOC benchmarks demonstrate that our network achieves precise object localization and also outperforms current FPN-based object detection algorithms.





Similar content being viewed by others
References
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)
Cao, J., Chen, Q., Guo, J., et al.: Attention-guided context feature pyramid network for object detection (2020). arXiv preprint arXiv:2005.11475
Cao, Y., Xu, J., Lin, S., et al.: GCNET: non-local networks meet squeeze-excitation networks and beyond. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW) (2020)
Chalavadi, V., Jeripothula, P., Datla, R., et al.: mSODANet: a network for multi-scale object detection in aerial images using hierarchical dilated convolutions. Pattern Recogn. 126, 108548 (2022)
Chen, K., Wang, J., Pang, J., et al.: MMDetection: open MMLab detection toolbox and benchmark (2019). arXiv preprint arXiv:1906.07155
Chen, Y., Zhu, X., Li, Y., et al.: Enhanced semantic feature pyramid network for small object detection. Signal Process. Image Commun. 113, 11691 (2023)
Everingham, M., Gool, L.V., Williams, C., et al.: The Pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 2, 88 (2010)
Fan, Z., Liu, Q.: Adaptive region-aware feature enhancement for object detection. Pattern Recogn. 124, 108437 (2022)
Farhadi, A., Redmon, J.: YOLO9000: better, faster, stronger (2016)
Fei, W., Jiang, M., Chen, Q., et al.: Residual attention network for image classification. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Feng, M., Lu, H., Ding, E.: Attentive feedback network for boundary-aware salient object detection. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Gidaris, S., Komodakis, N.: Object detection via a multi-region and semantic segmentation-aware CNN model. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1134–1142 (2015)
Girshick, R., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation (2013)
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
He, K., Gkioxari, G., Dollar, P., et al.: Mask R-CNN. In: International Conference on Computer Vision (2017)
Hou, Q., Cheng, M.M., Hu, X., et al.: Deeply supervised salient object detection with short connections. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Hu, Y., Lu, M., Lu, X.: Driving behaviour recognition from still images by using multi-stream fusion CNN. Mach. Vis. Appl. 30, 851–865 (2019)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Lee, G., Tai, Y.W., Kim, J.: Deep saliency with encoded low level distance map and high level features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 660–668 (2016)
Li, Z., Peng, C., Yu, G., et al.: DetNet: a backbone network for object detection (2018). arXiv preprint arXiv:1804.06215
Li, Z., Lang, C., Liew, J.H., et al.: Cross-layer feature pyramid network for salient object detection. IEEE Trans. Image Process. 30, 4587–4598 (2021)
Lin, T.Y., Maire, M., Belongie, S., et al.: Microsoft COCO: common objects in context (2014)
Lin, T.Y., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Lin, T.Y., Goyal, P., Girshick, R., et al.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Liu, N., Han, J., Yang, M.H.: PiCANet: learning pixel-wise contextual attention for saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3089–3098 (2018)
Liu, S., Qi, L., Qin, H., et al.: Path aggregation network for instance segmentation. IEEE (2018)
Liu, W., Anguelov, D., Erhan, D., et al.: SSD: single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 Oct, 2016, Proceedings, Part I, pp. 21–37. Springer (2016)
Liu, Y., Wang, Y., Wang, S., et al.: CBNet: a novel composite backbone network architecture for object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 11653–11660 (2020)
Luo, Y., Cao, X., Zhang, J., et al.: CE-FPN: enhancing channel information for object detection. Multimed. Tools Appl. 1–20 (2022)
Oktay, O., Schlemper, J., Folgoc, L.L., et al.: Attention U-Net: learning where to look for the pancreas (2018). arXiv preprint arXiv:1804.03999
Pang, J., Chen, K., Shi, J., et al.: Libra R-CNN: towards balanced learning for object detection. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement (2018). arXiv preprint arXiv:1804.02767
Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28 (2015)
Sermanet, P., Eigen, D., Zhang, X., et al.: OverFeat: integrated recognition, localization and detection using convolutional networks. In: International Conference on Learning Representations (2013)
Tian, Z., Shen, C., Chen, H., et al.: FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019)
Wang, S., Ge, H., Yang, J., et al.: Reciprocal kernel-based weighted collaborative-competitive representation for robust face recognition. Mach. Vis. Appl. 32, 1–12 (2021)
Wang, X., Girshick, R., Gupta, A., et al.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Wu, Z., Su, L., Huang, Q.: Cascaded partial decoder for fast and accurate salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3907–3916 (2019)
Xiao, M., Wang, W., Shen, X., et al.: Research on defect detection method of powder metallurgy gear based on machine vision. Mach. Vis. Appl. 32, 1–13 (2021)
Xu, Y., Wen, G., Hu, Y., et al.: Multiple attentional pyramid networks for Chinese herbal recognition. Pattern Recogn. 110, 107558 (2021)
Yang, H., Zhang, Y.: A context-and level-aware feature pyramid network for object detection with attention mechanism. Vis. Comput. 1–12 (2023)
Zaidi, S.S.A., Ansari, M.S., Aslam, A., et al.: A survey of modern deep learning based object detection models. Digit. Signal Process. 126, 103514 (2022)
Zhao, T., Wu, X.: Pyramid feature attention network for saliency detection. arXiv e-prints (2019)
Zhao, X., Pang, Y., Zhang, L., et al.: Suppress and balance: a simple gated network for salient object detection. In: European Conference on Computer Vision, pp. 35–51. Springer (2020)
Zhu, Y., Zhao, C., Guo, H., et al.: Attention CoupleNet: fully convolutional attention coupling network for object detection. IEEE Trans. Image Process. 28(1), 113–126 (2018)
Acknowledgements
This research was supported by the National Natural Science Foundation of China (No: 61871124 and 61876037), with funding from the China Ship Development and Design Center (No. JJ-2021-702-05), and the National Key Laboratory of Science and Technology on Underwater Acoustic Antagonizing (No: 2021-JCJQ-LB-033-09).
Author information
Authors and Affiliations
Contributions
Zhicheng Li completed the experiment and wrote the manuscript, Chao Yang completed the figure production and revised the manuscript, Lonyu Jiang reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, Z., Yang, C. & Jiang, L. IAFPN: interlayer enhancement and multilayer fusion network for object detection. Machine Vision and Applications 35, 93 (2024). https://doi.org/10.1007/s00138-024-01577-5
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-024-01577-5