A context- and level-aware feature pyramid network for object detection with attention mechanism

Yang, Hao; Zhang, Yi

doi:10.1007/s00371-022-02758-x

A context- and level-aware feature pyramid network for object detection with attention mechanism

Original article
Published: 18 January 2023

Volume 39, pages 6711–6722, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Hao Yang¹ &
Yi Zhang¹

599 Accesses
1 Altmetric
Explore all metrics

Abstract

An object detection task includes classification and localization, which require large receptive field and high-resolution input, respectively. How to strike a balance between the two conflicting needs remains a difficult problem in this field. Fortunately, feature pyramid network (FPN) realizes the fusion of low-level and high-level features, which alleviates this dilemma to some extent. However, existing FPN-based networks overlooked the importance of features of different levels during fusion process. Their simple fusion strategies can easily cause overwritten of important information, leading to serious aliasing effect. In this paper, we propose an improved object detector based on context- and level-aware feature pyramid networks. Experiments have been conducted on mainstream datasets to validate the effectiveness of our network, where it exhibits superior performances than other state-of-the-art works.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-scale Attention-Based Feature Pyramid Networks for Object Detection

Learning Discriminated Features Based on Feature Pyramid Networks and Attention for Multi-scale Object Detection

Article 26 August 2022

A feature pyramid network with adaptive fusion strategy and enhanced semantic information

Article 07 June 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The raw/processed data required to reproduce these findings will be shared once this paper has been accepted.

References

Gupta, A.K., Seal, A., Prasad, M., Khanna, P.: Salient object detection techniques in computer vision—a survey. Entropy 22(10), 1174 (2020)
Article MathSciNet Google Scholar
Zhang, W., Du, Y., Chen, Z., Deng, J., Liu, P.: Robust adaptive learning with Siamese network architecture for visual tracking. Vis. Comput. 37(5), 881–894 (2021)
Article Google Scholar
Gupta, A.K., Seal, A., Khanna, P., Krejcar, O., Yazidi, A.: AWKs: adaptive, weighted k-means-based superpixels for improved saliency detection. Pattern Anal. Appl. 24(2), 625–639 (2021)
Article Google Scholar
Zhang, J., Liu, Y., Guo, C., Zhan, J.: Optimized segmentation with image inpainting for semantic mapping in dynamic scenes. Appl. Intell. (2022). https://doi.org/10.1007/s10489-022-03487-3
Article Google Scholar
Wang, J., Yu, J., He, Z.: ARFP: a novel adaptive recursive feature pyramid for object detection in aerial images. Appl. Intell. 52, 12844–12859 (2022). https://doi.org/10.1007/s10489-021-03147-y
Article Google Scholar
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28, 91–99 (2015)
Google Scholar
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019)
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018)
Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., Lin, D.: Libra R-CNN: towards balanced learning for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 821–830 (2019)
Guo, C., Fan, B., Zhang, Q., Xiang, S., Pan, C.: AugFPN: improving multi-scale feature learning for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12595–12604 (2020)
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp. 379–387 (2016)
Sun, P., Zhang, R., Jiang, Y., Kong, T., Xu, C., Zhan, W., Tomizuka, M., Li, L., Yuan, Z., Wang, C., et al.: Sparse R-CNN: end-to-end object detection with learnable proposals. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14454–14463 (2021)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: Single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37 (2016). Springer
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Kong, T., Sun, F., Liu, H., Jiang, Y., Li, L., Shi, J.: Foveabox: beyound anchor-based object detection. IEEE Trans. Image Process. 29, 7389–7398 (2020)
Article MATH Google Scholar
Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 734–750 (2018)
Yang, Z., Liu, S., Hu, H., Wang, L., Lin, S.: Reppoints: point set representation for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9657–9666 (2019)
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Article Google Scholar
Chen, Q., Wang, Y., Yang, T., Zhang, X., Cheng, J., Sun, J.: You only look one-level feature. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13039–13048 (2021)
Chen, Z., Huang, S., Tao, D.: Context refinement for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 71–86 (2018)
Li, J., Wei, Y., Liang, X., Dong, J., Xu, T., Feng, J., Yan, S.: Attentive contexts for object detection. IEEE Trans. Multimed. 19(5), 944–954 (2016)
Article Google Scholar
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018). https://doi.org/10.1109/CVPR.2018.00813
Gupta, A.K., Seal, A., Khanna, P., Yazidi, A., Krejcar, O.: Gated contextual features for salient object detection. IEEE Trans. Instrum. Meas. 70, 1–13 (2021). https://doi.org/10.1109/TIM.2021.3064423
Article Google Scholar
Ghiasi, G., Lin, T.-Y., Le, Q.V.: NAS-FPN: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7036–7045 (2019)
Tan, M., Pang, R., Le, Q.V.: Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)
Gupta, A.K., Seal, A., Khanna, P., Herrera-Viedma, E., Krejcar, O.: ALMNet: adjacent layer driven multiscale features for salient object detection. IEEE Trans. Instrum. Meas. 70, 1–14 (2021). https://doi.org/10.1109/TIM.2021.3108503
Article Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Zhu, Y., Zhao, C., Guo, H., Wang, J., Zhao, X., Lu, H.: Attention CoupleNet: fully convolutional attention coupling network for object detection. IEEE Trans. Image Process. 28(1), 113–126 (2018)
Article MathSciNet Google Scholar
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229. Springer (2020)
Park, H., Paik, J.: Pyramid attention upsampling module for object detection. IEEE Access 10, 38742–38749 (2022). https://doi.org/10.1109/ACCESS.2022.3166928
Article Google Scholar
Jing, Y., Lin, L., Li, X., Li, T., Shen, H.: An attention mechanism based convolutional network for satellite precipitation downscaling over China. J. Hydrol. 613, 128388 (2022). https://doi.org/10.1016/j.jhydrol.2022.128388
Article Google Scholar
Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Xu, J., et al.: MMDetection: Open MMLab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Li, Y., Zhou, S., Chen, H.: Attention-based fusion factor in FPN for object detection. Appl. Intell. 52, 1–10 (2022)
Article Google Scholar
Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp. 379–387 (2016)
Leng, J., Liu, Y.: Context augmentation for object detection. Appl. Intell. 52(3), 2621–2633 (2022)
Article Google Scholar
Cao, J., Pang, Y., Zhao, S., Li, X.: High-level semantic networks for multi-scale object detection. IEEE Trans. Circuits Syst. Video Technol. 30(10), 3372–3386 (2019)
Article Google Scholar
Wang, C., Zhong, C.: Adaptive feature pyramid networks for object detection. IEEE Access 9, 107024–107032 (2021). https://doi.org/10.1109/ACCESS.2021.3100369
Article Google Scholar
Chen, X., Li, H., Wu, Q., Ngan, K.N., Xu, L.: High-quality R-CNN object detection using multi-path detection calibration network. IEEE Trans. Circuits Syst. Video Technol. 31(2), 715–727 (2020)
Article Google Scholar
Xie, J., Pang, Y., Nie, J., Cao, J., Han, J.: Latent feature pyramid network for object detection. IEEE Trans. Multimed. (2022). https://doi.org/10.1109/TMM.2022.3143707
Luo, Y., Cao, X., Zhang, J., Guo, J., Shen, H., Wang, T., Feng, Q.: CE-FPN: enhancing channel information for object detection. Multimed. Tools Appl. 81, 1–20 (2022)
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science, Sichuan University, Chengdu, 610041, China
Hao Yang & Yi Zhang

Authors

Hao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yi Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Zhang.

Ethics declarations

Conflict of interest

We declare that we have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, H., Zhang, Y. A context- and level-aware feature pyramid network for object detection with attention mechanism. Vis Comput 39, 6711–6722 (2023). https://doi.org/10.1007/s00371-022-02758-x

Download citation

Accepted: 01 December 2022
Published: 18 January 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s00371-022-02758-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A context- and level-aware feature pyramid network for object detection with attention mechanism

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-scale Attention-Based Feature Pyramid Networks for Object Detection

Learning Discriminated Features Based on Feature Pyramid Networks and Attention for Multi-scale Object Detection

A feature pyramid network with adaptive fusion strategy and enhanced semantic information

Explore related subjects

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now