Abstract
An object detection task includes classification and localization, which require large receptive field and high-resolution input, respectively. How to strike a balance between the two conflicting needs remains a difficult problem in this field. Fortunately, feature pyramid network (FPN) realizes the fusion of low-level and high-level features, which alleviates this dilemma to some extent. However, existing FPN-based networks overlooked the importance of features of different levels during fusion process. Their simple fusion strategies can easily cause overwritten of important information, leading to serious aliasing effect. In this paper, we propose an improved object detector based on context- and level-aware feature pyramid networks. Experiments have been conducted on mainstream datasets to validate the effectiveness of our network, where it exhibits superior performances than other state-of-the-art works.









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The raw/processed data required to reproduce these findings will be shared once this paper has been accepted.
References
Gupta, A.K., Seal, A., Prasad, M., Khanna, P.: Salient object detection techniques in computer vision—a survey. Entropy 22(10), 1174 (2020)
Zhang, W., Du, Y., Chen, Z., Deng, J., Liu, P.: Robust adaptive learning with Siamese network architecture for visual tracking. Vis. Comput. 37(5), 881–894 (2021)
Gupta, A.K., Seal, A., Khanna, P., Krejcar, O., Yazidi, A.: AWKs: adaptive, weighted k-means-based superpixels for improved saliency detection. Pattern Anal. Appl. 24(2), 625–639 (2021)
Zhang, J., Liu, Y., Guo, C., Zhan, J.: Optimized segmentation with image inpainting for semantic mapping in dynamic scenes. Appl. Intell. (2022). https://doi.org/10.1007/s10489-022-03487-3
Wang, J., Yu, J., He, Z.: ARFP: a novel adaptive recursive feature pyramid for object detection in aerial images. Appl. Intell. 52, 12844–12859 (2022). https://doi.org/10.1007/s10489-021-03147-y
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28, 91–99 (2015)
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019)
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018)
Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., Lin, D.: Libra R-CNN: towards balanced learning for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 821–830 (2019)
Guo, C., Fan, B., Zhang, Q., Xiang, S., Pan, C.: AugFPN: improving multi-scale feature learning for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12595–12604 (2020)
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp. 379–387 (2016)
Sun, P., Zhang, R., Jiang, Y., Kong, T., Xu, C., Zhan, W., Tomizuka, M., Li, L., Yuan, Z., Wang, C., et al.: Sparse R-CNN: end-to-end object detection with learnable proposals. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14454–14463 (2021)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: Single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37 (2016). Springer
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Kong, T., Sun, F., Liu, H., Jiang, Y., Li, L., Shi, J.: Foveabox: beyound anchor-based object detection. IEEE Trans. Image Process. 29, 7389–7398 (2020)
Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 734–750 (2018)
Yang, Z., Liu, S., Hu, H., Wang, L., Lin, S.: Reppoints: point set representation for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9657–9666 (2019)
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Chen, Q., Wang, Y., Yang, T., Zhang, X., Cheng, J., Sun, J.: You only look one-level feature. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13039–13048 (2021)
Chen, Z., Huang, S., Tao, D.: Context refinement for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 71–86 (2018)
Li, J., Wei, Y., Liang, X., Dong, J., Xu, T., Feng, J., Yan, S.: Attentive contexts for object detection. IEEE Trans. Multimed. 19(5), 944–954 (2016)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018). https://doi.org/10.1109/CVPR.2018.00813
Gupta, A.K., Seal, A., Khanna, P., Yazidi, A., Krejcar, O.: Gated contextual features for salient object detection. IEEE Trans. Instrum. Meas. 70, 1–13 (2021). https://doi.org/10.1109/TIM.2021.3064423
Ghiasi, G., Lin, T.-Y., Le, Q.V.: NAS-FPN: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7036–7045 (2019)
Tan, M., Pang, R., Le, Q.V.: Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)
Gupta, A.K., Seal, A., Khanna, P., Herrera-Viedma, E., Krejcar, O.: ALMNet: adjacent layer driven multiscale features for salient object detection. IEEE Trans. Instrum. Meas. 70, 1–14 (2021). https://doi.org/10.1109/TIM.2021.3108503
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Zhu, Y., Zhao, C., Guo, H., Wang, J., Zhao, X., Lu, H.: Attention CoupleNet: fully convolutional attention coupling network for object detection. IEEE Trans. Image Process. 28(1), 113–126 (2018)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229. Springer (2020)
Park, H., Paik, J.: Pyramid attention upsampling module for object detection. IEEE Access 10, 38742–38749 (2022). https://doi.org/10.1109/ACCESS.2022.3166928
Jing, Y., Lin, L., Li, X., Li, T., Shen, H.: An attention mechanism based convolutional network for satellite precipitation downscaling over China. J. Hydrol. 613, 128388 (2022). https://doi.org/10.1016/j.jhydrol.2022.128388
Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Xu, J., et al.: MMDetection: Open MMLab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Li, Y., Zhou, S., Chen, H.: Attention-based fusion factor in FPN for object detection. Appl. Intell. 52, 1–10 (2022)
Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp. 379–387 (2016)
Leng, J., Liu, Y.: Context augmentation for object detection. Appl. Intell. 52(3), 2621–2633 (2022)
Cao, J., Pang, Y., Zhao, S., Li, X.: High-level semantic networks for multi-scale object detection. IEEE Trans. Circuits Syst. Video Technol. 30(10), 3372–3386 (2019)
Wang, C., Zhong, C.: Adaptive feature pyramid networks for object detection. IEEE Access 9, 107024–107032 (2021). https://doi.org/10.1109/ACCESS.2021.3100369
Chen, X., Li, H., Wu, Q., Ngan, K.N., Xu, L.: High-quality R-CNN object detection using multi-path detection calibration network. IEEE Trans. Circuits Syst. Video Technol. 31(2), 715–727 (2020)
Xie, J., Pang, Y., Nie, J., Cao, J., Han, J.: Latent feature pyramid network for object detection. IEEE Trans. Multimed. (2022). https://doi.org/10.1109/TMM.2022.3143707
Luo, Y., Cao, X., Zhang, J., Guo, J., Shen, H., Wang, T., Feng, Q.: CE-FPN: enhancing channel information for object detection. Multimed. Tools Appl. 81, 1–20 (2022)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We declare that we have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, H., Zhang, Y. A context- and level-aware feature pyramid network for object detection with attention mechanism. Vis Comput 39, 6711–6722 (2023). https://doi.org/10.1007/s00371-022-02758-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-022-02758-x