Multi-scale global context feature pyramid network for object detector

Li, Yunhao; Shao, Mingwen; Fan, Bingbing; Zhang, Wei

doi:10.1007/s11760-021-02010-4

Multi-scale global context feature pyramid network for object detector

Original Paper
Published: 03 September 2021

Volume 16, pages 705–713, (2022)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Yunhao Li¹,
Mingwen Shao ORCID: orcid.org/0000-0001-7323-5896¹,
Bingbing Fan¹ &
…
Wei Zhang¹

655 Accesses
1 Altmetric
Explore all metrics

Abstract

In order to capture more contextual information, various attention mechanisms are applied to object detectors. However, the spatial interaction in the commonly used attention mechanisms is single scale, and it cannot capture the context information of the objects from the feature maps of different scales, which will lead to the underutilization of the context information. In addition, since the predicted bounding box does not completely fit the shape and pose of the object, it has room for further improvement in the performance. In this paper, we propose a multi-scale global context feature pyramid network to obtain a feature pyramid with richer context information, which is a two-layer lightweight neck structure. Moreover, we extend the regression branch by adding an additional prediction head to predict the corner offsets of the bounding boxes to further refine the bounding boxes, which can effectively improve the accuracy of the predicted bounding boxes. Extensive experiments are conducted on the MS COCO 2017 detection datasets. Without bells and whistles, the proposed methods show an average 2% improvement over the RetinaNet baseline.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A context- and level-aware feature pyramid network for object detection with attention mechanism

Article 18 January 2023

Global context aware RCNN for object detection

Article 10 March 2021

Learning Discriminated Features Based on Feature Pyramid Networks and Attention for Multi-scale Object Detection

Article 26 August 2022

References

Bodla, N., Singh, B., Chellappa, R., Davis, L.S.: Soft-NMS—improving object detection with one line of code. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5562–5570 (2017)
Cai, Z., Vasconcelos, N.: Cascade r-cnn: High quality object detection and instance segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1 (2019)
Cao, Y., Xu, J., Lin, S., Wei, F., Hu, H.: GCNET: non-local networks meet squeeze-excitation networks and beyond. In: 2019 IEEE International Conference on Computer Vision Workshop (ICCVW), pp. 0–0 (2019)
Chen, K., Li, J., Lin, W., See, J., Wang, J., Duan, L., Chen, Z., He, C., Zou, J.: Towards accurate one-stage object detection with ap-loss. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5119–5127 (2019)
Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Xu, J., Zhang, Z., Cheng, D., Zhu, C., Cheng, T., Zhao, Q., Li, B., Lu, X., Zhu, R., Wu, Y., Dai, J., Wang, J., Shi, J., Ouyang, W., Loy, C.C., Lin, D.: Mmdetection: open mmlab detection toolbox and benchmark (2019). arXiv:1906.07155
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 764–773 (2017)
Ghiasi, G., Lin, T.Y., Le, Q.V.: Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7036–7045 (2019)
Girshick, R.: Fast r-cnn. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR’4 Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 386–397 (2020)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications (2017). arXiv:1704.04861
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Hurtík, P., Molek, V., Hula, J., Vajgl, M., Vlasínek, P., Nejezchleba, T.: Poly-yolo: higher speed, more precise detection and instance segmentation for yolov3 (2020). arXiv:2005.13243
Kim, K., Lee, H.S.: Probabilistic anchor assignment with IOU prediction for object detection. ECCV 25, 355–371 (2020)
Google Scholar
Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. Int. J. Comput. Vis. 128(3), 642–656 (2020)
Article Google Scholar
Li, H., Wu, Z., Zhu, C., Xiong, C., Socher, R., Davis, L.S.: Learning from noisy anchors for one-stage object detection. In: 2020 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10588–10597 (2020)
Li, Y., Chen, Y., Wang, N., Zhang, Z.X.: Scale-aware trident networks for object detection. In: 2019 IEEE International Conference on Computer Vision (ICCV), pp. 6054–6063 (2019)
Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944 (2017)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2999–3007 (2017)
Lin, T.Y., Maire, M., Belongie, S.J., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European Conference on Computer Vision, pp. 740–755 (2014)
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. In: 14th European Conference on Computer Vision, ECCV 2016, pp. 21–37 (2016)
Lu, X., Li, B., Yue, Y., Li, Q., Yan, J.: GRID R-CNN. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7363–7372 (2019)
Najibi, M., Singh, B., Davis, L.: Autofocus: Efficient multi-scale inference. In: 2019 IEEE International Conference on Computer Vision (ICCV), pp. 9745–9755 (2019)
Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., Lin, D.: LIBRA R-CNN: Towards balanced learning for object detection. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 821–830 (2019)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525 (2017)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement (2018). arXiv:1804.02767
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Article Google Scholar
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: A metric and a loss for bounding box regression. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 658–666 (2019)
Samet, N., Hicsonmez, S., Akbas, E.: Reducing label noise in anchor-free object detection (2020). arXiv:2008.01167
Singh, B., Davis, L.S.: An analysis of scale invariance in object detection—SNIP. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3578–3587 (2018)
Tan, M., Pang, R., Le, Q.V.: Efficientdet: Scalable and efficient object detection. In: 2020 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10781–10790 (2020)
Tian, Z., Shen, C., Chen, H., He, T.: Fcos: Fully convolutional one-stage object detection. In: 2019 IEEE International Conference on Computer Vision (ICCV), pp. 9627–9636 (2019)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Wu, Y., He, K.: Group normalization. In: Vision and Pattern Recognition (2018). arXiv: Computer
Xie, S., Girshick, R., Dollar, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5987–5995 (2017)
Yang, Z., Liu, S., Hu, H., Wang, L., Lin, S.: Reppoints: point set representation for object detection. In: 2019 IEEE International Conference on Computer Vision (ICCV), pp. 9657–9666 (2019)
Zhang, X., Wan, F., Liu, C., Ji, R., Ye, Q.: Freeanchor: learning to match anchors for visual object detection. Adva. Neural Inf. Process. Syst. 32, 147–155 (2019)
Google Scholar
Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., Ren, D.: Distance-IOU loss: faster and better learning for bounding box regression. Proc. AAAI Conf. Artif. Intellig. 34(7), 12993–13000 (2020)
Google Scholar
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points (2019). arXiv:1904.07850
Zhou, X., Zhuo, J., Krahenbuhl, P.: Bottom-up object detection by grouping extreme and center points. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 850–859 (2019)
Zhu, C., He, Y., Savvides, M.: Feature selective anchor-free module for single-shot object detection. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 840–849 (2019)

Download references

Acknowledgements

This work was supported by the grants from the National Natural Science Foundation of China (Nos.61673396, 61976245).

Author information

Authors and Affiliations

College of Computer Science and Technology, China University of Petroleum, City Qingdao, China
Yunhao Li, Mingwen Shao, Bingbing Fan & Wei Zhang

Authors

Yunhao Li
View author publications
You can also search for this author inPubMed Google Scholar
Mingwen Shao
View author publications
You can also search for this author inPubMed Google Scholar
Bingbing Fan
View author publications
You can also search for this author inPubMed Google Scholar
Wei Zhang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Mingwen Shao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Y., Shao, M., Fan, B. et al. Multi-scale global context feature pyramid network for object detector. SIViP 16, 705–713 (2022). https://doi.org/10.1007/s11760-021-02010-4

Download citation

Received: 24 March 2021
Revised: 13 July 2021
Accepted: 10 August 2021
Published: 03 September 2021
Issue Date: April 2022
DOI: https://doi.org/10.1007/s11760-021-02010-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-scale global context feature pyramid network for object detector

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A context- and level-aware feature pyramid network for object detection with attention mechanism

Global context aware RCNN for object detection

Learning Discriminated Features Based on Feature Pyramid Networks and Attention for Multi-scale Object Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now