Skip to main content
Log in

Multi-scale global context feature pyramid network for object detector

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

In order to capture more contextual information, various attention mechanisms are applied to object detectors. However, the spatial interaction in the commonly used attention mechanisms is single scale, and it cannot capture the context information of the objects from the feature maps of different scales, which will lead to the underutilization of the context information. In addition, since the predicted bounding box does not completely fit the shape and pose of the object, it has room for further improvement in the performance. In this paper, we propose a multi-scale global context feature pyramid network to obtain a feature pyramid with richer context information, which is a two-layer lightweight neck structure. Moreover, we extend the regression branch by adding an additional prediction head to predict the corner offsets of the bounding boxes to further refine the bounding boxes, which can effectively improve the accuracy of the predicted bounding boxes. Extensive experiments are conducted on the MS COCO 2017 detection datasets. Without bells and whistles, the proposed methods show an average 2% improvement over the RetinaNet baseline.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Bodla, N., Singh, B., Chellappa, R., Davis, L.S.: Soft-NMS—improving object detection with one line of code. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5562–5570 (2017)

  2. Cai, Z., Vasconcelos, N.: Cascade r-cnn: High quality object detection and instance segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1 (2019)

  3. Cao, Y., Xu, J., Lin, S., Wei, F., Hu, H.: GCNET: non-local networks meet squeeze-excitation networks and beyond. In: 2019 IEEE International Conference on Computer Vision Workshop (ICCVW), pp. 0–0 (2019)

  4. Chen, K., Li, J., Lin, W., See, J., Wang, J., Duan, L., Chen, Z., He, C., Zou, J.: Towards accurate one-stage object detection with ap-loss. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5119–5127 (2019)

  5. Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Xu, J., Zhang, Z., Cheng, D., Zhu, C., Cheng, T., Zhao, Q., Li, B., Lu, X., Zhu, R., Wu, Y., Dai, J., Wang, J., Shi, J., Ouyang, W., Loy, C.C., Lin, D.: Mmdetection: open mmlab detection toolbox and benchmark (2019). arXiv:1906.07155

  6. Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 764–773 (2017)

  7. Ghiasi, G., Lin, T.Y., Le, Q.V.: Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7036–7045 (2019)

  8. Girshick, R.: Fast r-cnn. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015)

  9. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR’4 Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

  10. He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 386–397 (2020)

    Article  Google Scholar 

  11. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

  12. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications (2017). arXiv:1704.04861

  13. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

  14. Hurtík, P., Molek, V., Hula, J., Vajgl, M., Vlasínek, P., Nejezchleba, T.: Poly-yolo: higher speed, more precise detection and instance segmentation for yolov3 (2020). arXiv:2005.13243

  15. Kim, K., Lee, H.S.: Probabilistic anchor assignment with IOU prediction for object detection. ECCV 25, 355–371 (2020)

    Google Scholar 

  16. Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. Int. J. Comput. Vis. 128(3), 642–656 (2020)

    Article  Google Scholar 

  17. Li, H., Wu, Z., Zhu, C., Xiong, C., Socher, R., Davis, L.S.: Learning from noisy anchors for one-stage object detection. In: 2020 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10588–10597 (2020)

  18. Li, Y., Chen, Y., Wang, N., Zhang, Z.X.: Scale-aware trident networks for object detection. In: 2019 IEEE International Conference on Computer Vision (ICCV), pp. 6054–6063 (2019)

  19. Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944 (2017)

  20. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2999–3007 (2017)

  21. Lin, T.Y., Maire, M., Belongie, S.J., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European Conference on Computer Vision, pp. 740–755 (2014)

  22. Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018)

  23. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. In: 14th European Conference on Computer Vision, ECCV 2016, pp. 21–37 (2016)

  24. Lu, X., Li, B., Yue, Y., Li, Q., Yan, J.: GRID R-CNN. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7363–7372 (2019)

  25. Najibi, M., Singh, B., Davis, L.: Autofocus: Efficient multi-scale inference. In: 2019 IEEE International Conference on Computer Vision (ICCV), pp. 9745–9755 (2019)

  26. Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., Lin, D.: LIBRA R-CNN: Towards balanced learning for object detection. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 821–830 (2019)

  27. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)

  28. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525 (2017)

  29. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement (2018). arXiv:1804.02767

  30. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)

    Article  Google Scholar 

  31. Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: A metric and a loss for bounding box regression. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 658–666 (2019)

  32. Samet, N., Hicsonmez, S., Akbas, E.: Reducing label noise in anchor-free object detection (2020). arXiv:2008.01167

  33. Singh, B., Davis, L.S.: An analysis of scale invariance in object detection—SNIP. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3578–3587 (2018)

  34. Tan, M., Pang, R., Le, Q.V.: Efficientdet: Scalable and efficient object detection. In: 2020 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10781–10790 (2020)

  35. Tian, Z., Shen, C., Chen, H., He, T.: Fcos: Fully convolutional one-stage object detection. In: 2019 IEEE International Conference on Computer Vision (ICCV), pp. 9627–9636 (2019)

  36. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)

  37. Wu, Y., He, K.: Group normalization. In: Vision and Pattern Recognition (2018). arXiv: Computer

  38. Xie, S., Girshick, R., Dollar, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5987–5995 (2017)

  39. Yang, Z., Liu, S., Hu, H., Wang, L., Lin, S.: Reppoints: point set representation for object detection. In: 2019 IEEE International Conference on Computer Vision (ICCV), pp. 9657–9666 (2019)

  40. Zhang, X., Wan, F., Liu, C., Ji, R., Ye, Q.: Freeanchor: learning to match anchors for visual object detection. Adva. Neural Inf. Process. Syst. 32, 147–155 (2019)

    Google Scholar 

  41. Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., Ren, D.: Distance-IOU loss: faster and better learning for bounding box regression. Proc. AAAI Conf. Artif. Intellig. 34(7), 12993–13000 (2020)

    Google Scholar 

  42. Zhou, X., Wang, D., Krähenbühl, P.: Objects as points (2019). arXiv:1904.07850

  43. Zhou, X., Zhuo, J., Krahenbuhl, P.: Bottom-up object detection by grouping extreme and center points. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 850–859 (2019)

  44. Zhu, C., He, Y., Savvides, M.: Feature selective anchor-free module for single-shot object detection. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 840–849 (2019)

Download references

Acknowledgements

This work was supported by the grants from the National Natural Science Foundation of China (Nos.61673396, 61976245).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingwen Shao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Shao, M., Fan, B. et al. Multi-scale global context feature pyramid network for object detector. SIViP 16, 705–713 (2022). https://doi.org/10.1007/s11760-021-02010-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-021-02010-4

Keywords

Navigation