Abstract
In order to capture more contextual information, various attention mechanisms are applied to object detectors. However, the spatial interaction in the commonly used attention mechanisms is single scale, and it cannot capture the context information of the objects from the feature maps of different scales, which will lead to the underutilization of the context information. In addition, since the predicted bounding box does not completely fit the shape and pose of the object, it has room for further improvement in the performance. In this paper, we propose a multi-scale global context feature pyramid network to obtain a feature pyramid with richer context information, which is a two-layer lightweight neck structure. Moreover, we extend the regression branch by adding an additional prediction head to predict the corner offsets of the bounding boxes to further refine the bounding boxes, which can effectively improve the accuracy of the predicted bounding boxes. Extensive experiments are conducted on the MS COCO 2017 detection datasets. Without bells and whistles, the proposed methods show an average 2% improvement over the RetinaNet baseline.





Similar content being viewed by others
References
Bodla, N., Singh, B., Chellappa, R., Davis, L.S.: Soft-NMS—improving object detection with one line of code. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5562–5570 (2017)
Cai, Z., Vasconcelos, N.: Cascade r-cnn: High quality object detection and instance segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1 (2019)
Cao, Y., Xu, J., Lin, S., Wei, F., Hu, H.: GCNET: non-local networks meet squeeze-excitation networks and beyond. In: 2019 IEEE International Conference on Computer Vision Workshop (ICCVW), pp. 0–0 (2019)
Chen, K., Li, J., Lin, W., See, J., Wang, J., Duan, L., Chen, Z., He, C., Zou, J.: Towards accurate one-stage object detection with ap-loss. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5119–5127 (2019)
Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Xu, J., Zhang, Z., Cheng, D., Zhu, C., Cheng, T., Zhao, Q., Li, B., Lu, X., Zhu, R., Wu, Y., Dai, J., Wang, J., Shi, J., Ouyang, W., Loy, C.C., Lin, D.: Mmdetection: open mmlab detection toolbox and benchmark (2019). arXiv:1906.07155
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 764–773 (2017)
Ghiasi, G., Lin, T.Y., Le, Q.V.: Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7036–7045 (2019)
Girshick, R.: Fast r-cnn. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR’4 Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 386–397 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications (2017). arXiv:1704.04861
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Hurtík, P., Molek, V., Hula, J., Vajgl, M., Vlasínek, P., Nejezchleba, T.: Poly-yolo: higher speed, more precise detection and instance segmentation for yolov3 (2020). arXiv:2005.13243
Kim, K., Lee, H.S.: Probabilistic anchor assignment with IOU prediction for object detection. ECCV 25, 355–371 (2020)
Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. Int. J. Comput. Vis. 128(3), 642–656 (2020)
Li, H., Wu, Z., Zhu, C., Xiong, C., Socher, R., Davis, L.S.: Learning from noisy anchors for one-stage object detection. In: 2020 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10588–10597 (2020)
Li, Y., Chen, Y., Wang, N., Zhang, Z.X.: Scale-aware trident networks for object detection. In: 2019 IEEE International Conference on Computer Vision (ICCV), pp. 6054–6063 (2019)
Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944 (2017)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2999–3007 (2017)
Lin, T.Y., Maire, M., Belongie, S.J., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European Conference on Computer Vision, pp. 740–755 (2014)
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. In: 14th European Conference on Computer Vision, ECCV 2016, pp. 21–37 (2016)
Lu, X., Li, B., Yue, Y., Li, Q., Yan, J.: GRID R-CNN. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7363–7372 (2019)
Najibi, M., Singh, B., Davis, L.: Autofocus: Efficient multi-scale inference. In: 2019 IEEE International Conference on Computer Vision (ICCV), pp. 9745–9755 (2019)
Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., Lin, D.: LIBRA R-CNN: Towards balanced learning for object detection. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 821–830 (2019)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525 (2017)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement (2018). arXiv:1804.02767
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: A metric and a loss for bounding box regression. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 658–666 (2019)
Samet, N., Hicsonmez, S., Akbas, E.: Reducing label noise in anchor-free object detection (2020). arXiv:2008.01167
Singh, B., Davis, L.S.: An analysis of scale invariance in object detection—SNIP. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3578–3587 (2018)
Tan, M., Pang, R., Le, Q.V.: Efficientdet: Scalable and efficient object detection. In: 2020 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10781–10790 (2020)
Tian, Z., Shen, C., Chen, H., He, T.: Fcos: Fully convolutional one-stage object detection. In: 2019 IEEE International Conference on Computer Vision (ICCV), pp. 9627–9636 (2019)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Wu, Y., He, K.: Group normalization. In: Vision and Pattern Recognition (2018). arXiv: Computer
Xie, S., Girshick, R., Dollar, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5987–5995 (2017)
Yang, Z., Liu, S., Hu, H., Wang, L., Lin, S.: Reppoints: point set representation for object detection. In: 2019 IEEE International Conference on Computer Vision (ICCV), pp. 9657–9666 (2019)
Zhang, X., Wan, F., Liu, C., Ji, R., Ye, Q.: Freeanchor: learning to match anchors for visual object detection. Adva. Neural Inf. Process. Syst. 32, 147–155 (2019)
Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., Ren, D.: Distance-IOU loss: faster and better learning for bounding box regression. Proc. AAAI Conf. Artif. Intellig. 34(7), 12993–13000 (2020)
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points (2019). arXiv:1904.07850
Zhou, X., Zhuo, J., Krahenbuhl, P.: Bottom-up object detection by grouping extreme and center points. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 850–859 (2019)
Zhu, C., He, Y., Savvides, M.: Feature selective anchor-free module for single-shot object detection. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 840–849 (2019)
Acknowledgements
This work was supported by the grants from the National Natural Science Foundation of China (Nos.61673396, 61976245).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, Y., Shao, M., Fan, B. et al. Multi-scale global context feature pyramid network for object detector. SIViP 16, 705–713 (2022). https://doi.org/10.1007/s11760-021-02010-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-021-02010-4