Skip to main content
Log in

Pyramid context learning for object detection

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Contextual information in complex scenarios is critical for accurate object detection. Existing state-of-the-art detectors have greatly improved detection performance with the use of contexts around objects. However, these detectors consider the local and global contexts separately, which limits the improvement in detection accuracy. In this paper, we propose a pyramid context learning module (PCL) for object detection, which makes full use of the feature context at different levels. Specifically, two operators, named aggregation and distribution, are designed to assemble and synthesize contextual information at different levels. In addition, a channel context learning operator is also used to capture the channel context. PCL is a universal module, so it can be easily integrated into most of the detection frameworks. To evaluate our PCL, we apply it into some popular detectors, e.g., SSD, Faster R-CNN and RetinaNet, and conduct extensive experiments on PASCAL VOC and MS COCO datasets. Experimental results show that PCL can produce competitive performance gains and significantly improve the baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Bell S, Lawrence Zitnick C, Bala K, Girshick R (2016) Inside–outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 2874–2883

  2. Cai Z, Vasconcelos N (2018) Cascade R-CNN: delving into high quality object detection. In: IEEE CVPR

  3. Chen X, Gupta A (2017) Spatial memory for context reasoning in object detection. In: Proceedings of the IEEE International Conference on Computer Vision. pp 4086–4096

  4. Chen X, Li LJ, Fei-Fei L, Gupta A (2018) Iterative visual reasoning beyond convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 7239–7248

  5. Chen Y, Li W, Sakaridis C, Dai D, Van Gool L (2018) Domain adaptive faster R-CNN for object detection in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 3339–3348

  6. Chen Z, Huang S, Tao D (2018) Context refinement for object detection. In: The European Conference on Computer Vision (ECCV)

  7. Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems. pp 379–387

  8. Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision. pp 764–773

  9. Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: object detection with keypoint triplets. ArXiv preprint arXiv:1904.08189

  10. Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (VOC) challenge. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  11. Fu CY, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659

  12. Ghiasi G, Lin TY, Le QV (2019) Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 7036–7045

  13. Hara K, Liu MY, Tuzel O, Farahmand Am (2017) Attentional network for visual object detection. arXiv preprint arXiv:1702.01478

  14. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, pp 2980–2988

  15. He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European Conference on Computer Vision. Springer, pp 630–645

  16. Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y (2017) Ron: Reverse connection with objectness prior networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 5936–5944

  17. Kou G, Yang P, Peng Y, Xiao F, Chen Y, Alsaadi FE (2020) Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods. Appl Soft Comput 86:105836

    Article  Google Scholar 

  18. Law H, Deng J (2018) Cornernet: Detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision (ECCV). pp 734–750

  19. Lee H, Eum S, Kwon H (2017) Me r-cnn: Multi-expert r-cnn for object detection. arXiv preprint arXiv:1704.01069

  20. Leng J, Liu Y (2019) An enhanced SSD with feature fusion and visual reasoning for object detection. Neural Comput Appl 31(10):6549–6558

    Article  Google Scholar 

  21. Leng J, Liu Y, Du D, Zhang T, Quan P (2019) Robust obstacle detection and recognition for driver assistance systems. IEEE Trans Intell Transp Syst. https://doi.org/10.1109/TITS.2019.2909275

    Article  Google Scholar 

  22. Li J, Wei Y, Liang X, Dong J, Xu T, Feng J, Yan S (2016) Attentive contexts for object detection. IEEE Trans Multimedia 19(5):944–954

    Article  Google Scholar 

  23. Li J, Wei Y, Liang X, Dong J, Xu T, Feng J, Yan S (2017) Attentive contexts for object detection. IEEE Trans Multimedia 19(5):944–954

    Article  Google Scholar 

  24. Li T, Kou G, Peng Y, Shi Y (2017) Classifying with adaptive hyper-spheres: an incremental classifier based on competitive learning. IEEE Trans Syst Man Cybern Syst. https://doi.org/10.1109/TSMC.2017.2761360

    Article  Google Scholar 

  25. Li X, Jiang S (2019) Know more say less: image captioning based on scene graphs. IEEE Trans Multimedia 21(8):2117–2130

    Article  Google Scholar 

  26. Li Z, Peng C, Yu G, Zhang X, Deng Y, Sun J (2017) Light-head r-cnn: In defense of two-stage object detector. arXiv preprint arXiv:1711.07264

  27. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 2117–2125

  28. Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision. pp 2980–2988

  29. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision. Springer, pp 740–755

  30. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European Conference on Computer Vision. Springer, pp 21–37

  31. Neubeck A, Van Gool L (2006) Efficient non-maximum suppression. In: 18th International Conference on Pattern Recognition (ICPR’06). IEEE, vol 3, pp 850–855

  32. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 779–788

  33. Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 7263–7271

  34. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767

  35. Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 6:1137–1149

    Article  Google Scholar 

  36. Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 761–769

  37. Simonyan K, Zisserman (2014) A Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  38. Su Y, Li Y, Xu N, Liu AA (2019) Hierarchical deep neural network for image captioning. Neural Process Lett. https://doi.org/10.1007/s11063-019-09997-5

    Article  Google Scholar 

  39. Tang X, Du DK, He Z, Liu J (2018) Pyramidbox: A context-assisted single shot face detector. In: Proceedings of the European Conference on Computer Vision (ECCV). pp 797–813

  40. Tychsen-Smith L, Petersson L (2018) Improving object localization with fitness nms and bounded iou loss. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 6877–6885

  41. Wang X, Shrivastava A, Gupta A (2017) A-fast-rcnn: Hard positive generation via adversary for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 2606–2615

  42. Woo S, Park J, Lee JY, So Kweon I (2018) Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV). pp 3–19

  43. Yang L, Tang K, Yang J, Li LJ (2017) Dense captioning with joint inference and visual context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 2193–2202

  44. Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 4203–4212

  45. Zhu C, He Y, Savvides M (2019) Feature selective anchor-free module for single-shot object detection. arXiv preprint arXiv:1903.00621

  46. Zhu Y, Zhao C, Wang J, Zhao X, Wu Y, Lu H (2017) Couplenet: Coupling global structure with local parts for object detection. In: Proceedings of the IEEE International Conference on Computer Vision. pp 4126–4134

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Minghui Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ding, P., Zhang, J., Zhou, H. et al. Pyramid context learning for object detection. J Supercomput 76, 9374–9387 (2020). https://doi.org/10.1007/s11227-020-03168-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-020-03168-3

Keywords

Navigation