Skip to main content
Log in

Object detector with enriched global context information

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

How to add more context information and bring more accurate detection is an important problem to be considered in object detection. In this paper, we propose a new object detector with enriched global context information by a pyramid feature pool module and several global activation blocks, named EGCI-Net, which is a one-stage object detector from scratch as DSOD.The global activation blocks are added into the backbone sub network of the detector to weaken the local information of the detected object feature maps and increase the global context of them. And the pyramid feature pool module produces multi-scale global context features to supervise the pyramid features by multi-scale global average pooling. Then the features obtained by the main structure are fused with the pyramid pooling features to merge into the final multibox detector. We have evaluated our detector on the Pascal VOC and MS COCO datasets. The experimental results show that our proposed detector achieves better results than DSOD and exceeds most of the existing excellent detectors, especially detects partially occluded objects and small objects well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Bell S, Lawrence Zitnick C, Bala Kavita, Girshick Ross (2016) Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2874–2883

  2. Chabot F, Chaouch M, Rabarisoa J, Teuliere C, Chateau T (2017) Deep manta: a coarse-to-fine many-task network for joint 2d and 3d vehicle analysis from monocular image. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1827–1836

  3. Chen Y, Li J, Zhou B, Feng J, Yan S (2017) Weaving multi-scale context for single shot detector. arXiv preprint arXiv:1712.03149

  4. Cheng-Yang F, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659

  5. Dai J, Yi L, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387

  6. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248–255. IEEE

  7. Everingham M, Gool LV, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. International journal of computer vision 88(2):303–338

    Article  Google Scholar 

  8. Girshick R (2015) Fast r-cnn. In: IEEE International Conference on Computer Vision, pp 1440–1448

  9. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587

  10. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256

  11. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 37(9):1904–1916

    Article  Google Scholar 

  12. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn.. In: Computer Vision (ICCV) IEEE International Conference On, pages 2980–2988. IEEE, p 2017

  13. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  14. Hoiem D, Chodpathumwan Y, Dai Q (2012) Diagnosing error in object detectors. In: European conference on computer vision, pages 340–353. Springer

  15. Huang G, Liu Z, Weinberger K Q, Maaten van der L (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol 1, p 3

  16. Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167

  17. Jie H, Li S, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  18. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on Multimedia, pages 675–678 ACM

  19. Kim S-W, Kook H-K, Sun J-Y, Kang M-C, Ko S-J (2018) Parallel feature pyramid network for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 234–250

  20. Kong T, Sun F, Tan C, Liu H, Huang W (2018) Deep feature pyramid reconfiguration for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 169–185

  21. Leng Q, Yang H, Jiang J, Tian Q (2020) Adaptive MultiScale Segmentations for Hyperspectral Image Classification. IEEE Transactions on Geoscience and Remote Sensing 58(8):5847–5860

    Article  Google Scholar 

  22. Li J, Liang X, Shen S, Tingfa X, Feng J, Yan S (2018) Scale-aware fast r-cnn for pedestrian detection. IEEE transactions on Multimedia 20(4):985–996

    Google Scholar 

  23. Li J, Wei Y, Liang X, Dong J, Tingfa X, Feng J, Yan S (2017) Attentive contexts for object detection. IEEE Transactions on Multimedia 19(5):944–954

    Article  Google Scholar 

  24. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: CVPR, vol 1, p 4

  25. Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988

  26. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, pages 740–755. Springer

  27. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, pages 21–37. Springer

  28. Liu W, Rabinovich A, Berg AC (2015) Parsenet: Looking wider to see better. arXiv preprint arXiv:1506.04579

  29. Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–814

  30. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  31. Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger arXiv preprint

  32. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: International Conference on Neural Information Processing Systems, pp 91–99

  33. Shao Z, Wenjing W, Wang Z, Wan D, Li C (2018) Seaships: a large-scale precisely annotated dataset for ship detection. IEEE Transactions on Multimedia 20(10):2593–2604

    Article  Google Scholar 

  34. Shen Z, Liu Z, Li J, Jiang Y-G, Chen Y, Xue X (2017) Dsod: Learning deeply supervised object detectors from scratch. In: The IEEE International Conference on Computer Vision (ICCV), vol 3, p 7

  35. Shen Z, Liu Z, Li J, Jiang Y-G, Chen Y, Xue X (2018) Object detection from scratch with deep supervision. arXiv preprint arXiv:1809.09294

  36. Shen Z, Shi H, Feris R, Cao L, Yan S, Liu D, Wang X, Xue X, Huang TS (2017) Learning object detectors from scratch with gated recurrent feature pyramids. arXiv preprint arXiv:1712.00886

  37. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  38. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI, vol 4, p 12

  39. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2818–2826

  40. Tian L, Li M, Hao Y, Liu J, Zhang G, Chen YQ (2018) Robust 3-d human detection in complex environments with a depth camera. IEEE Transactions on Multimedia 20(9):2249–2261

    Article  Google Scholar 

  41. Uijlings J RR, Sande Van De KEA , Gevers T, Smeulders AWM (2013) Selective search for object recognition. International journal of computer vision 104(2):154–171

    Article  Google Scholar 

  42. Wang S, Cheng J, Liu H, Wang F, Zhou H (2018) Pedestrian detection via body part semantic and contextual information with dnn. IEEE Transactions on Multimedia 20(11):3148–3159

    Article  Google Scholar 

  43. Woo S, Hwang S (2018) In So Kweon. Stairnet: Top-down semantic aggregation for accurate one shot detection. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1093–1102. IEEE

  44. Xiang W, Zhang D-Q, Athitsos V, Yu H (2017) Context-aware single-shot detector. arXiv preprint arXiv:1707.08682

  45. Yi S, Wang X, Tang X (2016) Sparsifying neural network connections for face recognition. In: Computer Vision and Pattern Recognition, pp 4856–4864

  46. Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4203–4212

  47. Zhang Z, Qiao S, Xie C, Shen W, Bo W, Yuille A L (2018) Single-shot object detection with enriched semantics. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5813–5821

  48. Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  49. Zhong Q, Li C, Zhang Y, Xie D, Yang S, Pu S (2017) Cascade region proposal and global context for deep object detection. arXiv preprint arXiv:1710.10749

  50. Zhou H, Li Z, Ning C, Tang J (2017) Cad: Scale invariant framework for real-time object detection. In: IEEE International Conference on Computer Vision Workshop

  51. Zitnick CL, Dollár P (2014) Edge boxes: Locating object proposals from edges. In: European Conference on Computer Vision, pages 391–405. Springer

Download references

Acknowledgment

This work is supported by the Natural Science Foundation of China (Grant 61572214 and U1536203).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jingjuan Guo or Tianjiang Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, J., Yuan, C., Zhao, Z. et al. Object detector with enriched global context information. Multimed Tools Appl 79, 29551–29571 (2020). https://doi.org/10.1007/s11042-020-09500-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-09500-6

Keywords

Navigation