Abstract
How to add more context information and bring more accurate detection is an important problem to be considered in object detection. In this paper, we propose a new object detector with enriched global context information by a pyramid feature pool module and several global activation blocks, named EGCI-Net, which is a one-stage object detector from scratch as DSOD.The global activation blocks are added into the backbone sub network of the detector to weaken the local information of the detected object feature maps and increase the global context of them. And the pyramid feature pool module produces multi-scale global context features to supervise the pyramid features by multi-scale global average pooling. Then the features obtained by the main structure are fused with the pyramid pooling features to merge into the final multibox detector. We have evaluated our detector on the Pascal VOC and MS COCO datasets. The experimental results show that our proposed detector achieves better results than DSOD and exceeds most of the existing excellent detectors, especially detects partially occluded objects and small objects well.
Similar content being viewed by others
References
Bell S, Lawrence Zitnick C, Bala Kavita, Girshick Ross (2016) Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2874–2883
Chabot F, Chaouch M, Rabarisoa J, Teuliere C, Chateau T (2017) Deep manta: a coarse-to-fine many-task network for joint 2d and 3d vehicle analysis from monocular image. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1827–1836
Chen Y, Li J, Zhou B, Feng J, Yan S (2017) Weaving multi-scale context for single shot detector. arXiv preprint arXiv:1712.03149
Cheng-Yang F, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659
Dai J, Yi L, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248–255. IEEE
Everingham M, Gool LV, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. International journal of computer vision 88(2):303–338
Girshick R (2015) Fast r-cnn. In: IEEE International Conference on Computer Vision, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 37(9):1904–1916
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn.. In: Computer Vision (ICCV) IEEE International Conference On, pages 2980–2988. IEEE, p 2017
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hoiem D, Chodpathumwan Y, Dai Q (2012) Diagnosing error in object detectors. In: European conference on computer vision, pages 340–353. Springer
Huang G, Liu Z, Weinberger K Q, Maaten van der L (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol 1, p 3
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167
Jie H, Li S, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on Multimedia, pages 675–678 ACM
Kim S-W, Kook H-K, Sun J-Y, Kang M-C, Ko S-J (2018) Parallel feature pyramid network for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 234–250
Kong T, Sun F, Tan C, Liu H, Huang W (2018) Deep feature pyramid reconfiguration for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 169–185
Leng Q, Yang H, Jiang J, Tian Q (2020) Adaptive MultiScale Segmentations for Hyperspectral Image Classification. IEEE Transactions on Geoscience and Remote Sensing 58(8):5847–5860
Li J, Liang X, Shen S, Tingfa X, Feng J, Yan S (2018) Scale-aware fast r-cnn for pedestrian detection. IEEE transactions on Multimedia 20(4):985–996
Li J, Wei Y, Liang X, Dong J, Tingfa X, Feng J, Yan S (2017) Attentive contexts for object detection. IEEE Transactions on Multimedia 19(5):944–954
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: CVPR, vol 1, p 4
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, pages 740–755. Springer
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, pages 21–37. Springer
Liu W, Rabinovich A, Berg AC (2015) Parsenet: Looking wider to see better. arXiv preprint arXiv:1506.04579
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–814
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger arXiv preprint
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: International Conference on Neural Information Processing Systems, pp 91–99
Shao Z, Wenjing W, Wang Z, Wan D, Li C (2018) Seaships: a large-scale precisely annotated dataset for ship detection. IEEE Transactions on Multimedia 20(10):2593–2604
Shen Z, Liu Z, Li J, Jiang Y-G, Chen Y, Xue X (2017) Dsod: Learning deeply supervised object detectors from scratch. In: The IEEE International Conference on Computer Vision (ICCV), vol 3, p 7
Shen Z, Liu Z, Li J, Jiang Y-G, Chen Y, Xue X (2018) Object detection from scratch with deep supervision. arXiv preprint arXiv:1809.09294
Shen Z, Shi H, Feris R, Cao L, Yan S, Liu D, Wang X, Xue X, Huang TS (2017) Learning object detectors from scratch with gated recurrent feature pyramids. arXiv preprint arXiv:1712.00886
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI, vol 4, p 12
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2818–2826
Tian L, Li M, Hao Y, Liu J, Zhang G, Chen YQ (2018) Robust 3-d human detection in complex environments with a depth camera. IEEE Transactions on Multimedia 20(9):2249–2261
Uijlings J RR, Sande Van De KEA , Gevers T, Smeulders AWM (2013) Selective search for object recognition. International journal of computer vision 104(2):154–171
Wang S, Cheng J, Liu H, Wang F, Zhou H (2018) Pedestrian detection via body part semantic and contextual information with dnn. IEEE Transactions on Multimedia 20(11):3148–3159
Woo S, Hwang S (2018) In So Kweon. Stairnet: Top-down semantic aggregation for accurate one shot detection. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1093–1102. IEEE
Xiang W, Zhang D-Q, Athitsos V, Yu H (2017) Context-aware single-shot detector. arXiv preprint arXiv:1707.08682
Yi S, Wang X, Tang X (2016) Sparsifying neural network connections for face recognition. In: Computer Vision and Pattern Recognition, pp 4856–4864
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4203–4212
Zhang Z, Qiao S, Xie C, Shen W, Bo W, Yuille A L (2018) Single-shot object detection with enriched semantics. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5813–5821
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Zhong Q, Li C, Zhang Y, Xie D, Yang S, Pu S (2017) Cascade region proposal and global context for deep object detection. arXiv preprint arXiv:1710.10749
Zhou H, Li Z, Ning C, Tang J (2017) Cad: Scale invariant framework for real-time object detection. In: IEEE International Conference on Computer Vision Workshop
Zitnick CL, Dollár P (2014) Edge boxes: Locating object proposals from edges. In: European Conference on Computer Vision, pages 391–405. Springer
Acknowledgment
This work is supported by the Natural Science Foundation of China (Grant 61572214 and U1536203).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Guo, J., Yuan, C., Zhao, Z. et al. Object detector with enriched global context information. Multimed Tools Appl 79, 29551–29571 (2020). https://doi.org/10.1007/s11042-020-09500-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-09500-6