Abstract
Trade-off or appropriate balance between high accuracy on object identification and fast speed of identification process is one of the most challenging problems in the study of pedestrian detection algorithms which is based on convolutional neural network. In this paper, we presented a one-stage pedestrian detection algorithm to optimise the trade-off based on an improved scheme via implying deep network features. Firstly, a novel branch was attached to ResNet-50 backbone network. In comparison to the conventional convolution, a dilated convolution in the branch was used to extract much richer context features. Secondly, a classification regression sub-network with stacking predictors was proposed to locate objects and recognise whether the objects are pedestrians. Finally, a novel loss function was introduced into the scheme to improve our network training method by learning more detailed information regarding pedestrian locations. The proposed scheme in this study demonstrated a competitive missing rate which resulting in 12.90 in the ideal circumstances of accuracy and high speed against the challenging benchmark CityPerson in pedestrian detection.
Similar content being viewed by others
References
Rohil, M.K., Gupta, N., Yadav, P.: An improved model for no-reference image quality assessment and a no-reference video quality assessment model based on frame analysis. Signal Image Video Process. 14, 205–213 (2020)
Li, Y., Xu, J., Xia, R., Wang, X.-C., Xie, W.-X.: A two-stage framework of target detection in high-resolution hyperspectral images. Signal Image Video Process. 13, 1339–1346 (2019)
Han, B., Wang, Y., Yang, Z., Gao, X.: Small-scale pedestrian detection based on deep neural network. IEEE Trans. Intell. Transp. Syst. 21, 3046–3055 (2019)
Qian, Y., Yang, M., Zhao, X., Wang, C., Wang, B.: Oriented spatial transformer network for pedestrian detection using fish-eye camera. IEEE Trans. Multimed. 22(2), 421–431 (2020)
Liu, W., Liao, S., Ren, W., Hu, W., Yu, Y.: High-level semantic feature detection: a new perspective for pedestrian detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5187–5196 (2019)
Baek, J., Hyun, J., Kim, E.: A pedestrian detection system accelerated by kernelized proposals. IEEE Trans. Intell. Transp. Syst. 21(3), 1216–1228 (2020)
Lin, C., Lu, J., Wang, G., Zhou, J.: Graininess-aware deep feature learning for robust pedestrian detection. IEEE Trans. Image Process. 29, 3820–3834 (2020)
Pei, D., Jing, M., Liu, H., Jiang, L., Sun, F.: A fast RetinaNet fusion framework for multi-spectral pedestrian detection. Infrared Phys. Technol. (2019). https://doi.org/10.1016/j.infrared.2019.103178
Doğan, Y., Demirci, S., Güdükbay, U., Dibeklioğlu, H.: Augmentation of virtual agents in real crowd videos. Signal Image Video Process. 13(4), 643–650 (2019)
Zhang, S., Benenson, R., Schiele, B.: Citypersons: a diverse dataset for pedestrian detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3221 (2017)
Gao, X., Ram, S., Rodríguez, J.J.: A post-processing scheme for the performance improvement of vehicle detection in wide-area aerial imagery. Signal Image Video Process. 14(3), 625–633, 635 (2020)
Touil, D.E., Terki, N., Medouakh, S.: Hierarchical convolutional features for visual tracking via two combined color spaces with SVM classifier. Signal Image Video Process. 13(2), 359–368 (2019)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: SSD: single shot multibox detector. In: European Conference on Computer Vision (ECCV), pp. 21–37 (2016)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2117–2125 (2017)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587 (2014)
Tian, Y., Luo, P., Wang, X., Tang, X.: Pedestrian detection aided by deep learning semantic tasks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5079–5087 (2015)
Dollár, P., Appel, R., Belongie, S., Perona, P.: Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 36(8), 1532–1545 (2014)
Tian, Y., Luo, P., Wang, X., Tang, X.: Deep learning strong parts for pedestrian detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 1904–1912 (2015)
Nam, W., Dollr, P., Han, J.H.: Local decorrelation for improved pedestrian detection. NIPS 1, 1–9 (2014)
Zhang, L., Lin, L., Liang, X., He, K.: Is faster R-CNN doing well for pedestrian detection? In: European Conference on Computer Vision (ECCV), pp. 443–457 (2016)
Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: European Conference on Computer Vision (ECCV), pp. 354–370 (2016)
Li, J., Liang, X., Shen, S., Xu, T., Feng, J., Yan, S.: Scale-aware fast R-CNN for pedestrian detection. IEEE Trans. Multimed. 20(4), 985–996 (2018)
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6154–6162 (2018)
Mao, J., Xiao, T., Jiang, Y., Cao, Z.: What can help pedestrian detection?. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3127–3136 (2017)
Wang, X., Xiao, T., Jiang, Y., Shao, S., Sun, J., Shen, C.: Repulsion loss: detecting pedestrians in a crowd. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7774–7783 (2018)
Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Occlusion-aware R-CNN: detecting pedestrians in a crowd. In: European Conference on Computer Vision (ECCV), pp. 637–653 (2018)
Liu, W., Liao, S., Hu, W., Liang, X., Chen, X.: Learning efficient single-stage pedestrian detectors by asymptotic localization fitting. In: European Conference on Computer Vision (ECCV), pp. 618–634 (2018)
Li, Z., Peng, C., Yu, G., Zhang, X., Deng, Y., Sun, J.: Detnet: a backbone network for object detection (2018). arXiv:1804.06215
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255 (2009)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988 (2017)
Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 743–761 (2012)
Tarvainen, A., Valpola, H.: Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: NIPS, pp. 1195–1204 (2017)
Li, Z., Chen, Z., Wu, Q.J., Liu, C.: Real-time pedestrian detection with deep supervision in the wild. Signal Image Video Process. 13(4), 761–769 (2019)
Maji, S., Berg, A.C., Malik, J:. Classification using intersection kernel support vector machines is efficient. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2008)
Lin, C.Y., Xie, H.X., Zheng, H.: PedJointNet: joint head–shoulder and full body deep network for pedestrian detection. IEEE Access 7, 47687–47697 (2019)
Song, T., Sun, L., Xie, D., Sun, H., Pu, S.: Small-scale pedestrian detection based on topological line localization and temporal feature aggregation. In: European Conference on Computer Vision (ECCV), pp. 536–551 (2018)
Acknowledgements
This study was sponsored by the China Shandong Key R&D Plan (2018GGX106008) and was supported by the China Shandong Key Laboratory of Medical Physical Image Processing Technology. The authors are very grateful for the fruitful discussion with Hui Shi.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ma, J., Wan, H., Wang, J. et al. An improved scheme of deep dilated feature extraction on pedestrian detection. SIViP 15, 231–239 (2021). https://doi.org/10.1007/s11760-020-01742-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-020-01742-z