Abstract
Crowd counting aims to estimate the number of pedestrians in a single image. Current crowd counting methods usually obtain counting results by integrating density maps. However, the label density map generated by the Gaussian kernel cannot accurately map the ground truth in the corresponding crowd image, thereby affecting the final counting result. In this paper, a ground truth fitting network called GTFNet was proposed, which aims to generate estimated density maps which can fit the ground truth better. Firstly, the VGG network combined with the dilated convolutional layers was used as the backbone network of GTFNet to extract hierarchical features. The multi-level features were concatenated to achieve compensation for information loss caused by pooling operations, which may assist the network to obtain texture information and spatial information. Secondly, the regional consistency loss function was designed to obtain the mapping results of the estimated density map and the label density map at different region levels. During the training process, the region-level dynamic weights were designed to assign a suitable region fitting range for the network, which can effectively reduce the impact of label errors on the estimated density maps. Finally, our proposed GTFNet was evaluated upon three crowd counting datasets (ShanghaiTech, UCF_CC_50 and UCF-QRNF). The experimental results demonstrated that the proposed GTFNet achieved excellent overall performance on all these datasets.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Beibei, Z.: Crowd analysis: a survey. Mach. Vis. Appl. 19(5–6), 345–357 (2008)
Teng, L.: Crowded scene analysis: a survey. IEEE Trans. Circuits Syst. Video Technol. 25(3), 367–386 (2015)
Dalal, N.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 886–893. IEEE (2005)
Felzenszwalb, P.F.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2009)
Zhao, T.: Segmentation and tracking of multiple humans in crowded environments. IEEE Trans. Pattern Anal. Mach. Intell. 30(7), 1198–1211 (2008)
Rodriguez, M.: Density-aware person detection and tracking in crowds. In: 2011 International Conference on Computer Vision, pp. 2423–2430. IEEE (2011)
Wang, M.: Automatic adaptation of a generic pedestrian detector to a specific traffic scene. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11), vol. 7, pp. 3401–3408. IEEE (2011)
Wu, B.: Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors. In: Tenth IEEE International Conference on Computer Vision (ICCV’05), vol. 1, pp. 90–97. IEEE (2005)
Zhang, C.: Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 833–841. IEEE (2015)
Szegedy, C.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9. IEEE (2015)
Szegedy, C.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826. IEEE (2016)
He, K.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE (2016)
Zhang, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 589–597. IEEE (2016)
Li, Y.: Csrnet: dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1091–1100. IEEE (2018)
Simonyan, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556
Yu, F.: Dilated residual networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 472–480. IEEE (2017)
Jiang, X.: Crowd counting and density estimation by trellis encoder-decoder networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6133–6142. IEEE (2019)
Idrees, H.: Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2547–2554. IEEE (2013)
Idrees, H., et al.: Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 544–559. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_33
Yosinski, J.: How transferable are features in deep neural networks? In: Advances in Neural Information Processing Systems, pp. 3320–3328. MIT Press (2014)
Paszke, A., Gross, S., Chintala, S., Chanan, G.: Pytorch: tensors and dynamic neural networks in python with strong gpu acceleration. PyTorch: tensors and dynamic neural networks in Python with strong GPU acceleration 6 (2017)
Cao, X., Wang, Z., Zhao, Y., Su, F.: Scale aggregation network for accurate and efficient crowd counting. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 757–773. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_45
Wang, Q.: Learning from synthetic data for crowd counting in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8198–8207. IEEE (2019)
Liu, N.: Adcrowdnet: an attention-injective deformable convolutional network for crowd understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3225–3234. IEEE (2019)
Shi, M.: Perspective-aware CNN for crowd counting (2018)
Acknowledgements
This work was supported by National Natural Science Foundation of China (No. 61971073).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Tan, J., Sang, J., Xiang, Z., Shi, Y., Xia, X. (2020). GTFNet: Ground Truth Fitting Network for Crowd Counting. In: Farkaš, I., Masulli, P., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2020. ICANN 2020. Lecture Notes in Computer Science(), vol 12396. Springer, Cham. https://doi.org/10.1007/978-3-030-61609-0_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-61609-0_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61608-3
Online ISBN: 978-3-030-61609-0
eBook Packages: Computer ScienceComputer Science (R0)