GTFNet: Ground Truth Fitting Network for Crowd Counting

Tan, Jinghan; Sang, Jun; Xiang, Zhili; Shi, Ying; Xia, Xiaofeng

doi:10.1007/978-3-030-61609-0_19

GTFNet: Ground Truth Fitting Network for Crowd Counting

Jinghan Tan^11,12,
Jun Sang^11,12,
Zhili Xiang^11,12,
Ying Shi^11,12 &
…
Xiaofeng Xia^11,12

Conference paper
First Online: 14 October 2020

3058 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12396))

Abstract

Crowd counting aims to estimate the number of pedestrians in a single image. Current crowd counting methods usually obtain counting results by integrating density maps. However, the label density map generated by the Gaussian kernel cannot accurately map the ground truth in the corresponding crowd image, thereby affecting the final counting result. In this paper, a ground truth fitting network called GTFNet was proposed, which aims to generate estimated density maps which can fit the ground truth better. Firstly, the VGG network combined with the dilated convolutional layers was used as the backbone network of GTFNet to extract hierarchical features. The multi-level features were concatenated to achieve compensation for information loss caused by pooling operations, which may assist the network to obtain texture information and spatial information. Secondly, the regional consistency loss function was designed to obtain the mapping results of the estimated density map and the label density map at different region levels. During the training process, the region-level dynamic weights were designed to assign a suitable region fitting range for the network, which can effectively reduce the impact of label errors on the estimated density maps. Finally, our proposed GTFNet was evaluated upon three crowd counting datasets (ShanghaiTech, UCF_CC_50 and UCF-QRNF). The experimental results demonstrated that the proposed GTFNet achieved excellent overall performance on all these datasets.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Beibei, Z.: Crowd analysis: a survey. Mach. Vis. Appl. 19(5–6), 345–357 (2008)
Google Scholar
Teng, L.: Crowded scene analysis: a survey. IEEE Trans. Circuits Syst. Video Technol. 25(3), 367–386 (2015)
Google Scholar
Dalal, N.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 886–893. IEEE (2005)
Google Scholar
Felzenszwalb, P.F.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2009)
Article Google Scholar
Zhao, T.: Segmentation and tracking of multiple humans in crowded environments. IEEE Trans. Pattern Anal. Mach. Intell. 30(7), 1198–1211 (2008)
Article Google Scholar
Rodriguez, M.: Density-aware person detection and tracking in crowds. In: 2011 International Conference on Computer Vision, pp. 2423–2430. IEEE (2011)
Google Scholar
Wang, M.: Automatic adaptation of a generic pedestrian detector to a specific traffic scene. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11), vol. 7, pp. 3401–3408. IEEE (2011)
Google Scholar
Wu, B.: Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors. In: Tenth IEEE International Conference on Computer Vision (ICCV’05), vol. 1, pp. 90–97. IEEE (2005)
Google Scholar
Zhang, C.: Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 833–841. IEEE (2015)
Google Scholar
Szegedy, C.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9. IEEE (2015)
Google Scholar
Szegedy, C.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826. IEEE (2016)
Google Scholar
He, K.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE (2016)
Google Scholar
Zhang, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 589–597. IEEE (2016)
Google Scholar
Li, Y.: Csrnet: dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1091–1100. IEEE (2018)
Google Scholar
Simonyan, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556
Yu, F.: Dilated residual networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 472–480. IEEE (2017)
Google Scholar
Jiang, X.: Crowd counting and density estimation by trellis encoder-decoder networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6133–6142. IEEE (2019)
Google Scholar
Idrees, H.: Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2547–2554. IEEE (2013)
Google Scholar
Idrees, H., et al.: Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 544–559. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_33
Chapter Google Scholar
Yosinski, J.: How transferable are features in deep neural networks? In: Advances in Neural Information Processing Systems, pp. 3320–3328. MIT Press (2014)
Google Scholar
Paszke, A., Gross, S., Chintala, S., Chanan, G.: Pytorch: tensors and dynamic neural networks in python with strong gpu acceleration. PyTorch: tensors and dynamic neural networks in Python with strong GPU acceleration 6 (2017)
Google Scholar
Cao, X., Wang, Z., Zhao, Y., Su, F.: Scale aggregation network for accurate and efficient crowd counting. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 757–773. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_45
Chapter Google Scholar
Wang, Q.: Learning from synthetic data for crowd counting in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8198–8207. IEEE (2019)
Google Scholar
Liu, N.: Adcrowdnet: an attention-injective deformable convolutional network for crowd understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3225–3234. IEEE (2019)
Google Scholar
Shi, M.: Perspective-aware CNN for crowd counting (2018)
Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (No. 61971073).

Author information

Authors and Affiliations

Key Laboratory of Dependable Service Computing in Cyber Physical Society of Ministry of Education, Chongqing University, Chongqing, 400044, China
Jinghan Tan, Jun Sang, Zhili Xiang, Ying Shi & Xiaofeng Xia
School of Big Data and Software Engineering, Chongqing University, Chongqing, 401331, China
Jinghan Tan, Jun Sang, Zhili Xiang, Ying Shi & Xiaofeng Xia

Authors

Jinghan Tan
View author publications
You can also search for this author in PubMed Google Scholar
Jun Sang
View author publications
You can also search for this author in PubMed Google Scholar
Zhili Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Ying Shi
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofeng Xia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Sang .

Editor information

Editors and Affiliations

Department of Applied Informatics, Comenius University in Bratislava, Bratislava, Slovakia
Igor Farkaš
Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kgs. Lyngby, Denmark
Paolo Masulli
Department of Informatics, University of Hamburg, Hamburg, Germany
Stefan Wermter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tan, J., Sang, J., Xiang, Z., Shi, Y., Xia, X. (2020). GTFNet: Ground Truth Fitting Network for Crowd Counting. In: Farkaš, I., Masulli, P., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2020. ICANN 2020. Lecture Notes in Computer Science(), vol 12396. Springer, Cham. https://doi.org/10.1007/978-3-030-61609-0_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-61609-0_19
Published: 14 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61608-3
Online ISBN: 978-3-030-61609-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics