Abstract
The task of crowd counting is receiving increased attention recently, but it still faces many challenges, such as extremely dense scene, scale variation and background clutter. The quality of generated density map plays an important role in counting performance. In this paper, we propose an encoder-decoder structure network called Average Up-sample Convolution Neural Network (AU-CNN), for high-quality density map and accurate counting estimation. The encoder extracts the features of input image while the decoder gradually recovers the size of feature map to the original size of input image by developing a simple but effective average up-sample module. The average up-sample module takes the average of interpolation results from three different up-sample methods, without adding any other redundant parameters. Moreover, compared with most existing counting algorithm using only Euclidean loss, we use a combined loss function of Euclidean loss and count loss to optimize the network, which is demonstrated effective in performance improving. Experiments on the ShanghaiTech, UCF_CC_50, and UCF_QNRF demonstrate the great counting performance and robustness of our proposed method.
Similar content being viewed by others
References
Aich S, Stavness I (2018) Global sum pooling: A generalization trick for object counting with small datasets of large images. arXiv:1805.11123
Bahmanyar R, Vig E, Reinartz P (2019) Mrcnet: Crowd counting and density map estimation in aerial and ground imagery. arXiv:1909.12743
Cao X, Wang Z, Zhao Y, Su F (2018) Scale aggregation network for accurate and efficient crowd counting. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 734–750
Chen K, Loy CC, Gong S, Xiang T (2012) Feature mining for localised crowd counting. In: BMVC, vol 1, p 3
Cheng ZQ, Li JX, Dai Q, Wu X, He JY, Hauptmann AG (2019) Improving the learning of multi-column convolutional neural network for crowd counting. In: Proceedings of the 27th ACM international conference on multimedia, pp 1897–1906
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE Computer society conference on computer vision and pattern recognition (CVPR’05), vol 1. IEEE, pp 886–893
Ding X, He F, Lin Z, Wang Y, Guo H, Huang Y (2020) Crowd density estimation using fusion of multi-layer features. IEEE Trans Intell Transport Syst
Dong Z, Zhang R, Shao X, Li Y (2020) Scale-recursive network with point supervision for crowd scene analysis. Neurocomputing 384:314–324
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Machine Intell 32(9):1627–1645
Gao G, Liu Q, Wang Y (2020) Counting dense objects in remote sensing images. In: ICASSP 2020-2020 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4137–4141
Gao J, Lin W, Zhao B, Wang D, Gao C, Wen J (2019) Cˆ 3 framework: An open-source pytorch code for crowd counting. arXiv:1907.02724
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Hou Y, Li C, Yang F, Ma C, Zhu L, Li Y, Jia H, Xie X (2020) Bba-net: A bi-branch attention network for crowd counting. In: ICASSP 2020-2020 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4072– 4076
Idrees H, Saleemi I, Seibert C, Shah M (2013) Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2547–2554
Idrees H, Tayyab M, Athrey K, Zhang D, Al-Maadeed S, Rajpoot N, Shah M (2018) Composition loss for counting, density map estimation and localization in dense crowds. In: Proceedings of the european conference on computer vision (ECCV), pp 532–546
Jiang X, Xiao Z, Zhang B, Zhen X, Cao X, Doermann D, Shao L (2019) Crowd counting and density estimation by trellis encoder-decoder networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6133–6142
Jiang X, Zhang L, Zhang T, Lv P, Zhou B, Pang Y, Xu M, Xu C (2020) Density-aware multi-task learning for crowd counting. IEEE Transactions on Multimedia
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization
Leibe B, Seemann E, Schiele B (2005) Pedestrian detection in crowded scenes. In: 2005 IEEE Computer society conference on computer vision and pattern recognition (CVPR’05), vol 1. IEEE, pp 878–885
Lempitsky V, Zisserman A (2010) Learning to count objects in images. In: Advances in neural information processing systems, pp 1324–1332
Li J, Xue Y, Wang W, Ouyang G (2019) Cross-level parallel network for crowd counting. IEEE Trans Indust Inform PP:1–1. https://doi.org/10.1109/TII.2019.2935244
Li Y, Zhang X, Chen D (2018) Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1091–1100
Lin SF, Chen JY, Chao HX (2001) Estimation of number of people in crowded scenes using perspective transformation. IEEE Transactions on Systems. Man Cybern-Part A Syst Humans 31(6):645–654
Liu N, Long Y, Zou C, Niu Q, Pan L, Wu H (2019) Adcrowdnet: An attention-injective deformable convolutional network for crowd understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3225–3234
Liu X, Van De Weijer J, Bagdanov AD (2018) Leveraging unlabeled data for crowd counting by learning to rank. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7661–7669
Marsden M, McGuinness K, Little S, O’Connor NE (2016) Fully convolutional crowd counting on highly congested scenes. arXiv:1612.00220
Miao Y, Lin Z, Ding G, Han J (2020) Shallow feature based dense attention network for crowd counting. In: AAAI, pp 11765–11772
Oh MH, Olsen PA, Ramamurthy KN (2020) Crowd counting with decomposed uncertainty. In: AAAI, pp 11799–11806
Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Machine Intell 24(7):971–987
Onoro-Rubio D, López-Sastre RJ (2016) Towards perspective-free object counting with deep learning. In: European conference on computer vision. Springer, pp 615–629
Paragios N, Ramesh V (2001) A mrf-based approach for real-time subway monitoring. In: Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001, vol 1. IEEE, pp I–I
Paszke A, Gross S, Massa F, Lerer A, Chintala S (2019) Pytorch: An imperative style high-performance deep learning library
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:1804.02767
Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR). IEEE, pp 4031–4039
Shen Z, Xu Y, Ni B, Wang M, Hu J, Yang X (2018) Crowd counting via adversarial cross-scale consistency pursuit. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5245–5254
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Sindagi VA, Patel VM (2017) Generating high-quality crowd density maps using contextual pyramid cnns. In: Proceedings of the IEEE international conference on computer vision, pp 1861–1870
Sindagi VA, Patel VM (2019) Ha-ccn: Hierarchical attention-based crowd counting network. IEEE Trans Image Process 29:323–335
Tuzel O, Porikli F, Meer P (2008) Pedestrian detection via classification on riemannian manifolds. IEEE Trans Pattern Anal Machine Intell 30(10):1713–1727
Walach E, Wolf L (2016) Learning to count with cnn boosting. In: European conference on computer vision. Springer, pp 660– 676
Wang C, Zhang H, Yang L, Liu S, Cao X (2015) Deep people counting in extremely dense crowds. In: Proceedings of the 23rd ACM international conference on multimedia, pp 1299– 1302
Yang B, Zhan W, Wang N, Liu X, Lv J (2019) Counting crowds using a scale-distribution-aware network and adaptive human-shaped kernel. Neurocomputing
Zhang A, Shen J, Xiao Z, Zhu F, Zhen X, Cao X, Shao L (2019) Relational attention network for crowd counting. In: Proceedings of the IEEE international conference on computer vision, pp 6788–6797
Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 589–597
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grants 61701029, the National Natural Science Foundation of Beijing under Grants L192036, and Innovation Fund for Industry, Education and Research, Science and Technology Development Center, Ministry of Education, under Grants 201920548040.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wu, D., Fan, Z. & Cui, M. Average up-sample network for crowd counting. Appl Intell 52, 1376–1388 (2022). https://doi.org/10.1007/s10489-021-02470-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02470-8