Abstract
Crowd counting is a challenging and relevant computer vision task. Most of the existing methods are image-based, i.e., they only exploit the spatial information of a single image to estimate the corresponding people count. Recently, video-based methods have been proposed to improve counting accuracy by also exploiting temporal information coming from the correlation between adjacent frames. In this work, we point out the need to properly evaluate the temporal information’s specific contribution over the spatial one. This issue has not been discussed by existing work, and in some cases such evaluation has been carried out in a way that may lead to overestimating the contribution of the temporal information. To address this issue we propose a categorisation of existing video-based models, discuss how the contribution of the temporal information has been evaluated by existing work, and propose an evaluation approach aimed at providing a more complete evaluation for two different categories of video-based methods. We finally illustrate our approach, for a specific category, through experiments on several benchmark video data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bai, H., Chan, S.G.: Motion-guided non-local spatial-temporal network for video crowd counting. CoRR abs/2104.13946 (2021)
Bandyopadhyay, S.: Optical flow based crowd counting in video frames. In: 10th International Conference on Computing, Communication and Networking Technologies, ICCCNT, pp. 1–6 (2019)
Chan, A.B., Liang, Z.S.J., Vasconcelos, N.: Privacy preserving crowd monitoring: Counting people without people models or tracking. In: CVPR, pp. 1–7 (2008)
Chen, K., Loy, C.C., Gong, S., Xiang, T.: Feature mining for localised crowd counting. In: BMVC, pp. 1–11 (2012)
Delussu, R., Putzu, L., Fumera, G.: Investigating synthetic data sets for crowd counting in cross-scene scenarios. In: VISIGRAPP, pp. 365–372 (2020)
Fan, Z., Zhang, H., Zhang, Z., Lu, G., Zhang, Y., Wang, Y.: A survey of crowd counting and density estimation based on convolutional neural network. Neurocomputing 472, 224–251 (2022)
Fang, Y., Gao, S., Li, J., Luo, W., He, L., Hu, B.: Multi-level feature fusion based locality-constrained spatial transformer network for video crowd counting. Neurocomputing 392, 98–107 (2020)
Fang, Y., Zhan, B., Cai, W., Gao, S., Hu, B.: Locality-constrained spatial transformer network for video crowd counting. In: International Conference on Multimedia and Expo, ICME, pp. 814–819 (2019)
Ferryman, J., Shahrokni, A.: Pets 2009: dataset and challenge. In: IEEE International Workshop on PETS, pp. 1–6 (2009)
Guerrero-Gómez-Olmedo, R., Torre-Jiménez, B., López-Sastre, R., Maldonado-Bascón, S., Oñoro-Rubio, D.: Extremely overlapping vehicle counting. In: Paredes, R., Cardoso, J.S., Pardo, X.M. (eds.) IbPRIA 2015. LNCS, vol. 9117, pp. 423–431. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19390-8_48
Guo, Q., Zeng, X., Hu, S., Phoummixay, S., Ye, Y.: Learning a deep network with cross-hierarchy aggregation for crowd counting. Knowl. Based Syst. 213, 106691 (2021)
Li, Y., Zhang, X., Chen, D.: CSRNet : dilated convolutional neural networks for understanding the highly congested scenes. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1091–1100. Computer Vision Foundation/IEEE Computer Society (2018)
Liu, C., Huang, Y., Mu, Y., Yu, X.: Pointernet: spatiotemporal modeling for crowd counting in videos. In: ICDLT: 5th International Conference on Deep Learning Technologies, pp. 26–31 (2021)
Liu, Y.-B., Jia, R.-S., Liu, Q.-M., Zhang, X.-L., Sun, H.-M.: Crowd counting method based on the self-attention residual network. Appl. Intell. 51(1), 427–440 (2020). https://doi.org/10.1007/s10489-020-01842-w
Loy, C.C., Chen, K., Gong, S., Xiang, T.: Crowd counting and profiling: methodology and evaluation. In: Ali, S., Nishino, K., Manocha, D., Shah, M. (eds.) Modeling, Simulation and Visual Analysis of Crowds. TISVC, vol. 11, pp. 347–382. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-8483-7_14
Ma, Z., Wei, X., Hong, X., Gong, Y.: Bayesian loss for crowd count estimation with point supervision. In: ICCV, pp. 6141–6150 (2019)
Meng, S., Li, J., Guo, W., Ye, L., Jiang, J.: PHNet: parasite-host network for video crowd counting. In: 25th International Conference on Pattern Recognition, ICPR, pp. 1956–1963 (2020)
Miao, Y., Han, J., Gao, Y., Zhang, B.: ST-CNN: spatial-temporal convolutional neural network for crowd counting in videos. Pattern Recognit. Lett. 125, 113–118 (2019)
Sindagi, V., Patel, V.M.: A survey of recent advances in CNN-based single image crowd counting and density estimation. Pattern Recognit. Lett. 107, 3–16 (2017)
Tripathy, S.K., Srivastava, R.: AMS-CNN: attentive multi-stream CNN for video-based crowd counting. Int. J. Multimedia. Inf. Retr. 10(4), 239–254 (2021). https://doi.org/10.1007/s13735-021-00220-7
Wu, Q., Zhang, C., Kong, X., Zhao, M., Chen, Y.: Triple attention for robust video crowd counting. In: IEEE ICIP, pp. 1966–1970 (2020)
Wu, X., Xu, B., Zheng, Y., Ye, H., Yang, J., He, L.: Fast video crowd counting with a temporal aware network. Neurocomputing 403, 13–20 (2020)
Xiong, F., Shi, X., Yeung, D.: Spatiotemporal modeling for crowd counting in videos. In: International Conference on Computer Vision, ICCV, pp. 5161–5169 (2017)
Xu, C., Liang, D., Xu, Y., et al.: AutoScale: learning to scale for crowd counting. Int. J. Comput. Vis. 130, 405–434 (2022). https://doi.org/10.1007/s11263-021-01542-z
Yu, Y., Zhu, H., Wang, L., Pedrycz, W.: Dense crowd counting based on adaptive scene division. Int. J. Mach. Learn. Cybern. 12(4), 931–942 (2020). https://doi.org/10.1007/s13042-020-01212-5
Zhang, S., Zhang, X., Li, H., He, H., Song, D., Wang, L.: Hierarchical pyramid attentive network with spatial separable convolution for crowd counting. Eng. Appl. Artif. Intell. 108, 104563 (2022)
Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: CVPR, pp. 589–597 (2016)
Zhu, F., Yan, H., Chen, X., Li, T.: Real-time crowd counting via lightweight scale-aware network. Neurocomputing 472, 54–67 (2022)
Zhu, F., Yan, H., Chen, X., Li, T., Zhang, Z.: A multi-scale and multi-level feature aggregation network for crowd counting. Neurocomputing 423, 46–56 (2021)
Zou, Z., Shao, H., Qu, X., Wei, W., Zhou, P.: Enhanced 3D convolutional networks for crowd counting. In: 30th British Machine Vision Conference, BMVC, p. 250 (2019)
Acknowledgement
This work was supported by the projects “Law Enforcement agencies human factor methods and Toolkit for the Security and protection of CROWDs in mass gatherings” (LETSCROWD), EU Horizon 2020 programme, grant agreement No. 740466, and “IMaging MAnagement Guidelines and Informatics Network for law enforcement Agencies” (IMMAGINA), European Space Agency, ARTES Integrated Applications Promotion Programme, contract No. 4000133110/20/NL/AF.
Emanuele Ledda is affiliated with the Italian National PhD in Artificial Intelligence, Sapienza University of Rome. He also acknowledges the cooperation with and the support from the Pattern Recognition and Applications Lab of the University of Cagliari.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ledda, E., Putzu, L., Delussu, R., Fumera, G., Roli, F. (2022). On the Evaluation of Video-Based Crowd Counting Models. In: Sclaroff, S., Distante, C., Leo, M., Farinella, G.M., Tombari, F. (eds) Image Analysis and Processing – ICIAP 2022. ICIAP 2022. Lecture Notes in Computer Science, vol 13233. Springer, Cham. https://doi.org/10.1007/978-3-031-06433-3_26
Download citation
DOI: https://doi.org/10.1007/978-3-031-06433-3_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06432-6
Online ISBN: 978-3-031-06433-3
eBook Packages: Computer ScienceComputer Science (R0)