On the Evaluation of Video-Based Crowd Counting Models

Ledda, Emanuele; Putzu, Lorenzo; Delussu, Rita; Fumera, Giorgio; Roli, Fabio

doi:10.1007/978-3-031-06433-3_26

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13233))

Included in the following conference series:

International Conference on Image Analysis and Processing

1278 Accesses

Abstract

Crowd counting is a challenging and relevant computer vision task. Most of the existing methods are image-based, i.e., they only exploit the spatial information of a single image to estimate the corresponding people count. Recently, video-based methods have been proposed to improve counting accuracy by also exploiting temporal information coming from the correlation between adjacent frames. In this work, we point out the need to properly evaluate the temporal information’s specific contribution over the spatial one. This issue has not been discussed by existing work, and in some cases such evaluation has been carried out in a way that may lead to overestimating the contribution of the temporal information. To address this issue we propose a categorisation of existing video-based models, discuss how the contribution of the temporal information has been evaluated by existing work, and propose an evaluation approach aimed at providing a more complete evaluation for two different categories of video-based methods. We finally illustrate our approach, for a specific category, through experiments on several benchmark video data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bai, H., Chan, S.G.: Motion-guided non-local spatial-temporal network for video crowd counting. CoRR abs/2104.13946 (2021)
Google Scholar
Bandyopadhyay, S.: Optical flow based crowd counting in video frames. In: 10th International Conference on Computing, Communication and Networking Technologies, ICCCNT, pp. 1–6 (2019)
Google Scholar
Chan, A.B., Liang, Z.S.J., Vasconcelos, N.: Privacy preserving crowd monitoring: Counting people without people models or tracking. In: CVPR, pp. 1–7 (2008)
Google Scholar
Chen, K., Loy, C.C., Gong, S., Xiang, T.: Feature mining for localised crowd counting. In: BMVC, pp. 1–11 (2012)
Google Scholar
Delussu, R., Putzu, L., Fumera, G.: Investigating synthetic data sets for crowd counting in cross-scene scenarios. In: VISIGRAPP, pp. 365–372 (2020)
Google Scholar
Fan, Z., Zhang, H., Zhang, Z., Lu, G., Zhang, Y., Wang, Y.: A survey of crowd counting and density estimation based on convolutional neural network. Neurocomputing 472, 224–251 (2022)
Article Google Scholar
Fang, Y., Gao, S., Li, J., Luo, W., He, L., Hu, B.: Multi-level feature fusion based locality-constrained spatial transformer network for video crowd counting. Neurocomputing 392, 98–107 (2020)
Article Google Scholar
Fang, Y., Zhan, B., Cai, W., Gao, S., Hu, B.: Locality-constrained spatial transformer network for video crowd counting. In: International Conference on Multimedia and Expo, ICME, pp. 814–819 (2019)
Google Scholar
Ferryman, J., Shahrokni, A.: Pets 2009: dataset and challenge. In: IEEE International Workshop on PETS, pp. 1–6 (2009)
Google Scholar
Guerrero-Gómez-Olmedo, R., Torre-Jiménez, B., López-Sastre, R., Maldonado-Bascón, S., Oñoro-Rubio, D.: Extremely overlapping vehicle counting. In: Paredes, R., Cardoso, J.S., Pardo, X.M. (eds.) IbPRIA 2015. LNCS, vol. 9117, pp. 423–431. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19390-8_48
Chapter Google Scholar
Guo, Q., Zeng, X., Hu, S., Phoummixay, S., Ye, Y.: Learning a deep network with cross-hierarchy aggregation for crowd counting. Knowl. Based Syst. 213, 106691 (2021)
Article Google Scholar
Li, Y., Zhang, X., Chen, D.: CSRNet : dilated convolutional neural networks for understanding the highly congested scenes. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1091–1100. Computer Vision Foundation/IEEE Computer Society (2018)
Google Scholar
Liu, C., Huang, Y., Mu, Y., Yu, X.: Pointernet: spatiotemporal modeling for crowd counting in videos. In: ICDLT: 5th International Conference on Deep Learning Technologies, pp. 26–31 (2021)
Google Scholar
Liu, Y.-B., Jia, R.-S., Liu, Q.-M., Zhang, X.-L., Sun, H.-M.: Crowd counting method based on the self-attention residual network. Appl. Intell. 51(1), 427–440 (2020). https://doi.org/10.1007/s10489-020-01842-w
Article Google Scholar
Loy, C.C., Chen, K., Gong, S., Xiang, T.: Crowd counting and profiling: methodology and evaluation. In: Ali, S., Nishino, K., Manocha, D., Shah, M. (eds.) Modeling, Simulation and Visual Analysis of Crowds. TISVC, vol. 11, pp. 347–382. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-8483-7_14
Chapter Google Scholar
Ma, Z., Wei, X., Hong, X., Gong, Y.: Bayesian loss for crowd count estimation with point supervision. In: ICCV, pp. 6141–6150 (2019)
Google Scholar
Meng, S., Li, J., Guo, W., Ye, L., Jiang, J.: PHNet: parasite-host network for video crowd counting. In: 25th International Conference on Pattern Recognition, ICPR, pp. 1956–1963 (2020)
Google Scholar
Miao, Y., Han, J., Gao, Y., Zhang, B.: ST-CNN: spatial-temporal convolutional neural network for crowd counting in videos. Pattern Recognit. Lett. 125, 113–118 (2019)
Article Google Scholar
Sindagi, V., Patel, V.M.: A survey of recent advances in CNN-based single image crowd counting and density estimation. Pattern Recognit. Lett. 107, 3–16 (2017)
Article Google Scholar
Tripathy, S.K., Srivastava, R.: AMS-CNN: attentive multi-stream CNN for video-based crowd counting. Int. J. Multimedia. Inf. Retr. 10(4), 239–254 (2021). https://doi.org/10.1007/s13735-021-00220-7
Article Google Scholar
Wu, Q., Zhang, C., Kong, X., Zhao, M., Chen, Y.: Triple attention for robust video crowd counting. In: IEEE ICIP, pp. 1966–1970 (2020)
Google Scholar
Wu, X., Xu, B., Zheng, Y., Ye, H., Yang, J., He, L.: Fast video crowd counting with a temporal aware network. Neurocomputing 403, 13–20 (2020)
Article Google Scholar
Xiong, F., Shi, X., Yeung, D.: Spatiotemporal modeling for crowd counting in videos. In: International Conference on Computer Vision, ICCV, pp. 5161–5169 (2017)
Google Scholar
Xu, C., Liang, D., Xu, Y., et al.: AutoScale: learning to scale for crowd counting. Int. J. Comput. Vis. 130, 405–434 (2022). https://doi.org/10.1007/s11263-021-01542-z
Article Google Scholar
Yu, Y., Zhu, H., Wang, L., Pedrycz, W.: Dense crowd counting based on adaptive scene division. Int. J. Mach. Learn. Cybern. 12(4), 931–942 (2020). https://doi.org/10.1007/s13042-020-01212-5
Article Google Scholar
Zhang, S., Zhang, X., Li, H., He, H., Song, D., Wang, L.: Hierarchical pyramid attentive network with spatial separable convolution for crowd counting. Eng. Appl. Artif. Intell. 108, 104563 (2022)
Article Google Scholar
Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: CVPR, pp. 589–597 (2016)
Google Scholar
Zhu, F., Yan, H., Chen, X., Li, T.: Real-time crowd counting via lightweight scale-aware network. Neurocomputing 472, 54–67 (2022)
Article Google Scholar
Zhu, F., Yan, H., Chen, X., Li, T., Zhang, Z.: A multi-scale and multi-level feature aggregation network for crowd counting. Neurocomputing 423, 46–56 (2021)
Article Google Scholar
Zou, Z., Shao, H., Qu, X., Wei, W., Zhou, P.: Enhanced 3D convolutional networks for crowd counting. In: 30th British Machine Vision Conference, BMVC, p. 250 (2019)
Google Scholar

Download references

Acknowledgement

This work was supported by the projects “Law Enforcement agencies human factor methods and Toolkit for the Security and protection of CROWDs in mass gatherings” (LETSCROWD), EU Horizon 2020 programme, grant agreement No. 740466, and “IMaging MAnagement Guidelines and Informatics Network for law enforcement Agencies” (IMMAGINA), European Space Agency, ARTES Integrated Applications Promotion Programme, contract No. 4000133110/20/NL/AF.

Emanuele Ledda is affiliated with the Italian National PhD in Artificial Intelligence, Sapienza University of Rome. He also acknowledges the cooperation with and the support from the Pattern Recognition and Applications Lab of the University of Cagliari.

Author information

Authors and Affiliations

Department of Informatics, Bioengineering, Robotics, and Systems Engineering, University of Genova, Genoa, Italy
Emanuele Ledda & Fabio Roli
Department of Electrical and Electronic Engineering, University of Cagliari, Cagliari, Italy
Lorenzo Putzu, Rita Delussu, Giorgio Fumera & Fabio Roli

Authors

Emanuele Ledda
View author publications
You can also search for this author in PubMed Google Scholar
Lorenzo Putzu
View author publications
You can also search for this author in PubMed Google Scholar
Rita Delussu
View author publications
You can also search for this author in PubMed Google Scholar
Giorgio Fumera
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Roli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Emanuele Ledda .

Editor information

Editors and Affiliations

Boston University, Boston, MA, USA
Stan Sclaroff
National Research Council, Lecce, Italy
Cosimo Distante
National Research Council, Lecce, Italy
Marco Leo
University of Catania, Catania, Italy
Giovanni M. Farinella
Technische Universität München, Garching, Germany
Federico Tombari

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ledda, E., Putzu, L., Delussu, R., Fumera, G., Roli, F. (2022). On the Evaluation of Video-Based Crowd Counting Models. In: Sclaroff, S., Distante, C., Leo, M., Farinella, G.M., Tombari, F. (eds) Image Analysis and Processing – ICIAP 2022. ICIAP 2022. Lecture Notes in Computer Science, vol 13233. Springer, Cham. https://doi.org/10.1007/978-3-031-06433-3_26

Download citation

DOI: https://doi.org/10.1007/978-3-031-06433-3_26
Published: 15 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06432-6
Online ISBN: 978-3-031-06433-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On the Evaluation of Video-Based Crowd Counting Models