Skip to main content

On the Evaluation of Video-Based Crowd Counting Models

  • Conference paper
  • First Online:
Image Analysis and Processing – ICIAP 2022 (ICIAP 2022)

Abstract

Crowd counting is a challenging and relevant computer vision task. Most of the existing methods are image-based, i.e., they only exploit the spatial information of a single image to estimate the corresponding people count. Recently, video-based methods have been proposed to improve counting accuracy by also exploiting temporal information coming from the correlation between adjacent frames. In this work, we point out the need to properly evaluate the temporal information’s specific contribution over the spatial one. This issue has not been discussed by existing work, and in some cases such evaluation has been carried out in a way that may lead to overestimating the contribution of the temporal information. To address this issue we propose a categorisation of existing video-based models, discuss how the contribution of the temporal information has been evaluated by existing work, and propose an evaluation approach aimed at providing a more complete evaluation for two different categories of video-based methods. We finally illustrate our approach, for a specific category, through experiments on several benchmark video data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bai, H., Chan, S.G.: Motion-guided non-local spatial-temporal network for video crowd counting. CoRR abs/2104.13946 (2021)

    Google Scholar 

  2. Bandyopadhyay, S.: Optical flow based crowd counting in video frames. In: 10th International Conference on Computing, Communication and Networking Technologies, ICCCNT, pp. 1–6 (2019)

    Google Scholar 

  3. Chan, A.B., Liang, Z.S.J., Vasconcelos, N.: Privacy preserving crowd monitoring: Counting people without people models or tracking. In: CVPR, pp. 1–7 (2008)

    Google Scholar 

  4. Chen, K., Loy, C.C., Gong, S., Xiang, T.: Feature mining for localised crowd counting. In: BMVC, pp. 1–11 (2012)

    Google Scholar 

  5. Delussu, R., Putzu, L., Fumera, G.: Investigating synthetic data sets for crowd counting in cross-scene scenarios. In: VISIGRAPP, pp. 365–372 (2020)

    Google Scholar 

  6. Fan, Z., Zhang, H., Zhang, Z., Lu, G., Zhang, Y., Wang, Y.: A survey of crowd counting and density estimation based on convolutional neural network. Neurocomputing 472, 224–251 (2022)

    Article  Google Scholar 

  7. Fang, Y., Gao, S., Li, J., Luo, W., He, L., Hu, B.: Multi-level feature fusion based locality-constrained spatial transformer network for video crowd counting. Neurocomputing 392, 98–107 (2020)

    Article  Google Scholar 

  8. Fang, Y., Zhan, B., Cai, W., Gao, S., Hu, B.: Locality-constrained spatial transformer network for video crowd counting. In: International Conference on Multimedia and Expo, ICME, pp. 814–819 (2019)

    Google Scholar 

  9. Ferryman, J., Shahrokni, A.: Pets 2009: dataset and challenge. In: IEEE International Workshop on PETS, pp. 1–6 (2009)

    Google Scholar 

  10. Guerrero-Gómez-Olmedo, R., Torre-Jiménez, B., López-Sastre, R., Maldonado-Bascón, S., Oñoro-Rubio, D.: Extremely overlapping vehicle counting. In: Paredes, R., Cardoso, J.S., Pardo, X.M. (eds.) IbPRIA 2015. LNCS, vol. 9117, pp. 423–431. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19390-8_48

    Chapter  Google Scholar 

  11. Guo, Q., Zeng, X., Hu, S., Phoummixay, S., Ye, Y.: Learning a deep network with cross-hierarchy aggregation for crowd counting. Knowl. Based Syst. 213, 106691 (2021)

    Article  Google Scholar 

  12. Li, Y., Zhang, X., Chen, D.: CSRNet : dilated convolutional neural networks for understanding the highly congested scenes. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1091–1100. Computer Vision Foundation/IEEE Computer Society (2018)

    Google Scholar 

  13. Liu, C., Huang, Y., Mu, Y., Yu, X.: Pointernet: spatiotemporal modeling for crowd counting in videos. In: ICDLT: 5th International Conference on Deep Learning Technologies, pp. 26–31 (2021)

    Google Scholar 

  14. Liu, Y.-B., Jia, R.-S., Liu, Q.-M., Zhang, X.-L., Sun, H.-M.: Crowd counting method based on the self-attention residual network. Appl. Intell. 51(1), 427–440 (2020). https://doi.org/10.1007/s10489-020-01842-w

    Article  Google Scholar 

  15. Loy, C.C., Chen, K., Gong, S., Xiang, T.: Crowd counting and profiling: methodology and evaluation. In: Ali, S., Nishino, K., Manocha, D., Shah, M. (eds.) Modeling, Simulation and Visual Analysis of Crowds. TISVC, vol. 11, pp. 347–382. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-8483-7_14

    Chapter  Google Scholar 

  16. Ma, Z., Wei, X., Hong, X., Gong, Y.: Bayesian loss for crowd count estimation with point supervision. In: ICCV, pp. 6141–6150 (2019)

    Google Scholar 

  17. Meng, S., Li, J., Guo, W., Ye, L., Jiang, J.: PHNet: parasite-host network for video crowd counting. In: 25th International Conference on Pattern Recognition, ICPR, pp. 1956–1963 (2020)

    Google Scholar 

  18. Miao, Y., Han, J., Gao, Y., Zhang, B.: ST-CNN: spatial-temporal convolutional neural network for crowd counting in videos. Pattern Recognit. Lett. 125, 113–118 (2019)

    Article  Google Scholar 

  19. Sindagi, V., Patel, V.M.: A survey of recent advances in CNN-based single image crowd counting and density estimation. Pattern Recognit. Lett. 107, 3–16 (2017)

    Article  Google Scholar 

  20. Tripathy, S.K., Srivastava, R.: AMS-CNN: attentive multi-stream CNN for video-based crowd counting. Int. J. Multimedia. Inf. Retr. 10(4), 239–254 (2021). https://doi.org/10.1007/s13735-021-00220-7

    Article  Google Scholar 

  21. Wu, Q., Zhang, C., Kong, X., Zhao, M., Chen, Y.: Triple attention for robust video crowd counting. In: IEEE ICIP, pp. 1966–1970 (2020)

    Google Scholar 

  22. Wu, X., Xu, B., Zheng, Y., Ye, H., Yang, J., He, L.: Fast video crowd counting with a temporal aware network. Neurocomputing 403, 13–20 (2020)

    Article  Google Scholar 

  23. Xiong, F., Shi, X., Yeung, D.: Spatiotemporal modeling for crowd counting in videos. In: International Conference on Computer Vision, ICCV, pp. 5161–5169 (2017)

    Google Scholar 

  24. Xu, C., Liang, D., Xu, Y., et al.: AutoScale: learning to scale for crowd counting. Int. J. Comput. Vis. 130, 405–434 (2022). https://doi.org/10.1007/s11263-021-01542-z

    Article  Google Scholar 

  25. Yu, Y., Zhu, H., Wang, L., Pedrycz, W.: Dense crowd counting based on adaptive scene division. Int. J. Mach. Learn. Cybern. 12(4), 931–942 (2020). https://doi.org/10.1007/s13042-020-01212-5

    Article  Google Scholar 

  26. Zhang, S., Zhang, X., Li, H., He, H., Song, D., Wang, L.: Hierarchical pyramid attentive network with spatial separable convolution for crowd counting. Eng. Appl. Artif. Intell. 108, 104563 (2022)

    Article  Google Scholar 

  27. Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: CVPR, pp. 589–597 (2016)

    Google Scholar 

  28. Zhu, F., Yan, H., Chen, X., Li, T.: Real-time crowd counting via lightweight scale-aware network. Neurocomputing 472, 54–67 (2022)

    Article  Google Scholar 

  29. Zhu, F., Yan, H., Chen, X., Li, T., Zhang, Z.: A multi-scale and multi-level feature aggregation network for crowd counting. Neurocomputing 423, 46–56 (2021)

    Article  Google Scholar 

  30. Zou, Z., Shao, H., Qu, X., Wei, W., Zhou, P.: Enhanced 3D convolutional networks for crowd counting. In: 30th British Machine Vision Conference, BMVC, p. 250 (2019)

    Google Scholar 

Download references

Acknowledgement

This work was supported by the projects “Law Enforcement agencies human factor methods and Toolkit for the Security and protection of CROWDs in mass gatherings” (LETSCROWD), EU Horizon 2020 programme, grant agreement No. 740466, and “IMaging MAnagement Guidelines and Informatics Network for law enforcement Agencies” (IMMAGINA), European Space Agency, ARTES Integrated Applications Promotion Programme, contract No. 4000133110/20/NL/AF.

Emanuele Ledda is affiliated with the Italian National PhD in Artificial Intelligence, Sapienza University of Rome. He also acknowledges the cooperation with and the support from the Pattern Recognition and Applications Lab of the University of Cagliari.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Emanuele Ledda .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ledda, E., Putzu, L., Delussu, R., Fumera, G., Roli, F. (2022). On the Evaluation of Video-Based Crowd Counting Models. In: Sclaroff, S., Distante, C., Leo, M., Farinella, G.M., Tombari, F. (eds) Image Analysis and Processing – ICIAP 2022. ICIAP 2022. Lecture Notes in Computer Science, vol 13233. Springer, Cham. https://doi.org/10.1007/978-3-031-06433-3_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-06433-3_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06432-6

  • Online ISBN: 978-3-031-06433-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics