Abstract
Video-based crowd counting is an essential surveillance tool that plays a crucial role in mitigating crowd catastrophes by facilitating the development and implementation of efficient crowd management methods. The deep learning approaches using density map-based regression consider local crowd distribution but are erroneous for point-level annotation of human heads. The weakly supervised approach overcomes such an issue by mapping global crowd attributes onto ground-truth counts. Also, video-based density map regression approaches don’t handle human shape variation and background effects. Hence, this research suggests a unique cascade of two deep structures: a local density map regressor and a global crowd count regressor with weakly supervised learning. The former model can effectively deal with human shape variation, minimise background effects, consider local crowd distribution, and provide crowd density maps. In contrast, the latter adopts a weakly supervised learning mechanism and provides scene-level crowd counting by considering global attributes of density maps. The trials were conducted using three datasets, namely Venice, Mall, and UCSD, yielding promising and improved outcomes. The codes can be available at https://github.com/santosh1448/LDR_GCCR_Weakly_Supervised.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Audoux Y, Montemurro M, Pailhes J (2018) A surrogate model based on non-uniform rational B-splines hypersurfaces. Procedia CIRP 70:463–468. https://doi.org/10.1016/j.procir.2018.03.234
Audoux Y, Montemurro M, Pailhès J (2020a) A metamodel based on non-uniform rational basis spline hyper-surfaces for optimisation of composite structures. Compos Struct 247:112439. https://doi.org/10.1016/j.compstruct.2020.112439
Audoux Y, Montemurro M, Pailhès J (2020b) Non-uniform rational basis spline hyper-surfaces for metamodelling. Comput Methods Appl Mech Eng 364:112918. https://doi.org/10.1016/j.cma.2020.112918
Avvenuti M, Bongiovanni M, Falchi F (2023) A spatio-temporal attentive network for video-based crowd counting 4 th 5 th claudio gennaro 6 th nicola messina.” [Online]. Available: https://tinyurl.com/yb42ce38
Chan AB, Liang ZSJ, Vasconcelos N (2008) Privacy preserving crowd monitoring: counting people without people models or tracking. In: 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR. https://doi.org/10.1109/CVPR.2008.4587569
Chan AB, Vasconcelos N (2012) Counting people with low-level features and bayesian regression. IEEE Trans Image Process 21(4):2160–2177. https://doi.org/10.1109/TIP.2011.2172800
Chen K, Loy CC, Gong S, Xiang T (2012) Feature mining for localised crowd counting. BMVC 1(2):1–11
Chen K, Gong S, Xiang T, Loy CC (2013) Cumulative attribute space for age and crowd density estimation. Proc IEEE Comput Soc Conf Comput Vis Pattern Recogn 2:2467–2474. https://doi.org/10.1109/CVPR.2013.319
Deb D, Ventura J (2018) An aggregated multicolumn dilated convolution network for perspective-free counting. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, vol. 2018-June, pp. 308–317. https://doi.org/10.1109/CVPRW.2018.00057
Han K, Wan W, Yao H, Hou L (2024) Image crowd counting using convolutional neural network and markov random field, pp. 1–6
Kang D, Chan A (2019) Crowd counting by adaptively fusing predictions from an image pyramid. In: British Machine Vision Conference 2018, BMVC 2018, pp. 1–12
Khan MA, Menouar H, Hamila R (2023) LCDnet: a lightweight crowd density estimation model for real-time video surveillance. J Real Time Image Process. https://doi.org/10.1007/s11554-023-01286-8
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization,pp. 1–15
Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, pp. 1–15
Lempitsky V, Zisserman A (2024) Learning to count objects in images. Adv Neural Inform Process Syst 3(3):1–5
Li Y, Zhang X, Chen D (2018) CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1091–1100. https://doi.org/10.1109/CVPR.2018.00120
Liu W, Venkatesh S, An S (20017) Face recognition using kernel ridge regression. In: CVPR’07. IEEE Conference on, IEEE, pp. 1–7
Liu W, Salzmann M, Fua P (2019) Context-aware crowd counting. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2019 pp. 5094–5103. https://doi.org/10.1109/CVPR.2019.00524
Miao Y, Han J, Gao Y, Zhang B (2019) ST-CNN: Spatial-Temporal Convolutional Neural Network for crowd counting in videos. Pattern Recogn Lett 125:113–118. https://doi.org/10.1016/j.patrec.2019.04.012
D Oñoro-Rubio, RJ López-Sastre (2016) Towards perspective-free object counting with deep learning. In: European Conference on Computer Vision. Springer, Cham, pp. 615–629. https://doi.org/10.1007/978-3-319-46478-7_38
Pham VQ, Kozakaya T, Yamaguchi O, Okada R (2015) COUNT forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2015 Inter, pp. 3253–3261. https://doi.org/10.1109/ICCV.2015.372.
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature. https://doi.org/10.1038/323533a0
Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. In: Proceedings—30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 4031–4039. https://doi.org/10.1109/CVPR.2017.429.
Saqib M, Khan SD, Sharma N, Blumenstein M (2019) Crowd counting in low-resolution crowded scenes using region-based deep convolutional neural networks. IEEE Access 7:35317–35329. https://doi.org/10.1109/ACCESS.2019.2904712
Savner SS, Kanhangad V (2023) CrowdFormer: weakly-supervised crowd counting with improved generalizability. J vis Commun Image Represent 94:103853. https://doi.org/10.1016/j.jvcir.2023.103853
Sindagi VA, Patel VM (2020) HA-CCN: hierarchical attention-based crowd counting network. IEEE Trans Image Process 29(8):323–335. https://doi.org/10.1109/TIP.2019.2928634
Tripathy SK, Srivastava R (2021a) AMS-CNN: attentive multi-stream CNN for video-based crowd counting. Int J Multimed Inf Retr. https://doi.org/10.1007/s13735-021-00220-7
Tripathy SK, Srivastava R (2021b) A transfer learning-based multi-cues multi-scale spatial–temporal modeling for effective video-based crowd counting and density estimation using a single-column 2D-atrous net, pp. 179–194. https://doi.org/10.1007/978-981-16-5078-9_16.
Wang L, Yin B, Tang X, Li Y (2019) Removing background interference for crowd counting via de-background detail convolutional network. Neurocomputing 332:360–371. https://doi.org/10.1016/j.neucom.2018.12.047
Wang Y, Zhang W, Liu Y, Zhu J (2020) Multi-density map fusion network for crowd counting. Neurocomputing. https://doi.org/10.1016/j.neucom.2020.02.010
Wei X, Du J, Liang M, Ye L (2019) Boosting deep attribute learning via support vector regression for fast moving crowd counting. Pattern Recogn Lett 119:12–23. https://doi.org/10.1016/j.patrec.2017.12.002
Xiong F, Shi X, Yeung DY (2017) Spatiotemporal modeling for crowd counting in videos. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2017-Octob, pp. 5161–5169. https://doi.org/10.1109/ICCV.2017.551s
Xu M et al (2019) Depth information guided crowd counting for complex crowd scenes. Pattern Recogn Lett 125:563–569. https://doi.org/10.1016/j.patrec.2019.02.026
Yingying Zhang YM, Zhou D, Chen S, Gao S (2016) Single-image crowd counting via multi-column convolutional neural network. CVPR 2(35):11431–11437. https://doi.org/10.1002/slct.201701956
Zhang S, Wu G (2017) FCN-rLSTM: deep spatio-temporal neural networks for. Iccv, pp. 3687–3696
Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 07–12-June, pp. 833–841. https://doi.org/10.1109/CVPR.2015.7298684.
Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016a) Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 589–597. https://doi.org/10.1002/slct.201701956
Zhang L, Lin L, Liang X, He K (2016b) Is faster R-CNN doing well for pedestrian detection?. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9906 LNCS, pp. 443–457. https://doi.org/10.1007/978-3-319-46475-6_28.
Zhang L, Shi M, Chen Q (2018) Crowd counting via scale-adaptive convolutional neural network. In: Proceedings—2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018, vol. 2018-Janua, no. 1, pp. 1113–1121. https://doi.org/10.1109/WACV.2018.00127
Zhou Y, Yang J, Li H, Cao T, Kung S-Y (2020) Adversarial learning for multiscale crowd counting under complex scenes. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2019.2956091
Acknowledgements
The support and the resources provided by ‘PARAM Shivay Facility' under the National Supercomputing Mission, Government of India at the Indian Institute of Technology, Varanasi, are gratefully acknowledged.
Funding
There was no funding obtained for this study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they don’t have any conflict of interest that could have influenced the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tripathy, S.K., Srivastava, S., Bajaj, D. et al. A Novel cascaded deep architecture with weak-supervision for video crowd counting and density estimation. Soft Comput 28, 8319–8335 (2024). https://doi.org/10.1007/s00500-024-09681-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-024-09681-4