A Novel cascaded deep architecture with weak-supervision for video crowd counting and density estimation

Tripathy, Santosh Kumar; Srivastava, Subodh; Bajaj, Divij; Srivastava, Rajeev

doi:10.1007/s00500-024-09681-4

A Novel cascaded deep architecture with weak-supervision for video crowd counting and density estimation

Application of soft computing
Published: 04 July 2024

Volume 28, pages 8319–8335, (2024)
Cite this article

Soft Computing Aims and scope Submit manuscript

Santosh Kumar Tripathy ORCID: orcid.org/0000-0002-1995-2525¹,
Subodh Srivastava²,
Divij Bajaj³ &
…
Rajeev Srivastava¹

158 Accesses
Explore all metrics

Abstract

Video-based crowd counting is an essential surveillance tool that plays a crucial role in mitigating crowd catastrophes by facilitating the development and implementation of efficient crowd management methods. The deep learning approaches using density map-based regression consider local crowd distribution but are erroneous for point-level annotation of human heads. The weakly supervised approach overcomes such an issue by mapping global crowd attributes onto ground-truth counts. Also, video-based density map regression approaches don’t handle human shape variation and background effects. Hence, this research suggests a unique cascade of two deep structures: a local density map regressor and a global crowd count regressor with weakly supervised learning. The former model can effectively deal with human shape variation, minimise background effects, consider local crowd distribution, and provide crowd density maps. In contrast, the latter adopts a weakly supervised learning mechanism and provides scene-level crowd counting by considering global attributes of density maps. The trials were conducted using three datasets, namely Venice, Mall, and UCSD, yielding promising and improved outcomes. The codes can be available at https://github.com/santosh1448/LDR_GCCR_Weakly_Supervised.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 9

Analytical Study of YOLO and Its Various Versions in Crowd Counting

Approaches on crowd counting and density estimation: a review

Article 20 February 2021

Denstity Level Aware Network for Crowd Counting

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The data that support the findings of this study can be obtained from the benchmark dataset Venice (Liu et al. 2019), Mall (Chen et al. 2012) and UCSD (Chan et al. 2008).

References

Audoux Y, Montemurro M, Pailhes J (2018) A surrogate model based on non-uniform rational B-splines hypersurfaces. Procedia CIRP 70:463–468. https://doi.org/10.1016/j.procir.2018.03.234
Article Google Scholar
Audoux Y, Montemurro M, Pailhès J (2020a) A metamodel based on non-uniform rational basis spline hyper-surfaces for optimisation of composite structures. Compos Struct 247:112439. https://doi.org/10.1016/j.compstruct.2020.112439
Article Google Scholar
Audoux Y, Montemurro M, Pailhès J (2020b) Non-uniform rational basis spline hyper-surfaces for metamodelling. Comput Methods Appl Mech Eng 364:112918. https://doi.org/10.1016/j.cma.2020.112918
Article MathSciNet Google Scholar
Avvenuti M, Bongiovanni M, Falchi F (2023) A spatio-temporal attentive network for video-based crowd counting 4 th 5 th claudio gennaro 6 th nicola messina.” [Online]. Available: https://tinyurl.com/yb42ce38
Chan AB, Liang ZSJ, Vasconcelos N (2008) Privacy preserving crowd monitoring: counting people without people models or tracking. In: 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR. https://doi.org/10.1109/CVPR.2008.4587569
Chan AB, Vasconcelos N (2012) Counting people with low-level features and bayesian regression. IEEE Trans Image Process 21(4):2160–2177. https://doi.org/10.1109/TIP.2011.2172800
Article MathSciNet Google Scholar
Chen K, Loy CC, Gong S, Xiang T (2012) Feature mining for localised crowd counting. BMVC 1(2):1–11
Google Scholar
Chen K, Gong S, Xiang T, Loy CC (2013) Cumulative attribute space for age and crowd density estimation. Proc IEEE Comput Soc Conf Comput Vis Pattern Recogn 2:2467–2474. https://doi.org/10.1109/CVPR.2013.319
Article Google Scholar
Deb D, Ventura J (2018) An aggregated multicolumn dilated convolution network for perspective-free counting. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, vol. 2018-June, pp. 308–317. https://doi.org/10.1109/CVPRW.2018.00057
Han K, Wan W, Yao H, Hou L (2024) Image crowd counting using convolutional neural network and markov random field, pp. 1–6
Kang D, Chan A (2019) Crowd counting by adaptively fusing predictions from an image pyramid. In: British Machine Vision Conference 2018, BMVC 2018, pp. 1–12
Khan MA, Menouar H, Hamila R (2023) LCDnet: a lightweight crowd density estimation model for real-time video surveillance. J Real Time Image Process. https://doi.org/10.1007/s11554-023-01286-8
Article Google Scholar
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization,pp. 1–15
Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, pp. 1–15
Lempitsky V, Zisserman A (2024) Learning to count objects in images. Adv Neural Inform Process Syst 3(3):1–5
Google Scholar
Li Y, Zhang X, Chen D (2018) CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1091–1100. https://doi.org/10.1109/CVPR.2018.00120
Liu W, Venkatesh S, An S (20017) Face recognition using kernel ridge regression. In: CVPR’07. IEEE Conference on, IEEE, pp. 1–7
Liu W, Salzmann M, Fua P (2019) Context-aware crowd counting. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2019 pp. 5094–5103. https://doi.org/10.1109/CVPR.2019.00524
Miao Y, Han J, Gao Y, Zhang B (2019) ST-CNN: Spatial-Temporal Convolutional Neural Network for crowd counting in videos. Pattern Recogn Lett 125:113–118. https://doi.org/10.1016/j.patrec.2019.04.012
Article Google Scholar
D Oñoro-Rubio, RJ López-Sastre (2016) Towards perspective-free object counting with deep learning. In: European Conference on Computer Vision. Springer, Cham, pp. 615–629. https://doi.org/10.1007/978-3-319-46478-7_38
Pham VQ, Kozakaya T, Yamaguchi O, Okada R (2015) COUNT forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2015 Inter, pp. 3253–3261. https://doi.org/10.1109/ICCV.2015.372.
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature. https://doi.org/10.1038/323533a0
Article Google Scholar
Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. In: Proceedings—30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 4031–4039. https://doi.org/10.1109/CVPR.2017.429.
Saqib M, Khan SD, Sharma N, Blumenstein M (2019) Crowd counting in low-resolution crowded scenes using region-based deep convolutional neural networks. IEEE Access 7:35317–35329. https://doi.org/10.1109/ACCESS.2019.2904712
Article Google Scholar
Savner SS, Kanhangad V (2023) CrowdFormer: weakly-supervised crowd counting with improved generalizability. J vis Commun Image Represent 94:103853. https://doi.org/10.1016/j.jvcir.2023.103853
Article Google Scholar
Sindagi VA, Patel VM (2020) HA-CCN: hierarchical attention-based crowd counting network. IEEE Trans Image Process 29(8):323–335. https://doi.org/10.1109/TIP.2019.2928634
Article MathSciNet Google Scholar
Tripathy SK, Srivastava R (2021a) AMS-CNN: attentive multi-stream CNN for video-based crowd counting. Int J Multimed Inf Retr. https://doi.org/10.1007/s13735-021-00220-7
Article Google Scholar
Tripathy SK, Srivastava R (2021b) A transfer learning-based multi-cues multi-scale spatial–temporal modeling for effective video-based crowd counting and density estimation using a single-column 2D-atrous net, pp. 179–194. https://doi.org/10.1007/978-981-16-5078-9_16.
Wang L, Yin B, Tang X, Li Y (2019) Removing background interference for crowd counting via de-background detail convolutional network. Neurocomputing 332:360–371. https://doi.org/10.1016/j.neucom.2018.12.047
Article Google Scholar
Wang Y, Zhang W, Liu Y, Zhu J (2020) Multi-density map fusion network for crowd counting. Neurocomputing. https://doi.org/10.1016/j.neucom.2020.02.010
Wei X, Du J, Liang M, Ye L (2019) Boosting deep attribute learning via support vector regression for fast moving crowd counting. Pattern Recogn Lett 119:12–23. https://doi.org/10.1016/j.patrec.2017.12.002
Article Google Scholar
Xiong F, Shi X, Yeung DY (2017) Spatiotemporal modeling for crowd counting in videos. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2017-Octob, pp. 5161–5169. https://doi.org/10.1109/ICCV.2017.551s
Xu M et al (2019) Depth information guided crowd counting for complex crowd scenes. Pattern Recogn Lett 125:563–569. https://doi.org/10.1016/j.patrec.2019.02.026
Article Google Scholar
Yingying Zhang YM, Zhou D, Chen S, Gao S (2016) Single-image crowd counting via multi-column convolutional neural network. CVPR 2(35):11431–11437. https://doi.org/10.1002/slct.201701956
Article Google Scholar
Zhang S, Wu G (2017) FCN-rLSTM: deep spatio-temporal neural networks for. Iccv, pp. 3687–3696
Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 07–12-June, pp. 833–841. https://doi.org/10.1109/CVPR.2015.7298684.
Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016a) Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 589–597. https://doi.org/10.1002/slct.201701956
Zhang L, Lin L, Liang X, He K (2016b) Is faster R-CNN doing well for pedestrian detection?. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9906 LNCS, pp. 443–457. https://doi.org/10.1007/978-3-319-46475-6_28.
Zhang L, Shi M, Chen Q (2018) Crowd counting via scale-adaptive convolutional neural network. In: Proceedings—2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018, vol. 2018-Janua, no. 1, pp. 1113–1121. https://doi.org/10.1109/WACV.2018.00127
Zhou Y, Yang J, Li H, Cao T, Kung S-Y (2020) Adversarial learning for multiscale crowd counting under complex scenes. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2019.2956091
Article Google Scholar

Download references

Acknowledgements

The support and the resources provided by ‘PARAM Shivay Facility' under the National Supercomputing Mission, Government of India at the Indian Institute of Technology, Varanasi, are gratefully acknowledged.

Funding

There was no funding obtained for this study.

Author information

Authors and Affiliations

Computing and Vision Laboratory, Department of Computer Science and Engineering, Indian Institute of Technology (BHU), Varanasi, 221005, UP, India
Santosh Kumar Tripathy & Rajeev Srivastava
Department of Electronics and Communication Engineering, National Institute of Technology, Patna, 800005, Bihar, India
Subodh Srivastava
Department of Electronics and Communication Engineering, Indian Institute of Technology (BHU), Varanasi, 221005, UP, India
Divij Bajaj

Authors

Santosh Kumar Tripathy
View author publications
You can also search for this author in PubMed Google Scholar
Subodh Srivastava
View author publications
You can also search for this author in PubMed Google Scholar
Divij Bajaj
View author publications
You can also search for this author in PubMed Google Scholar
Rajeev Srivastava
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Santosh Kumar Tripathy.

Ethics declarations

Conflict of interest

The authors declare that they don’t have any conflict of interest that could have influenced the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tripathy, S.K., Srivastava, S., Bajaj, D. et al. A Novel cascaded deep architecture with weak-supervision for video crowd counting and density estimation. Soft Comput 28, 8319–8335 (2024). https://doi.org/10.1007/s00500-024-09681-4

Download citation

Accepted: 15 January 2024
Published: 04 July 2024
Issue Date: July 2024
DOI: https://doi.org/10.1007/s00500-024-09681-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel cascaded deep architecture with weak-supervision for video crowd counting and density estimation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Analytical Study of YOLO and Its Various Versions in Crowd Counting

Approaches on crowd counting and density estimation: a review

Denstity Level Aware Network for Crowd Counting

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A Novel cascaded deep architecture with weak-supervision for video crowd counting and density estimation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Analytical Study of YOLO and Its Various Versions in Crowd Counting

Approaches on crowd counting and density estimation: a review

Denstity Level Aware Network for Crowd Counting

Explore related subjects

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation