Skip to main content

Advertisement

Log in

CrowdFPN: crowd counting via scale-enhanced and location-aware feature pyramid network

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Crowd counting has emerged as a prevalent research direction within computer vision, focusing on estimating the number of pedestrians in images or videos. However, existing methods tend to ignore crowd location information and model efficiency, leading to reduced accuracy due to challenges such as multi-scale variations and intricate background interferences. To address these issues, we propose the scale-enhanced and location-aware feature pyramid network for crowd counting (CrowdFPN). First, it can fine-tune each feature layer to focus more on crowd objects within a specific scale through the Scale Enhancement Module. Then, feature information from different layers is effectively fused using the lightweight Adaptive Bi-directional Feature Pyramid Network. Recognizing the importance of crowd location information for accurate counting, we introduce the Location Awareness Module, which embeds crowd location data into the channel attention mechanism while mitigating the effects of complex background interference. Finally, extensive experiments on four popular crowd counting datasets demonstrate the effectiveness of the proposed model. The code is available at https://github.com/zf990312/CrowdFPN.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

The datasets generated and/or analyzed during the current study will be made available on reasonable request.

References

  1. Zhang Y, Zhou D, Chen S, et al (2016) Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 589–597

  2. Li Y, Zhang X, Chen D (2018) Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1091–1100

  3. Yu Y, Zhu H, Pan C et al (2021) Survey on deep learning based on crowd counting. J Comput Res Dev 58(12):2724–2747

    MATH  Google Scholar 

  4. Cheng ZQ, Li JX, Dai Q et al (2019) Improving the learning of multi-column convolutional neural network for crowd counting. In: Proceedings of the 27th ACM international conference on multimedia (ACM MM), pp 1897–1906

  5. Gao G, Liu Q, Hu Z et al (2022) Psgcnet: A pyramidal scale and global context guided network for dense object counting in remote-sensing images. IEEE Trans Geosci Remote Sens 60:1–12

    Google Scholar 

  6. Chen X, Bin Y, Sang N et al (2019) Scale pyramid network for crowd counting. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision (WACV), pp 1941–1950

  7. Yi J, Shen Z, Chen F et al (2023) A lightweight multiscale feature fusion network for remote sensing object counting. IEEE Trans Geosci Remote Sens 61:1–13

    MATH  Google Scholar 

  8. Lan M, Zhang Y, Zhang L et al (2020) Global context based automatic road segmentation via dilated convolutional neural network. Inf Sci 535:156–171

    Article  MathSciNet  MATH  Google Scholar 

  9. Gao G, Liu Q, Wang Y (2020a) Counting dense objects in remote sensing images. In: 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4137–4141

  10. Gao G, Liu Q, Wang Y (2020) Counting from sky: A large-scale data set for remote sensing object counting and a benchmark method. IEEE Trans Geosci Remote Sens 59:3642–3655

    Article  MATH  Google Scholar 

  11. Liu N, Long Y, Zou C et al (2019) Adcrowdnet: An attention-injective deformable convolutional network for crowd understanding. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 3225–3234

  12. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7132–7141

  13. Woo S, Park J, Lee JY et al (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19

  14. Qin Z, Zhang P, Wu F et al (2021) Fcanet: Frequency channel attention networks. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 783–792

  15. Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 13713–13722

  16. Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5744–5752

  17. Boominathan L, Kruthiventi SS, Babu RV (2016) Crowdnet: A deep convolutional network for dense crowd counting. In: Proceedings of the 24th ACM international conference on multimedia (ACM MM), pp 640–644

  18. Sindagi VA, Patel VM (2019) Multi-level bottom-top and top-bottom feature fusion for crowd counting. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 1002–1012

  19. Zhang A, Yue L, Shen J et al (2019) Attentional neural fields for crowd counting. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 5714–5723

  20. Jia S, Song C, Cao Y et al (2023) Imdet: Injecting more supervision to centernet-like object detection. Expert Syst Appl 234:120928

    Article  Google Scholar 

  21. Song Q, Wang C, Wang Y et al (2021) To choose or to fuse? scale selection for crowd counting. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp 2576–2583

  22. Chaudhuri Y, Kumar A, Phukan OC et al (2024) A lightweight feature fusion architecture for resource-constrained crowd counting. ArXiv preprint, arXiv:2401.05968

  23. Guo D, Li K, Zha ZJ et al (2019) Dadnet: Dilated-attention-deformable convnet for crowd counting. In: Proceedings of the 27th ACM international conference on multimedia (ACM MM), pp 1823–1832

  24. Dai F, Liu H, Ma Y et al (2021) Dense scale network for crowd counting. In: Proceedings of the 2021 international conference on multimedia retrieval (ICMR), pp 64–72

  25. Bai S, He Z, Qiao Y et al (2020) Adaptive dilated network with self-correction supervision for counting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4594–4603

  26. Chen Z, Zhang S, Zheng X et al (2023) Crowd counting based on multiscale spatial guided perception aggregation network. IEEE Trans Neural Netw Learn Syst 1:1–14

    MATH  Google Scholar 

  27. Chen L, Gao X (2024) Fuss-free network: A simplified and efficient neural network for crowd counting. ArXiv preprint, arXiv:2404.07847

  28. Lu X, Chen S, Cao Y et al (2023) Attributes grouping and mining hashing for fine-grained image retrieval. In: Proceedings of the 31st ACM international conference on multimedia (ACM MM), pp 6558–6566

  29. Dai J, Qi H, Xiong Y et al (2017) Deformable convolutional networks. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 764–773

  30. Zhao Z, Li X (2023) Deformable density estimation via adaptive representation. IEEE Trans Image Process 32:1134–1144

    Article  MATH  Google Scholar 

  31. Miao Y, Lin Z, Ding G et al (2020) Shallow feature based dense attention network for crowd counting. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp 11765–11772

  32. Bai H, Wen S, Gary Chan SH (2019) Crowd counting on images with scale variation and isolated clusters. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 0–0

  33. Lin H, Ma Z, Ji R et al (2022) Boosting crowd counting via multifaceted attention. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 19628–19637

  34. Mo H, Ren W, Zhang X et al (2022) Attention-guided collaborative counting. IEEE Trans Image Process 31:6306–6319

    Article  MATH  Google Scholar 

  35. Wang Y, Wang F, Huang D (2024) Dual-branch counting method for dense crowd based on self-attention mechanism. Expert Syst Appl 236:121272

    Article  MATH  Google Scholar 

  36. Yu Y, Cai Z, Miao D et al (2023) An interactive network based on transformer for multimodal crowd counting. Appl Intell 53(19):22602–22614

    Article  MATH  Google Scholar 

  37. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. ArXiv preprint, arXiv:1409.1556

  38. Dosovitskiy A, Beyer L, Kolesnikov A et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv preprint, arXiv:2010.11929

  39. Chu X, Tian Z, Wang Y et al (2021) Twins: Revisiting the design of spatial attention in vision transformers. ArXiv preprint, arXiv:2104.13840

  40. Liu Z, Lin Y, Cao Y et al (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 10012–10022

  41. Wang W, Xie E, Li X et al (2021) Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 568–578

  42. Chen LC, Papandreou G, Kokkinos I et al (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40:834–848

    Article  Google Scholar 

  43. Lin TY, Dollár P, Girshick R et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 2117–2125

  44. Liu S, Qi L, Qin H et al (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8759–8768

  45. Ghiasi G, Lin TY, Le QV (2019) Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7036–7045

  46. Liu S, Huang D, Wang Y (2019) Learning spatial fusion for single-shot object detection. ArXiv preprint, arXiv:1911.09516

  47. Tan M, Pang R, Le QV (2020) Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10781–10790

  48. Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1251–1258

  49. He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 770–778

  50. Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. ArXiv preprint, arXiv:1706.03762

  51. Wang X, Girshick R, Gupta A et al (2018) Non-local neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7794–7803

  52. Ramachandran P, Zoph B, Le QV (2017) Searching for activation functions. arXiv preprint arXiv:1710.05941

  53. Wang B, Liu H, Samaras D et al (2020) Distribution matching for crowd counting. Adv Neural Inf Process Syst 33:1595–1607

  54. Idrees H, Tayyab M, Athrey K et al (2018) Composition loss for counting, density map estimation and localization in dense crowds. In: Proceedings of the European conference on computer vision (ECCV), pp 532–546

  55. Idrees H, Saleemi I, Seibert C et al (2013) Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 2547–2554

  56. Liu W, Salzmann M, Fua P (2019) Context-aware crowd counting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5099–5108

  57. Ma Y, Sanchez V, Guha T (2022) Fusioncount: Efficient crowd counting via multiscale feature fusion. In: IEEE international conference on image processing (ICIP), pp 3256–3260

  58. Ma Z, Wei X, Hong X et al (2019) Bayesian loss for crowd count estimation with point supervision. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 6142–6151

  59. Wan J, Liu Z, Chan AB (2021) A generalized loss function for crowd counting and localization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1974–1983

  60. Liang D, Xu W, Bai X (2022) An end-to-end transformer model for crowd localization. In: Proceedings of the European conference on computer vision (ECCV), Springer, pp 38–54

  61. Savner SS, Kanhangad V (2023) Crowdformer: Weakly-supervised crowd counting with improved generalizability. J Vis Commun Image Represent 94:103853

    Article  Google Scholar 

  62. Wang F, Liu K, Long F et al (2022) Joint cnn and transformer network via weakly supervised learning for efficient crowd counting. arXiv preprint arXiv:2203.06388

  63. Liang D, Chen X, Xu W et al (2022) Transcrowd: weakly-supervised crowd counting with transformers. Sci China Inf Sci 65(6):160104

    Article  Google Scholar 

  64. Tian Y, Chu X, Wang H (2021) Cctrans: Simplifying and improving crowd counting with transformer. arXiv preprint arXiv:2109.14483

  65. Song Q, Wang C, Jiang Z et al (2021) Rethinking counting and localization in crowds: A purely point-based framework. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 3365–3374

  66. Luo A, Yang F, Li X et al (2020) Hybrid graph neural networks for crowd counting. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp 11693–11700

  67. Abousamra S, Hoai M, Samaras D et al (2021) Localization in the crowd with topological constraints. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp 872–881

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No.62163016, No.62462033), the Natural Science Foundation of Jiangxi Province (No.20242BAB25092, No.20212ACB202001), the foreign expert project of the Ministry of Science and Technology (No.G2023022005L), the open project of State Key Laboratory of Performance Monitoring and Protecting of Rail Transit Infrastructure (Grant No.HJGZ2023203), Science and Technology Research Project of Jiangxi Provincial Department of Education (No.GJJ2200644).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying Yu.

Ethics declarations

Ethical and informed consent for data used

The relevant datasets are publicly available, and the authors of the manuscript are aware that the data used in this article does not involve ethical issues.

Declaration of conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, Y., Zhu, F., Qian, J. et al. CrowdFPN: crowd counting via scale-enhanced and location-aware feature pyramid network. Appl Intell 55, 359 (2025). https://doi.org/10.1007/s10489-025-06263-1

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-025-06263-1

Keywords

Navigation