Abstract
Crowd counting has emerged as a prevalent research direction within computer vision, focusing on estimating the number of pedestrians in images or videos. However, existing methods tend to ignore crowd location information and model efficiency, leading to reduced accuracy due to challenges such as multi-scale variations and intricate background interferences. To address these issues, we propose the scale-enhanced and location-aware feature pyramid network for crowd counting (CrowdFPN). First, it can fine-tune each feature layer to focus more on crowd objects within a specific scale through the Scale Enhancement Module. Then, feature information from different layers is effectively fused using the lightweight Adaptive Bi-directional Feature Pyramid Network. Recognizing the importance of crowd location information for accurate counting, we introduce the Location Awareness Module, which embeds crowd location data into the channel attention mechanism while mitigating the effects of complex background interference. Finally, extensive experiments on four popular crowd counting datasets demonstrate the effectiveness of the proposed model. The code is available at https://github.com/zf990312/CrowdFPN.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-025-06263-1/MediaObjects/10489_2025_6263_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-025-06263-1/MediaObjects/10489_2025_6263_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-025-06263-1/MediaObjects/10489_2025_6263_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-025-06263-1/MediaObjects/10489_2025_6263_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-025-06263-1/MediaObjects/10489_2025_6263_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-025-06263-1/MediaObjects/10489_2025_6263_Fig6_HTML.png)
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The datasets generated and/or analyzed during the current study will be made available on reasonable request.
References
Zhang Y, Zhou D, Chen S, et al (2016) Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 589–597
Li Y, Zhang X, Chen D (2018) Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1091–1100
Yu Y, Zhu H, Pan C et al (2021) Survey on deep learning based on crowd counting. J Comput Res Dev 58(12):2724–2747
Cheng ZQ, Li JX, Dai Q et al (2019) Improving the learning of multi-column convolutional neural network for crowd counting. In: Proceedings of the 27th ACM international conference on multimedia (ACM MM), pp 1897–1906
Gao G, Liu Q, Hu Z et al (2022) Psgcnet: A pyramidal scale and global context guided network for dense object counting in remote-sensing images. IEEE Trans Geosci Remote Sens 60:1–12
Chen X, Bin Y, Sang N et al (2019) Scale pyramid network for crowd counting. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision (WACV), pp 1941–1950
Yi J, Shen Z, Chen F et al (2023) A lightweight multiscale feature fusion network for remote sensing object counting. IEEE Trans Geosci Remote Sens 61:1–13
Lan M, Zhang Y, Zhang L et al (2020) Global context based automatic road segmentation via dilated convolutional neural network. Inf Sci 535:156–171
Gao G, Liu Q, Wang Y (2020a) Counting dense objects in remote sensing images. In: 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4137–4141
Gao G, Liu Q, Wang Y (2020) Counting from sky: A large-scale data set for remote sensing object counting and a benchmark method. IEEE Trans Geosci Remote Sens 59:3642–3655
Liu N, Long Y, Zou C et al (2019) Adcrowdnet: An attention-injective deformable convolutional network for crowd understanding. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 3225–3234
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7132–7141
Woo S, Park J, Lee JY et al (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Qin Z, Zhang P, Wu F et al (2021) Fcanet: Frequency channel attention networks. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 783–792
Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 13713–13722
Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5744–5752
Boominathan L, Kruthiventi SS, Babu RV (2016) Crowdnet: A deep convolutional network for dense crowd counting. In: Proceedings of the 24th ACM international conference on multimedia (ACM MM), pp 640–644
Sindagi VA, Patel VM (2019) Multi-level bottom-top and top-bottom feature fusion for crowd counting. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 1002–1012
Zhang A, Yue L, Shen J et al (2019) Attentional neural fields for crowd counting. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 5714–5723
Jia S, Song C, Cao Y et al (2023) Imdet: Injecting more supervision to centernet-like object detection. Expert Syst Appl 234:120928
Song Q, Wang C, Wang Y et al (2021) To choose or to fuse? scale selection for crowd counting. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp 2576–2583
Chaudhuri Y, Kumar A, Phukan OC et al (2024) A lightweight feature fusion architecture for resource-constrained crowd counting. ArXiv preprint, arXiv:2401.05968
Guo D, Li K, Zha ZJ et al (2019) Dadnet: Dilated-attention-deformable convnet for crowd counting. In: Proceedings of the 27th ACM international conference on multimedia (ACM MM), pp 1823–1832
Dai F, Liu H, Ma Y et al (2021) Dense scale network for crowd counting. In: Proceedings of the 2021 international conference on multimedia retrieval (ICMR), pp 64–72
Bai S, He Z, Qiao Y et al (2020) Adaptive dilated network with self-correction supervision for counting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4594–4603
Chen Z, Zhang S, Zheng X et al (2023) Crowd counting based on multiscale spatial guided perception aggregation network. IEEE Trans Neural Netw Learn Syst 1:1–14
Chen L, Gao X (2024) Fuss-free network: A simplified and efficient neural network for crowd counting. ArXiv preprint, arXiv:2404.07847
Lu X, Chen S, Cao Y et al (2023) Attributes grouping and mining hashing for fine-grained image retrieval. In: Proceedings of the 31st ACM international conference on multimedia (ACM MM), pp 6558–6566
Dai J, Qi H, Xiong Y et al (2017) Deformable convolutional networks. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 764–773
Zhao Z, Li X (2023) Deformable density estimation via adaptive representation. IEEE Trans Image Process 32:1134–1144
Miao Y, Lin Z, Ding G et al (2020) Shallow feature based dense attention network for crowd counting. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp 11765–11772
Bai H, Wen S, Gary Chan SH (2019) Crowd counting on images with scale variation and isolated clusters. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 0–0
Lin H, Ma Z, Ji R et al (2022) Boosting crowd counting via multifaceted attention. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 19628–19637
Mo H, Ren W, Zhang X et al (2022) Attention-guided collaborative counting. IEEE Trans Image Process 31:6306–6319
Wang Y, Wang F, Huang D (2024) Dual-branch counting method for dense crowd based on self-attention mechanism. Expert Syst Appl 236:121272
Yu Y, Cai Z, Miao D et al (2023) An interactive network based on transformer for multimodal crowd counting. Appl Intell 53(19):22602–22614
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. ArXiv preprint, arXiv:1409.1556
Dosovitskiy A, Beyer L, Kolesnikov A et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv preprint, arXiv:2010.11929
Chu X, Tian Z, Wang Y et al (2021) Twins: Revisiting the design of spatial attention in vision transformers. ArXiv preprint, arXiv:2104.13840
Liu Z, Lin Y, Cao Y et al (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 10012–10022
Wang W, Xie E, Li X et al (2021) Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 568–578
Chen LC, Papandreou G, Kokkinos I et al (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40:834–848
Lin TY, Dollár P, Girshick R et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 2117–2125
Liu S, Qi L, Qin H et al (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8759–8768
Ghiasi G, Lin TY, Le QV (2019) Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7036–7045
Liu S, Huang D, Wang Y (2019) Learning spatial fusion for single-shot object detection. ArXiv preprint, arXiv:1911.09516
Tan M, Pang R, Le QV (2020) Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10781–10790
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1251–1258
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 770–778
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. ArXiv preprint, arXiv:1706.03762
Wang X, Girshick R, Gupta A et al (2018) Non-local neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7794–7803
Ramachandran P, Zoph B, Le QV (2017) Searching for activation functions. arXiv preprint arXiv:1710.05941
Wang B, Liu H, Samaras D et al (2020) Distribution matching for crowd counting. Adv Neural Inf Process Syst 33:1595–1607
Idrees H, Tayyab M, Athrey K et al (2018) Composition loss for counting, density map estimation and localization in dense crowds. In: Proceedings of the European conference on computer vision (ECCV), pp 532–546
Idrees H, Saleemi I, Seibert C et al (2013) Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 2547–2554
Liu W, Salzmann M, Fua P (2019) Context-aware crowd counting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5099–5108
Ma Y, Sanchez V, Guha T (2022) Fusioncount: Efficient crowd counting via multiscale feature fusion. In: IEEE international conference on image processing (ICIP), pp 3256–3260
Ma Z, Wei X, Hong X et al (2019) Bayesian loss for crowd count estimation with point supervision. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 6142–6151
Wan J, Liu Z, Chan AB (2021) A generalized loss function for crowd counting and localization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1974–1983
Liang D, Xu W, Bai X (2022) An end-to-end transformer model for crowd localization. In: Proceedings of the European conference on computer vision (ECCV), Springer, pp 38–54
Savner SS, Kanhangad V (2023) Crowdformer: Weakly-supervised crowd counting with improved generalizability. J Vis Commun Image Represent 94:103853
Wang F, Liu K, Long F et al (2022) Joint cnn and transformer network via weakly supervised learning for efficient crowd counting. arXiv preprint arXiv:2203.06388
Liang D, Chen X, Xu W et al (2022) Transcrowd: weakly-supervised crowd counting with transformers. Sci China Inf Sci 65(6):160104
Tian Y, Chu X, Wang H (2021) Cctrans: Simplifying and improving crowd counting with transformer. arXiv preprint arXiv:2109.14483
Song Q, Wang C, Jiang Z et al (2021) Rethinking counting and localization in crowds: A purely point-based framework. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 3365–3374
Luo A, Yang F, Li X et al (2020) Hybrid graph neural networks for crowd counting. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp 11693–11700
Abousamra S, Hoai M, Samaras D et al (2021) Localization in the crowd with topological constraints. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp 872–881
Acknowledgements
This work was supported by the National Natural Science Foundation of China (No.62163016, No.62462033), the Natural Science Foundation of Jiangxi Province (No.20242BAB25092, No.20212ACB202001), the foreign expert project of the Ministry of Science and Technology (No.G2023022005L), the open project of State Key Laboratory of Performance Monitoring and Protecting of Rail Transit Infrastructure (Grant No.HJGZ2023203), Science and Technology Research Project of Jiangxi Provincial Department of Education (No.GJJ2200644).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical and informed consent for data used
The relevant datasets are publicly available, and the authors of the manuscript are aware that the data used in this article does not involve ethical issues.
Declaration of conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yu, Y., Zhu, F., Qian, J. et al. CrowdFPN: crowd counting via scale-enhanced and location-aware feature pyramid network. Appl Intell 55, 359 (2025). https://doi.org/10.1007/s10489-025-06263-1
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-025-06263-1