CrowdFPN: crowd counting via scale-enhanced and location-aware feature pyramid network

Yu, Ying; Zhu, Feng; Qian, Jin; Fujita, Hamido; Yu, Jiamao; Zeng, Kangli; Chen, Enhong

doi:10.1007/s10489-025-06263-1

CrowdFPN: crowd counting via scale-enhanced and location-aware feature pyramid network

Published: 21 January 2025

Volume 55, article number 359, (2025)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Ying Yu ORCID: orcid.org/0000-0002-3480-4571^1,2,
Feng Zhu²,
Jin Qian²,
Hamido Fujita^3,4,
Jiamao Yu²,
Kangli Zeng² &
…
Enhong Chen⁵

69 Accesses
Explore all metrics

Abstract

Crowd counting has emerged as a prevalent research direction within computer vision, focusing on estimating the number of pedestrians in images or videos. However, existing methods tend to ignore crowd location information and model efficiency, leading to reduced accuracy due to challenges such as multi-scale variations and intricate background interferences. To address these issues, we propose the scale-enhanced and location-aware feature pyramid network for crowd counting (CrowdFPN). First, it can fine-tune each feature layer to focus more on crowd objects within a specific scale through the Scale Enhancement Module. Then, feature information from different layers is effectively fused using the lightweight Adaptive Bi-directional Feature Pyramid Network. Recognizing the importance of crowd location information for accurate counting, we introduce the Location Awareness Module, which embeds crowd location data into the channel attention mechanism while mitigating the effects of complex background interference. Finally, extensive experiments on four popular crowd counting datasets demonstrate the effectiveness of the proposed model. The code is available at https://github.com/zf990312/CrowdFPN.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient multi-scale contextual feature fusion network for counting crowds with varying densities and scales

Article 26 September 2022

Scale-aware local difference attention on pyramidal features for crowd counting

Article 30 May 2023

Self-attention Guidance Based Crowd Localization and Counting

Article 22 February 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

The datasets generated and/or analyzed during the current study will be made available on reasonable request.

References

Zhang Y, Zhou D, Chen S, et al (2016) Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 589–597
Li Y, Zhang X, Chen D (2018) Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1091–1100
Yu Y, Zhu H, Pan C et al (2021) Survey on deep learning based on crowd counting. J Comput Res Dev 58(12):2724–2747
MATH Google Scholar
Cheng ZQ, Li JX, Dai Q et al (2019) Improving the learning of multi-column convolutional neural network for crowd counting. In: Proceedings of the 27th ACM international conference on multimedia (ACM MM), pp 1897–1906
Gao G, Liu Q, Hu Z et al (2022) Psgcnet: A pyramidal scale and global context guided network for dense object counting in remote-sensing images. IEEE Trans Geosci Remote Sens 60:1–12
Google Scholar
Chen X, Bin Y, Sang N et al (2019) Scale pyramid network for crowd counting. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision (WACV), pp 1941–1950
Yi J, Shen Z, Chen F et al (2023) A lightweight multiscale feature fusion network for remote sensing object counting. IEEE Trans Geosci Remote Sens 61:1–13
MATH Google Scholar
Lan M, Zhang Y, Zhang L et al (2020) Global context based automatic road segmentation via dilated convolutional neural network. Inf Sci 535:156–171
Article MathSciNet MATH Google Scholar
Gao G, Liu Q, Wang Y (2020a) Counting dense objects in remote sensing images. In: 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4137–4141
Gao G, Liu Q, Wang Y (2020) Counting from sky: A large-scale data set for remote sensing object counting and a benchmark method. IEEE Trans Geosci Remote Sens 59:3642–3655
Article MATH Google Scholar
Liu N, Long Y, Zou C et al (2019) Adcrowdnet: An attention-injective deformable convolutional network for crowd understanding. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 3225–3234
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7132–7141
Woo S, Park J, Lee JY et al (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Qin Z, Zhang P, Wu F et al (2021) Fcanet: Frequency channel attention networks. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 783–792
Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 13713–13722
Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5744–5752
Boominathan L, Kruthiventi SS, Babu RV (2016) Crowdnet: A deep convolutional network for dense crowd counting. In: Proceedings of the 24th ACM international conference on multimedia (ACM MM), pp 640–644
Sindagi VA, Patel VM (2019) Multi-level bottom-top and top-bottom feature fusion for crowd counting. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 1002–1012
Zhang A, Yue L, Shen J et al (2019) Attentional neural fields for crowd counting. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 5714–5723
Jia S, Song C, Cao Y et al (2023) Imdet: Injecting more supervision to centernet-like object detection. Expert Syst Appl 234:120928
Article Google Scholar
Song Q, Wang C, Wang Y et al (2021) To choose or to fuse? scale selection for crowd counting. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp 2576–2583
Chaudhuri Y, Kumar A, Phukan OC et al (2024) A lightweight feature fusion architecture for resource-constrained crowd counting. ArXiv preprint, arXiv:2401.05968
Guo D, Li K, Zha ZJ et al (2019) Dadnet: Dilated-attention-deformable convnet for crowd counting. In: Proceedings of the 27th ACM international conference on multimedia (ACM MM), pp 1823–1832
Dai F, Liu H, Ma Y et al (2021) Dense scale network for crowd counting. In: Proceedings of the 2021 international conference on multimedia retrieval (ICMR), pp 64–72
Bai S, He Z, Qiao Y et al (2020) Adaptive dilated network with self-correction supervision for counting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4594–4603
Chen Z, Zhang S, Zheng X et al (2023) Crowd counting based on multiscale spatial guided perception aggregation network. IEEE Trans Neural Netw Learn Syst 1:1–14
MATH Google Scholar
Chen L, Gao X (2024) Fuss-free network: A simplified and efficient neural network for crowd counting. ArXiv preprint, arXiv:2404.07847
Lu X, Chen S, Cao Y et al (2023) Attributes grouping and mining hashing for fine-grained image retrieval. In: Proceedings of the 31st ACM international conference on multimedia (ACM MM), pp 6558–6566
Dai J, Qi H, Xiong Y et al (2017) Deformable convolutional networks. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 764–773
Zhao Z, Li X (2023) Deformable density estimation via adaptive representation. IEEE Trans Image Process 32:1134–1144
Article MATH Google Scholar
Miao Y, Lin Z, Ding G et al (2020) Shallow feature based dense attention network for crowd counting. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp 11765–11772
Bai H, Wen S, Gary Chan SH (2019) Crowd counting on images with scale variation and isolated clusters. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 0–0
Lin H, Ma Z, Ji R et al (2022) Boosting crowd counting via multifaceted attention. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 19628–19637
Mo H, Ren W, Zhang X et al (2022) Attention-guided collaborative counting. IEEE Trans Image Process 31:6306–6319
Article MATH Google Scholar
Wang Y, Wang F, Huang D (2024) Dual-branch counting method for dense crowd based on self-attention mechanism. Expert Syst Appl 236:121272
Article MATH Google Scholar
Yu Y, Cai Z, Miao D et al (2023) An interactive network based on transformer for multimodal crowd counting. Appl Intell 53(19):22602–22614
Article MATH Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. ArXiv preprint, arXiv:1409.1556
Dosovitskiy A, Beyer L, Kolesnikov A et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv preprint, arXiv:2010.11929
Chu X, Tian Z, Wang Y et al (2021) Twins: Revisiting the design of spatial attention in vision transformers. ArXiv preprint, arXiv:2104.13840
Liu Z, Lin Y, Cao Y et al (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 10012–10022
Wang W, Xie E, Li X et al (2021) Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 568–578
Chen LC, Papandreou G, Kokkinos I et al (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40:834–848
Article Google Scholar
Lin TY, Dollár P, Girshick R et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 2117–2125
Liu S, Qi L, Qin H et al (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8759–8768
Ghiasi G, Lin TY, Le QV (2019) Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7036–7045
Liu S, Huang D, Wang Y (2019) Learning spatial fusion for single-shot object detection. ArXiv preprint, arXiv:1911.09516
Tan M, Pang R, Le QV (2020) Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10781–10790
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1251–1258
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 770–778
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. ArXiv preprint, arXiv:1706.03762
Wang X, Girshick R, Gupta A et al (2018) Non-local neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7794–7803
Ramachandran P, Zoph B, Le QV (2017) Searching for activation functions. arXiv preprint arXiv:1710.05941
Wang B, Liu H, Samaras D et al (2020) Distribution matching for crowd counting. Adv Neural Inf Process Syst 33:1595–1607
Idrees H, Tayyab M, Athrey K et al (2018) Composition loss for counting, density map estimation and localization in dense crowds. In: Proceedings of the European conference on computer vision (ECCV), pp 532–546
Idrees H, Saleemi I, Seibert C et al (2013) Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 2547–2554
Liu W, Salzmann M, Fua P (2019) Context-aware crowd counting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5099–5108
Ma Y, Sanchez V, Guha T (2022) Fusioncount: Efficient crowd counting via multiscale feature fusion. In: IEEE international conference on image processing (ICIP), pp 3256–3260
Ma Z, Wei X, Hong X et al (2019) Bayesian loss for crowd count estimation with point supervision. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 6142–6151
Wan J, Liu Z, Chan AB (2021) A generalized loss function for crowd counting and localization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1974–1983
Liang D, Xu W, Bai X (2022) An end-to-end transformer model for crowd localization. In: Proceedings of the European conference on computer vision (ECCV), Springer, pp 38–54
Savner SS, Kanhangad V (2023) Crowdformer: Weakly-supervised crowd counting with improved generalizability. J Vis Commun Image Represent 94:103853
Article Google Scholar
Wang F, Liu K, Long F et al (2022) Joint cnn and transformer network via weakly supervised learning for efficient crowd counting. arXiv preprint arXiv:2203.06388
Liang D, Chen X, Xu W et al (2022) Transcrowd: weakly-supervised crowd counting with transformers. Sci China Inf Sci 65(6):160104
Article Google Scholar
Tian Y, Chu X, Wang H (2021) Cctrans: Simplifying and improving crowd counting with transformer. arXiv preprint arXiv:2109.14483
Song Q, Wang C, Jiang Z et al (2021) Rethinking counting and localization in crowds: A purely point-based framework. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 3365–3374
Luo A, Yang F, Li X et al (2020) Hybrid graph neural networks for crowd counting. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp 11693–11700
Abousamra S, Hoai M, Samaras D et al (2021) Localization in the crowd with topological constraints. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp 872–881

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No.62163016, No.62462033), the Natural Science Foundation of Jiangxi Province (No.20242BAB25092, No.20212ACB202001), the foreign expert project of the Ministry of Science and Technology (No.G2023022005L), the open project of State Key Laboratory of Performance Monitoring and Protecting of Rail Transit Infrastructure (Grant No.HJGZ2023203), Science and Technology Research Project of Jiangxi Provincial Department of Education (No.GJJ2200644).

Author information

Authors and Affiliations

State Key Laboratory of Performance Monitoring and Protecting of Rail Transit Infrastructure, East China Jiaotong University, Nanchang, 330013, China
Ying Yu
School of Information and Software Engineering, East China Jiaotong University, Nanchang, 330013, China
Ying Yu, Feng Zhu, Jin Qian, Jiamao Yu & Kangli Zeng
Malaysia-Japan International Institute of Technology, Universiti Teknologi Malaysia, Kuala Lumpur, 81310, Malaysia
Hamido Fujita
Andalusian Research Institute in Data Science and Computational Intelligence, University of Granada, Granada, 81310, Spain
Hamido Fujita
School of Data Science, University of Science and Technology of China, Hefei, 230026, China
Enhong Chen

Authors

Ying Yu
View author publications
You can also search for this author in PubMed Google Scholar
Feng Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Jin Qian
View author publications
You can also search for this author in PubMed Google Scholar
Hamido Fujita
View author publications
You can also search for this author in PubMed Google Scholar
Jiamao Yu
View author publications
You can also search for this author in PubMed Google Scholar
Kangli Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Enhong Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ying Yu.

Ethics declarations

Ethical and informed consent for data used

The relevant datasets are publicly available, and the authors of the manuscript are aware that the data used in this article does not involve ethical issues.

Declaration of conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yu, Y., Zhu, F., Qian, J. et al. CrowdFPN: crowd counting via scale-enhanced and location-aware feature pyramid network. Appl Intell 55, 359 (2025). https://doi.org/10.1007/s10489-025-06263-1

Download citation

Accepted: 04 January 2025
Published: 21 January 2025
DOI: https://doi.org/10.1007/s10489-025-06263-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CrowdFPN: crowd counting via scale-enhanced and location-aware feature pyramid network

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An efficient multi-scale contextual feature fusion network for counting crowds with varying densities and scales

Scale-aware local difference attention on pyramidal features for crowd counting

Self-attention Guidance Based Crowd Localization and Counting

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethical and informed consent for data used

Declaration of conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

CrowdFPN: crowd counting via scale-enhanced and location-aware feature pyramid network

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An efficient multi-scale contextual feature fusion network for counting crowds with varying densities and scales

Scale-aware local difference attention on pyramidal features for crowd counting

Self-attention Guidance Based Crowd Localization and Counting

Explore related subjects

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethical and informed consent for data used

Declaration of conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation