Abstract
Crowd counting has gained significant popularity due to its practical applications. However, mainstream counting methods ignore precise individual localization and suffer from annotation noise because of counting from estimating density maps. Additionally, they also struggle with high-density images. To address these issues, we propose an end-to-end model called Fine-Grained Extraction Network (FGENet). Different from methods estimating density maps, FGENet directly learns the original coordinate points that represent the precise localization of individuals. This study designs a fusion module, named Fine-Grained Feature Pyramid (FGFP), that is used to fuse feature maps extracted by the backbone of FGENet. The fused features are then passed to both regression and classification heads, where the former provides predicted point coordinates for a given image, and the latter determines the confidence level for each predicted point being an individual. At the end, FGENet establishes correspondences between prediction points and ground truth points by employing the Hungarian algorithm. For training FGENet, we design a robust loss function, named Three-Task Combination (TTC), to mitigate the impact of annotation noise. Extensive experiments are conducted on four widely used crowd counting datasets. Experimental results demonstrate the effectiveness of FGENet. Notably, our method achieves a remarkable improvement of 3.14 points in Mean Absolute Error (MAE) on the ShanghaiTech Part A dataset, showcasing its superiority over the existing state-of-the-art methods. Even more impressively, FGENet surpasses previous benchmarks on the UCF_CC_50 dataset with an astounding enhancement of 30.16 points in MAE.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen, J., et al.: Run, Don’t Walk: chasing higher FLOPS for faster neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12021–12031 (2023)
Cheng, Z.Q., Dai, Q., Li, H., Song, J., Wu, X., Hauptmann, A.G.: Rethinking spatial invariance of convolutional networks for object counting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 19638–19648 (2022)
Cheng, Z.Q., Li, J.X., Dai, Q., Wu, X., He, J.Y., Hauptmann, A.: Improving the learning of multi-column convolutional neural network for crowd counting (2019)
Dai, M., Huang, Z., Gao, J., Shan, H., Zhang, J.: Cross-head supervision for crowd counting with noisy annotations. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE (2023)
Guo, D., Li, K., Zha, Z., Wang, M.: DADNet: dilated-attention-deformable ConvNet for crowd counting. In: Proceedings of the the 27th ACM International Conference (ACM MM) (2019)
Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2547–2554 (2013)
Idrees, H., et al.: Composition loss for counting, density map estimation and localization in dense crowds. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 544–559. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_33
Jiang, X., et al.: Attention scaling for crowd counting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4706–4715 (2020)
Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res. Logistics Q. 2(1–2), 83–97 (1955)
Lempitsky, V.S., Zisserman, A.: Learning to count objects in images. In: Proceedings of the 24th Annual Conference on Neural Information Processing Systems (NeurIPS) (2010)
Li, Y., Zhang, X., Chen, D.: CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1091–1100 (2018)
Liang, D., Xu, W., Bai, X.: An end-to-end transformer model for crowd localization. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. ECCV 2022. LNCS, vol. 13661, pp. 38–54. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19769-7_3
Lin, H., et al.: Direct measure matching for crowd counting. In: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI), pp. 837–844 (2021)
Lin, H., Ma, Z., Ji, R., Wang, Y., Hong, X.: Boosting crowd counting via multifaceted attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 19628–19637 (2022)
Liu, W., Salzmann, M., Fua, P.: Context-aware crowd counting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5099–5108 (2019)
Ma, Y., Sanchez, V., Guha, T.: FusionCount: efficient crowd counting via multiscale feature fusion. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 3256–3260 (2022)
Meng, Y., et al.: Spatial uncertainty-aware semi-supervised crowd counting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 15549–15559 (2021)
Miao, Y., Lin, Z., Ding, G., Han, J.: Shallow feature based dense attention network for crowd counting. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp. 11765–11772 (2020)
Quan, Y., Zhang, D., Zhang, L., Tang, J.: Centralized feature pyramid for object detection. CoRR abs/2210.02093 (2022)
Shu, W., Wan, J., Tan, K.C., Kwong, S., Chan, A.B.: Crowd counting in the frequency domain. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 19618–19627 (2022)
Sindagi, V.A., Patel, V.M.: CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: Proceedings of the 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6. IEEE (2017)
Song, Q., et al.: Rethinking counting and localization in crowds: a purely point-based framework. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3365–3374 (2021)
Wan, J., Liu, Z., Chan, A.B.: A generalized loss function for crowd counting and localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1974–1983 (2021)
Wang, Y., Ma, X., Chen, Z., Luo, Y., Yi, J., Bailey, J.: Symmetric cross entropy for robust learning with noisy labels. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), pp. 322–330 (2019)
Wen, L., et al.: Detection, tracking, and counting meets drones in crowds: a benchmark. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7812–7821 (2021)
Yan, L., Zhang, L., Zheng, X., Li, F.: Deeper multi-column dilated convolutional network for congested crowd understanding. Neural Comput. Appl. 34(2), 1407–1422 (2022)
Zhang, L., Yan, L., Zhang, M., Lu, J.: T\({}^{{2}}\)CNN: a novel method for crowd counting via two-task convolutional neural network. Visual Comput. 39(1), 73–85 (2023)
Zhang, S., Yang, L., Mi, M.B., Zheng, X., Yao, A.: Improving deep regression with ordinal entropy (2023). https://doi.org/10.48550/arXiv.2301.08915
Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 589–597 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ma, HY., Zhang, L., Wei, XY. (2024). FGENet: Fine-Grained Extraction Network for Congested Crowd Counting. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14556. Springer, Cham. https://doi.org/10.1007/978-3-031-53311-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-53311-2_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53310-5
Online ISBN: 978-3-031-53311-2
eBook Packages: Computer ScienceComputer Science (R0)