Skip to main content

FGENet: Fine-Grained Extraction Network for Congested Crowd Counting

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14556))

Included in the following conference series:

  • 1129 Accesses

Abstract

Crowd counting has gained significant popularity due to its practical applications. However, mainstream counting methods ignore precise individual localization and suffer from annotation noise because of counting from estimating density maps. Additionally, they also struggle with high-density images. To address these issues, we propose an end-to-end model called Fine-Grained Extraction Network (FGENet). Different from methods estimating density maps, FGENet directly learns the original coordinate points that represent the precise localization of individuals. This study designs a fusion module, named Fine-Grained Feature Pyramid (FGFP), that is used to fuse feature maps extracted by the backbone of FGENet. The fused features are then passed to both regression and classification heads, where the former provides predicted point coordinates for a given image, and the latter determines the confidence level for each predicted point being an individual. At the end, FGENet establishes correspondences between prediction points and ground truth points by employing the Hungarian algorithm. For training FGENet, we design a robust loss function, named Three-Task Combination (TTC), to mitigate the impact of annotation noise. Extensive experiments are conducted on four widely used crowd counting datasets. Experimental results demonstrate the effectiveness of FGENet. Notably, our method achieves a remarkable improvement of 3.14 points in Mean Absolute Error (MAE) on the ShanghaiTech Part A dataset, showcasing its superiority over the existing state-of-the-art methods. Even more impressively, FGENet surpasses previous benchmarks on the UCF_CC_50 dataset with an astounding enhancement of 30.16 points in MAE.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chen, J., et al.: Run, Don’t Walk: chasing higher FLOPS for faster neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12021–12031 (2023)

    Google Scholar 

  2. Cheng, Z.Q., Dai, Q., Li, H., Song, J., Wu, X., Hauptmann, A.G.: Rethinking spatial invariance of convolutional networks for object counting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 19638–19648 (2022)

    Google Scholar 

  3. Cheng, Z.Q., Li, J.X., Dai, Q., Wu, X., He, J.Y., Hauptmann, A.: Improving the learning of multi-column convolutional neural network for crowd counting (2019)

    Google Scholar 

  4. Dai, M., Huang, Z., Gao, J., Shan, H., Zhang, J.: Cross-head supervision for crowd counting with noisy annotations. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE (2023)

    Google Scholar 

  5. Guo, D., Li, K., Zha, Z., Wang, M.: DADNet: dilated-attention-deformable ConvNet for crowd counting. In: Proceedings of the the 27th ACM International Conference (ACM MM) (2019)

    Google Scholar 

  6. Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2547–2554 (2013)

    Google Scholar 

  7. Idrees, H., et al.: Composition loss for counting, density map estimation and localization in dense crowds. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 544–559. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_33

    Chapter  Google Scholar 

  8. Jiang, X., et al.: Attention scaling for crowd counting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4706–4715 (2020)

    Google Scholar 

  9. Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res. Logistics Q. 2(1–2), 83–97 (1955)

    Article  MathSciNet  Google Scholar 

  10. Lempitsky, V.S., Zisserman, A.: Learning to count objects in images. In: Proceedings of the 24th Annual Conference on Neural Information Processing Systems (NeurIPS) (2010)

    Google Scholar 

  11. Li, Y., Zhang, X., Chen, D.: CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1091–1100 (2018)

    Google Scholar 

  12. Liang, D., Xu, W., Bai, X.: An end-to-end transformer model for crowd localization. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. ECCV 2022. LNCS, vol. 13661, pp. 38–54. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19769-7_3

  13. Lin, H., et al.: Direct measure matching for crowd counting. In: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI), pp. 837–844 (2021)

    Google Scholar 

  14. Lin, H., Ma, Z., Ji, R., Wang, Y., Hong, X.: Boosting crowd counting via multifaceted attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 19628–19637 (2022)

    Google Scholar 

  15. Liu, W., Salzmann, M., Fua, P.: Context-aware crowd counting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5099–5108 (2019)

    Google Scholar 

  16. Ma, Y., Sanchez, V., Guha, T.: FusionCount: efficient crowd counting via multiscale feature fusion. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 3256–3260 (2022)

    Google Scholar 

  17. Meng, Y., et al.: Spatial uncertainty-aware semi-supervised crowd counting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 15549–15559 (2021)

    Google Scholar 

  18. Miao, Y., Lin, Z., Ding, G., Han, J.: Shallow feature based dense attention network for crowd counting. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp. 11765–11772 (2020)

    Google Scholar 

  19. Quan, Y., Zhang, D., Zhang, L., Tang, J.: Centralized feature pyramid for object detection. CoRR abs/2210.02093 (2022)

    Google Scholar 

  20. Shu, W., Wan, J., Tan, K.C., Kwong, S., Chan, A.B.: Crowd counting in the frequency domain. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 19618–19627 (2022)

    Google Scholar 

  21. Sindagi, V.A., Patel, V.M.: CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: Proceedings of the 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6. IEEE (2017)

    Google Scholar 

  22. Song, Q., et al.: Rethinking counting and localization in crowds: a purely point-based framework. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3365–3374 (2021)

    Google Scholar 

  23. Wan, J., Liu, Z., Chan, A.B.: A generalized loss function for crowd counting and localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1974–1983 (2021)

    Google Scholar 

  24. Wang, Y., Ma, X., Chen, Z., Luo, Y., Yi, J., Bailey, J.: Symmetric cross entropy for robust learning with noisy labels. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), pp. 322–330 (2019)

    Google Scholar 

  25. Wen, L., et al.: Detection, tracking, and counting meets drones in crowds: a benchmark. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7812–7821 (2021)

    Google Scholar 

  26. Yan, L., Zhang, L., Zheng, X., Li, F.: Deeper multi-column dilated convolutional network for congested crowd understanding. Neural Comput. Appl. 34(2), 1407–1422 (2022)

    Article  Google Scholar 

  27. Zhang, L., Yan, L., Zhang, M., Lu, J.: T\({}^{{2}}\)CNN: a novel method for crowd counting via two-task convolutional neural network. Visual Comput. 39(1), 73–85 (2023)

    Article  Google Scholar 

  28. Zhang, S., Yang, L., Mi, M.B., Zheng, X., Yao, A.: Improving deep regression with ordinal entropy (2023). https://doi.org/10.48550/arXiv.2301.08915

  29. Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 589–597 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ma, HY., Zhang, L., Wei, XY. (2024). FGENet: Fine-Grained Extraction Network for Congested Crowd Counting. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14556. Springer, Cham. https://doi.org/10.1007/978-3-031-53311-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-53311-2_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-53310-5

  • Online ISBN: 978-3-031-53311-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics