Skip to main content

Multiscale Network with Equivalent Large Kernel Attention for Crowd Counting

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1965))

Included in the following conference series:

  • 406 Accesses

Abstract

Most of the existing crowd counting methods are based on convolutional neural networks (CNN) to solve the crowd scale and background noise problems. These methods can effectively extract local features, but their convolutional kernel sizes are limited so that it is hard to obtain global information which is also crucial for scale awareness and noise discrimination. In this paper, we propose a Multiscale Network with Equivalent Large Kernel Attention for Crowd Counting (MELANet), which can extract both global and local information based on CNN. MELANet is composed of three parts: feature extraction module (FEM) for original feature extraction, multiscale equivalent attention module (MEAM) for global and local information combination, and fusion module (FM) for multiscale feature fusion. In MEAM, by decomposing large convolution kernels into equivalent combinations of small convolution kernels, the model obtains receptive fields equivalent to the large convolutional kernels with lower complexity and less parameters. It enables local and global correlation in the attention mechanism based on CNN, which makes the model focus more on the crowd head region to resist the background noise. Besides, we use a multiscale structure and different convolution kernel sizes to encode contextual information at different scales into the feature maps to deal with head scale transformations. Furthermore, we add gate channel attention units in MEAM to enhance the channel adaptivity of the model. Extensive experiments demonstrate that MELANet can achieve excellent counting performance on three popular crowd counting datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: CVPR, pp. 589–597 (2016)

    Google Scholar 

  2. Sam, D.B., Surya, S., Babu, R.V.: Switching convolutional neural network for crowd counting. In: CVPR, pp. 4031–4039 (2017)

    Google Scholar 

  3. Sindagi, V.A., Patel, V.M.: Generating high-quality crowd density maps using contextual pyramid CNNs. In: ICCV, pp. 1879–1888 (2017)

    Google Scholar 

  4. Cao, X., Wang, Z., Zhao, Y., Su, F.: Scale aggregation network for accurate and efficient crowd counting. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 757–773. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_45

  5. Liu, W., Salzmann, M., Fua, P.: Context-aware crowd counting. In: CVPR, pp. 5094–5103 (2019)

    Google Scholar 

  6. Song, Q., et al.: To choose or to fuse? Scale selection for crowd counting. In: AAAI, pp. 2576–2583 (2021)

    Google Scholar 

  7. Liu, J., Gao, C., Meng, D., Hauptmann, A.G.: DecideNet: counting varying density crowds through attention guided detection and density estimation. In: CVPR, pp. 5197–5206 (2018)

    Google Scholar 

  8. Gao, J., Wang, Q., Yuan, Y.: SCAR: spatial-/channel-wise attention regression networks for crowd counting. Neurocomputing 363, 1–8 (2019)

    Google Scholar 

  9. Zhang, Y., Zhou, C., Chang, F., Kot, A.C.: Multi-resolution attention convolutional neural network for crowd counting. Neurocomputing 329, 144–152 (2019)

    Google Scholar 

  10. Guo, D., Li, K., Zha, Z., Wang, M.: DADNet: dilated-attention-deformable ConvNet for crowd counting. In: ACM MM, pp. 1823–1832 (2019)

    Google Scholar 

  11. Sindagi, V.A., Patel, V.M.: HA-CCN: hierarchical attention-based crowd counting network. TIP 29, 323–335 (2020)

    Google Scholar 

  12. Bakhtiarnia, A., Zhang, Q., Iosifidis, A.: Single-layer vision transformers for more accurate early exits with less overhead? Neural Netw. 153, 461–473 (2022)

    Google Scholar 

  13. Liang, D., Chen, X., Xu, W., Zhou, Y., Bai, X.: TransCrowd: weakly-supervised crowd counting with transformers. Inf. Sci. 65(6) (2022)

    Google Scholar 

  14. Liang, D., Xu, W., Bai, X.: An end-to-end transformer model for crowd localization. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13661, pp. 38–54. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19769-7_3

  15. Cao, C., et al.: Look and think twice: capturing top-down visual attention with feedback convolutional neural networks. In: ICCV, pp. 2956–2964 (2015)

    Google Scholar 

  16. Yoo, D., Park, S., Lee, J.Y., Paek, A.S., Kweon, I.S.: AttentionNet: aggregating weak directions for accurate object detection. In: ICCV, pp. 2659–2667 (2015)

    Google Scholar 

  17. Hossain, M.A., Hosseinzadeh, M., Chanda, O., Wang, Y.: Crowd counting using scale-aware attention networks. In: WACV, pp. 1280–1288 (2019)

    Google Scholar 

  18. Tian, Y., Lei, Y., Zhang, J., Wang, J.: PaDNet: pan-density crowd counting. TIP 29, 2714–2727 (2020)

    Google Scholar 

  19. Jiang, X., et al.: Density-aware multi-task learning for crowd counting. IEEE Trans. Multimedia 23, 443–453 (2021)

    Google Scholar 

  20. Wang, F., Sang, J., Wu, Z., Liu, Q., Sang, N.: Hybrid attention network based on progressive embedding scale-context for crowd counting. Inf. Sci. 591, 306–318 (2022)

    Google Scholar 

  21. Chen, B., Yan, Z., Li, K., Li, P., Wang, B., Zuo, W., Zhang, L.: Variational attention: propagating domain-specific knowledge for multi-domain learning in crowd counting. In: ICCV, pp. 16045–16055 (2021)

    Google Scholar 

  22. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Comput. Sci. (2014)

    Google Scholar 

  23. Li, Y., Zhang, X., Chen, D.: CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: CVPR, pp. 1091–1100 (2018)

    Google Scholar 

  24. Wang, B., Liu, H., Samaras, D., Hoai, M.: Distribution matching for crowd counting. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, Article no. 135. Curran Associates Inc., Vancouver, BC, Canada (2020)

    Google Scholar 

  25. Guo, M., Lu, C., Liu, Z., Cheng, M., Hu, S.: Visual attention network arXiv preprint arXiv:2202.09741 (2022)

  26. Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: CVPR, pp. 2547–2554 (2013)

    Google Scholar 

  27. Idrees, H., et al.: Composition loss for counting, density map estimation and localization in dense crowds. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 544–559. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_33

  28. Jiang, X., et al.: Crowd counting and density estimation by trellis encoder-decoder networks. In: CVPR, pp. 6126–6135 (2019)

    Google Scholar 

  29. Wan, J., Wang, Q., Chan, A.: Kernel-based density map generation for dense object counting. IEEE Trans. Pattern Anal. Mach. Intell. 44(3), 1357–1370 (2022)

    Article  Google Scholar 

  30. Liu, N., Long, Y., Zou, C., Niu, Q., Pan, L., Wu, H.: ADCrowdNet: an attention-injective deformable convolutional network for crowd understanding. In: CVPR, pp. 3220–3229 (2019)

    Google Scholar 

  31. Zhang, A., et al.: Relational attention network for crowd counting. In: ICCV, pp. 6787–6796 (2019)

    Google Scholar 

  32. Wang, M., Cai, H., Dai, Y., Gong, M.: Dynamic mixture of counter network for location-agnostic crowd counting. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 167–177 (2023)

    Google Scholar 

  33. Liu, Y., et al.: Crowd counting via cross-stage refinement networks. TIP 29, 6800–6812 (2020)

    Google Scholar 

Download references

Acknowledgments

This work was supported by National Natural Science Foundation of China (No. 61971073).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Sang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, Z., Gong, W., Chen, Y., Xia, X., Sang, J. (2024). Multiscale Network with Equivalent Large Kernel Attention for Crowd Counting. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1965. Springer, Singapore. https://doi.org/10.1007/978-981-99-8145-8_41

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8145-8_41

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8144-1

  • Online ISBN: 978-981-99-8145-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics