Abstract
Most of the existing crowd counting methods are based on convolutional neural networks (CNN) to solve the crowd scale and background noise problems. These methods can effectively extract local features, but their convolutional kernel sizes are limited so that it is hard to obtain global information which is also crucial for scale awareness and noise discrimination. In this paper, we propose a Multiscale Network with Equivalent Large Kernel Attention for Crowd Counting (MELANet), which can extract both global and local information based on CNN. MELANet is composed of three parts: feature extraction module (FEM) for original feature extraction, multiscale equivalent attention module (MEAM) for global and local information combination, and fusion module (FM) for multiscale feature fusion. In MEAM, by decomposing large convolution kernels into equivalent combinations of small convolution kernels, the model obtains receptive fields equivalent to the large convolutional kernels with lower complexity and less parameters. It enables local and global correlation in the attention mechanism based on CNN, which makes the model focus more on the crowd head region to resist the background noise. Besides, we use a multiscale structure and different convolution kernel sizes to encode contextual information at different scales into the feature maps to deal with head scale transformations. Furthermore, we add gate channel attention units in MEAM to enhance the channel adaptivity of the model. Extensive experiments demonstrate that MELANet can achieve excellent counting performance on three popular crowd counting datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: CVPR, pp. 589–597 (2016)
Sam, D.B., Surya, S., Babu, R.V.: Switching convolutional neural network for crowd counting. In: CVPR, pp. 4031–4039 (2017)
Sindagi, V.A., Patel, V.M.: Generating high-quality crowd density maps using contextual pyramid CNNs. In: ICCV, pp. 1879–1888 (2017)
Cao, X., Wang, Z., Zhao, Y., Su, F.: Scale aggregation network for accurate and efficient crowd counting. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 757–773. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_45
Liu, W., Salzmann, M., Fua, P.: Context-aware crowd counting. In: CVPR, pp. 5094–5103 (2019)
Song, Q., et al.: To choose or to fuse? Scale selection for crowd counting. In: AAAI, pp. 2576–2583 (2021)
Liu, J., Gao, C., Meng, D., Hauptmann, A.G.: DecideNet: counting varying density crowds through attention guided detection and density estimation. In: CVPR, pp. 5197–5206 (2018)
Gao, J., Wang, Q., Yuan, Y.: SCAR: spatial-/channel-wise attention regression networks for crowd counting. Neurocomputing 363, 1–8 (2019)
Zhang, Y., Zhou, C., Chang, F., Kot, A.C.: Multi-resolution attention convolutional neural network for crowd counting. Neurocomputing 329, 144–152 (2019)
Guo, D., Li, K., Zha, Z., Wang, M.: DADNet: dilated-attention-deformable ConvNet for crowd counting. In: ACM MM, pp. 1823–1832 (2019)
Sindagi, V.A., Patel, V.M.: HA-CCN: hierarchical attention-based crowd counting network. TIP 29, 323–335 (2020)
Bakhtiarnia, A., Zhang, Q., Iosifidis, A.: Single-layer vision transformers for more accurate early exits with less overhead? Neural Netw. 153, 461–473 (2022)
Liang, D., Chen, X., Xu, W., Zhou, Y., Bai, X.: TransCrowd: weakly-supervised crowd counting with transformers. Inf. Sci. 65(6) (2022)
Liang, D., Xu, W., Bai, X.: An end-to-end transformer model for crowd localization. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13661, pp. 38–54. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19769-7_3
Cao, C., et al.: Look and think twice: capturing top-down visual attention with feedback convolutional neural networks. In: ICCV, pp. 2956–2964 (2015)
Yoo, D., Park, S., Lee, J.Y., Paek, A.S., Kweon, I.S.: AttentionNet: aggregating weak directions for accurate object detection. In: ICCV, pp. 2659–2667 (2015)
Hossain, M.A., Hosseinzadeh, M., Chanda, O., Wang, Y.: Crowd counting using scale-aware attention networks. In: WACV, pp. 1280–1288 (2019)
Tian, Y., Lei, Y., Zhang, J., Wang, J.: PaDNet: pan-density crowd counting. TIP 29, 2714–2727 (2020)
Jiang, X., et al.: Density-aware multi-task learning for crowd counting. IEEE Trans. Multimedia 23, 443–453 (2021)
Wang, F., Sang, J., Wu, Z., Liu, Q., Sang, N.: Hybrid attention network based on progressive embedding scale-context for crowd counting. Inf. Sci. 591, 306–318 (2022)
Chen, B., Yan, Z., Li, K., Li, P., Wang, B., Zuo, W., Zhang, L.: Variational attention: propagating domain-specific knowledge for multi-domain learning in crowd counting. In: ICCV, pp. 16045–16055 (2021)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Comput. Sci. (2014)
Li, Y., Zhang, X., Chen, D.: CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: CVPR, pp. 1091–1100 (2018)
Wang, B., Liu, H., Samaras, D., Hoai, M.: Distribution matching for crowd counting. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, Article no. 135. Curran Associates Inc., Vancouver, BC, Canada (2020)
Guo, M., Lu, C., Liu, Z., Cheng, M., Hu, S.: Visual attention network arXiv preprint arXiv:2202.09741 (2022)
Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: CVPR, pp. 2547–2554 (2013)
Idrees, H., et al.: Composition loss for counting, density map estimation and localization in dense crowds. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 544–559. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_33
Jiang, X., et al.: Crowd counting and density estimation by trellis encoder-decoder networks. In: CVPR, pp. 6126–6135 (2019)
Wan, J., Wang, Q., Chan, A.: Kernel-based density map generation for dense object counting. IEEE Trans. Pattern Anal. Mach. Intell. 44(3), 1357–1370 (2022)
Liu, N., Long, Y., Zou, C., Niu, Q., Pan, L., Wu, H.: ADCrowdNet: an attention-injective deformable convolutional network for crowd understanding. In: CVPR, pp. 3220–3229 (2019)
Zhang, A., et al.: Relational attention network for crowd counting. In: ICCV, pp. 6787–6796 (2019)
Wang, M., Cai, H., Dai, Y., Gong, M.: Dynamic mixture of counter network for location-agnostic crowd counting. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 167–177 (2023)
Liu, Y., et al.: Crowd counting via cross-stage refinement networks. TIP 29, 6800–6812 (2020)
Acknowledgments
This work was supported by National Natural Science Foundation of China (No. 61971073).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wu, Z., Gong, W., Chen, Y., Xia, X., Sang, J. (2024). Multiscale Network with Equivalent Large Kernel Attention for Crowd Counting. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1965. Springer, Singapore. https://doi.org/10.1007/978-981-99-8145-8_41
Download citation
DOI: https://doi.org/10.1007/978-981-99-8145-8_41
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8144-1
Online ISBN: 978-981-99-8145-8
eBook Packages: Computer ScienceComputer Science (R0)