Multiscale Network with Equivalent Large Kernel Attention for Crowd Counting

Wu, Zhiwei; Gong, Wenhui; Chen, Yan; Xia, Xiaofeng; Sang, Jun

doi:10.1007/978-981-99-8145-8_41

Zhiwei Wu¹⁰,
Wenhui Gong¹⁰,
Yan Chen¹⁰,
Xiaofeng Xia¹⁰ &
…
Jun Sang¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1965))

Included in the following conference series:

International Conference on Neural Information Processing

406 Accesses

Abstract

Most of the existing crowd counting methods are based on convolutional neural networks (CNN) to solve the crowd scale and background noise problems. These methods can effectively extract local features, but their convolutional kernel sizes are limited so that it is hard to obtain global information which is also crucial for scale awareness and noise discrimination. In this paper, we propose a Multiscale Network with Equivalent Large Kernel Attention for Crowd Counting (MELANet), which can extract both global and local information based on CNN. MELANet is composed of three parts: feature extraction module (FEM) for original feature extraction, multiscale equivalent attention module (MEAM) for global and local information combination, and fusion module (FM) for multiscale feature fusion. In MEAM, by decomposing large convolution kernels into equivalent combinations of small convolution kernels, the model obtains receptive fields equivalent to the large convolutional kernels with lower complexity and less parameters. It enables local and global correlation in the attention mechanism based on CNN, which makes the model focus more on the crowd head region to resist the background noise. Besides, we use a multiscale structure and different convolution kernel sizes to encode contextual information at different scales into the feature maps to deal with head scale transformations. Furthermore, we add gate channel attention units in MEAM to enhance the channel adaptivity of the model. Extensive experiments demonstrate that MELANet can achieve excellent counting performance on three popular crowd counting datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: CVPR, pp. 589–597 (2016)
Google Scholar
Sam, D.B., Surya, S., Babu, R.V.: Switching convolutional neural network for crowd counting. In: CVPR, pp. 4031–4039 (2017)
Google Scholar
Sindagi, V.A., Patel, V.M.: Generating high-quality crowd density maps using contextual pyramid CNNs. In: ICCV, pp. 1879–1888 (2017)
Google Scholar
Cao, X., Wang, Z., Zhao, Y., Su, F.: Scale aggregation network for accurate and efficient crowd counting. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 757–773. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_45
Liu, W., Salzmann, M., Fua, P.: Context-aware crowd counting. In: CVPR, pp. 5094–5103 (2019)
Google Scholar
Song, Q., et al.: To choose or to fuse? Scale selection for crowd counting. In: AAAI, pp. 2576–2583 (2021)
Google Scholar
Liu, J., Gao, C., Meng, D., Hauptmann, A.G.: DecideNet: counting varying density crowds through attention guided detection and density estimation. In: CVPR, pp. 5197–5206 (2018)
Google Scholar
Gao, J., Wang, Q., Yuan, Y.: SCAR: spatial-/channel-wise attention regression networks for crowd counting. Neurocomputing 363, 1–8 (2019)
Google Scholar
Zhang, Y., Zhou, C., Chang, F., Kot, A.C.: Multi-resolution attention convolutional neural network for crowd counting. Neurocomputing 329, 144–152 (2019)
Google Scholar
Guo, D., Li, K., Zha, Z., Wang, M.: DADNet: dilated-attention-deformable ConvNet for crowd counting. In: ACM MM, pp. 1823–1832 (2019)
Google Scholar
Sindagi, V.A., Patel, V.M.: HA-CCN: hierarchical attention-based crowd counting network. TIP 29, 323–335 (2020)
Google Scholar
Bakhtiarnia, A., Zhang, Q., Iosifidis, A.: Single-layer vision transformers for more accurate early exits with less overhead? Neural Netw. 153, 461–473 (2022)
Google Scholar
Liang, D., Chen, X., Xu, W., Zhou, Y., Bai, X.: TransCrowd: weakly-supervised crowd counting with transformers. Inf. Sci. 65(6) (2022)
Google Scholar
Liang, D., Xu, W., Bai, X.: An end-to-end transformer model for crowd localization. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13661, pp. 38–54. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19769-7_3
Cao, C., et al.: Look and think twice: capturing top-down visual attention with feedback convolutional neural networks. In: ICCV, pp. 2956–2964 (2015)
Google Scholar
Yoo, D., Park, S., Lee, J.Y., Paek, A.S., Kweon, I.S.: AttentionNet: aggregating weak directions for accurate object detection. In: ICCV, pp. 2659–2667 (2015)
Google Scholar
Hossain, M.A., Hosseinzadeh, M., Chanda, O., Wang, Y.: Crowd counting using scale-aware attention networks. In: WACV, pp. 1280–1288 (2019)
Google Scholar
Tian, Y., Lei, Y., Zhang, J., Wang, J.: PaDNet: pan-density crowd counting. TIP 29, 2714–2727 (2020)
Google Scholar
Jiang, X., et al.: Density-aware multi-task learning for crowd counting. IEEE Trans. Multimedia 23, 443–453 (2021)
Google Scholar
Wang, F., Sang, J., Wu, Z., Liu, Q., Sang, N.: Hybrid attention network based on progressive embedding scale-context for crowd counting. Inf. Sci. 591, 306–318 (2022)
Google Scholar
Chen, B., Yan, Z., Li, K., Li, P., Wang, B., Zuo, W., Zhang, L.: Variational attention: propagating domain-specific knowledge for multi-domain learning in crowd counting. In: ICCV, pp. 16045–16055 (2021)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Comput. Sci. (2014)
Google Scholar
Li, Y., Zhang, X., Chen, D.: CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: CVPR, pp. 1091–1100 (2018)
Google Scholar
Wang, B., Liu, H., Samaras, D., Hoai, M.: Distribution matching for crowd counting. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, Article no. 135. Curran Associates Inc., Vancouver, BC, Canada (2020)
Google Scholar
Guo, M., Lu, C., Liu, Z., Cheng, M., Hu, S.: Visual attention network arXiv preprint arXiv:2202.09741 (2022)
Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: CVPR, pp. 2547–2554 (2013)
Google Scholar
Idrees, H., et al.: Composition loss for counting, density map estimation and localization in dense crowds. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 544–559. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_33
Jiang, X., et al.: Crowd counting and density estimation by trellis encoder-decoder networks. In: CVPR, pp. 6126–6135 (2019)
Google Scholar
Wan, J., Wang, Q., Chan, A.: Kernel-based density map generation for dense object counting. IEEE Trans. Pattern Anal. Mach. Intell. 44(3), 1357–1370 (2022)
Article Google Scholar
Liu, N., Long, Y., Zou, C., Niu, Q., Pan, L., Wu, H.: ADCrowdNet: an attention-injective deformable convolutional network for crowd understanding. In: CVPR, pp. 3220–3229 (2019)
Google Scholar
Zhang, A., et al.: Relational attention network for crowd counting. In: ICCV, pp. 6787–6796 (2019)
Google Scholar
Wang, M., Cai, H., Dai, Y., Gong, M.: Dynamic mixture of counter network for location-agnostic crowd counting. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 167–177 (2023)
Google Scholar
Liu, Y., et al.: Crowd counting via cross-stage refinement networks. TIP 29, 6800–6812 (2020)
Google Scholar

Download references

Acknowledgments

This work was supported by National Natural Science Foundation of China (No. 61971073).

Author information

Authors and Affiliations

School of Big Data and Software Engineering, Chongqing University, Chongqing, 401331, China
Zhiwei Wu, Wenhui Gong, Yan Chen, Xiaofeng Xia & Jun Sang

Authors

Zhiwei Wu
View author publications
You can also search for this author in PubMed Google Scholar
Wenhui Gong
View author publications
You can also search for this author in PubMed Google Scholar
Yan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofeng Xia
View author publications
You can also search for this author in PubMed Google Scholar
Jun Sang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Sang .

Editor information

Editors and Affiliations

School of Automation, Central South University, Changsha, China
Biao Luo
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Long Cheng
Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, China
Zheng-Guang Wu
School of Automation, Guangdong University of Technology, Guangzhou, China
Hongyi Li
School of Electrical Engineering and Telecommunications, UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, Z., Gong, W., Chen, Y., Xia, X., Sang, J. (2024). Multiscale Network with Equivalent Large Kernel Attention for Crowd Counting. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1965. Springer, Singapore. https://doi.org/10.1007/978-981-99-8145-8_41

Download citation

DOI: https://doi.org/10.1007/978-981-99-8145-8_41
Published: 27 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8144-1
Online ISBN: 978-981-99-8145-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics