Cascade-guided multi-scale attention network for crowd counting

Li, Shufang; Hu, Zhengping; Zhao, Mengyao; Sun, Zhe

doi:10.1007/s11760-021-01903-8

Cascade-guided multi-scale attention network for crowd counting

Original Paper
Published: 15 April 2021

Volume 15, pages 1663–1670, (2021)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Shufang Li^1,2,
Zhengping Hu ORCID: orcid.org/0000-0003-0300-6144¹,
Mengyao Zhao¹ &
…
Zhe Sun¹

363 Accesses
3 Citations
Explore all metrics

Abstract

The performance of crowd counting based on density estimation has been greatly improved with the development of deep learning. However, it is still a major issue to obtain high-quality density map due to the clutter of background, as well as the interference of perspective changes within and between scenes. In this paper, we propose a cascade-guided crowd counting network, which is mainly embedded with scale aware model (SAM) and attention aware model (AAM). First, SAM considers share-net design and multi-directional perspective transform in convolution to deal with multi-scale varying and smooth transition, while reducing the background noise in shallow features. Second, AAM further encodes the semantic inter dependencies by using the two-dimensional features of location and channel in order to let the network learn to pay attention to the key information. Finally, the global and local features are concatenated and taken into decoder to generate the estimated density map for crowd counting. Comprehensive experiments based on three established datasets show that the proposed method not only has higher accuracy, but also has stronger robustness to scale variation and background noise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A multi-scale fusion and dual attention network for crowd counting

Article 21 May 2024

Crowd density estimation based on multi scale features fusion network with reverse attention mechanism

Article 22 February 2022

GTL-ASENet: global to local adaptive spatial encoder network for crowd counting

Article 03 March 2023

References

Oñoro-Rubio, D., López-Sastre, R.J.: Towards perspective-free object counting with deep learning. In: European Conference on Computer Vision, 11–14 October, Amsterdam, pp. 615–629. Springer, Amsterdam (2016)
Sindagi, V.A., Patel, V.M.: Cnn-based cascaded multitask learning of high-level prior and density estimation for crowd counting. In: IEEE International Conference on Advanced Video and Signal Based Surveillance, 29 August–1 September, pp. 1–6. IEEE, Lecce (2017)
Yang, B., Cao, J., Wang, N., Zhang, Y., Zou, L.: Counting challenging crowds robustly using a multi-column multi-task convolutional neural network. Sig. Process. Image Commun. 64, 118–129 (2018)
Article Google Scholar
Liu, W., Salzmann, M., Fua, P.: Context-aware crowd counting. In: IEEE, CVF Conference on Computer Vision and Pattern Recognition, 15–20, Long Beach, pp. 5094–5103. IEEE, California (2019)
Li, J., Xue, Y., Wang, W., Ouyang, G.: Cross-level parallel network for crowd counting. IEEE Trans. Ind. Inf. 16, 566–576 (2020)
Article Google Scholar
Zhang, Y., Zhou, D., Chen, S., Gao, S., Yi, M.: Single-image crowd counting via multi-column convolutional neural network. In: IEEE Conference on Computer Vision and Pattern Recognition, 27–30, Las Vegas, pp. 589–597. IEEE, Nevada (2016)
Kang, D., Chan, A.: Crowd counting by adaptively fusing predictions from an image pyramid. In: 29th British Machine Vision Conference, pp. 2–6. Springer, Newcastle (2018)
Liu, N., Long, Y., Zou, C.: ADCrowdNet: an attention-injective deformable convolutional network for crowd understanding. In: IEEE, CVF Conference on Computer Vision and Pattern Recognition, 15–20, Long Beach, pp. 3220–3229. IEEE, California (2019)
Li, Y., Zhang, X., Chen, D.: CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: IEEE, CVF Conference on Computer Vision and Pattern Recognition, 18–22, Salt Lake, pp. 1091–1100. IEEE, Utah (2018)
Zan, S., Yi, X., Ni, B., Wang, M., Yang, X.: Crowd counting via adversarial cross-scale consistency pursuit. In: IEEE, CVF Conference on Computer Vision and Pattern Recognition, 18–22, Salt Lake, pp. 5245–5254. IEEE, Utah (2018)
Gao, J., Wang, Q., Li, X.: PCC Net: perspective crowd counting via spatial convolutional network. IEEE Trans. Circuits Syst. Video 30, 3486–3498 (2019)
Article Google Scholar
Pan, X., Shi, J., Luo, P., Wang, X., Tang, X.: Spatial as deep: spatial cnn for traffic scene understanding. In: 32nd AAAI Conference on Artificial Intelligence, 2–7, New Orleans, pp. 7276–7283. AAAI, Los Angeles (2018)
Miao, Y., Lin, Z., Ding, G., Han, J.: Shallow feature based dense attention network for crowd counting. In: 34th AAAI Conference on Artificial Intelligence, pp. 7–12. AAAI, New York (2020)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: 31st Annual Conference on Neural Information Processing Systems, 4–9, Long Beach, pp. 5998–6008. NIPS, California (2017)
Wang, X., Girshich, R., Gupta, A., He, K.: Networks, non-local neural. In: IEEE, CVF Conference on Computer Vision and Pattern Recognition, 18–22, Salt Lake. IEEE, Utah (2018)
Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., Lu, H.: Dual attention network for scene segmentation. In: IEEE, CVF Conference on Computer Vision and Pattern Recognition, 15–20, Long Beach, pp. 3141–3149. IEEE, California (2019)
Sindagi, V.A., Patel, V.M.: Generating high-quality crowd density maps using contextual pyramid CNNs. In: IEEE International Conference on Computer Vision, 22–29, Venice, pp. 1879–1888. IEEE, Italy (2017)
Sam, D.B., Surya, S., Babu, R.V.: Switching convolutional neural network for crowd counting. In: IEEE, CVF Conference on Computer Vision and Pattern Recognition, 21–26, Honolulu, pp. 4031–4039. IEEE, Hawaii (2017)
Cao, X., Wang, Z., Zhao, Y., Su, F.: Scale aggregation network for accurate and efficient crowd counting. In: European Conference on Computer Vision, 8–14 September, Munich, pp. 734–750. Springer, Germany (2018)
Gao, J., Wang, Q., Yuan, Y.: SCAR: spatial-/channel-wise attention regression networks for crowd counting. Neurocomputing 363, 1–8 (2019)
Article Google Scholar
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: CBAM: convolutional block attention module. In: European Conference on Computer Vision, 8C14, Munich, pp. 3–19. Springer, Germany (2018)
Chen, J., Su, W., Wang, Z.: Crowd counting with crowd attention convolutional neural network. Neurocomputing 382, 210–220 (2020)
Article Google Scholar
Sindagi, V.A., Patel, V.M.: Inverse attention guided deep crowd counting network. In: 16th IEEE International Conference on Advanced Video and Signal Based Surveillance, 18–21, Taipei, pp. 1–8. AVSS, Taiwan (2019)
Dong, Z., Zhang, R., Shao, X., Li, Y.: Scale-recursive network with point supervision for crowd scene analysis. Neurocomputing 384, 314–324 (2019)
Article Google Scholar
Jiang, X., Xiao, Z., Zhang, B., Zhen, X., Cao, X., Doermann, D.S., Shao, L.: Crowd counting and density estimation by trellis encoder-decoder networks. In: IEEE, CVF Conference on Computer Vision and Pattern Recognition, 15–20, Long Beach, pp. 6133–6142. IEEE, California (2019)
Sajid, U., Sajid, H., Wang, H., Wang, G.: Zoom count: a zooming mechanism for crowd counting in static images. IEEE Trans. Circuits Syst. Video 30, 3499–3512 (2020)
Article Google Scholar
Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 640–651 (2017)
Article Google Scholar
Ren, S., He, K., Girshick, R., Jian, S.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2007)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations, 7–9, San Diego. ICLR, California (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1904–1916 (2015)
Article Google Scholar
Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: IEEE, CVF Conference on Computer Vision and Pattern Recognition, 23–28, Portland, pp. 2547–2554. IEEE, Oregon (2013)
Idrees, H., Tayyab, M., Athrey, K., Dong, Z., Shah, M.: Composition loss for counting, density map estimation and localization in dense crowds. In: European Conference on Computer Vision, 8–14 September, Munich, pp. 544–559. Springer, Germany (2018)
Zhang, C., Li, H., Wang, X., Yang, X.: Cross-scene crowd counting via deep convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, 7–12, Boston, pp. 833–841. IEEE, MA (2015)

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant Nos. 61771420 and 62001413, the National Natural Science Foundation of Hebei Province under Grant No. F2020203064, as well as the China Postdoctoral Science Foundation under Grant No. 2018M641674 and Doctoral Foundation in Yanshan University under Grant No. BL18033.

Author information

Authors and Affiliations

School of Information Science and Engineering, Yanshan University, West of Hebei Street No. 438, Qinhuangdao, China, 066004
Shufang Li, Zhengping Hu, Mengyao Zhao & Zhe Sun
Department of Information Engineering, Hebei University of Environmental Engineering, Jingang Road No. 8, Qinhuangdao, 066102, China
Shufang Li

Authors

Shufang Li
View author publications
You can also search for this author inPubMed Google Scholar
Zhengping Hu
View author publications
You can also search for this author inPubMed Google Scholar
Mengyao Zhao
View author publications
You can also search for this author inPubMed Google Scholar
Zhe Sun
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Zhengping Hu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, S., Hu, Z., Zhao, M. et al. Cascade-guided multi-scale attention network for crowd counting. SIViP 15, 1663–1670 (2021). https://doi.org/10.1007/s11760-021-01903-8

Download citation

Received: 23 October 2020
Revised: 23 February 2021
Accepted: 28 March 2021
Published: 15 April 2021
Issue Date: November 2021
DOI: https://doi.org/10.1007/s11760-021-01903-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cascade-guided multi-scale attention network for crowd counting

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A multi-scale fusion and dual attention network for crowd counting

Crowd density estimation based on multi scale features fusion network with reverse attention mechanism

GTL-ASENet: global to local adaptive spatial encoder network for crowd counting

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now