Skip to main content
Log in

Cascade-guided multi-scale attention network for crowd counting

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

The performance of crowd counting based on density estimation has been greatly improved with the development of deep learning. However, it is still a major issue to obtain high-quality density map due to the clutter of background, as well as the interference of perspective changes within and between scenes. In this paper, we propose a cascade-guided crowd counting network, which is mainly embedded with scale aware model (SAM) and attention aware model (AAM). First, SAM considers share-net design and multi-directional perspective transform in convolution to deal with multi-scale varying and smooth transition, while reducing the background noise in shallow features. Second, AAM further encodes the semantic inter dependencies by using the two-dimensional features of location and channel in order to let the network learn to pay attention to the key information. Finally, the global and local features are concatenated and taken into decoder to generate the estimated density map for crowd counting. Comprehensive experiments based on three established datasets show that the proposed method not only has higher accuracy, but also has stronger robustness to scale variation and background noise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Oñoro-Rubio, D., López-Sastre, R.J.: Towards perspective-free object counting with deep learning. In: European Conference on Computer Vision, 11–14 October, Amsterdam, pp. 615–629. Springer, Amsterdam (2016)

  2. Sindagi, V.A., Patel, V.M.: Cnn-based cascaded multitask learning of high-level prior and density estimation for crowd counting. In: IEEE International Conference on Advanced Video and Signal Based Surveillance, 29 August–1 September, pp. 1–6. IEEE, Lecce (2017)

  3. Yang, B., Cao, J., Wang, N., Zhang, Y., Zou, L.: Counting challenging crowds robustly using a multi-column multi-task convolutional neural network. Sig. Process. Image Commun. 64, 118–129 (2018)

    Article  Google Scholar 

  4. Liu, W., Salzmann, M., Fua, P.: Context-aware crowd counting. In: IEEE, CVF Conference on Computer Vision and Pattern Recognition, 15–20, Long Beach, pp. 5094–5103. IEEE, California (2019)

  5. Li, J., Xue, Y., Wang, W., Ouyang, G.: Cross-level parallel network for crowd counting. IEEE Trans. Ind. Inf. 16, 566–576 (2020)

    Article  Google Scholar 

  6. Zhang, Y., Zhou, D., Chen, S., Gao, S., Yi, M.: Single-image crowd counting via multi-column convolutional neural network. In: IEEE Conference on Computer Vision and Pattern Recognition, 27–30, Las Vegas, pp. 589–597. IEEE, Nevada (2016)

  7. Kang, D., Chan, A.: Crowd counting by adaptively fusing predictions from an image pyramid. In: 29th British Machine Vision Conference, pp. 2–6. Springer, Newcastle (2018)

  8. Liu, N., Long, Y., Zou, C.: ADCrowdNet: an attention-injective deformable convolutional network for crowd understanding. In: IEEE, CVF Conference on Computer Vision and Pattern Recognition, 15–20, Long Beach, pp. 3220–3229. IEEE, California (2019)

  9. Li, Y., Zhang, X., Chen, D.: CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: IEEE, CVF Conference on Computer Vision and Pattern Recognition, 18–22, Salt Lake, pp. 1091–1100. IEEE, Utah (2018)

  10. Zan, S., Yi, X., Ni, B., Wang, M., Yang, X.: Crowd counting via adversarial cross-scale consistency pursuit. In: IEEE, CVF Conference on Computer Vision and Pattern Recognition, 18–22, Salt Lake, pp. 5245–5254. IEEE, Utah (2018)

  11. Gao, J., Wang, Q., Li, X.: PCC Net: perspective crowd counting via spatial convolutional network. IEEE Trans. Circuits Syst. Video 30, 3486–3498 (2019)

    Article  Google Scholar 

  12. Pan, X., Shi, J., Luo, P., Wang, X., Tang, X.: Spatial as deep: spatial cnn for traffic scene understanding. In: 32nd AAAI Conference on Artificial Intelligence, 2–7, New Orleans, pp. 7276–7283. AAAI, Los Angeles (2018)

  13. Miao, Y., Lin, Z., Ding, G., Han, J.: Shallow feature based dense attention network for crowd counting. In: 34th AAAI Conference on Artificial Intelligence, pp. 7–12. AAAI, New York (2020)

  14. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: 31st Annual Conference on Neural Information Processing Systems, 4–9, Long Beach, pp. 5998–6008. NIPS, California (2017)

  15. Wang, X., Girshich, R., Gupta, A., He, K.: Networks, non-local neural. In: IEEE, CVF Conference on Computer Vision and Pattern Recognition, 18–22, Salt Lake. IEEE, Utah (2018)

  16. Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., Lu, H.: Dual attention network for scene segmentation. In: IEEE, CVF Conference on Computer Vision and Pattern Recognition, 15–20, Long Beach, pp. 3141–3149. IEEE, California (2019)

  17. Sindagi, V.A., Patel, V.M.: Generating high-quality crowd density maps using contextual pyramid CNNs. In: IEEE International Conference on Computer Vision, 22–29, Venice, pp. 1879–1888. IEEE, Italy (2017)

  18. Sam, D.B., Surya, S., Babu, R.V.: Switching convolutional neural network for crowd counting. In: IEEE, CVF Conference on Computer Vision and Pattern Recognition, 21–26, Honolulu, pp. 4031–4039. IEEE, Hawaii (2017)

  19. Cao, X., Wang, Z., Zhao, Y., Su, F.: Scale aggregation network for accurate and efficient crowd counting. In: European Conference on Computer Vision, 8–14 September, Munich, pp. 734–750. Springer, Germany (2018)

  20. Gao, J., Wang, Q., Yuan, Y.: SCAR: spatial-/channel-wise attention regression networks for crowd counting. Neurocomputing 363, 1–8 (2019)

    Article  Google Scholar 

  21. Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: CBAM: convolutional block attention module. In: European Conference on Computer Vision, 8C14, Munich, pp. 3–19. Springer, Germany (2018)

  22. Chen, J., Su, W., Wang, Z.: Crowd counting with crowd attention convolutional neural network. Neurocomputing 382, 210–220 (2020)

    Article  Google Scholar 

  23. Sindagi, V.A., Patel, V.M.: Inverse attention guided deep crowd counting network. In: 16th IEEE International Conference on Advanced Video and Signal Based Surveillance, 18–21, Taipei, pp. 1–8. AVSS, Taiwan (2019)

  24. Dong, Z., Zhang, R., Shao, X., Li, Y.: Scale-recursive network with point supervision for crowd scene analysis. Neurocomputing 384, 314–324 (2019)

    Article  Google Scholar 

  25. Jiang, X., Xiao, Z., Zhang, B., Zhen, X., Cao, X., Doermann, D.S., Shao, L.: Crowd counting and density estimation by trellis encoder-decoder networks. In: IEEE, CVF Conference on Computer Vision and Pattern Recognition, 15–20, Long Beach, pp. 6133–6142. IEEE, California (2019)

  26. Sajid, U., Sajid, H., Wang, H., Wang, G.: Zoom count: a zooming mechanism for crowd counting in static images. IEEE Trans. Circuits Syst. Video 30, 3499–3512 (2020)

    Article  Google Scholar 

  27. Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 640–651 (2017)

    Article  Google Scholar 

  28. Ren, S., He, K., Girshick, R., Jian, S.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2007)

    Article  Google Scholar 

  29. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations, 7–9, San Diego. ICLR, California (2015)

  30. He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1904–1916 (2015)

    Article  Google Scholar 

  31. Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: IEEE, CVF Conference on Computer Vision and Pattern Recognition, 23–28, Portland, pp. 2547–2554. IEEE, Oregon (2013)

  32. Idrees, H., Tayyab, M., Athrey, K., Dong, Z., Shah, M.: Composition loss for counting, density map estimation and localization in dense crowds. In: European Conference on Computer Vision, 8–14 September, Munich, pp. 544–559. Springer, Germany (2018)

  33. Zhang, C., Li, H., Wang, X., Yang, X.: Cross-scene crowd counting via deep convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, 7–12, Boston, pp. 833–841. IEEE, MA (2015)

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant Nos. 61771420 and 62001413, the National Natural Science Foundation of Hebei Province under Grant No. F2020203064, as well as the China Postdoctoral Science Foundation under Grant No. 2018M641674 and Doctoral Foundation in Yanshan University under Grant No. BL18033.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhengping Hu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, S., Hu, Z., Zhao, M. et al. Cascade-guided multi-scale attention network for crowd counting. SIViP 15, 1663–1670 (2021). https://doi.org/10.1007/s11760-021-01903-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-021-01903-8

Keywords

Navigation