Abstract
The performance of crowd counting based on density estimation has been greatly improved with the development of deep learning. However, it is still a major issue to obtain high-quality density map due to the clutter of background, as well as the interference of perspective changes within and between scenes. In this paper, we propose a cascade-guided crowd counting network, which is mainly embedded with scale aware model (SAM) and attention aware model (AAM). First, SAM considers share-net design and multi-directional perspective transform in convolution to deal with multi-scale varying and smooth transition, while reducing the background noise in shallow features. Second, AAM further encodes the semantic inter dependencies by using the two-dimensional features of location and channel in order to let the network learn to pay attention to the key information. Finally, the global and local features are concatenated and taken into decoder to generate the estimated density map for crowd counting. Comprehensive experiments based on three established datasets show that the proposed method not only has higher accuracy, but also has stronger robustness to scale variation and background noise.
Similar content being viewed by others
References
Oñoro-Rubio, D., López-Sastre, R.J.: Towards perspective-free object counting with deep learning. In: European Conference on Computer Vision, 11–14 October, Amsterdam, pp. 615–629. Springer, Amsterdam (2016)
Sindagi, V.A., Patel, V.M.: Cnn-based cascaded multitask learning of high-level prior and density estimation for crowd counting. In: IEEE International Conference on Advanced Video and Signal Based Surveillance, 29 August–1 September, pp. 1–6. IEEE, Lecce (2017)
Yang, B., Cao, J., Wang, N., Zhang, Y., Zou, L.: Counting challenging crowds robustly using a multi-column multi-task convolutional neural network. Sig. Process. Image Commun. 64, 118–129 (2018)
Liu, W., Salzmann, M., Fua, P.: Context-aware crowd counting. In: IEEE, CVF Conference on Computer Vision and Pattern Recognition, 15–20, Long Beach, pp. 5094–5103. IEEE, California (2019)
Li, J., Xue, Y., Wang, W., Ouyang, G.: Cross-level parallel network for crowd counting. IEEE Trans. Ind. Inf. 16, 566–576 (2020)
Zhang, Y., Zhou, D., Chen, S., Gao, S., Yi, M.: Single-image crowd counting via multi-column convolutional neural network. In: IEEE Conference on Computer Vision and Pattern Recognition, 27–30, Las Vegas, pp. 589–597. IEEE, Nevada (2016)
Kang, D., Chan, A.: Crowd counting by adaptively fusing predictions from an image pyramid. In: 29th British Machine Vision Conference, pp. 2–6. Springer, Newcastle (2018)
Liu, N., Long, Y., Zou, C.: ADCrowdNet: an attention-injective deformable convolutional network for crowd understanding. In: IEEE, CVF Conference on Computer Vision and Pattern Recognition, 15–20, Long Beach, pp. 3220–3229. IEEE, California (2019)
Li, Y., Zhang, X., Chen, D.: CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: IEEE, CVF Conference on Computer Vision and Pattern Recognition, 18–22, Salt Lake, pp. 1091–1100. IEEE, Utah (2018)
Zan, S., Yi, X., Ni, B., Wang, M., Yang, X.: Crowd counting via adversarial cross-scale consistency pursuit. In: IEEE, CVF Conference on Computer Vision and Pattern Recognition, 18–22, Salt Lake, pp. 5245–5254. IEEE, Utah (2018)
Gao, J., Wang, Q., Li, X.: PCC Net: perspective crowd counting via spatial convolutional network. IEEE Trans. Circuits Syst. Video 30, 3486–3498 (2019)
Pan, X., Shi, J., Luo, P., Wang, X., Tang, X.: Spatial as deep: spatial cnn for traffic scene understanding. In: 32nd AAAI Conference on Artificial Intelligence, 2–7, New Orleans, pp. 7276–7283. AAAI, Los Angeles (2018)
Miao, Y., Lin, Z., Ding, G., Han, J.: Shallow feature based dense attention network for crowd counting. In: 34th AAAI Conference on Artificial Intelligence, pp. 7–12. AAAI, New York (2020)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: 31st Annual Conference on Neural Information Processing Systems, 4–9, Long Beach, pp. 5998–6008. NIPS, California (2017)
Wang, X., Girshich, R., Gupta, A., He, K.: Networks, non-local neural. In: IEEE, CVF Conference on Computer Vision and Pattern Recognition, 18–22, Salt Lake. IEEE, Utah (2018)
Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., Lu, H.: Dual attention network for scene segmentation. In: IEEE, CVF Conference on Computer Vision and Pattern Recognition, 15–20, Long Beach, pp. 3141–3149. IEEE, California (2019)
Sindagi, V.A., Patel, V.M.: Generating high-quality crowd density maps using contextual pyramid CNNs. In: IEEE International Conference on Computer Vision, 22–29, Venice, pp. 1879–1888. IEEE, Italy (2017)
Sam, D.B., Surya, S., Babu, R.V.: Switching convolutional neural network for crowd counting. In: IEEE, CVF Conference on Computer Vision and Pattern Recognition, 21–26, Honolulu, pp. 4031–4039. IEEE, Hawaii (2017)
Cao, X., Wang, Z., Zhao, Y., Su, F.: Scale aggregation network for accurate and efficient crowd counting. In: European Conference on Computer Vision, 8–14 September, Munich, pp. 734–750. Springer, Germany (2018)
Gao, J., Wang, Q., Yuan, Y.: SCAR: spatial-/channel-wise attention regression networks for crowd counting. Neurocomputing 363, 1–8 (2019)
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: CBAM: convolutional block attention module. In: European Conference on Computer Vision, 8C14, Munich, pp. 3–19. Springer, Germany (2018)
Chen, J., Su, W., Wang, Z.: Crowd counting with crowd attention convolutional neural network. Neurocomputing 382, 210–220 (2020)
Sindagi, V.A., Patel, V.M.: Inverse attention guided deep crowd counting network. In: 16th IEEE International Conference on Advanced Video and Signal Based Surveillance, 18–21, Taipei, pp. 1–8. AVSS, Taiwan (2019)
Dong, Z., Zhang, R., Shao, X., Li, Y.: Scale-recursive network with point supervision for crowd scene analysis. Neurocomputing 384, 314–324 (2019)
Jiang, X., Xiao, Z., Zhang, B., Zhen, X., Cao, X., Doermann, D.S., Shao, L.: Crowd counting and density estimation by trellis encoder-decoder networks. In: IEEE, CVF Conference on Computer Vision and Pattern Recognition, 15–20, Long Beach, pp. 6133–6142. IEEE, California (2019)
Sajid, U., Sajid, H., Wang, H., Wang, G.: Zoom count: a zooming mechanism for crowd counting in static images. IEEE Trans. Circuits Syst. Video 30, 3499–3512 (2020)
Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 640–651 (2017)
Ren, S., He, K., Girshick, R., Jian, S.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2007)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations, 7–9, San Diego. ICLR, California (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1904–1916 (2015)
Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: IEEE, CVF Conference on Computer Vision and Pattern Recognition, 23–28, Portland, pp. 2547–2554. IEEE, Oregon (2013)
Idrees, H., Tayyab, M., Athrey, K., Dong, Z., Shah, M.: Composition loss for counting, density map estimation and localization in dense crowds. In: European Conference on Computer Vision, 8–14 September, Munich, pp. 544–559. Springer, Germany (2018)
Zhang, C., Li, H., Wang, X., Yang, X.: Cross-scene crowd counting via deep convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, 7–12, Boston, pp. 833–841. IEEE, MA (2015)
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grant Nos. 61771420 and 62001413, the National Natural Science Foundation of Hebei Province under Grant No. F2020203064, as well as the China Postdoctoral Science Foundation under Grant No. 2018M641674 and Doctoral Foundation in Yanshan University under Grant No. BL18033.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, S., Hu, Z., Zhao, M. et al. Cascade-guided multi-scale attention network for crowd counting. SIViP 15, 1663–1670 (2021). https://doi.org/10.1007/s11760-021-01903-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-021-01903-8