Skip to main content
Log in

Context-aware pyramid attention network for crowd counting

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Achieving accurate crowd counting still faces many challenges due to continuous scale variations. To this end, we present an innovative Context-Aware Pyramid Attention Network for crowd counting which is realized by extracting rich contextual features and dealing with dependencies on space and channels. To extract rich contextual features, we propose a context-aware pyramid feature extraction module. With this module, the rich contextual feature extraction is implemented by dividing the input features into four blocks with different scales. In addition, we design an attention module consisting of a space attention block and a channel attention block which are responsible for dealing with the interdependence on feature information in the spatial dimension and the channel dimension, respectively. We first perform an ablation study to evaluate the effectiveness of each component on ShanghaiTech Part_A. Then we conduct several comparative experiments with several state-of-art methods on five challenging crowd counting datasets, including the ShanghaiTech, the UCF_CC_50, the UCF-QNRF, the WorldExpo’10, and NWPU-Crowd datasets. Experimental results demonstrate that the mean absolute error and mean square error of CAPAN are significantly reduced compared to the state-of-art methods in counting accuracy, and at the same time, the quality of density maps and the false recognition rate are improved. These results validate the effectiveness of CAPAN in crowd counting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Anderson P, He X, Buehler C, Teney D, Johnson M, Gould S, Zhang L (2018) Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6077–6086

  2. Boominathan L, Kruthiventi SS, Babu RV (2016) Crowdnet: A deep convolutional network for dense crowd counting. In: Proceedings of the 24th ACM international conference on multimedia, pp 640–644

  3. Cao X, Wang Z, Zhao Y, Su F (2018) Scale aggregation network for accurate and efficient crowd counting. In: Proceedings of the european conference on computer vision (ECCV), pp 734–750

  4. Chen K, Loy CC, Gong S, Xiang T (2012) Feature mining for localised crowd counting. In: Bmvc, vol 1, p 3

  5. Chen X, Bin Y, Sang N, Gao C (2019) Scale pyramid network for crowd counting. In: 2019 IEEE Winter conference on applications of computer vision (WACV), IEEE, pp 1941–1950

  6. Cheng ZQ, Li JX, Dai Q, Wu X, He JY, Hauptmann AG (2019) Improving the learning of multi-column convolutional neural network for crowd counting. In: Proceedings of the 27th ACM international conference on multimedia, pp 1897–1906

  7. French G, Fisher M, Mackiewicz M, Needle C (2015) Convolutional neural networks for counting fish in fisheries surveillance video

  8. Gao J, Wang Q, Yuan Y (2019) Feature-aware adaptation and structured density alignment for crowd counting in video surveillance. arXiv:1912.03672

  9. Gao J, Wang Q, Yuan Y (2019) Scar: Spatial-/channel-wise attention regression networks for crowd counting. Neurocomputing 363:1–8

    Article  Google Scholar 

  10. Guo D, Li K, Zha Z J, Wang M (2019) Dadnet: Dilated-attention-deformable convnet for crowd counting. In: Proceedings of the 27th ACM international conference on multimedia, pp 1823–1832

  11. Hossain M, Hosseinzadeh M, Chanda O, Wang Y (2019) Crowd counting using scale-aware attention networks. In: 2019 IEEE Winter conference on applications of computer vision (WACV), IEEE, pp 1280–1288

  12. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  13. Idrees H, Saleemi I, Seibert C, Shah M (2013) Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2547–2554

  14. Idrees H, Tayyab M, Athrey K, Zhang D, Al-Maadeed S, Rajpoot N, Shah M (2018) Composition loss for counting, density map estimation and localization in dense crowds. In: Proceedings of the european conference on computer vision (ECCV), pp 532–546

  15. Ji Q, Zhu T, Bao D (2020) A hybrid model of convolutional neural networks and deep regression forests for crowd counting. Appl. Intell. 50, 2818-2832

  16. Jiang H, Jin W (2019) Effective use of convolutional neural networks and diverse deep supervision for better crowd counting. Appl. Intell. 49(7):2415–2433

    Article  Google Scholar 

  17. Jiang X, Zhang L, Xu M, Zhang T, Lv P, Zhou B, Yang X, Pang Y (2020) Attention scaling for crowd counting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4706–4715

  18. Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980

  19. Li L, Liu H, Han Y (2019) Arch formation-based congestion alleviation for crowd evacuation. Transp Res Part C Emerg Technol 100:88–106

    Article  Google Scholar 

  20. Li Y, Zhang X, Chen D (2018) Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1091–1100

  21. Liu H, Liu B, Zhang H, Li L, Qin X, Zhang G (2018) Crowd evacuation simulation approach based on navigation knowledge and two-layer control mechanism. Inform Sci 436:247–267

    Article  MathSciNet  Google Scholar 

  22. Liu H, Xu B, Lu D, Zhang G (2018) A path planning approach for crowd evacuation in buildings based on improved artificial bee colony algorithm. Appl Soft Comput 68:360–376

    Article  Google Scholar 

  23. Liu L, Qiu Z, Li G, Liu S, Ouyang W, Lin L (2019) Crowd counting with deep structured scale integration network. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1774–1783

  24. Liu N, Long Y, Zou C, Niu Q, Pan L, Wu H (2019) Adcrowdnet: An attention-injective deformable convolutional network for crowd understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3225–3234

  25. Liu W, Salzmann M, Fua P (2019) Context-aware crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5099–5108

  26. Liu YB, Jia RS, Liu QM, Zhang XL, Sun HM (2021) Crowd counting method based on the self-attention residual network. Appl Intell 51(1):427–440

    Article  Google Scholar 

  27. Lobov SA, Mikhaylov AN, Shamshin M, Makarov VA, Kazantsev VB (2020) Spatial properties of stdp in a self-learning spiking neural network enable controlling a mobile robot. Front Neurosci 14:88

    Article  Google Scholar 

  28. Miao Y, Lin Z, Ding G, Han J (2020) Shallow feature based dense attention network for crowd counting. In: AAAI, pp 11765–11772

  29. Onoro-Rubio D, López-Sastre RJ (2016) Towards perspective-free object counting with deep learning. In: European conference on computer vision, Springer, pp 615–629

  30. Rong L, Li C (2021) Coarse-and fine-grained attention network with background-aware loss for crowd density map estimation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3675–3684

  31. Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR), IEEE, pp 4031–4039

  32. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  33. Sindagi VA, Patel VM (2017) Generating high-quality crowd density maps using contextual pyramid cnns. In: Proceedings of the IEEE international conference on computer vision, pp 1861–1870

  34. Sindagi VA, Patel VM (2019) Ha-ccn: Hierarchical attention-based crowd counting network. IEEE Trans Image Process 29:323–335

    Article  MathSciNet  Google Scholar 

  35. Sindagi VA, Patel VM (2019) Multi-level bottom-top and top-bottom feature fusion for crowd counting. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1002–1012

  36. Tang YY, Hölzel BK, Posner MI (2015) The neuroscience of mindfulness meditation. Nat Rev Neurosci 16(4):213–225

    Article  Google Scholar 

  37. Wang J, Jiang W, Ma L, Liu W, Xu Y (2018) Bidirectional attentive fusion with context gating for dense video captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7190–7198

  38. Wang Q, Gao J, Lin W, Li X (2020) Nwpu-crowd: A large-scale benchmark for crowd counting. arXiv:2001.03360

  39. Wang Q, Gao J, Lin W, Yuan Y (2019) Learning from synthetic data for crowd counting in the wild. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 8198–8207

  40. Wang Y, Zou Y (2016) Fast visual object counting via example-based density estimation. In: 2016 IEEE International conference on image processing (ICIP), IEEE, pp 3653–3657

  41. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Proc 13(4):600–612

    Article  Google Scholar 

  42. Woo S, Park J, Lee JY, So Kweon I (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19

  43. Yan Z, Yuan Y, Zuo W, Tan X, Wang Y, Wen S, Ding E (2019) Perspective-guided convolution networks for crowd counting. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 952–961

  44. Yang S, Deng B, Wang J, Li H, Lu M, Che Y, Wei X, Loparo KA (2019) Scalable digital neuromorphic architecture for large-scale biophysically meaningful neural network with multi-compartment neurons. IEEE Trans Neural Netw Learn Syst 31(1):148–162

    Article  Google Scholar 

  45. Yang S, Deng B, Wang J, Liu C, Li H, Lin Q, Fietkiewicz C, Loparo KA (2018) Design of hidden-property-based variable universe fuzzy control for movement disorders and its efficient reconfigurable implementation. IEEE Trans Fuzzy Syst 27(2):304–318

    Article  Google Scholar 

  46. Yang S, Gao T, Wang J, Deng B, Lansdell B, Linares-Barranco B (2021) Efficient spike-driven learning with dendritic event-based processing. Front Neurosci 15:97

    Google Scholar 

  47. Yang S, Wang J, Deng B, Liu C, Li H, Fietkiewicz C, Loparo KA (2018) Real-time neuromorphic system for large-scale conductance-based spiking neural networks. IEEE Trans Cybern 49(7):2490–2503

    Article  Google Scholar 

  48. Yang S, Wang J, Hao X, Li H, Wei X, Deng B, Loparo KA (2021) Bicoss: toward large-scale cognition brain with multigranular neuromorphic architecture. IEEE Trans Neural Netw Learn Syst

  49. Yang S, Wang J, Lin Q, Deng B, Wei X, Liu C, Li H (2018) Cost-efficient fpga implementation of a biologically plausible dopamine neural network and its application. Neurocomputing 314:394–408

    Article  Google Scholar 

  50. Yang S, Wang J, Zhang N, Deng B, Pang Y, Azghadi MR (2021) Cerebellumorphic: Large-scale neuromorphic model and architecture for supervised motor learning. IEEE Trans Neural Netw Learn Syst

  51. Yang S, Wei X, Deng B, Liu C, Li H, Wang J (2018) Efficient digital implementation of a conductance-based globus pallidus neuron and the dynamics analysis. Physica A Stat Mech Appl 494:484–502

    Article  MathSciNet  Google Scholar 

  52. Zhang A, Shen J, Xiao Z, Zhu F, Zhen X, Cao X, Shao L (2019) Relational attention network for crowd counting. In: Proceedings of the IEEE international conference on computer vision, pp 6788–6797

  53. Zhang A, Yue L, Shen J, Zhu F, Zhen X, Cao X, Shao L (2019) Attentional neural fields for crowd counting. In: Proceedings of the IEEE international conference on computer vision, pp 5714–5723

  54. Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 833–841

  55. Zhang G, Lu D, Liu H (2018) Strategies to utilize the positive emotional contagion optimally in crowd evacuation. IEEE Trans Affect Comput 11(4):708–721

    Article  Google Scholar 

  56. Zhang G, Lu D, Liu H (2020) Iot-based positive emotional contagion for crowd evacuation. IEEE Internet Things J 8(2):1057–1070

    Article  Google Scholar 

  57. Zhang L, Shi M, Chen Q (2018) Crowd counting via scale-adaptive convolutional neural network. In: 2018 IEEE Winter conference on applications of computer vision (WACV), IEEE, pp 1113–1121

  58. Zhang M, Lucas J, Ba J, Hinton G E (2019) Lookahead optimizer: k steps forward, 1 step back. In: Advances in neural information processing systems, pp 9597–9608

  59. Zhang S, Wu G, Costeira J P, Moura J M (2017) Fcn-rlstm: Deep spatio-temporal neural networks for vehicle counting in city cameras. In: Proceedings of the IEEE international conference on computer vision, pp 3667–3676

  60. Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 589–597

  61. Zhou W, Guo Q, Lei J, Yu L, Hwang JN (2021) Ecffnet: effective and consistent feature fusion network for rgb-t salient object detection. IEEE Trans Circuits Syst Video Technol

  62. Zhou W, Liu W, Lei J, Luo T, Yu L (2021) Deep binocular fixation prediction using a hierarchical multimodal fusion network. IEEE Trans Cogn Dev Syst

  63. Zhou W, Lv Y, Lei J, Yu L (2019) Global and local-contrast guides content-aware fusion for rgb-d saliency prediction. IEEE Trans Syst Man Cybern Syst

  64. Zhou W, Wu J, Lei J, Hwang J N, Yu L (2020) Salient object detection in stereoscopic 3d images using a deep convolutional residual autoencoder. IEEE Trans Multimed

  65. Zhou W, Zhu Y, Lei J, Wan J, Yu L (2021) Ccafnet: crossflow and cross-scale adaptive fusion network for detecting salient objects in rgb-d images. IEEE Trans Multimed

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Lyu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gu, L., Pang, C., Zheng, Y. et al. Context-aware pyramid attention network for crowd counting. Appl Intell 52, 6164–6180 (2022). https://doi.org/10.1007/s10489-021-02639-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02639-1

Keywords

Navigation