Skip to main content
Log in

Single-column CNN for crowd counting with pixel-wise attention mechanism

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

This paper presents a novel method for accurate people counting in highly dense crowd images. The proposed method consists of three modules: extracting foreground regions (EF), pixel-wise attention mechanism (PAM) and single-column density map estimator (S-DME). EF can suppress the disturbance of complex background efficiently with a fully convolutional network, PAM performs pixel-wise classification of crowd images to generate high-quality local crowd density maps, and S-DME is a carefully designed single-column network that can learn more representative features with much fewer parameters. In addition, two new evaluation metrics are introduced to get a comprehensive understanding of the performance of different modules in our algorithm. Experiments demonstrate that our approach can get the state-of-the-art results on several challenging datasets including our dataset with highly cluttered environments and various camera perspectives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Zhou B, Tang X, Wang X (2015) Learning collective crowd behaviors with dynamic pedestrian-agents. Int J Comput Vis 111(1):50–68

    Article  Google Scholar 

  2. Huang L, Chen T, Wang Y, Yuan H (2015) Congestion detection of pedestrians using the velocity entropy: a case study of love parade 2010 disaster. Phys A Stat Mech Appl 440:200–209

    Article  Google Scholar 

  3. Li W, Mahadevan V, Vasconcelos N (2014) Anomaly detection and localization in crowded scenes. IEEE Trans Pattern Anal Mach Intell 36(1):18–32

    Article  Google Scholar 

  4. Chaker R, Al Aghbari Z, Junejo IN (2017) Social network model for crowd anomaly detection and localization. Pattern Recognit 61:266–281

    Article  Google Scholar 

  5. Benabbas Y, Ihaddadene N, Djeraba C (2011) Motion pattern extraction and event detection for automatic visual surveillance. EURASIP J Image Video Process 2011(1):1–15

    Article  Google Scholar 

  6. Onoro-Rubio D, L’opez-Sastre RJ (2016) Towards perspective-free object counting with deep learning. In: ECCV

  7. French G, Fisher M, Mackiewicz M, Needle C (2015) Convolutional neural networks for counting fish in fisheries surveillance video. In: British machine vision conference workshop, BMVA Press

  8. Chen K, Loy CC, Gong S, Xiang T (2012) Feature mining for localized crowd counting. In: European conference on computer vision

  9. Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. In: Computer vision and pattern recognition (CVPR)

  10. Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In: Computer vision and pattern recognition (CVPR)

  11. Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained partbased models. In: PAMI

  12. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: NIPS

  13. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  14. He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Computer vision and pattern recognition (CVPR)

  15. Sindagi VA, Patel VM (2017) Generating high-quality crowd density maps using contextual pyramid CNNs. In: ICCV

  16. Onoro-Rubio D, Lopez-Sastre RJ (2016) Towards perspective-free object counting with deep learning. In: ECCV

  17. Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. In: CVPR

  18. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. arXiv:1409.4842v1

  19. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR

  20. Girshick R (2015) Fast R-CNN. In: ICCV

  21. Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. In: PAMI

  22. Zhang H, Ji Y, Huang W, Liu L (2018) Sitcom-star-based clothing retrieval for video advertising: a deep learning framework. Neural Comput Appl 2018:1–20

    Google Scholar 

  23. Zhang H, Cao X, Ho JKL, Chow TWS (2018) Object-level video advertising: an optimization framework. IEEE Trans Ind Inform 13(2):520–531

    Article  Google Scholar 

  24. Mostajabi M, Yadollahpour P, Shakhnarovich G (2014) Feedforward semantic segmentation with zoom-out features. Arxiv preprint arxiv:1412.0774

  25. Dai J, He K, Sun J (2015) Convolutional feature masking for joint object and stuff segmentation. In: CVPR

  26. Hariharan B, Arbelaez P, Girshick R, Malik J (2014) Simultaneous detection and segmentation. In: ECCV

  27. Hariharan B, Arbelaez P, Girshick R, Malik J (2015) Hyper-columns for object segmentation and fine-grained localization. In: CVPR

  28. Jiang F, Grigorev A, Rho S, Tian Z, Fu Y, Jifara W, Adil K, Liu S (2017) Medical image semantic segmentation based on deep learning. Neural Comput Appl 2017(8):1–7

    Google Scholar 

  29. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Computer vision and pattern recognition (CVPR)

  30. Chen L-C, Papandreou G, Murphy K, Yuille AL (2015) Semantic image segmentation with deep convolutional nets and fully connected CRFs. In: ICLR

  31. Dalal N, Triggs B (2015) Histograms of oriented gradients for human detection. In: CVPR

  32. Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154

    Article  Google Scholar 

  33. Wu B, Nevatia R (2005) Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors. In: ICCV

  34. Li M, Zhang Z, Huang K, Tan T (2008) Estimating the number of people in crowd scenes by mid based foreground segmentation and head-shoulder detection. In: Pattern recognition

  35. Huang S, Xi Li, Zhang Z, Wu F, Gao S, Ji R, Han J (2017) Body structure aware deep crowd counting. IEEE Trans Image Process 27(3):1049–1059

    Article  MathSciNet  Google Scholar 

  36. Ryan D, Denman S, Fookes C, Sridharan S (2009) Crowd counting using multiple local features. Digit Image Comput Tech Appl 63(6):81–88

    Google Scholar 

  37. Wang C, Zhang H, Yang L, Liu S, Cao (2015) Deep people counting in extremely dense crowds. In: Proceedings of the 23rd ACM international conference on Multimedia. ACM New York, pp 1299–1302

  38. Li Y, Zhang X, Chen D (2018) CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: CVPR

  39. Kong T, Yao A, Chen Y, Sun F (2016) HyperNet: towards accurate region proposal generation and joint object detection. In: CVPR

  40. Leng J, Liu Y (2018) An enhanced SSD with feature fusion and visual reasoning for object detection. Neural Comput Appl 2018(2):1–10

    Google Scholar 

  41. Wang C, Zhang H, Yang L, Liu S, Cao X (2015) Deep people counting in extremely dense crowds. In: ACM International Conference on Multimedia

  42. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093

  43. Sindagi VA, Patel VM (2017) CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: AVSS

  44. Liu J, Gao C, Meng D, Hauptmann AG (2018) DecideNet: counting varying density crowds through attention guided detection and density estimation. In: CVPR

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Bisheng Wang or Guo Cao.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, B., Cao, G., Shang, Y. et al. Single-column CNN for crowd counting with pixel-wise attention mechanism. Neural Comput & Applic 32, 2897–2908 (2020). https://doi.org/10.1007/s00521-018-3810-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-018-3810-9

Keywords

Navigation