Single-column CNN for crowd counting with pixel-wise attention mechanism

Wang, Bisheng; Cao, Guo; Shang, Yanfeng; Zhou, Licun; Zhang, Youqiang; Li, Xuesong

doi:10.1007/s00521-018-3810-9

Single-column CNN for crowd counting with pixel-wise attention mechanism

Original Article
Published: 13 October 2018

Volume 32, pages 2897–2908, (2020)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Bisheng Wang¹,
Guo Cao¹,
Yanfeng Shang²,
Licun Zhou²,
Youqiang Zhang¹ &
…
Xuesong Li¹

698 Accesses
Explore all metrics

Abstract

This paper presents a novel method for accurate people counting in highly dense crowd images. The proposed method consists of three modules: extracting foreground regions (EF), pixel-wise attention mechanism (PAM) and single-column density map estimator (S-DME). EF can suppress the disturbance of complex background efficiently with a fully convolutional network, PAM performs pixel-wise classification of crowd images to generate high-quality local crowd density maps, and S-DME is a carefully designed single-column network that can learn more representative features with much fewer parameters. In addition, two new evaluation metrics are introduced to get a comprehensive understanding of the performance of different modules in our algorithm. Experiments demonstrate that our approach can get the state-of-the-art results on several challenging datasets including our dataset with highly cluttered environments and various camera perspectives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two stages double attention convolutional neural network for crowd counting

Article 08 August 2020

MSFFA: a multi-scale feature fusion and attention mechanism network for crowd counting

Article 12 January 2022

Lightweight multi-scale network with attention for accurate and efficient crowd counting

Article 25 September 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Zhou B, Tang X, Wang X (2015) Learning collective crowd behaviors with dynamic pedestrian-agents. Int J Comput Vis 111(1):50–68
Article Google Scholar
Huang L, Chen T, Wang Y, Yuan H (2015) Congestion detection of pedestrians using the velocity entropy: a case study of love parade 2010 disaster. Phys A Stat Mech Appl 440:200–209
Article Google Scholar
Li W, Mahadevan V, Vasconcelos N (2014) Anomaly detection and localization in crowded scenes. IEEE Trans Pattern Anal Mach Intell 36(1):18–32
Article Google Scholar
Chaker R, Al Aghbari Z, Junejo IN (2017) Social network model for crowd anomaly detection and localization. Pattern Recognit 61:266–281
Article Google Scholar
Benabbas Y, Ihaddadene N, Djeraba C (2011) Motion pattern extraction and event detection for automatic visual surveillance. EURASIP J Image Video Process 2011(1):1–15
Article Google Scholar
Onoro-Rubio D, L’opez-Sastre RJ (2016) Towards perspective-free object counting with deep learning. In: ECCV
French G, Fisher M, Mackiewicz M, Needle C (2015) Convolutional neural networks for counting fish in fisheries surveillance video. In: British machine vision conference workshop, BMVA Press
Chen K, Loy CC, Gong S, Xiang T (2012) Feature mining for localized crowd counting. In: European conference on computer vision
Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. In: Computer vision and pattern recognition (CVPR)
Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In: Computer vision and pattern recognition (CVPR)
Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained partbased models. In: PAMI
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: NIPS
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Computer vision and pattern recognition (CVPR)
Sindagi VA, Patel VM (2017) Generating high-quality crowd density maps using contextual pyramid CNNs. In: ICCV
Onoro-Rubio D, Lopez-Sastre RJ (2016) Towards perspective-free object counting with deep learning. In: ECCV
Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. In: CVPR
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. arXiv:1409.4842v1
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR
Girshick R (2015) Fast R-CNN. In: ICCV
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. In: PAMI
Zhang H, Ji Y, Huang W, Liu L (2018) Sitcom-star-based clothing retrieval for video advertising: a deep learning framework. Neural Comput Appl 2018:1–20
Google Scholar
Zhang H, Cao X, Ho JKL, Chow TWS (2018) Object-level video advertising: an optimization framework. IEEE Trans Ind Inform 13(2):520–531
Article Google Scholar
Mostajabi M, Yadollahpour P, Shakhnarovich G (2014) Feedforward semantic segmentation with zoom-out features. Arxiv preprint arxiv:1412.0774
Dai J, He K, Sun J (2015) Convolutional feature masking for joint object and stuff segmentation. In: CVPR
Hariharan B, Arbelaez P, Girshick R, Malik J (2014) Simultaneous detection and segmentation. In: ECCV
Hariharan B, Arbelaez P, Girshick R, Malik J (2015) Hyper-columns for object segmentation and fine-grained localization. In: CVPR
Jiang F, Grigorev A, Rho S, Tian Z, Fu Y, Jifara W, Adil K, Liu S (2017) Medical image semantic segmentation based on deep learning. Neural Comput Appl 2017(8):1–7
Google Scholar
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Computer vision and pattern recognition (CVPR)
Chen L-C, Papandreou G, Murphy K, Yuille AL (2015) Semantic image segmentation with deep convolutional nets and fully connected CRFs. In: ICLR
Dalal N, Triggs B (2015) Histograms of oriented gradients for human detection. In: CVPR
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154
Article Google Scholar
Wu B, Nevatia R (2005) Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors. In: ICCV
Li M, Zhang Z, Huang K, Tan T (2008) Estimating the number of people in crowd scenes by mid based foreground segmentation and head-shoulder detection. In: Pattern recognition
Huang S, Xi Li, Zhang Z, Wu F, Gao S, Ji R, Han J (2017) Body structure aware deep crowd counting. IEEE Trans Image Process 27(3):1049–1059
Article MathSciNet Google Scholar
Ryan D, Denman S, Fookes C, Sridharan S (2009) Crowd counting using multiple local features. Digit Image Comput Tech Appl 63(6):81–88
Google Scholar
Wang C, Zhang H, Yang L, Liu S, Cao (2015) Deep people counting in extremely dense crowds. In: Proceedings of the 23rd ACM international conference on Multimedia. ACM New York, pp 1299–1302
Li Y, Zhang X, Chen D (2018) CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: CVPR
Kong T, Yao A, Chen Y, Sun F (2016) HyperNet: towards accurate region proposal generation and joint object detection. In: CVPR
Leng J, Liu Y (2018) An enhanced SSD with feature fusion and visual reasoning for object detection. Neural Comput Appl 2018(2):1–10
Google Scholar
Wang C, Zhang H, Yang L, Liu S, Cao X (2015) Deep people counting in extremely dense crowds. In: ACM International Conference on Multimedia
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093
Sindagi VA, Patel VM (2017) CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: AVSS
Liu J, Gao C, Meng D, Hauptmann AG (2018) DecideNet: counting varying density crowds through attention guided detection and density estimation. In: CVPR

Download references

Author information

Authors and Affiliations

Nanjing University of Science and Technology, No. 200 Xiao Lingwei Street, Xuan Wu District, Nan Jing City, Jiang Su Province, China
Bisheng Wang, Guo Cao, Youqiang Zhang & Xuesong Li
The Third Research Institute of the Ministry of Public Security, Shanghai, China
Yanfeng Shang & Licun Zhou

Authors

Bisheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Guo Cao
View author publications
You can also search for this author in PubMed Google Scholar
Yanfeng Shang
View author publications
You can also search for this author in PubMed Google Scholar
Licun Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Youqiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xuesong Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Bisheng Wang or Guo Cao.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, B., Cao, G., Shang, Y. et al. Single-column CNN for crowd counting with pixel-wise attention mechanism. Neural Comput & Applic 32, 2897–2908 (2020). https://doi.org/10.1007/s00521-018-3810-9

Download citation

Received: 08 June 2018
Accepted: 05 October 2018
Published: 13 October 2018
Issue Date: April 2020
DOI: https://doi.org/10.1007/s00521-018-3810-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Single-column CNN for crowd counting with pixel-wise attention mechanism

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Two stages double attention convolutional neural network for crowd counting

MSFFA: a multi-scale feature fusion and attention mechanism network for crowd counting

Lightweight multi-scale network with attention for accurate and efficient crowd counting

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now