Skip to main content
Log in

Skip-connection convolutional neural network for still image crowd counting

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In recent years, crowd counting in still images has attracted many research interests due to its applications in public safety. However, it remains a challenging task for reasons of perspective and scale variations. In this paper, we propose an effective Skip-connection Convolutional Neural Network (SCNN) for crowd counting to overcome the issue of scale variations. The proposed SCNN architecture consists of several multi-scale units to extract multi-scale features. Each multi-scale unit including three convolutional layers builds connections between the input and each convolutional layer. In addition, we propose a scale-related training method to improve the accuracy and robustness of crowd counting. We evaluate our method on three crowd counting benchmarks. Experimental results verify the efficiency of the proposed method, and it achieves superior performance compared with other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Idrees H, Saleemi I, Seibert C, Shah M (2013) Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2547–2554

  2. Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 589–597

  3. Sam D B, Surya S, Babu R V (2017) Switching convolutional neural network for crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol 1, p 6

  4. Boominathan L, Kruthiventi S S, Babu R V (2016) Crowdnet: A deep convolutional network for dense crowd counting. In: Proceedings of the 2016 ACM on Multimedia Conference, pp 640–644. ACM

  5. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3431–3440

  6. Onoro-Rubio D, López-Sastre RJ (2016) Towards perspective-free object counting with deep learning. In: European Conference on Computer Vision, pp 615–629. Springer

  7. Lin S-F, Chen J-Y, Chao H-X (2001) Estimation of number of people in crowded scenes using perspective transformation. IEEE Trans Syst Man Cybern Syst Hum 31(6):645–654

    Article  Google Scholar 

  8. Wu B, Nevatia R (2005) Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors. In: 2005 10th IEEE International Conference on Computer Vision, 2005. ICCV, vol 1, pp 90–97. IEEE

  9. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR, vol 1, pp 886–893. IEEE

  10. Wang M, Wang X (2011) Automatic adaptation of a generic pedestrian detector to a specific traffic scene. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3401–3408. IEEE

  11. Ge W, Collins R T (2009) Marked point processes for crowd counting. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR, pp 2913–2920. IEEE

  12. Li M, Zhang Z, Huang K, Tan T (2008) Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection. In: 2008 ICPR 2008, 19th International Conference on Pattern Recognition, pp 1–4. IEEE

  13. Chan A B, Liang Z-S J, Vasconcelos N (2008) Privacy preserving crowd monitoring: Counting people without people models or tracking. In: CVPR 2008. IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp 1–7. IEEE

  14. Chen K, Loy C C, Gong S, Xiang T (2012) Feature mining for localised crowd counting. In: fBMVC, vol 1, p 3

  15. Lempitsky V, Zisserman A (2010) Learning to count objects in images. In: Advances in Neural Information Processing Systems, pp 1324–1332

  16. Chan A B, Vasconcelos N (2009) Bayesian poisson regression for crowd counting. In: 2009 IEEE 12th International Conference on Computer Vision, pp 545–551. IEEE

  17. Kong D, Gray D, Tao H (2006) A viewpoint invariant approach for crowd counting. In: ICPR 2006. 18th International Conference on Pattern Recognition, 2006, vol 3, pp 1187–1190. IEEE

  18. Marana A, Costa LdF, Lotufo R, Velastin S (1998) On the efficacy of texture analysis for crowd monitoring. In: 1998 Proceedings. SIBGRAPI’98. International Symposium on Computer Graphics, Image Processing, and Vision, pp 354–361. IEEE

  19. Chan A B, Vasconcelos N (2012) Counting people with low-level features and bayesian regression. IEEE Trans Image Process 21(4):2160–2177

    Article  MathSciNet  MATH  Google Scholar 

  20. Paragios N, Ramesh V (2001) A mrf-based approach for real-time subway monitoring. In: 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the, vol 1, pp I–I. IEEE

  21. Regazzoni C S, Tesei A (1996) Distributed data fusion for real-time crowding estimation. Signal Process 53(1):47–63

    Article  MATH  Google Scholar 

  22. Bell S, Lawrence Zitnick C, Bala K, Girshick R (2016) Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2874–2883

  23. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778

  24. Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 833–841

  25. Hu Y, Chang H, Nian F, Wang Y, Li T (2016) Dense crowd counting from still images with convolutional neural networks. J Vis Commun Image Represent 38:530–539

    Article  Google Scholar 

  26. Zhang Y, Chang F, Wang M, Zhang F, Han C (2017) Auxiliary learning for crowd counting via count-net. Neurocomputing

  27. Sindagi V A, Patel V M (2017) Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. arXiv:1707.09605

  28. Chen L-C, Yang Y, Wang J, Xu W, Yuille AL (2016) Attention to scale: Scale-aware semantic image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3640–3649

  29. Neverova N, Wolf C, Taylor G W, Nebout F (2014) Multi-scale deep learning for gesture detection and localization. In: Workshop at the European Conference on Computer Vision, pp 474–490. Springer

  30. Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. In: Advances in neural information processing systems, pp 2366–2374

  31. Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE international conference on computer vision, pp 2650–2658

  32. Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labeling. IEEE Trans Pattern Anal Mach Intell 35(8):1915–1929

    Article  Google Scholar 

  33. Zeiler M D, Ranzato M, Monga R, Mao M, Yang K, Le Q V, Nguyen P, Senior A, Vanhoucke V, Dean J et al (2013) On rectified linear units for speech processing. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 3517–3521. IEEE

  34. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2818–2826

  35. Dumoulin V, Visin F (2016). arXiv:1603.07285

  36. Marsden M, McGuiness K, Little S, O’Connor N E (2016) Fully convolutional crowd counting on highly congested scenes. arXiv:1612.00220

  37. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on Multimedia, pp 675–678. ACM

  38. Rodriguez M, Laptev I, Sivic J, Audibert J-Y (2011) Density-aware person detection and tracking in crowds. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp 2423–2430. IEEE

  39. Zeng L, Xu X, Cai B, Qiu S, Zhang T (2017) Multi-scale convolutional neural networks for crowd counting. arXiv:1702.02359

Download references

Acknowledgments

This work is supported in part by the National Natural Science Foundation of China under grant No. 61233003, in part by the Equipment Pre-research Fund under grant No. 61403120201.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luyang Wang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, L., Yin, B., Guo, A. et al. Skip-connection convolutional neural network for still image crowd counting. Appl Intell 48, 3360–3371 (2018). https://doi.org/10.1007/s10489-018-1150-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-018-1150-1

Keywords

Navigation