Skip to main content
Log in

Crowd counting via learning perspective for multi-scale multi-view Web images

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Estimating the number of people in Web images still remains a challenging problem owing to the perspective variation, different views, and diverse backgrounds. Existing deep learning models still have difficulties in dealing with scenarios where the size of a person is either extremely large or extremely small. In this paper, we propose a novel perspective-aware architecture to estimate the number of people in a crowd in web images. Specifically, we use a two-stage framework, where we first learn a policy network to infer the perspective of the target scene, which outputs a scale label for the subsequent perspective normalization. Next, given the aligned inputs, we further adjust the scale-specific counting network to regress the final count. Experiments on challenging datasets demonstrate our approach can deal with a large perspective variation and that we have achieved state-of-theart results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Ali S, Shah M. A lagrangian particle dynamics approach for crowd flow segmentation and stability analysis. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2007

    Google Scholar 

  2. Shao J, Kang K, Change Loy C, Wang X. Deeply learned attributes for crowded scene understanding. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 4657–4666

    Google Scholar 

  3. Idrees H, Soomro K, Shah M. Detecting humans in dense crowds using locally-consistent scale prior and global occlusion reasoning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(10): 1986–1998

    Article  Google Scholar 

  4. Lempitsky V, Zisserman A. Learning to count objects in images. In: Proceedings of the Neural Information Processing Systems Conference. 2010, 1324–1332

    Google Scholar 

  5. Chan A B, Liang Z S J, Vasconcelos N. Privacy preserving crowd monitoring: counting people without people models or tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2008

    Google Scholar 

  6. Idrees H, Saleemi I, Seibert C, Shah M. Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2013, 2547–2554

    Google Scholar 

  7. Ma Z, Chan A B. Crossing the line: crowd counting by integer programming with local features. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2013, 2539–2546

    Google Scholar 

  8. Loy C C, Gong S, Xiang T. From semisupervised to transfer counting of crowds. In: Proceedings of IEEE International Conference on Computer Vision. 2013, 2256–2263

    Google Scholar 

  9. Chen K, Gong S, Xiang T, Loy C C. Cumulative attribute space for age and crowd density estimation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2013, 2467–2474

    Google Scholar 

  10. Fiaschi L, Köthe U, Nair R, Hamprecht F A. Learning to count with regression forest and structured labels. In: Proceedings of the 21st IEEE International Conference on Pattern Recognition. 2012, 2685–2688

    Google Scholar 

  11. Chen K, Loy C C, Gong S, Xiang T. Feature mining for localised crowd counting. In: Proceedings of the British Machine Vision Conference. 2012

    Google Scholar 

  12. Shang C, Ai H, Bai B. End-to-end crowd counting via joint learning local and global count. In: Proceedings of the International Conference on Image Processing. 2016, 1215–1219

    Google Scholar 

  13. Zhang Y, Zhou D, Chen S, Gao S, Ma Y. Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2016, 589–597

    Google Scholar 

  14. Onoro-Rubio D, López-Sastre R J. Towards perspective-free object counting with deep learning. In: Proceedings of the European Conference on Computer Vision. 2016, 615–629

    Google Scholar 

  15. Zhang C, Li H, Wang X, Yang X. Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 833–841

    Google Scholar 

  16. Rabaud V, Belongie S. Counting crowded moving objects. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2006, 705–711

    Google Scholar 

  17. Wu X, Liang G, Lee K K, Xu Y. Crowd density estimation using texture analysis and learning. In: Proceedings of IEEE International Conference on Robotics and Biomimetics. 2006, 214–219

    Google Scholar 

  18. Kong D, Gray D, Tao H. A viewpoint invariant approach for crowd counting. In: Proceedings of the 18th IEEE International Conference on Pattern Recognition. 2006, 1187–1190

    Google Scholar 

  19. Cong Y, Gong H, Zhu S C, Tang Y. Flow mosaicking: real-time pedestrian counting without scene-specific learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2009, 1093–1100

    Google Scholar 

  20. Tang N C, Lin Y Y, Weng M F, Liao H Y M. Cross-camera knowledge transfer for multiview people counting. IEEE Transactions on image processing, 2015, 24(1): 80–93

    Article  MathSciNet  MATH  Google Scholar 

  21. Zhang Z, Wang M, Geng X. Crowd counting in public video surveillance by label distribution learning. Elsevier Neurocomputing, 2015, 166: 151–163

    Article  Google Scholar 

  22. Liu B, Vasconcelos N. Bayesian model adaptation for crowd counts. In: Proceedings of IEEE International Conference on Computer Vision. 2015, 4175–4183

    Google Scholar 

  23. Arteta C, Lempitsky V, Noble J A, Zisserman A. Interactive object counting. In: Proceedings of the European Conference on Computer Vision. 2014, 504–518

    Google Scholar 

  24. Pham V Q, Kozakaya T, Yamaguchi O, Okada R. Count forest: covoting uncertain number of targets using random forest for crowd density estimation. In: Proceedings of the IEEE International Conference on Computer Vision. 2015, 3253–3261

    Google Scholar 

  25. Felzenszwalb P F, Huttenlocher D P. Efficient belief propagation for early vision. International Journal of Computer Vision, 2006, 70(1): 41–54

    Article  Google Scholar 

  26. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015

    Google Scholar 

  27. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. 2015, arXiv preprint arXiv:1512.03385

    Google Scholar 

  28. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 2014, arXiv preprint arXiv:1409.1556

    Google Scholar 

  29. Kingma D, Ba J. Adam: a method for stochastic optimization. 2014, arXiv preprint arXiv:1412.6980

    Google Scholar 

  30. Rodriguez M, Sivic J, Laptev I, Audibert J Y. Data-driven crowd analysis in videos. In: Proceedings of IEEE International Conference on Computer Vision. 2011, 1235–1242

    Google Scholar 

  31. An S, Liu W, Venkatesh S. Face recognition using kernel ridge regression. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2007

    Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 61521002).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haizhou Ai.

Additional information

Chong Shang received the BS degree in computer science and technology with the honor of the Outstanding Graduating Student from Northwestern Polytechnical University, China in 2013. He is currently pursuing his PhD degree at Tsinghua University, China. His research interests are computer vision and deep learning, with a current specific focus on object detection and crowd analysis.

Haizhou Ai received the BS, MS, and PhD degrees from Tsinghua University, China in 1985, 1988, and 1991, respectively. From 1994 to 1996, he was with the Flexible Production System Laboratory, University of Brussels, Belgium, as a Postdoctoral Researcher. He is currently a Professor with the Computer Science and Technology Department, Tsinghua University. His research domain is in the computer vision and pattern recognition field, particularly in object detection, tracking, and recognition. He has published more than 80 papers in refereed journals and conference proceedings. He supervised the Best PhD Dissertation of the Beijing Municipal City in computer science and technology in the year of 2008 and the Best Student Paper of IEEE CVPR 2007.

Yi Yang received the BS degree in network engineering and PhD degree in pattern recognition from Sichuan University, China and the Institute of Automation, Chinese Academy of Sciences, China in 2010 and 2016, respectively. Since 2016, she has been with 2012 labs, Huawei Technologies Co., Ltd., China, where she is currently an algorithm engineer. Her research interests include computer vision, pattern recognition, deep learning, and object detection.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shang, C., Ai, H. & Yang, Y. Crowd counting via learning perspective for multi-scale multi-view Web images. Front. Comput. Sci. 13, 579–587 (2019). https://doi.org/10.1007/s11704-017-6598-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-017-6598-3

Keywords

Navigation