Abstract
People counting is one of the important tasks in video surveillance. In spite of the significant improvements, this task still has had many challenges such as heavy occlusion in the crowded environment, viewpoint variation, the variety of illumination, etc. People counting process consisted of people detection stage and people tracking stage. This paper focused on boosting the people counting results based on people detection. Our suggested method combines the Deformable Part Models (DPM) and the Deep Convolutional Neural Network (DCNN) to take their advantages and to overcome the shortcomings of each method in people detection. Firstly, to be robust to viewpoint and occlusion, we fuse the people detection results from parts detected by DPM such as head, head-shoulders, upper body, full body. Secondly, to overcome the inefficiency of DPM due to zoom-in view, we use DCNN in detecting head region because the body is often occluded, leaving only head be in full appearance for counting. Finally, we use the late fusion of the detection results from two listed models. PETS 2012 and TUD datasets are selected to experiment and the performance is evaluated by MAE, MRE. The experimental results show that our method could achieve higher performance than the method of Abiol [1], Conte [5] and Subburaman [22] on PETS dataset and especially it could outperform state-of-the-art method as YOLO9000 [10] with parameters fine tuning accordingly to HollywoodHeads dataset. Moreover, it could achieve the high performance in the sparse, medium-density crowd environment and it could be robust to scale, viewpoint, illumination, occlusion, and deformation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Albiol, A., et al.: Video analysis using corner motion statistics. In: Performance Evaluation of Tracking and Surveillance Workshop at CVPR, pp. 31–37 (2009)
Zeng, C., et al.: Robust head-shoulder detection by PCA-based multilevel HOG-LBP detector for people counting. In: 20th ICPR, pp. 2069–2072 (2010). https://doi.org/10.1109/icpr.2010.509
Tome, D., et al.: Deep convolutional neural networks for pedestrian detection. arXiv:1510.03608 (2016). https://doi.org/10.1016/j.image.2016.05.007
Kang, D., et al.: Beyond counting: comparisons of density maps for crowd analysis tasks - counting, detection, and tracking. arXiv:1705.10118 (2017)
Conte, D., et al.: A method for counting people in crowded scenes. In: 7th IEEE International Conference on AVSS, pp. 225–232 (2010). https://doi.org/10.1109/avss.2010.78
Ling, D., et al.: An automatic people counting method of hotel dining with occlusion. J. Artif. Intell. Pract. 1(1), 1–7 (2016)
Schroff, F., et al.: FaceNet: a unified embedding for face recognition and clustering. arXiv:1503.03832 (2015). https://doi.org/10.1109/cvpr.2015.7298682
Idrees, H.: Visual analysis of extremely dense crowded scenes. Ph.D. dissertation, University of Central Florida, USA (2014)
Barandiaran, J., et al.: Real-time people counting using multiple lines. In: WIAMIS, pp. 159–162 (2008). https://doi.org/10.1109/wiamis.2008.27
Redmon, J., et al.: YOLO9000: better, faster, stronger. arXiv:1612.08242 (2017)
García, J., et al.: Directional people counter based on head tracking. IEEE TIE 60(9), 3991–4000 (2013). https://doi.org/10.1109/TIE.2012.2206330
van de Sande, K.E.A., et al.: Segmentation as selective search for object recognition. In: ICCV (2011). https://doi.org/10.1109/iccv.2011.6126456
Boominathan, L., et al.: CrowdNet: a deep convolutional network for dense crowd counting. arXiv:1608.06197, pp. 640–644 (2016)
Bourdev, L., Maji, S., Brox, T., Malik, J.: Detecting people using mutually consistent poselet activations. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6316, pp. 168–181. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15567-3_13
Pizzo, L.D., et al.: Counting people by RGB or depth overhead cameras. Pattern Recogn. Lett. 81, 41–50 (2016). https://doi.org/10.1016/j.patrec.2016.05.033
Ngoc, L.Q., et al.: Event retrieval in soccer video from coarse to fine based on multi-modal approach. In: IEEE RIVF, pp. 308–313 (2010). https://doi.org/10.1109/rivf.2010.5632694
Oquab, M., et al.: Learning and transferring mid-level image representations using convolutional neural networks. In: IEEE Conference on CVPR, pp. 1717–1724 (2014). https://doi.org/10.1109/cvpr.2014.222
Felzenszwalb, P.F., et al.: Object detection with discriminatively trained part based models. IEEE TPAMI 32, 1627–1645 (2010). https://doi.org/10.1109/TPAMI.2009.167
Felzenszwalb, P.F., et al.: Discriminatively trained deformable part models (2010). Release 4 http://people.cs.uchicago.edu/~pff/latent-release4
Girshick, R., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv:1311.2524 (2014). https://doi.org/10.1109/cvpr.2014.81
Vu, T.-H., et al.: Context-aware CNNs for person head detection. In: IEEE on ICCV, pp. 2893–2901 (2015). https://doi.org/10.1109/ICCV.2015.331
Subburaman, V.B., et al.: Counting people in the crowd using a generic head detector. In: 9th IEEE on AVSS, pp. 470–475 (2012). https://doi.org/10.1109/avss.2012.87
Liu, W., et al.: SSD: single shot multibox detector. arXiv:1512.02325 (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Taigman, Y., et al.: DeepFace: closing the gap to human-level performance in face verification. In: IEEE on CVPR, pp. 1701–1708 (2014). https://doi.org/10.1109/cvpr.2014.220
Zhao, Z., Li, H., Zhao, R., Wang, X.: Crossing-line crowd counting with two-phase deep neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 712–726. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_43
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Nguyen, A.H., Ly, N.Q. (2018). A New Framework for People Counting from Coarse to Fine Could be Robust to Viewpoint and Illumination. In: Nguyen, N., Hoang, D., Hong, TP., Pham, H., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2018. Lecture Notes in Computer Science(), vol 10752. Springer, Cham. https://doi.org/10.1007/978-3-319-75420-8_59
Download citation
DOI: https://doi.org/10.1007/978-3-319-75420-8_59
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75419-2
Online ISBN: 978-3-319-75420-8
eBook Packages: Computer ScienceComputer Science (R0)