Abstract
Crowd density estimation, in general, is a challenging task due to the large variation of head sizes in the crowds. Existing methods always use a multi-column convolutional neural network (MCNN) to adapt to this variation, which results in an average effect in areas with different densities and brings a lot of noise to the density map. To address this problem, we propose a new method called the segmentation-aware prior network (SAPNet), which generates a high-quality density map without noise based on a coarse head-segmentation map. SAPNet is composed of two networks, i.e., a foreground-segmentation convolutional neural network (FS-CNN) as the front end and a crowd-regression convolutional neural network (CR-CNN) as the back end. With only the single dot annotation, we generate the ground truth of segmentation masks in heads. Then, based on the ground truth, FS-CNN outputs a coarse head-segmentation map, which helps eliminate the noise in regions without people in the density map. By inputting the head-segmentation map generated by the front end, CR-CNN performs accurate crowd counting estimation and generates a high-quality density map. We demonstrate SAPNet on four datasets (i.e., ShanghaiTech, UCF-CC-50, WorldExpo’10, and UCSD), and show the state-of-the-art performances on ShanghaiTech part B and UCF-CC-50 datasets.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Canny J, 1986. A computational approach to edge detection. IEEE Trans Patt Anal Mach Intell, 8(6):679–698. https://doi.org/10.1109/TPAMI.1986.4767851
Chan AB, Vasconcelos N, 2009. Bayesian Poisson regression for crowd counting. Proc IEEE 12th Int Conf on Computer Vision, p.545–551. https://doi.org/10.1109/ICCV.2009.5459191
Chan AB, Liang ZSJ, Vasconcelos N, 2008. Privacy preserving crowd monitoring: counting people without people models or tracking. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.1–7. https://doi.org/10.1109/CVPR.2008.4587569
Dai JF, Li Y, He KM, et al., 2016. R-FCN: object detection via region-based fully convolutional networks. Proc 30th Int Conf on Neural Information Processing Systems, p.379–387.
Dollar P, Wojek C, Schiele B, et al., 2012. Pedestrian detection: an evaluation of the state of the art. IEEE Trans Patt Anal Mach Intell, 34(4):743–761. https://doi.org/10.1109/TPAMI.2011.155
Idrees H, Saleemi I, Seibert C, et al., 2013. Multi-source multi-scale counting in extremely dense crowd images. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.2547–2554. https://doi.org/10.1109/CVPR.2013.329
Kang K, Wang XG, 2014. Fully convolutional neural networks for crowd segmentation. https://arxiv.org/abs/1411.4464
Lempitsky V, Zisserman A, 2010. Learning to count objects in images. Proc 23rd Int Conf on Neural Information Processing Systems, p.1324–1332.
Li HH, He XJ, Wu HF, et al., 2018. Structured inhomogeneous density map learning for crowd counting. https://arxiv.org/abs/1801.06642
Li JJ, Yang H, Wu S, 2016. Crowd semantic segmentation based on spatial-temporal dynamics. Proc 13th IEEE Int Conf on Advanced Video and Signal Based Surveillance, p.102–108. https://doi.org/10.1109/AVSS.2016.7738032
Li T, Chang H, Wang M, et al., 2015. Crowded scene analysis: a survey. IEEE Trans Circ Syst Video Technol, 25(3):367–386. https://doi.org/10.1109/TCSVT.2014.2358029
Li YH, Zhang XF, Chen DM, 2018. CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.1091–1100. https://doi.org/10.1109/CVPR.2018.00120
Liu J, Gao CQ, Meng DY, et al., 2018. DecideNet: counting varying density crowds through attention guided detection and density estimation. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.5197–5206. https://doi.org/10.1109/CVPR.2018.00545
Long J, Shelhamer E, Darrell T, 2015. Fully convolutional networks for semantic segmentation. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.3431–3440. https://doi.org/10.1109/CVPR.2015.7298965
Sam DB, Surya S, Babu RV, 2017. Switching convolutional neural network for crowd counting. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.4031–4039. https://doi.org/10.1109/CVPR.2017.429
Sam DB, Sajjan NN, Babu RV, 2018. Divide and grow: capturing huge diversity in crowd images with incrementally growing CNN. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.3618–3626. https://doi.org/10.1109/CVPR.2018.00381
Shen Z, Xu Y, Ni B, et al., 2018. Crowd counting via adversarial cross-scale consistency pursuit. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.5245–5254. https://doi.org/10.1109/CVPR.2018.00550
Sindagi VA, Patel VM, 2017. Generating high-quality crowd density maps using contextual pyramid CNNs. Proc IEEE Int Conf on Computer Vision, p.1879–1888. https://doi.org/10.1109/ICCV.2017.206
Sindagi VA, Patel VM, 2018. A survey of recent advances in CNN-based single image crowd counting and density estimation. Patt Recogn Lett, 107:3–16. https://doi.org/10.1016/j.patrec.2017.07.007
Zhan BB, Monekosso DN, Remagnino P, et al., 2008. Crowd analysis: a survey. Mach Vis Appl, 19(5–6):345–357. https://doi.org/10.1007/s00138-008-0132-4
Zhang C, Li HS, Wang XG, et al., 2015. Cross-scene crowd counting via deep convolutional neural networks. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.833–841. https://doi.org/10.1109/CVPR.2015.7298684
Zhang C, Zhang K, Li HS, et al., 2016. Data-driven crowd understanding: a baseline for a large-scale crowd dataset. IEEE Trans Multim, 18(6):1048–1061. https://doi.org/10.1109/TMM.2016.2542585
Zhang YY, Zhou DS, Chen SQ, et al., 2016. Single-image crowd counting via multi-column convolutional neural network. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.589–597. https://doi.org/10.1109/CVPR.2016.70
Author information
Authors and Affiliations
Corresponding author
Additional information
Project supported by the National Natural Science Foundation of China (No. 61775048) and the Fundamental Research Funds for the Central Universities, China (No. ZDXMPY20180103)
Contributors
Jie-hao HUANG and Xiao-guang DI designed the research. Jie-hao HUANG drafted the manuscript. Jie-hao HUANG, Jun-de WU, and Ai-yue CHEN processed the data. Xiao-guang DI helped organize the manuscript. Jie-hao HUANG and Xiao-guang DI revised and finalized the paper.
Compliance with ethics guidelines
Jie-hao HUANG, Xiao-guang DI, Jun-de WU, and Aiyue CHEN declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Huang, Jh., Di, Xg., Wu, Jd. et al. A novel convolutional neural network method for crowd counting. Front Inform Technol Electron Eng 21, 1150–1160 (2020). https://doi.org/10.1631/FITEE.1900282
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/FITEE.1900282