Abstract
In recent years, crowd counting has increasingly drawn attention due to its widespread applications in the field of computer vision. Most of the existing methods rely on datasets with scarce labeled images to train networks. They are prone to suffer from the over-fitting problem. Further, these existing datasets usually just give manually labeled annotations related to the head center position. This kind of annotation provides limited information. In this paper, we propose to exploit virtual synthetic crowd scenes to improve the performance of the counting network in the real world. Since we can obtain people masks easily in a synthetic dataset, we first learn to distinguish people from the background via a segmentation network using the synthetic data. Then we transfer the learned segmentation priors from synthetic data to real-world data. Finally, we train a density estimation network on real-world data by utilizing the obtained people masks. Our experiments on two crowd counting datasets demonstrate the effectiveness of the proposed method.
Similar content being viewed by others
References
Wang Q, Gao J Y, Lin W, Yuan Y. Learning from synthetic data for crowd counting in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019, 8198–8207
Shang C, Ai H Z, Yang Yi. Crowd counting via learning perspective for multi-scale multi-view web images. Frontiers of Computer Science, 2019, 13(3): 579–587
Liu W Z, Salzmann M, Fua P. Context-aware crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019, 5099–5108
Zhang Y Y, Zhou D S, Chen S Q, Gao S G, Ma Y. Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, 589–597
Zhang J G, Huang K Q, Tan T N, Zhang Z X. Local structured representation for generic object detection. Frontiers of Computer Science, 2017, 11(4): 632–648
Jiang H Z, Cheng M M, Li S J, Borji A, Wang J D. Joint salient object detection and existence prediction. Frontiers of Computer Science, 2019, 13(4): 778–788
Li H, Liu Y, Xiong S W, Wang L. Pedestrian detection algorithm based on video sequences and laser point cloud. Frontiers of Computer Science, 2015, 9(3): 402–414
Gadekallu T R, Rajput D S, Reddy M. P K, Lakshmanna K, Bhattacharya S, Singh S, Jolfaei A, Alazab M. A novel pca-whale optimization-based deep neural network model for classification of tomato plant diseases using gpu. Journal of Real-Time Image Processing, 2020
Shrivastava R, Kumar P, Tripathi S, Tiwari V, Rajput D S, Gadekallu T R, Suthar B, Singh S, Ra I H. A novel grid and place neuron’s computational modeling to learn spatial semantics of an environment. Applied Sciences, 2020, 10(15): 5147
Thippa R G, Swarna P R, Parimala M, Chiranji L C, Praveen K R, Saqib H, Wazir Z K. A deep neural networks based model for uninterrupted marine environment monitoring. Computer Communications, 2020, 157: 64–75
Boominathan L, Kruthiventi S S, Babu R V. Crowdnet: a deep convolutional network for dense crowd counting. In: Proceedings of the ACM on Multimedia Conference. 2016, 640–644
Onoro-Rubio D, López-Sastre R J. Towards perspective-free object counting with deep learning. In: Proceedings of the European Conference on Computer Vision. 2016, 615–629
Kang D, Ma Z, Chan A B. Beyond counting: comparisons of density maps for crowd analysis tasks-counting, detection, and tracking. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 29(5): 1408–1422
Marsden M, McGuinness K, Little S, O’Connor N E. Resnetcrowd: a residual deep learning architecture for crowd counting, violent behaviour detection and crowd density level classification. In: Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance. 2017, 1–7
Walach E, Wolf L. Learning to count with cnn boosting. In: Proceedings of the European Conference on Computer Vision. 2016, 660–676
Sam D B, Surya S, and Babu R V. Switching convolutional neural network for crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, 4031–4039
Xu M L, Ge Z Y, Jiang X H, Cui G G, Lv P, Zhou B, Xu C S. Depth information guided crowd counting for complex crowd scenes. Pattern Recognition Letters, 2019, 125: 563–569
Jiang X H, Zhang L, Lv P, Guo Y B, Zhu R J, Li Y F, Pang Y W, Li X, Zhou B, Xu M L. Learning multi-level density maps for crowd counting. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(8): 2705–2715
Jiang X H, Zhang L, Zhang T Z, Lv P, Zhou B, Pang Y W, Xu M L, Xu C S. Density-aware multi-task learning for crowd counting. IEEE Transactions on Multimedia, 2020, 23: 443–453
Jiang X H, Zhang L, Xu M L, Zhang T Z, Lv P, Zhou B, Yang X, Pang Y W. Attention scaling for crowd counting. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2020, 4705–4714
Sindagi V A, Patel V M. Generating high-quality crowd density maps using contextual pyramid cnns. In: Proceedings of the IEEE International Conference on Computer Vision. 2017, 1879–1888
Sindagi V A, Patel V M. Multi-level bottom-top and top-bottom feature fusion for crowd counting. In: Proceedings of The IEEE International Conference on Computer Vision. 2019, 1002–1012
Zhang A, Yue L, Shen J Y, Zhu F, Zhen X T, Cao X B, Shao L. Attentional neural fields for crowd counting. In: Proceedings of The IEEE International Conference on Computer Vision. 2019, 5713–5722
Zhang A, Shen J Y, Xiao Z H, Zhu F, Zhen X T, Cao X H, and Ling Shao. Relational attention network for crowd counting. In: Proceedings of The IEEE International Conference on Computer Vision. 2019, 6787–6796
Liu N, Long Y C, Zou C Q, Niu Q, Pan L, Wu H F. Adcrowdnet: An attention-injective deformable convolutional network for crowd understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019, 3225–3234
Liu C C, Weng X Y, Mu Y D. Recurrent attentive zooming for joint crowd counting and precise localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019, 1217–1226
Sindagi V A, Patel V M. Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance. 2017, 1–6
Zhao M M, Zhang J, Zhang C Y, Zhang W J. Leveraging heterogeneous auxiliary tasks to assist crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019, 12736–12745
Liu X L, Weijer J D, Bagdanov A D. Leveraging unlabeled data for crowd counting by learning to rank. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 7661–7669
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of the International Conference on Learning Representations. 2015
Li Y H, Zhang X F, Chen D M. Csrnet: dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 1091–1100
Ma Z H, Wei X, Hong X P, Gong Y H. Bayesian loss for crowd count estimation with point supervision. In: Proceedings of The IEEE International Conference on Computer Vision. 2019, 6141–6150
Kingma D P, Ba J. Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations. 2015
Li X H, Shen H F, Zhang L P, Zhang H Y, Yuan Q Q, Yang G. Recovering quantitative remote sensing products contaminated by thick clouds and shadows using multitemporal dictionary learning. IEEE Transactions on Geoscience and Remote Sensing, 2014, 52(11): 7086–7098
Deng J, Dong W, Socher R, Li L J, Li K, Li F F. Imagenet: A large-scale hierarchical image database. In: Proceedings of The IEEE conference on computer vision and pattern recognition. 2009, 248–255
Sam D B, Babu R V. Top-down feedback for crowd counting convolutional neural network. In: Proceedings of AAAI Conference on Artificial Intelligence. 2018, 7323–7330
Ma J J, Dai Y P, Tan Y P. Atrous convolutions spatial pyramid network for crowd counting and density estimation. Neurocomputing, 2019, 350: 91–101
Zeng L K, Xu X M, Cai B L, Qiu S, Zhang T. Multi-scale convolutional neural networks for crowd counting. In: Proceedings of The IEEE International Conference on Image Processing. 2017, 465–469
Shen Z, Xu Y, Ni B B, Wang M S, Hu J G, Yang X K. Crowd counting via adversarial cross-scale consistency pursuit. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 5245–5254
Zhang Y M, Zhou C L, Chang F L, Kot A C. Multi-resolution attention convolutional neural network for crowd counting. Neurocomputing, 2019, 329: 144–152
Zhang L, Shi Z L, Cheng M M, Liu Y, Bian J W, Zhou J T, Zheng G Y, Zeng Z. Nonlinear regression via deep negative correlation learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019
Sam D B, Sajjan N N, Babu R V, Srinivasan M. Divide and grow: Capturing huge diversity in crowd images with incrementally growing cnn. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 3618–3626
Zou Z K, Cheng Y, Qu X Y, Ji S L, Guo X X, Zhou P. Attend to count: Crowd counting with adaptive capacity multi-scale cnns. Neurocomputing, 2019, 367: 75–83
Wang L Y, Yin B Q, Tang X, Li Y. Removing background interference for crowd counting via de-background detail convolutional network. Neurocomputing, 2019, 332: 360–371
Liu L B, Wang H J, Li G B, Ouyang W L, Lin L. Crowd counting using deep recurrent spatial-aware network. In: Proceedings of the International Joint Conference on Artificial Intelligence. 2018, 849–855
Ranjan V, Le H U, Hoai M. Iterative crowd counting. In: Proceedings of the European Conference on Computer Vision. 2018, 270–285
Chen J W, Wen S, Wang Z F. Crowd counting with crowd attention convolutional neural network. Neurocomputing, 2019, 382: 210–220
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China (Grant Nos. 61802351, 61822701, 61872324, 61772474, 62036010), in part by China Postdoctoral Science Foundation (2018M632802), and in part by Key R&D and Promotion Projects in Henan Province (192102310258).
Author information
Authors and Affiliations
Corresponding author
Additional information
Xiaoheng Jiang received the BS degree, MS degree and PhD degree in electronic information engineering from the Tianjin University, China, in 2010, 2013 and 2017, respectively. Currently, he is an associate professor with the School of Information Engineering, Zhengzhou University, China. His research interests include computer vision and deep learning.
Hao Liu received the BS degree from Zhengzhou University, China in 2018. Currently, he is a Master student with the School of Information Engineering, Zhengzhou University, China. His research interests include computer vision and deep learning.
Li Zhang received the BS degree from University of Electronic Science and Technology of China, China in 2017, and the MS degree from Zheng-zhou University, China in 2020. Currently, he is a PhD candidate at Beihang University, China. His research interests include computer vision and deep learning.
Geyang Li is currently a senior student at Henan University of Science and Technology, China. His research interests include computer vision and deep learning.
Mingliang Xu received the PhD degree in computer science and technology from the State Key Laboratory of CAD&CG, Zhejiang University, China. He is currently a Professor with the School of Information Engineering, Zhengzhou University, China, and also the Director of the Center for Interdisciplinary Information Science Research, and the General Secretary of the ACM SIGAI China. His research interests include virtual reality and artificial intelligence.
Pei Lv is an associate professor in School of Information Engineering, Zhengzhou University, China. His research interests include video analysis and crowd simulation. He received his PhD in 2013 from the State Key Lab of CAD&CG, Zhejiang University, China. He has authored more than 20 journal and conference papers in these areas, including IEEE TIP, IEEE TCSVT, ACM MM, etc.
Bing Zhou received the BS and MS degrees from Xi’an Jiao Tong University, China in 1986 and 1989, respectively, and the PhD degree from Beihang University, China in 2003, all in computer science. He is currently a Professor with the School of Information Engineering, Zhengzhou University, China. His research interests cover video processing and understanding, surveillance, computer vision, and multimedia applications.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Jiang, X., Liu, H., Zhang, L. et al. Transferring priors from virtual data for crowd counting in real world. Front. Comput. Sci. 16, 163314 (2022). https://doi.org/10.1007/s11704-021-0387-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11704-021-0387-8