ABSTRACT
Previous works of crowd counting prepossess the given label and convert it into a density map or count map used for learning. However, we revealed that density maps tend to have severe errors due to faulty occlusions, head size variation, and head shape variation. Directly learning the density map will often result in fatal over-fitting. On the other hand, Count-map did not fully utilize the detailed information of the image. These unsatisfactory preprocessing lead to the performance bottleneck despite recent advances in network architecture. To solve these problems, in this paper, we discovered that the distribution of errors throughout the density map is not uniform. Moreover, it is correlated with the distance to the nearest annotation point. Inspired by this finding, we introduce Fine-Grained Adaptive Losses to learn the density map differently in different regions of the density map. While our method is simple, it dictates that we should endeavor to obtain more supervision from the density map. Our effort subverts the traditional use of density maps and opens up a new vision for future counting research. Extensive experiments demonstrate that our approach significantly outperforms standard methods in crowd-counting datasets.
- V. Lempitsky and A. Zisserman, “Learning to count objects in images,” in NeurIPS, 2010. 1, 2, 3, 4, 16Google Scholar
- Y. Zhang, D. Zhou, S. Chen, S. Gao, and Y. Ma, “Single-image crowd counting via multi-column convolutional neural network,” in CVPR, 2016, pp. 589–597. 1, 13, 16Google ScholarCross Ref
- Y. Li, X. Zhang, and D. Chen, “Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes,” in CVPR, 2018, pp. 1091–1100. 1Google ScholarCross Ref
- J. Y. Xiyang Liu and W. Ding, “Adaptive mixture regression network with local counting map for crowd counting,” ECCV, 2020. 1, 2, 5, 6Google Scholar
- B. Wang, H. Liu, D. Samaras, and M. Hoai, “Distribution matching for crowd counting,” arXiv, 2020. 2Google Scholar
- H. Xiong and A. Yao, “Discrete-constrained regression for local counting models,” arXiv preprint arXiv:2207.09865, 2022. 2, 6, 11Google ScholarDigital Library
- Z. Ma, X. Wei, X. Hong, and Y. Gong, “Bayesian loss for crowd count estimation with point supervision,” in ICCV, 2019, pp. 6142–6151. 2Google Scholar
- J. Wan and A. Chan, “Modeling noisy annotations for crowd counting,” NeurIPS, vol. 33, 2020. 2Google Scholar
- Z.-Q. Cheng, Q. Dai, H. Li, J. Song, X. Wu, and A. G. Hauptmann, “Rethinking spatial invariance of convolutional networks for object counting,” in CVPR2022, pp. 19 638–19 648. 2, 4Google Scholar
- J. Wan and A. Chan, “Adaptive density map generation for crowd counting,” in ICCV, 2019, pp. 1130–1139. 2, 3Google Scholar
- V. Sindagi, R. Yasarla, D. Babu, R. Babu, and V. Patel, Learning to Count in the Crowd from Limited Labeled Data, 11 2020, pp. 212–229. 2Google Scholar
- D. Liang, W. Xu, Y. Zhu, and Y. Zhou, “Focal inverse distance transform maps for crowd localization and counting in dense crowd,” arXiv preprint arXiv:2102.07925, 2021. 2, 3Google Scholar
- Q. Song, C. Wang, Z. Jiang, Y. Wang, Y. Tai, C. Wang, J. Li, F. Huang, and Y. Wu, “Rethinking counting and localization in crowds: A purely point-based framework,” in ICCV, 2021, pp. 3365–3374. 2Google Scholar
- J. Wan, Z. Liu, and A. B. Chan, “A generalized loss function for crowd counting and localization,” in CVPR, 2021, pp. 1974–1983. 2Google ScholarCross Ref
- Q. Zhang and A. B. Chan, “3d crowd counting via multi-view fusion with 3d gaussian kernels,” in AAAI, vol. 34, no. 07, 2020, pp. 12 837–12 844. 3Google ScholarCross Ref
- Z.-Q. Cheng, J.-X. Li, Q. Dai, X. Wu, and A. G. Hauptmann, “Learning spatial awareness to improve crowd counting,” in ICCV, 2019, pp. 6152–6161. 3Google Scholar
- Z.-Q. Cheng,, J.-X. Li, Q. Dai, X., Wu, J. Y. He, and A. G. Hauptmann, “Improving the learning of multi-column convolutional neural network for crowd counting”. In ACM MM 2019.Google ScholarDigital Library
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv, 2014. 11Google Scholar
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248–255. 11Google ScholarCross Ref
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv, 2014. 13Google Scholar
- J. Zhang, Z.-Q. Cheng, X. Wu, W. Li, and J. J. Qiao, “CrossNet: Boosting Crowd Counting with Localization”. In ACM MM 2022.Google ScholarDigital Library
- H. Idrees, I. Saleemi, C. Seibert, and M. Shah, “Multi-source multi-scale counting in extremely dense crowd images,” in CVPR, 2013, pp. 2547–2554. 16Google ScholarDigital Library
- H. Idrees, M. Tayyab, K. Athrey, D. Zhang, S. Al-Maadeed, N. Rajpoot, and M. Shah, “Composition loss for counting, density map estimation and localization in dense crowds,” in ECCV, 2018, pp. 532–546. 16Google ScholarDigital Library
- S. Huang, X. Li, Z.-Q. Cheng, Z. Zhang,and A. Hauptmann, “Stacked pooling for boosting scale invariance of crowd counting”. In ICASSP 2020,pp. 2578-2582Google Scholar
Index Terms
- Overturning the Counting Cornerstone: Exploring Fine-Grained Adaptive Losses to Subvert the Conventional Density Estimation
Recommendations
Crowd counting method via a dynamic-refined density map network
AbstractAt present, most existing crowd counting methods use density maps to estimate the number of people, so the quality of density maps is particularly important to the counting results. In practical application, the density map generated ...
A crowd counting method via density map and counting residual estimation
AbstractRecently, state-of-the-art crowd counting methods have focused more on predicting a density map and then obtaining the final aggregated count. In 2018, a typical density map-based network for congested scene recognition called CSRNet was proposed, ...
AutoScale: Learning to Scale for Crowd Counting
AbstractRecent works on crowd counting mainly leverage Convolutional Neural Networks (CNNs) to count by regressing density maps, and have achieved great progress. In the density map, each person is represented by a Gaussian blob, and the final count is ...
Comments