ABSTRACT
Reasonable management and control of extra crowded scenes have become a hot topic in recent years. Counting people from density map generated from the object location annotations is an effective way to analyze crowd information and control crowds in severely congested scenes. In this paper, we propose a novel end-to-end crowd counting method called MSANet for crowd counting. MSANet consists of the VGG16 backbone as the fronted part, two branches as the back-end part, including the attention map extractor to predict crowd states (means with people or not), and density map branch to regress the density map. What is more, to obtain high-resolution density map, we combine different scale maps from the front part to the back-end part. On the design of the loss function, to enhance the resolution of the predicted map and its structural similarity to ground truth, we proposed a new loss function for crowd counting. The test result based on the public dataset ShanghaiTech and Subway Crowd Counting Dataset supported by the Nanjing Metro demonstrates the effectiveness of our method.
- Viola P, Jones M J (2004). Robust real-time face detection. International journal of computer vision, pp. 137-154.Google ScholarDigital Library
- Dalal, Navneet, and Bill Triggs (2005). Histograms of oriented gradients for human detection. IEEE computer society conference on computer vision and pattern recognition, pp. 886-893.Google Scholar
- Hu P, Ramanan D. (2017). Finding tiny faces. In Proceedings of the IEEE conference on. computer vision and pattern recognition, pp 951-959.Google ScholarCross Ref
- Najibi M, Samangouei P, Chellappa R, (2017). Ssh: Single stage headless face detector. Proceedings of the IEEE international conference on computer vision, pp. 4875-4884.Google ScholarCross Ref
- Lempitsky V, Zisserman A. (2010). Learning to count objects in images. Advances in neural. information processing systems, pp 1324-1332.Google Scholar
- Zhang Y, Zhou D, Chen S, (2016). Single-image crowd counting via multi-column convolutional. neural network. Proceedings of the IEEE conference on computer vision andpattern recognition, pp. 589-597.Google ScholarCross Ref
- Li Y, Zhang X, Chen D. (2018). Csrnet: Dilated convolutional neural networks for understanding the. highly congested scenes. Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1091-1100.Google ScholarCross Ref
- Liu W, Salzmann M, Fua P. (2019). Context-aware crowd counting. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5099-5108.Google ScholarCross Ref
- Lin T Y, Dollár P, Girshick R, (2017). Feature pyramid networks for object detection. Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117-2125.Google ScholarCross Ref
- Idrees H, Tayyab M, Athrey K, (2018). Composition loss for counting, density map estimation and localization in dense crowds. Proceedings of the European Conference on Computer Vision, pp. 532-546.Google ScholarDigital Library
- Rong L, Li C. (2021). Coarse-and fine-grained attention network with background-aware loss for crowd density map estimation. Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pp. 3675-3684.Google ScholarCross Ref
- Shi M, Yang Z, Xu C, (2019). Revisiting perspective information for efficient crowd counting. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7279-7288.Google ScholarCross Ref
- Sindagi V A, Patel V M. (2019). Multi-level bottom-top and top-bottom feature fusion for crowd counting. Proceedings of the IEEE International Conference on Computer Vision, pp 1002-1012.Google ScholarCross Ref
- Cheng Z Q, Li J X, Dai Q, (2019). Improving the learning of multi-column convolutional neural network for crowd counting. Proceedings of the 27th ACM international conference on multimedia, pp. 1897-1906.Google ScholarDigital Library
- Wei B, Yuan Y, Wang Q. (2020). MSPNET: multi-supervised parallel network for crowd counting. ICASSP-IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2418-2422.Google ScholarCross Ref
- Wu X, Zheng Y, Ye H, (2020). Counting crowds with varying densities via adaptive scenario discovery framework. Neurocomputing, pp. 127-138.Google Scholar
- Hu J, Shen L, Sun G. Squeeze-and-excitation networks. (2018). Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132-7141.Google Scholar
- Zhao Z, Han T, Gao J, (2020). A flow base bi-path network for cross-scene video crowd understanding in aerial view. European Conference on Computer Vision. Springer, Cham, pp. 574-587.Google ScholarDigital Library
- Valloli V K, Mehta K. (2019). W-net: Reinforced u-net for density map estimation[J]. arXiv preprint arXiv. pp. 1903.11249.Google Scholar
- Ronneberger O, Fischer P, Brox T. (2015). U-net: Convolutional networks for biomedical image. Segmentation. International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, pp. 234-241.Google Scholar
- Shi Z, Mettes P, Snoek C G M. (2019).Counting with focus for free. Proceedings of the IEEE International Conference on Computer Vision, pp. 4200-4209.Google ScholarCross Ref
- Lin T Y, Goyal P, Girshick R, (2017). Focal loss for dense object detection. Proceedings of the IEEE international conference on computer vision, pp. 2980-2988.Google ScholarCross Ref
- Liu N, Long Y, Zou C, (2019). ADcrowdNet: An attention-injective deformable convolutional network for crowd understanding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3225-3234.Google ScholarCross Ref
- Cao X, Wang Z, Zhao Y, (2018) Scale aggregation network for accurate and efficient crowd counting. Proceedings of the European Conference on Computer Vision, pp. 734-750.Google ScholarDigital Library
- Liu L, Qiu Z, Li G, (2019). Crowd counting with deep structured scale integration network. Proceedings of the IEEE International Conference on Computer Vision, pp. 1774-1783.Google ScholarCross Ref
- Wang Q, Gao J, Lin W, (2019). Learning from synthetic data for crowd counting in the wild. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8198-8207.Google ScholarCross Ref
- He G, Chen Q, Jiang D, (2017). A double-region learning algorithm for counting the number of pedestrians in subway surveillance videos[J]. Engineering Applications of Artificial Intelligence, pp. 302-314.Google ScholarDigital Library
- Jiang S, Lu X, Lei Y, (2019). Mask-aware networks for crowd counting[J]. IEEE Transactions on. Circuits and Systems for Video Technology, 30(9): 3119-3129.Google ScholarDigital Library
- Zhu L, Zhao Z, Lu C, (2019). Dual path multi-scale fusion networks with attention for crowd. counting[J]. arXiv preprint arXiv:1902.01115.Google Scholar
- Ioffe S, Szegedy C. (2015). Batch normalization: Accelerating deep network training by reducing. internal covariate shift. International conference on machine learning. PMLR, pp448-456.Google Scholar
- Jiang X, Xiao Z, Zhang B, (2019). Crowd counting and density estimation by trellis encoder-decoder networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6133-6142.Google ScholarCross Ref
Index Terms
- Multi Scale Attention Network for Crowd Counting
Recommendations
Crowd counting with crowd attention convolutional neural network
AbstractCrowd counting is a challenging problem due to the scene complexity and scale variation. Although deep learning has achieved great improvement in crowd counting, scene complexity affects the judgement of these methods and they usually ...
Crowd counting method via a dynamic-refined density map network
AbstractAt present, most existing crowd counting methods use density maps to estimate the number of people, so the quality of density maps is particularly important to the counting results. In practical application, the density map generated ...
A crowd counting method via density map and counting residual estimation
AbstractRecently, state-of-the-art crowd counting methods have focused more on predicting a density map and then obtaining the final aggregated count. In 2018, a typical density map-based network for congested scene recognition called CSRNet was proposed, ...
Comments