research-article

Multi Scale Attention Network for Crowd Counting

Authors:
Xiangpeng Yang

School of Automation, Southeast University, China and Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of Education, China

School of Automation, Southeast University, China and Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of Education, China
View Profile

,
Xiaobo Lu

School of Automation, Southeast University, China and Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of Education, China

School of Automation, Southeast University, China and Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of Education, China
View Profile

CSAE '21: Proceedings of the 5th International Conference on Computer Science and Application EngineeringOctober 2021Article No.: 22Pages 1–8https://doi.org/10.1145/3487075.3487097

Published:07 December 2021Publication History

CSAE '21: Proceedings of the 5th International Conference on Computer Science and Application Engineering

Pages 1–8

ABSTRACT

Reasonable management and control of extra crowded scenes have become a hot topic in recent years. Counting people from density map generated from the object location annotations is an effective way to analyze crowd information and control crowds in severely congested scenes. In this paper, we propose a novel end-to-end crowd counting method called MSANet for crowd counting. MSANet consists of the VGG16 backbone as the fronted part, two branches as the back-end part, including the attention map extractor to predict crowd states (means with people or not), and density map branch to regress the density map. What is more, to obtain high-resolution density map, we combine different scale maps from the front part to the back-end part. On the design of the loss function, to enhance the resolution of the predicted map and its structural similarity to ground truth, we proposed a new loss function for crowd counting. The test result based on the public dataset ShanghaiTech and Subway Crowd Counting Dataset supported by the Nanjing Metro demonstrates the effectiveness of our method.

References

Viola P, Jones M J (2004). Robust real-time face detection. International journal of computer vision, pp. 137-154.Google ScholarDigital Library
Dalal, Navneet, and Bill Triggs (2005). Histograms of oriented gradients for human detection. IEEE computer society conference on computer vision and pattern recognition, pp. 886-893.Google Scholar
Hu P, Ramanan D. (2017). Finding tiny faces. In Proceedings of the IEEE conference on. computer vision and pattern recognition, pp 951-959.Google ScholarCross Ref
Najibi M, Samangouei P, Chellappa R, (2017). Ssh: Single stage headless face detector. Proceedings of the IEEE international conference on computer vision, pp. 4875-4884.Google ScholarCross Ref
Lempitsky V, Zisserman A. (2010). Learning to count objects in images. Advances in neural. information processing systems, pp 1324-1332.Google Scholar
Zhang Y, Zhou D, Chen S, (2016). Single-image crowd counting via multi-column convolutional. neural network. Proceedings of the IEEE conference on computer vision andpattern recognition, pp. 589-597.Google ScholarCross Ref
Li Y, Zhang X, Chen D. (2018). Csrnet: Dilated convolutional neural networks for understanding the. highly congested scenes. Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1091-1100.Google ScholarCross Ref
Liu W, Salzmann M, Fua P. (2019). Context-aware crowd counting. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5099-5108.Google ScholarCross Ref
Lin T Y, Dollár P, Girshick R, (2017). Feature pyramid networks for object detection. Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117-2125.Google ScholarCross Ref
Idrees H, Tayyab M, Athrey K, (2018). Composition loss for counting, density map estimation and localization in dense crowds. Proceedings of the European Conference on Computer Vision, pp. 532-546.Google ScholarDigital Library
Rong L, Li C. (2021). Coarse-and fine-grained attention network with background-aware loss for crowd density map estimation. Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pp. 3675-3684.Google ScholarCross Ref
Shi M, Yang Z, Xu C, (2019). Revisiting perspective information for efficient crowd counting. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7279-7288.Google ScholarCross Ref
Sindagi V A, Patel V M. (2019). Multi-level bottom-top and top-bottom feature fusion for crowd counting. Proceedings of the IEEE International Conference on Computer Vision, pp 1002-1012.Google ScholarCross Ref
Cheng Z Q, Li J X, Dai Q, (2019). Improving the learning of multi-column convolutional neural network for crowd counting. Proceedings of the 27th ACM international conference on multimedia, pp. 1897-1906.Google ScholarDigital Library
Wei B, Yuan Y, Wang Q. (2020). MSPNET: multi-supervised parallel network for crowd counting. ICASSP-IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2418-2422.Google ScholarCross Ref
Wu X, Zheng Y, Ye H, (2020). Counting crowds with varying densities via adaptive scenario discovery framework. Neurocomputing, pp. 127-138.Google Scholar
Hu J, Shen L, Sun G. Squeeze-and-excitation networks. (2018). Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132-7141.Google Scholar
Zhao Z, Han T, Gao J, (2020). A flow base bi-path network for cross-scene video crowd understanding in aerial view. European Conference on Computer Vision. Springer, Cham, pp. 574-587.Google ScholarDigital Library
Valloli V K, Mehta K. (2019). W-net: Reinforced u-net for density map estimation[J]. arXiv preprint arXiv. pp. 1903.11249.Google Scholar
Ronneberger O, Fischer P, Brox T. (2015). U-net: Convolutional networks for biomedical image. Segmentation. International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, pp. 234-241.Google Scholar
Shi Z, Mettes P, Snoek C G M. (2019).Counting with focus for free. Proceedings of the IEEE International Conference on Computer Vision, pp. 4200-4209.Google ScholarCross Ref
Lin T Y, Goyal P, Girshick R, (2017). Focal loss for dense object detection. Proceedings of the IEEE international conference on computer vision, pp. 2980-2988.Google ScholarCross Ref
Liu N, Long Y, Zou C, (2019). ADcrowdNet: An attention-injective deformable convolutional network for crowd understanding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3225-3234.Google ScholarCross Ref
Cao X, Wang Z, Zhao Y, (2018) Scale aggregation network for accurate and efficient crowd counting. Proceedings of the European Conference on Computer Vision, pp. 734-750.Google ScholarDigital Library
Liu L, Qiu Z, Li G, (2019). Crowd counting with deep structured scale integration network. Proceedings of the IEEE International Conference on Computer Vision, pp. 1774-1783.Google ScholarCross Ref
Wang Q, Gao J, Lin W, (2019). Learning from synthetic data for crowd counting in the wild. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8198-8207.Google ScholarCross Ref
He G, Chen Q, Jiang D, (2017). A double-region learning algorithm for counting the number of pedestrians in subway surveillance videos[J]. Engineering Applications of Artificial Intelligence, pp. 302-314.Google ScholarDigital Library
Jiang S, Lu X, Lei Y, (2019). Mask-aware networks for crowd counting[J]. IEEE Transactions on. Circuits and Systems for Video Technology, 30(9): 3119-3129.Google ScholarDigital Library
Zhu L, Zhao Z, Lu C, (2019). Dual path multi-scale fusion networks with attention for crowd. counting[J]. arXiv preprint arXiv:1902.01115.Google Scholar
Ioffe S, Szegedy C. (2015). Batch normalization: Accelerating deep network training by reducing. internal covariate shift. International conference on machine learning. PMLR, pp448-456.Google Scholar
Jiang X, Xiao Z, Zhang B, (2019). Crowd counting and density estimation by trellis encoder-decoder networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6133-6142.Google ScholarCross Ref

Index Terms

Multi Scale Attention Network for Crowd Counting
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Machine learning
2. Information systems
  1. Information systems applications

Index terms have been assigned to the content through auto-classification.

Recommendations

Crowd counting with crowd attention convolutional neural network
Abstract
Crowd counting is a challenging problem due to the scene complexity and scale variation. Although deep learning has achieved great improvement in crowd counting, scene complexity affects the judgement of these methods and they usually ...
Read More
Crowd counting method via a dynamic-refined density map network
Abstract
At present, most existing crowd counting methods use density maps to estimate the number of people, so the quality of density maps is particularly important to the counting results. In practical application, the density map generated ...
Read More
A crowd counting method via density map and counting residual estimation
Abstract
Recently, state-of-the-art crowd counting methods have focused more on predicting a density map and then obtaining the final aggregated count. In 2018, a typical density map-based network for congested scene recognition called CSRNet was proposed, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CSAE '21: Proceedings of the 5th International Conference on Computer Science and Application Engineering
October 2021
660 pages
ISBN:9781450389853
DOI:10.1145/3487075
Editors:
Ali Emrouznejad,
Jui-Sheng (Rayson) Chou
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 December 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Attention map
Crowd counting
Density map
Multi scale attention network
Structural similarity
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate368of770submissions,48%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 68
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Multi Scale Attention Network for Crowd Counting

CSAE '21: Proceedings of the 5th International Conference on Computer Science and Application Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Crowd counting with crowd attention convolutional neural network

Crowd counting method via a dynamic-refined density map network

A crowd counting method via density map and counting residual estimation