research-article

A Novel Spatiotemporal Attention Convolutional Neural Network for Video Crowd Counting

Authors:

Shangjie Zhang,

Yuelei XiaoAuthors Info & Claims

AIPR '22: Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition

Pages 607 - 614

https://doi.org/10.1145/3573942.3574069

Published: 16 May 2023 Publication History

Abstract

For most existing crowd counting methods, image-based methods are still used for crowd counting in the presence of video datasets, ignoring powerful time information. Thus, a novel spatiotemporal attention convolutional neural network is proposed to solve the video-based crowd counting problem. Firstly, the first ten layers of VGG-16 are used as the backbone network to extract features, and a single layer of ConvLSTM captures the time correlation of adjacent frames. Then, stacked dilated convolutional layers are used to enlarge the receptive field without increasing the computational load. Finally, a convolutional block attention module is introduced with the adaptive refinement of feature mapping. Its ability to emphasize or suppress information in the channel and spatial dimensions aids information dissemination. Experimental results on the two reference datasets (i.e., Mall and WorldExpo'10) show that the proposed method further improves the accuracy of crowd counting and is superior to the other existing crowd counting methods.

References

[1]

B. Sheng, C. Shen, G. Lin, J. Li, W. Yang and C. Sun. 2016. Crowd Counting via Weighted VLAD on a Dense Attribute Feature Map. In IEEE Transactions on Circuits and Systems for Video Technology. 28, 8 (August 2018), 1788-1797. http://doi.org/10.1109/TCSVT.2016.2637379

Digital Library

[2]

Li B, Huang H, Zhang A, Liu P, and Liu C. 2021. Approaches on crowd counting and density estimation: a review. Pattern Analysis and Applications. 24, 3 (February 2021), 853-874. http://doi.org/10.1007/s10044-021-00959-z

Digital Library

[3]

D. Ryan, S. Denman, C. Fookes and S. Sridharan. 2009. Crowd Counting Using Multiple Local Features. 2009 Digital Image Computing: Techniques and Applications. IEEE, Melbourne, VIC, Australia, 81-88, http://doi.org/10.1109/DICTA.2009.22

Digital Library

[4]

Handte M, Iqbal M U, Wagner S, 2014. Crowd density estimation for public transport vehicles. CEUR Workshop Proceedings 1133, (March 2014), 315-322.

[5]

Z. Zou, H. Shao, X. Qu, W. Wei, and P. Zhou. 2019. Enhanced 3D convolutional networks for crowd counting. In British Machine Vision Conference (BMVC). (September 2019), http://doi.org/ 10.48550/arXiv.1908.04121

[6]

F. Xiong, X. Shi, and D. Y. Yeung. 2017. Spatiotemporal modeling for crowd counting in videos. In Proceedings of the IEEE International Conference on Computer Vision. (October 2017), 5161–5169. https://doi.org/10.1109/ICCV.2017.551

[7]

Y. Fang, B. Zhan, W. Cai, S. Gao, and B. Hu. 2019. Locality-constrained spatial transformer network for video crowd counting. In IEEE International Conference on Multimedia and Expo, (2019), 814–819. http://doi.org/10.1109/ICME.2019.00145

[8]

Q. Wu, C. Zhang, X. Kong, M. Zhao, and Y. Chen. 2020. Triple attention for robust video crowd counting. In IEEE International Conference on Image Processing. (October 2020), 1966–1970. http://doi.org/10.1109/ICIP40778.2020.9190701

[9]

Simonyan K, Zisserman A. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. Computer Science. (September 2014). http://doi.org/10.48550/arXiv.1409.1556

[10]

Sabzmeydani P, Mori G. 2007. Detecting Pedestrians by Learning Shapelet Features. IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Minneapolis, MN, USA, 1-8. http://doi.org/10.1109/CVPR.2007.383134

[11]

Dalal N, Triggs B. 2005. Histograms of Oriented Gradients for Human Detection. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, San Diego, CA, USA, 886-893. http://doi.org/10.1109/CVPR.2005.177

Digital Library

[12]

Viola P, Jones M J. 2004. Robust Real-Time Face Detection. International Journal of Computer Vision. 57, 2, (May 2004), 137-154. http://doi.org/10.1023/B:VISI.0000013087.49260.fb

Digital Library

[13]

Pham V Q, Kozakaya T, Yamaguchi O, and Okada R. 2015. COUNT Forest: CO-Voting Uncertain Number of Targets Using Random Forest for Crowd Density Estimation. 2015 IEEE International Conference on Computer Vision (ICCV). IEEE Computer Society. 3253-3261, http://doi.org/10.1109/ICCV.2015.372

Digital Library

[14]

J. Gall, A. Yao, N. Razavi, L. Van Gool and V. Lempitsky. 2011. Hough Forests for Object Detection, Tracking, and Action Recognition. In IEEE Transactions on Pattern Analysis and Machine Intelligence. 33, 11 (November 2011), 2188-2202. http://doi.org/10.1109/TPAMI.2011.70

Digital Library

[15]

Lempitsky V, Zisserman A. 2010. Learning to count objects in images. Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems. Vancouver, British Columbia, Canada. 1324-1332.

[16]

M. Rodriguez, I. Laptev, J. Sivic and J. Audibert. 2011. Density-aware person detection and tracking in crowds. 2011 International Conference on Computer Vision. IEEE, Barcelona, Spain, 2423-2430. http://doi.org/10.1109/ICCV.2011.6126526

Digital Library

[17]

K. Chen, S. Gong, T. Xiang and C. C. Loy. 2013. Cumulative Attribute Space for Age and Crowd Density Estimation. 2013 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Portland, OR, USA, 2467-2474. http://doi.org/10.1109/CVPR.2013.319

Digital Library

[18]

Yingying Zhang, Desen Zhou, Siqin Chen, Shenghua Gao, and Yi Ma. 2016. Single-image crowd counting via multi-column convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, NV, USA, 589–597. http://doi.org/10.1109/CVPR.2016.70

[19]

Cong Z, Li H, Wang X, 2015. Cross-scene crowd counting via deep convolutional neural networks. IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Boston, MA, 833-841. http://doi.org/ 10.1109/CVPR.2015.7298684

[20]

Wang Q, Wan J, Li X. 2019. Robust Hierarchical Deep Learning for Vehicular Management. IEEE Transactions on Vehicular Technology. 68, 5 (May 2019), 4148-4156. http://doi.org/10.1109/TVT.2018.2883046

[21]

Shi X, Chen Z, Wang H, 2015. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. MIT Press. 1, (December 2015), 802-810. http://doi.org/10.1007/978-3-319-21233-3_6

Digital Library

[22]

Woo S, Park J, Lee J Y, 2018. CBAM: Convolutional Block Attention Module. European Conference on Computer Vision. Springer, Cham. http://doi.org/10.1007/978-3-030-01234-2_1

Digital Library

[23]

Li Y, Zhang X, Chen D. 2018. CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, USA, 1091-1100. http://doi.org/10.1109/CVPR.2018.00120

[24]

K. Chen, C. C. Loy, S. Gong, and T. Xiang. 2012. Feature mining for localised crowd counting. In Proceedings of British Machine Vision Conference. http://doi.org/10.5244/C.26.21

[25]

C. Zhang, K. Kang, H. Li, X. Wang, R. Xie, and X. Yang. 2016. Data-driven crowd understanding: A baseline for a largescale crowd dataset. IEEE Transactions on Multimedia, 18, 6 (June 2016), 1048–1061. http://doi.org/10.1109/TMM.2016.2542585

Digital Library

[26]

Y. -J. Ma, H. -H. Shuai and W. -H. Cheng. 2021. Spatiotemporal Dilated Convolution with Uncertain Matching for Video-Based Crowd Estimation. In IEEE Transactions on Multimedia. 24, (January 2021), 261-273. http://doi.org/10.1109/TMM.2021.3050059

Digital Library

[27]

Wang, C., Song, Q., Zhang, B., Wang, Y., and Wu, Y. 2021. Uniformity in Heterogeneity: Diving Deep into count Interval Partition for Crowd Counting. 2021 IEEE/CVF International Conference on Computer Vision (ICCV). (July 2021). http://doi.org/ 10.1109/iccv48922.2021.00322

[28]

X Jiang, Zhang, L., Xu, M., Zhang, T., and Pang, Y. 2020. Attention Scaling for Crowd Counting. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, USA. http://doi.org/10.1109/CVPR42600.2020.00476

Index Terms

A Novel Spatiotemporal Attention Convolutional Neural Network for Video Crowd Counting
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object identification

Recommendations

Multi-scale dilated convolution of convolutional neural network for crowd counting
Abstract
Growing numbers of crowd density estimation methods have been developed in scene monitoring, crowd safety and on-site management scheduling. We proposed a method for density estimation of a single static image based on convolutional neural network ...
Pyramid-dilated deep convolutional neural network for crowd counting
Abstract
Statistics on crowds in crowded scenes can reflect the density level of crowds and provide safety warnings. This is a laborious task if conducted manually. In recent years, automated crowd counting has received extensive attention in the computer ...
A survey of crowd counting and density estimation based on convolutional neural network
Abstract
Crowd counting and crowd density estimation methods are of great significance in the field of public security. Estimating crowd density and counting from single image or video frame has become an essential part of a computer vision system in ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

AIPR '22: Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition

September 2022

1221 pages

ISBN:9781450396899

DOI:10.1145/3573942

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 May 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Shaanxi Science and Technology Co-ordination and Innovation Project of China
New Star Team Project of Xi'an University of Posts and Telecommunications
National Key Research and Development Program
National Natural Science Foundation of China

Conference

AIPR 2022

AIPR 2022: 2022 5th International Conference on Artificial Intelligence and Pattern Recognition

September 23 - 25, 2022

Xiamen, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
27
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)2

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten