research-article

Weakly supervised crowd counting based on Swin Transformer

Authors:

Feng Min,

Linlin Hao,

Yonggang KuangAuthors Info & Claims

ICRSA '23: Proceedings of the 2023 6th International Conference on Robot Systems and Applications

Pages 229 - 236

https://doi.org/10.1145/3655532.3655572

Published: 28 June 2024 Publication History

Get Access

Abstract

Abstract. Most of the existing crowd counting research methods are based on convolutional neural network (CNN), which has a strong ability to extract local features, but is limited by the size of the receptive field, making it difficult to model the global context. At the same time, the background of crowd images is complex, the targets are densely distributed, and they are easily disturbed by external conditions such as occlusion and lighting. Therefore, it is extremely cumbersome and error-prone to label the heads of pedestrians in the image. In response to the above problems, this paper proposes a weakly supervised crowd counting method based on Swin Transformer. First, Swin Transformer is used as the backbone network for feature extraction to capture global context information and realize the modeling of feature interactions between targets. Secondly, attention-based multi-scale feature fusion module is designed to aggregate global spatial position features on multiple scales to improve the detection effect of small objects. Finally, global average pooling is used for feature dimensionality reduction and a regression layer is designed to predict the number of people. Tests were carried out on three crowd datasets including Shanghai Tech, UCF_CC_50 and UCF_QNRF. The experimental results show that the overall performance of the proposed method is better than other common crowd counting methods.

References

[1]

Lin S F, Chen J Y, Chao H X. Estimation of number of people in crowded scenes using perspective transformation[J]. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 2001, 31(6): 645-654.

Digital Library

Google Scholar

[2]

Chen K, Gong S, Xiang T, Cumulative attribute space for age and crowd density estimation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2013: 2467-2474.

Google Scholar

[3]

Tian Y, Chu X, Wang H. Cctrans: Simplifying and improving crowd counting with transformer[J]. arXiv preprint arXiv:2109.14483, 2021.

Google Scholar

[4]

Zhang Yingying, Zhou Desen, Chen Siqin, Single-image crowd counting via multi-column convolutional neural network [C] //Proc of the IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2016: 589-597

Google Scholar

[5]

Babu Sam D, Surya S, Venkatesh Babu R. Switching convolutional neural network for crowd counting[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 5744-5752.

Google Scholar

[6]

Li Y, Zhang X, Chen D. Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 1091-1100.

Google Scholar

[7]

Chan A B, Liang Z S J, Vasconcelos N. Privacy preserving crowd monitoring: Counting people without people models or tracking[C]//2008 IEEE conference on computer vision and pattern recognition. IEEE, 2008: 1-7.

Google Scholar

[8]

Yang Y, Li G, Wu Z, Weakly-supervised crowd counting learns from sorting rather than locations[C]//Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VIII 16. Springer International Publishing, 2020: 1-17.

Google Scholar

[9]

Dosovitskiy A, Beyer L, Kolesnikov A, An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.

Google Scholar

[10]

Liu Z, Lin Y, Cao Y, Swin transformer: Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 10012-10022.

Google Scholar

[11]

Lei Y, Liu Y, Zhang P, Towards using count-level weak supervision for crowd counting[J]. Pattern Recognition, 2021, 109: 107616.

Digital Library

Google Scholar

[12]

Liu Y, Shao Z, Teng Y, NAM: Normalization-based attention module[J]. arXiv preprint arXiv:2111.12419, 2021.

Google Scholar

Index Terms

Weakly supervised crowd counting based on Swin Transformer
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object detection

Recommendations

CrowdGraph: Weakly supervised Crowd Counting via Pure Graph Neural Network
Most existing weakly supervised crowd counting methods utilize Convolutional Neural Networks (CNN) or Transformer to estimate the total number of individuals in an image. However, both CNN-based (grid-to-count paradigm) and Transformer-based (sequence-to-...
PVT-Crowd: Bridging Multi-scale Features from Pyramid Vision Transformer for Weakly-Supervised Crowd Counting
Pattern Recognition and Computer Vision
Abstract
Weakly-supervised crowd counting does not require location-level annotations, but only relies on count-level annotations to achieve the task of crowd counting for images, which is becoming a new research hotspot in the field of crowd counting. ...
Crowd counting using statistical features based on curvelet frame change detection

Automatic counting for moving crowds in digital images is an important application in computer artificial intelligence, especially for safety and management purposes. This paper presents a new method to estimate the size of a crowd. The new algorithm ...

Comments

Information & Contributors

Information

Published In

ICRSA '23: Proceedings of the 2023 6th International Conference on Robot Systems and Applications

September 2023

335 pages

ISBN:9798400708039

DOI:10.1145/3655532

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Natural Science Foundation of China

Conference

ICRSA 2023

ICRSA 2023: 2023 the 6th International Conference on Robot Systems and Applications

September 22 - 24, 2023

Wuhan, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
20
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)4

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Abstract

References

Index Terms

Recommendations

CrowdGraph: Weakly supervised Crowd Counting via Pure Graph Neural Network

PVT-Crowd: Bridging Multi-scale Features from Pyramid Vision Transformer for Weakly-Supervised Crowd Counting

Crowd counting using statistical features based on curvelet frame change detection

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

HTML Format

Share

Share this Publication link

Share on social media

Affiliations