skip to main content
10.1145/3655532.3655572acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicrsaConference Proceedingsconference-collections
research-article

Weakly supervised crowd counting based on Swin Transformer

Published: 28 June 2024 Publication History

Abstract

Abstract. Most of the existing crowd counting research methods are based on convolutional neural network (CNN), which has a strong ability to extract local features, but is limited by the size of the receptive field, making it difficult to model the global context. At the same time, the background of crowd images is complex, the targets are densely distributed, and they are easily disturbed by external conditions such as occlusion and lighting. Therefore, it is extremely cumbersome and error-prone to label the heads of pedestrians in the image. In response to the above problems, this paper proposes a weakly supervised crowd counting method based on Swin Transformer. First, Swin Transformer is used as the backbone network for feature extraction to capture global context information and realize the modeling of feature interactions between targets. Secondly, attention-based multi-scale feature fusion module is designed to aggregate global spatial position features on multiple scales to improve the detection effect of small objects. Finally, global average pooling is used for feature dimensionality reduction and a regression layer is designed to predict the number of people. Tests were carried out on three crowd datasets including Shanghai Tech, UCF_CC_50 and UCF_QNRF. The experimental results show that the overall performance of the proposed method is better than other common crowd counting methods.

References

[1]
Lin S F, Chen J Y, Chao H X. Estimation of number of people in crowded scenes using perspective transformation[J]. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 2001, 31(6): 645-654.
[2]
Chen K, Gong S, Xiang T, Cumulative attribute space for age and crowd density estimation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2013: 2467-2474.
[3]
Tian Y, Chu X, Wang H. Cctrans: Simplifying and improving crowd counting with transformer[J]. arXiv preprint arXiv:2109.14483, 2021.
[4]
Zhang Yingying, Zhou Desen, Chen Siqin, Single-image crowd counting via multi-column convolutional neural network [C] //Proc of the IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2016: 589-597
[5]
Babu Sam D, Surya S, Venkatesh Babu R. Switching convolutional neural network for crowd counting[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 5744-5752.
[6]
Li Y, Zhang X, Chen D. Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 1091-1100.
[7]
Chan A B, Liang Z S J, Vasconcelos N. Privacy preserving crowd monitoring: Counting people without people models or tracking[C]//2008 IEEE conference on computer vision and pattern recognition. IEEE, 2008: 1-7.
[8]
Yang Y, Li G, Wu Z, Weakly-supervised crowd counting learns from sorting rather than locations[C]//Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VIII 16. Springer International Publishing, 2020: 1-17.
[9]
Dosovitskiy A, Beyer L, Kolesnikov A, An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.
[10]
Liu Z, Lin Y, Cao Y, Swin transformer: Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 10012-10022.
[11]
Lei Y, Liu Y, Zhang P, Towards using count-level weak supervision for crowd counting[J]. Pattern Recognition, 2021, 109: 107616.
[12]
Liu Y, Shao Z, Teng Y, NAM: Normalization-based attention module[J]. arXiv preprint arXiv:2111.12419, 2021.

Index Terms

  1. Weakly supervised crowd counting based on Swin Transformer

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICRSA '23: Proceedings of the 2023 6th International Conference on Robot Systems and Applications
    September 2023
    335 pages
    ISBN:9798400708039
    DOI:10.1145/3655532
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 June 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Attention
    2. Convolutional neural network (CNN)
    3. Crowd counting
    4. Global average pooling
    5. Swin transformer

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    ICRSA 2023

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 20
      Total Downloads
    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media