research-article

Multi-level Convolutional Transformer with Adaptive Ranking for Semi-supervised Crowd Counting

Authors:
Xin Deng

Fudan University, China

Fudan University, China
View Profile

,
Songjian Chen

Fudan University, China

Fudan University, China
View Profile

,
Yifan Chen

Shanghai Radio Equipment Research Institute, China

Shanghai Radio Equipment Research Institute, China
View Profile

,
Jie-Fang Xu

Shanghai Radio Equipment Research Institute, China

Shanghai Radio Equipment Research Institute, China
View Profile

ACAI '21: Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial IntelligenceDecember 2021Article No.: 2Pages 1–7https://doi.org/10.1145/3508546.3508548

Published:25 February 2022Publication History

ACAI '21: Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence

Pages 1–7

ABSTRACT

Most existing crowd counting methods have focused on pure convolutional neural network based supervised algorithms. Although these methods have attained good results on some datasets, they still encounter several common problems. The cost of labeling annotations for supervised methods is huge and the shortage of labeled datasets limits the further development of supervised algorithms for crowd counting. Meanwhile, pure CNN-based algorithms have certain limitations in building the connections among these features. To overcome those problems, we proposed a semi-supervised crowd counting algorithm that is a mixture model of CNN and transformer. Specifically, our method consists of two parts Multi-Level Convolutional Transformer (MLCT) and Adaptive Scale Module (ASM). MLCT is the counting branch, with its front end and back end being the CNN and the transformer, respectively. ASM outputs an adaptive scale factor for the unlabeled crowd images. We generate a ranking list based on this factor, which is fed into the MLCT and computes loss by the order of the list. Different from most crowd counting methods, we use a region-level regression target for labeled images, which is a weaker regression approach than the location regression. Furthermore, We train the entire model using a novel loss function that combines L1 loss and ranking loss. Experimental results on the three challenging datasets ShanghaiTech Part A, ShanghaiTech Part B, and UCF-QNRF have all demonstrated the effectiveness of the proposed approach.

References

Yinjie Lei, Yan Liu, Pingping Zhang, and Lingqiao Liu. Towards using count-level weak supervision for crowd counting. Pattern Recognition, 109:107616, 2021.Google ScholarDigital Library
Yifan Yang, Guorong Li, Zhe Wu, Li Su, Qingming Huang, and Nicu Sebe. Weakly-supervised crowd counting learns from sorting rather than locations. In The European Conference on Computer Vision. Springer, 2020.Google ScholarDigital Library
Lokesh Boominathan, Srinivas SS Kruthiventi, and R Venkatesh Babu. Crowdnet: A deep convolutional network for dense crowd counting. InGoogle Scholar
Proceedings of the 24th ACM international conference on Multimedia, pages 640–644, 2016.Google Scholar
Yingying Zhang, Desen Zhou, Siqin Chen, Shenghua Gao, and Yi Ma. Single-image crowd counting via multi-column convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 589–597, 2016.Google ScholarCross Ref
Deepak Babu Sam, Shiv Surya, and R Venkatesh Babu. Switching convolutional neural network for crowd counting. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4031–4039. IEEE, 2017.Google Scholar
Yuhong Li, Xiaofan Zhang, and Deming Chen. Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1091–1100, 2018.Google ScholarCross Ref
Vishwanath A Sindagi and Vishal M Patel. Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pages 1–6. IEEE, 2017.Google Scholar
Mohammad Hossain, Mehrdad Hosseinzadeh, Omit Chanda, and Yang Wang. Crowd counting using scale-aware attention networks. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1280–1288. IEEE, 2019.Google Scholar
Xiaoheng Jiang, Li Zhang, Mingliang Xu, Tianzhu Zhang, Pei Lv, Bing Zhou, Xin Yang, and Yanwei Pang. Attention scaling for crowd counting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4706–4715, 2020.Google ScholarCross Ref
Matthias von Borstel, Melih Kandemir, Philip Schmidt, Madhavi K Rao, Kumar Rajamani, and Fred A Hamprecht. Gaussian process density counting from weak supervision. In European Conference on Computer Vision, pages 365–380. Springer, 2016.Google Scholar
Xiyang Liu, Jie Yang, and Wenrui Ding. Adaptive mixture regression network with local counting map for crowd counting. arXiv preprint arXiv:2005.05776, 2020.Google Scholar
Greg Olmschenk, Jin Chen, Hao Tang, and Zhigang Zhu. Dense crowd counting convolutional neural networks with minimal data using semi- supervised dual-goal generative adversarial networks. In IEEE Conference on Computer Vision and Pattern Recognition: Learning with Imperfect Data Workshop, 2019.Google Scholar
Yinjie Lei, Yan Liu, Pingping Zhang, and Lingqiao Liu. Towards using count-level weak supervision for crowd counting. Pattern Recognition, 109:107616, 2020.Google ScholarDigital Library
Xialei Liu, Joost Van De Weijer, and Andrew D Bagdanov. Leveraging unlabeled data for crowd counting by learning to rank. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7661–7669, 2018.Google ScholarCross Ref
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. arXiv preprint arXiv:1706.03762, 2017.Google Scholar
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.Google Scholar
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. In European Conference on Computer Vision, pages 213–229. Springer, 2020.Google Scholar
Bichen Wu, Chenfeng Xu, Xiaoliang Dai, Alvin Wan, Peizhao Zhang, Masayoshi Tomizuka, Kurt Keutzer, and Peter Vajda. Visual transformers: Token-based image representation and processing for computer vision. arXiv preprint arXiv:2006.03677, 2020.Google Scholar
Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.Google Scholar
Vishwanath A Sindagi and Vishal M Patel. Generating high-quality crowd density maps using contextual pyramid cnns. In Proceedings of the IEEE international conference on computer vision, pages 1861–1870, 2017.Google ScholarCross Ref
Xinkun Cao, Zhipeng Wang, Yanyun Zhao, and Fei Su. Scale aggregation network for accurate and efficient crowd counting. In Proceedings of the European Conference on Computer Vision (ECCV), pages 734–750, 2018.Google ScholarDigital Library
Xinya Chen, Yanrui Bin, Nong Sang, and Changxin Gao. Scale pyramid network for crowd counting. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1941–1950. IEEE, 2019.Google Scholar
Antti Tarvainen and Harri Valpola. Weight-averaged, consistency targets improve semi-supervised deep learning results. CoRR„ vol. abs/1703, 2017, 1780.Google Scholar
Vikas Verma, Alex Lamb, Juho Kannala, Yoshua Bengio, and David Lopez-Paz. Interpolation consistency training for semi-supervised learning. arXiv preprint arXiv:1903.03825, 2019.Google Scholar
Yan Liu, Lingqiao Liu, Peng Wang, Pingping Zhang, and Yinjie Lei. Semi-supervised crowd counting via self-training on surrogate tasks. In European Conference on Computer Vision, pages 242–259. Springer, 2020.Google Scholar
Haroon Idrees, Muhmmad Tayyab, Kishan Athrey, Dong Zhang, Somaya Al-Maadeed, Nasir Rajpoot, and Mubarak Shah. Composition loss for counting, density map estimation and localization in dense crowds. In Proceedings of the European Conference on Computer Vision (ECCV), pages 532–546, 2018.Google ScholarDigital Library

Recommendations

PVT-Crowd: Bridging Multi-scale Features from Pyramid Vision Transformer for Weakly-Supervised Crowd Counting
Pattern Recognition and Computer Vision
Abstract
Weakly-supervised crowd counting does not require location-level annotations, but only relies on count-level annotations to achieve the task of crowd counting for images, which is becoming a new research hotspot in the field of crowd counting. ...
Read More
An efficient semi-supervised manifold embedding for crowd counting
Abstract
Crowd counting is one of the most paramount tasks for safety and security. Many existing methods mainly focus on the predicted accuracy but ignore the efficiency, which hinders their applications in practice. Moreover, their performance heavily ...
Highlights
- Explore the structural relation between adjacent frames to maintain the label fitness and the manifold smoothness on a manifold embedding for crowd counting.
- Frame the task of crowd counting as a semi-supervised classification problem.
Read More
Few-Shot Crowd Counting via Self-supervised Learning
PRICAI 2021: Trends in Artificial Intelligence
Abstract
Crowd counting has been developed significantly, attributing to the booming of deep learning. However, deep learning based methods are extreme data consuming, and labeling dataset for crowd counting is a sophisticated task. Both the number and the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ACAI '21: Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence
December 2021
699 pages
ISBN:9781450385053
DOI:10.1145/3508546

Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 February 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
convolution
crowd counting
ranking
transformer
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate173of395submissions,44%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 117
  Total Downloads
- Downloads (Last 12 months)21
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Multi-level Convolutional Transformer with Adaptive Ranking for Semi-supervised Crowd Counting

ACAI '21: Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence

ABSTRACT

References

Cited By

Recommendations

PVT-Crowd: Bridging Multi-scale Features from Pyramid Vision Transformer for Weakly-Supervised Crowd Counting

An efficient semi-supervised manifold embedding for crowd counting

Few-Shot Crowd Counting via Self-supervised Learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Multi-level Convolutional Transformer with Adaptive Ranking for Semi-supervised Crowd Counting

ACAI '21: Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence

ABSTRACT

References

Cited By

Recommendations

PVT-Crowd: Bridging Multi-scale Features from Pyramid Vision Transformer for Weakly-Supervised Crowd Counting

An efficient semi-supervised manifold embedding for crowd counting

Few-Shot Crowd Counting via Self-supervised Learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media