ABSTRACT
Most existing crowd counting methods have focused on pure convolutional neural network based supervised algorithms. Although these methods have attained good results on some datasets, they still encounter several common problems. The cost of labeling annotations for supervised methods is huge and the shortage of labeled datasets limits the further development of supervised algorithms for crowd counting. Meanwhile, pure CNN-based algorithms have certain limitations in building the connections among these features. To overcome those problems, we proposed a semi-supervised crowd counting algorithm that is a mixture model of CNN and transformer. Specifically, our method consists of two parts Multi-Level Convolutional Transformer (MLCT) and Adaptive Scale Module (ASM). MLCT is the counting branch, with its front end and back end being the CNN and the transformer, respectively. ASM outputs an adaptive scale factor for the unlabeled crowd images. We generate a ranking list based on this factor, which is fed into the MLCT and computes loss by the order of the list. Different from most crowd counting methods, we use a region-level regression target for labeled images, which is a weaker regression approach than the location regression. Furthermore, We train the entire model using a novel loss function that combines L1 loss and ranking loss. Experimental results on the three challenging datasets ShanghaiTech Part A, ShanghaiTech Part B, and UCF-QNRF have all demonstrated the effectiveness of the proposed approach.
- Yinjie Lei, Yan Liu, Pingping Zhang, and Lingqiao Liu. Towards using count-level weak supervision for crowd counting. Pattern Recognition, 109:107616, 2021.Google ScholarDigital Library
- Yifan Yang, Guorong Li, Zhe Wu, Li Su, Qingming Huang, and Nicu Sebe. Weakly-supervised crowd counting learns from sorting rather than locations. In The European Conference on Computer Vision. Springer, 2020.Google ScholarDigital Library
- Lokesh Boominathan, Srinivas SS Kruthiventi, and R Venkatesh Babu. Crowdnet: A deep convolutional network for dense crowd counting. InGoogle Scholar
- Proceedings of the 24th ACM international conference on Multimedia, pages 640–644, 2016.Google Scholar
- Yingying Zhang, Desen Zhou, Siqin Chen, Shenghua Gao, and Yi Ma. Single-image crowd counting via multi-column convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 589–597, 2016.Google ScholarCross Ref
- Deepak Babu Sam, Shiv Surya, and R Venkatesh Babu. Switching convolutional neural network for crowd counting. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4031–4039. IEEE, 2017.Google Scholar
- Yuhong Li, Xiaofan Zhang, and Deming Chen. Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1091–1100, 2018.Google ScholarCross Ref
- Vishwanath A Sindagi and Vishal M Patel. Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pages 1–6. IEEE, 2017.Google Scholar
- Mohammad Hossain, Mehrdad Hosseinzadeh, Omit Chanda, and Yang Wang. Crowd counting using scale-aware attention networks. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1280–1288. IEEE, 2019.Google Scholar
- Xiaoheng Jiang, Li Zhang, Mingliang Xu, Tianzhu Zhang, Pei Lv, Bing Zhou, Xin Yang, and Yanwei Pang. Attention scaling for crowd counting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4706–4715, 2020.Google ScholarCross Ref
- Matthias von Borstel, Melih Kandemir, Philip Schmidt, Madhavi K Rao, Kumar Rajamani, and Fred A Hamprecht. Gaussian process density counting from weak supervision. In European Conference on Computer Vision, pages 365–380. Springer, 2016.Google Scholar
- Xiyang Liu, Jie Yang, and Wenrui Ding. Adaptive mixture regression network with local counting map for crowd counting. arXiv preprint arXiv:2005.05776, 2020.Google Scholar
- Greg Olmschenk, Jin Chen, Hao Tang, and Zhigang Zhu. Dense crowd counting convolutional neural networks with minimal data using semi- supervised dual-goal generative adversarial networks. In IEEE Conference on Computer Vision and Pattern Recognition: Learning with Imperfect Data Workshop, 2019.Google Scholar
- Yinjie Lei, Yan Liu, Pingping Zhang, and Lingqiao Liu. Towards using count-level weak supervision for crowd counting. Pattern Recognition, 109:107616, 2020.Google ScholarDigital Library
- Xialei Liu, Joost Van De Weijer, and Andrew D Bagdanov. Leveraging unlabeled data for crowd counting by learning to rank. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7661–7669, 2018.Google ScholarCross Ref
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. arXiv preprint arXiv:1706.03762, 2017.Google Scholar
- Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.Google Scholar
- Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. In European Conference on Computer Vision, pages 213–229. Springer, 2020.Google Scholar
- Bichen Wu, Chenfeng Xu, Xiaoliang Dai, Alvin Wan, Peizhao Zhang, Masayoshi Tomizuka, Kurt Keutzer, and Peter Vajda. Visual transformers: Token-based image representation and processing for computer vision. arXiv preprint arXiv:2006.03677, 2020.Google Scholar
- Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.Google Scholar
- Vishwanath A Sindagi and Vishal M Patel. Generating high-quality crowd density maps using contextual pyramid cnns. In Proceedings of the IEEE international conference on computer vision, pages 1861–1870, 2017.Google ScholarCross Ref
- Xinkun Cao, Zhipeng Wang, Yanyun Zhao, and Fei Su. Scale aggregation network for accurate and efficient crowd counting. In Proceedings of the European Conference on Computer Vision (ECCV), pages 734–750, 2018.Google ScholarDigital Library
- Xinya Chen, Yanrui Bin, Nong Sang, and Changxin Gao. Scale pyramid network for crowd counting. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1941–1950. IEEE, 2019.Google Scholar
- Antti Tarvainen and Harri Valpola. Weight-averaged, consistency targets improve semi-supervised deep learning results. CoRR„ vol. abs/1703, 2017, 1780.Google Scholar
- Vikas Verma, Alex Lamb, Juho Kannala, Yoshua Bengio, and David Lopez-Paz. Interpolation consistency training for semi-supervised learning. arXiv preprint arXiv:1903.03825, 2019.Google Scholar
- Yan Liu, Lingqiao Liu, Peng Wang, Pingping Zhang, and Yinjie Lei. Semi-supervised crowd counting via self-training on surrogate tasks. In European Conference on Computer Vision, pages 242–259. Springer, 2020.Google Scholar
- Haroon Idrees, Muhmmad Tayyab, Kishan Athrey, Dong Zhang, Somaya Al-Maadeed, Nasir Rajpoot, and Mubarak Shah. Composition loss for counting, density map estimation and localization in dense crowds. In Proceedings of the European Conference on Computer Vision (ECCV), pages 532–546, 2018.Google ScholarDigital Library
Recommendations
PVT-Crowd: Bridging Multi-scale Features from Pyramid Vision Transformer for Weakly-Supervised Crowd Counting
Pattern Recognition and Computer VisionAbstractWeakly-supervised crowd counting does not require location-level annotations, but only relies on count-level annotations to achieve the task of crowd counting for images, which is becoming a new research hotspot in the field of crowd counting. ...
An efficient semi-supervised manifold embedding for crowd counting
AbstractCrowd counting is one of the most paramount tasks for safety and security. Many existing methods mainly focus on the predicted accuracy but ignore the efficiency, which hinders their applications in practice. Moreover, their performance heavily ...
Highlights- Explore the structural relation between adjacent frames to maintain the label fitness and the manifold smoothness on a manifold embedding for crowd counting.
- Frame the task of crowd counting as a semi-supervised classification problem.
Few-Shot Crowd Counting via Self-supervised Learning
PRICAI 2021: Trends in Artificial IntelligenceAbstractCrowd counting has been developed significantly, attributing to the booming of deep learning. However, deep learning based methods are extreme data consuming, and labeling dataset for crowd counting is a sophisticated task. Both the number and the ...
Comments