skip to main content
10.1145/3503161.3547863acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

CrossNet: Boosting Crowd Counting with Localization

Published: 10 October 2022 Publication History

Abstract

Generating high-quality density maps is a crucial step in crowd counting. It is obvious that exploiting the head location of the people can naturally highlight the crowded area and eliminate the interference of background noise. However, existing crowd counting methods are still tricky to reasonably use location in density generation. In this paper, a novel location-guided framework named CrossNet is proposed for crowd counting, which integrates location supervision into density maps through dual-branch joint training. First, a new branching network is proposed to localize the potential positions of pedestrians. With the help of supervision induced from the localization branch, Location Enhancement (LE) module is designed to obtain high-quality density maps by positioning foreground regions. Second, Adaptive Density Awareness Attention (ADAA) module is engaged to enhance localization accuracy, which can efficiently use the density of the counting branch to adaptively capture the error-prone dense areas of the location maps. Finally, Density Awareness Localization (DAL) loss is offered to allocate attention to the crowd density levels, which delivers more focus on regions with high densities and less concentration on areas with low densities. Extensive experiments conducted on four benchmark datasets demonstrate that the proposed method outperforms the state-of-the-art approaches both in crowd counting and crowd localization.

Supplementary Material

MP4 File (MM22-fp0477.mp4)
In this video, we explore using location information to boost crowd counting performance. Existing crowd counting methods are still tricky to reasonably use location in density generation. A deep model with a parallel dual-branch structure, called CrossNet, is proposed for crowd counting and crowd localization simultaneously, which can be mutually reinforced with the combination of density maps and location information. The proposed method utilizes the locations of the heads from the localization branch to guide the density map focusing on crowd regions. At the same time, the density of the counting branch is employed to help localization loss to capture error-prone areas adaptively. In such way, location information and density information are integrated between these two branches, forming a crossing structure. Extensive experiments conducted on four benchmark datasets demonstrate that the proposed method outperforms the state-of-the-art approaches both in crowd counting and crowd localization.

References

[1]
Shahira Abousamra, Minh Hoai, Dimitris Samaras, and Chao Chen. 2021. Localization in the crowd with topological constraints. In AAAI. 872--881.
[2]
Shuai Bai, Zhiqun He, Yu Qiao, Hanzhe Hu, Wei Wu, and Junjie Yan. 2020. Adaptive dilated network with self-correction supervision for counting. In CVPR. 4594--4603.
[3]
Binghui Chen, Zhaoyi Yan, Ke Li, Pengyu Li, Biao Wang, Wangmeng Zuo, and Lei Zhang. 2021. Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting. In ICCV. 16065--16075.
[4]
Jian Cheng, Haipeng Xiong, Zhiguo Cao, and Hao Lu. 2021. Decoupled Two-Stage Crowd Counting and Beyond. TIP 30 (2021), 2862--2875.
[5]
Zhi-Qi Cheng, Qi Dai, Hong Li, Jingkuan Song, Xiao Wu, and Alexander G. Hauptmann. 2022. Rethinking Spatial Invariance of Convolutional Networks for Object Counting. In CVPR. 19638--19648.
[6]
Zhi-Qi Cheng, Jun-Xiu Li, Qi Dai, Xiao Wu, and Alexander G. Hauptmann. 2019. Improving the Learning of Multi-column Convolutional Neural Network for Crowd Counting. In ACMMM. 1897--1906.
[7]
Zhi-Qi Cheng, Jun-Xiu Li, Qi Dai, Xiao Wu, and Alexander G. Hauptmann. 2019. Learning Spatial Awareness to Improve Crowd Counting. In ICCV. 6152--6161.
[8]
Peiyun Hu and Deva Ramanan. 2017. Finding tiny faces. In CVPR. 951--959.
[9]
Siyu Huang, Xi Li, Zhi-Qi Cheng, Zhongfei Zhang, and Alexander Hauptmann. 2020. Stacked pooling for boosting scale invariance of crowd counting. In ICASSP. 2578--2582.
[10]
Siyu Huang, Xi Li, Zhongfei Zhang, Fei Wu, Shenghua Gao, Rongrong Ji, and Junwei Han. 2018. Body Structure Aware Deep Crowd Counting. TIP 27, 3 (2018), 1049--1059.
[11]
Haroon Idrees, Muhmmad Tayyab, Kishan Athrey, Dong Zhang, Somaya Al-Máadeed, Nasir M. Rajpoot, and Mubarak Shah. 2018. Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds. In ECCV, Vol. 11206. 544--559.
[12]
Xiaoheng Jiang, Li Zhang, Mingliang Xu, Tianzhu Zhang, Pei Lv, Bing Zhou, Xin Yang, and Yanwei Pang. 2020. Attention Scaling for Crowd Counting. In CVPR. 4705--4714.
[13]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. arXiv:1412.6980 (2014).
[14]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In NeurIPS, Vol. 25.
[15]
Hei Law and Jia Deng. 2018. Cornernet: Detecting objects as paired keypoints. In ECCV. 734--750.
[16]
Wei Li, Zhenting Wang, Xiao Wu, Ji Zhang, Qiang Peng, and Hongliang Li. 2020. CODAN: Counting-driven Attention Network for Vehicle Detection in Congested Scenes. In ACMMM. 73--82.
[17]
Yuhong Li, Xiaofan Zhang, and Deming Chen. 2018. CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes. In CVPR. 1091--1100.
[18]
Dongze Lian, Xianing Chen, Jing Li, Weixin Luo, and Shenghua Gao. 2021. Locating and Counting Heads in Crowds With a Depth Prior. TPAMI (2021), 1--10.
[19]
Hui Lin, Xiaopeng Hong, Zhiheng Ma, Xing Wei, Yunfeng Qiu, Yaowei Wang, and Yihong Gong. 2021. Direct Measure Matching for Crowd Counting. In IJCAI.
[20]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In ICCV. 2980--2988.
[21]
Chenchen Liu, Xinyu Weng, and Yadong Mu. 2019. Recurrent attentive zooming for joint crowd counting and precise localization. In CVPR. 1217--1226.
[22]
Hao Liu, Qiang Zhao, Yike Ma, and Feng Dai. 2021. Bipartite Matching for Crowd Counting with Point Supervision. In IJCAI. 860--866.
[23]
Weizhe Liu, Mathieu Salzmann, and Pascal Fua. 2020. Counting People by Estimating People Flows. arXiv:2012.00452 (2020).
[24]
Xinyan Liu, Guorong Li, Zhenjun Han, Weigang Zhang, Yifan Yang, Qingming Huang, and Nicu Sebe. 2021. Exploiting sample correlation for crowd counting with multi-expert network. In ICCV. 3215--3224.
[25]
Yuting Liu, Miaojing Shi, Qijun Zhao, and Xiaofang Wang. 2019. Point in, box out: Beyond counting persons in crowds. In CVPR. 6469--6478.
[26]
Zhiheng Ma, Xing Wei, Xiaopeng Hong, and Yihong Gong. 2019. Bayesian loss for crowd count estimation with point supervision. In ICCV. 6142--6151.
[27]
Zhiheng Ma, Xing Wei, Xiaopeng Hong, Hui Lin, Yunfeng Qiu, and Yihong Gong. 2021. Learning to count via unbalanced optimal transport. In AAAI, Vol. 35. 2319--2327.
[28]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. NeurIPS 28 (2015), 91--99.
[29]
Weihong Ren, Xinchao Wang, Jiandong Tian, Yandong Tang, and Antoni B Chan. 2020. Tracking-by-counting: Using network flows on crowd density maps for tracking multiple targets. TIP 30 (2020), 1439--1452.
[30]
Liangzi Rong and Chunping Li. 2021. Coarse-and fine-grained attention network with background-aware loss for crowd density map estimation. In WACV. 3675--3684.
[31]
Deepak Babu Sam, Skand Vishwanath Peri, Mukuntha Narayanan Sundararaman, Amogh Kamath, and Venkatesh Babu Radhakrishnan. 2020. Locate, size and count: Accurately resolving people in dense crowds via detection. TPAMI 43, 8 (2020), 2739--2751.
[32]
Weibo Shu, Jia Wan, Kay Chen Tan, Sam Kwong, and Antoni B. Chan. 2022. Crowd Counting in the Frequency Domain. In CVPR. 19618--19627.
[33]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014).
[34]
Vishwanath A. Sindagi and Vishal M. Patel. 2019. Multi-Level Bottom-Top and Top-Bottom Feature Fusion for Crowd Counting. In ICCV. 1002--1012.
[35]
Vishwanath A Sindagi, Rajeev Yasarla, and Vishal M Patel. 2020. JHU-CROWD: Large-Scale Crowd Counting Dataset and A Benchmark Method. TPAMI (2020), 1--17.
[36]
Qingyu Song, Changan Wang, Zhengkai Jiang, Yabiao Wang, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, and Yang Wu. 2021. Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework. In ICCV. 3365--3374.
[37]
Qingyu Song, Changan Wang, Yabiao Wang, Ying Tai, Chengjie Wang, Jilin Li, Jian Wu, and Jiayi Ma. 2021. To choose or to fuse? Scale selection for crowd counting. In AAAI. 2576--2583.
[38]
Russell Stewart, Mykhaylo Andriluka, and Andrew Y Ng. 2016. End-to-end people detection in crowded scenes. In CVPR. 2325--2333.
[39]
Jia Wan, Ziquan Liu, and Antoni B Chan. 2021. A Generalized Loss Function for Crowd Counting and Localization. In CVPR. 1974--1983.
[40]
Boyu Wang, Huidong Liu, Dimitris Samaras, and Minh Hoai Nguyen. 2020. Distribution Matching for Crowd Counting. In NeurIPS, Vol. 33.
[41]
Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, et al . 2020. Deep high-resolution representation learning for visual recognition. TPAMI 43, 10 (2020), 3349--3364.
[42]
Mingjie Wang, Hao Cai, Xianfeng Han, Jun Zhou, and Minglun Gong. 2022. STNet: Scale Tree Network with Multi-level Auxiliator for Crowd Counting. TMM (2022), 1--9.
[43]
Qi Wang, Junyu Gao, Wei Lin, and Xuelong Li. 2020. NWPU-crowd: A large-scale benchmark for crowd counting and localization. TPAMI 43, 6 (2020), 2141--2149.
[44]
Yi Wang, Junhui Hou, Xinyu Hou, and Lap-Pui Chau. 2021. A self-training approach for point-supervised object detection and counting in crowds. TIP 30 (2021), 2876--2887.
[45]
Yi Wang, Xinyu Hou, and Lap-Pui Chau. 2021. Dense Point Prediction: A Simple Baseline for Crowd Counting and Localization. In ICME. 1--6.
[46]
Longyin Wen, Dawei Du, Pengfei Zhu, Qinghua Hu, Qilong Wang, Liefeng Bo, and Siwei Lyu. 2021. Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark. In CVPR. 7812--7821.
[47]
Zhongyuan Wu, Jun Sang, Ying Shi, Qi Liu, Nong Sang, and Xinyue Liu. 2021. CRANet: Cascade Residual Attention Network for Crowd Counting. In ICME. 1--6.
[48]
Yifan Yang, Guorong Li, Dawei Du, Qingming Huang, and Nicu Sebe. 2020. Embedding perspective analysis into multi-column convolutional neural network for crowd counting. TIP 30 (2020), 1395--1407.
[49]
Anran Zhang, Lei Yue, Jiayi Shen, Fan Zhu, Xiantong Zhen, Xianbin Cao, and Ling Shao. 2019. Attentional neural fields for crowd counting. In ICCV. 5714--5723.
[50]
Cong Zhang, Hongsheng Li, Xiaogang Wang, and Xiaokang Yang. 2015. Cross-scene crowd counting via deep convolutional neural networks. In CVPR. 833--841.
[51]
Ji Zhang, Jian-Jun Qiao, Xiao Wu, and Wei Li. 2021. Vehicle Counting Network with Attention-based Mask Refinement and Spatial-awareness Block Loss. In ACMMM. 2889--2898.
[52]
Qi Zhang, Wei Lin, and Antoni B Chan. 2021. Cross-View Cross-Scene Multi-View Crowd Counting. In CVPR. 557--567.
[53]
Yingying Zhang, Desen Zhou, Siqin Chen, Shenghua Gao, and Yi Ma. 2016. Single-Image Crowd Counting via Multi-Column Convolutional Neural Network. In CVPR. 589--597.
[54]
Xingyi Zhou, Dequan Wang, and Philipp Krähenbühl. 2019. Objects as points. arXiv:1904.07850 (2019).

Cited By

View all
  • (2025)Global vision, local focus: the semantic enhancement transformer network for crowd countingSoft Computing10.1007/s00500-025-10506-129:2(1035-1052)Online publication date: 7-Feb-2025
  • (2024)Training-free Object Counting with Prompts2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00039(322-330)Online publication date: 3-Jan-2024
  • (2024)Learning Discriminative Features for Crowd CountingIEEE Transactions on Image Processing10.1109/TIP.2024.340860933(3749-3764)Online publication date: 2024
  • Show More Cited By

Index Terms

  1. CrossNet: Boosting Crowd Counting with Localization

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '22: Proceedings of the 30th ACM International Conference on Multimedia
    October 2022
    7537 pages
    ISBN:9781450392037
    DOI:10.1145/3503161
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 October 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. adaptive density awareness attention
    2. crowd counting
    3. crowd localization
    4. deep learning
    5. density awareness localization loss
    6. location enhancement

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)58
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Global vision, local focus: the semantic enhancement transformer network for crowd countingSoft Computing10.1007/s00500-025-10506-129:2(1035-1052)Online publication date: 7-Feb-2025
    • (2024)Training-free Object Counting with Prompts2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00039(322-330)Online publication date: 3-Jan-2024
    • (2024)Learning Discriminative Features for Crowd CountingIEEE Transactions on Image Processing10.1109/TIP.2024.340860933(3749-3764)Online publication date: 2024
    • (2024)Analysis of Fine-Grained Counting Methods for Masked Face Counting: A Comparative StudyIEEE Access10.1109/ACCESS.2024.336759312(27426-27443)Online publication date: 2024
    • (2024)CrowdUNet: Segmentation assisted U-shaped crowd counting networkNeurocomputing10.1016/j.neucom.2024.128215601(128215)Online publication date: Oct-2024
    • (2024)Focus for Free in Density-Based CountingInternational Journal of Computer Vision10.1007/s11263-024-01990-3132:7(2600-2617)Online publication date: 9-Feb-2024
    • (2024)Double multi-scale feature fusion network for crowd countingMultimedia Tools and Applications10.1007/s11042-024-18769-w83:34(81831-81855)Online publication date: 7-Mar-2024
    • (2024)Robust Zero-Shot Crowd Counting and Localization With Adaptive Resolution SAMComputer Vision – ECCV 202410.1007/978-3-031-72998-0_27(478-495)Online publication date: 30-Sep-2024
    • (2023)Adaptive Teaching for Cross-Domain Crowd CountingIEEE Transactions on Multimedia10.1109/TMM.2023.330581526(2943-2952)Online publication date: 16-Aug-2023
    • (2023)Tolerating Annotation Displacement in Dense Object Counting via Point Annotation Probability MapIEEE Transactions on Image Processing10.1109/TIP.2023.333190832(6359-6372)Online publication date: 2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media