skip to main content
10.1145/3503161.3548298acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Fine-Grained Fragment Diffusion for Cross Domain Crowd Counting

Published: 10 October 2022 Publication History

Abstract

Deep learning improves the performance of crowd counting, but model migration remains a tricky challenge. Due to the reliance on training data and inherent domain shift, model application to unseen scenarios is tough. To facilitate the problem, this paper proposes a cross-domain Fine-Grained Fragment Diffusion model (FGFD) that explores feature-level fine-grained similarities of crowd distributions between different fragments to bridge the cross-domain gap (content-level coarse-grained dissimilarities). Specifically, we obtain features of fragments in both source and target domains, and then perform the alignment of the crowd distribution across different domains. With the assistance of the diffusion of crowd distribution, it is able to label unseen domain fragments and make source domain close to target domain, which is fed back to the model to reduce the domain discrepancy. By monitoring the distribution alignment, the distribution perception model is updated, then the performance of distribution alignment is improved. During the model inference, the gap between different domains is gradually alleviated. Multiple sets of migration experiments show that the proposed method achieves competitive results with other state-of-the-art domain-transfer methods.

Supplementary Material

MP4 File (MM22-mmfp2367.mp4)
Presentation video

References

[1]
Artem Babenko and Victor Lempitsky. 2015. Aggregating local deep features for image retrieval. In Proc. IEEE/CVF Int. Conf. Comput. Vis. 1269--1277.
[2]
Song Bai, Xiang Bai, Qi Tian, and Longin Jan Latecki. 2019a. Regularized Diffusion Process on Bidirectional Context for Object Retrieval. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 41, 5 (2019), 1213--1226.
[3]
Shuai Bai, Zhiqun He, Yu Qiao, Hanzhe Hu, Wei Wu, and Junjie Yan. 2020. Adaptive dilated network with self-correction supervision for counting. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. 4594--4603.
[4]
Song Bai, Zhichao Zhou, Jingdong Wang, Xiang Bai, Longin Jan Latecki, and Qi Tian. 2019b. Automatic Ensemble Diffusion for 3D Shape and Image Retrieval. IEEE Trans. Image Process., Vol. 28, 1 (2019), 88--101.
[5]
Xinkun Cao, Zhipeng Wang, Yanyun Zhao, and Fei Su. 2018. Scale aggregation network for accurate and efficient crowd counting. In Proc. Springer Eur. Conf. Comput. Vis. 734--750.
[6]
Jian Cheng, Haipeng Xiong, Zhiguo Cao, and Hao Lu. 2021. Decoupled Two-Stage Crowd Counting and Beyond. IEEE Trans. Image Process., Vol. 30 (2021), 2862--2875.
[7]
Ondrej Chum, James Philbin, Josef Sivic, Michael Isard, and Andrew Zisserman. 2007. Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval. In Proc. IEEE/CVF Int. Conf. Comput. Vis. 1--8.
[8]
Evelyn Fix and Joseph Lawson Hodges. 1989. Discriminatory analysis. Nonparametric discrimination: Consistency properties. Int. Stat. Review, Vol. 28, 2 (1989), 238--247.
[9]
Yaroslav Ganin and Victor Lempitsky. 2015. Unsupervised domain adaptation by backpropagation. In Int. Conf. Mach. Learn. 1180--1189.
[10]
Junyu Gao, Tao Han, Yuan Yuan, and Qi Wang. 2021. Domain-Adaptive Crowd Counting via High-Quality Image Translation and Density Reconstruction. IEEE Trans. Neural Networks Learn. Syst. (2021).
[11]
Tao Han, Junyu Gao, Yuan Yuan, and Qi Wang. 2020. Focus on Semantic Consistency for Cross-Domain Crowd Understanding. In Proc. IEEE Int. Conf. Acoustics Speech Signal Process. 1848--1852.
[12]
Yuhang He, Zhiheng Ma, Xing Wei, Xiaopeng Hong, Wei Ke, and Yihong Gong. 2021. Error-aware density isomorphism reconstruction for unsupervised cross-domain crowd counting. In Proc. AAAI Conf. Artif. Intell.
[13]
Ziling Huang, Zheng Wang, Wei Hu, Chia-Wen Lin, and Shin'ichi Satoh. 2019. DoT-GNN: Domain-transferred graph neural network for group re-identification. In Proc. ACM Int. Conf. Multimedia. 1888--1896.
[14]
Haroon Idrees, Muhmmad Tayyab, Kishan Athrey, Dong Zhang, Somaya Al-Má adeed, Nasir M. Rajpoot, and Mubarak Shah. 2018. Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds. In Proc. Springer Eur. Conf. Comput. Vis. 544--559.
[15]
Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, Teddy Furon, and Ondrej Chum. 2017. Efficient Diffusion on Region Manifolds: Recovering Small Objects with Compact CNN Representations. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. 926--935.
[16]
Xuemei Jia, Xian Zhong, Mang Ye, Wenxuan Liu, and Wenxin Huang. 2022. Complementary Data Augmentation for Cloth-Changing Person Re-Identification. IEEE Trans. Image Process., Vol. 31 (2022), 4227--4239.
[17]
Kui Jiang, Zhongyuan Wang, Zheng Wang, Chen Chen, Peng Yi, Tao Lu, and Chia-Wen Lin. 2021a. Degrade is upgrade: Learning degradation for low-light image enhancement. arXiv:2103.10621 (2021).
[18]
Kui Jiang, Zhongyuan Wang, Peng Yi, Chen Chen, Zheng Wang, Xiao Wang, Junjun Jiang, and Chia-Wen Lin. 2021b. Rain-free and residue hand-in-hand: A progressive coupled network for real-time image deraining. IEEE Trans. Image Process., Vol. 30 (2021), 7404--7418.
[19]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In Int. Conf. Learn. Represent.
[20]
Jingjing Li, Erpeng Chen, Zhengming Ding, Lei Zhu, Ke Lu, and Heng Tao Shen. 2020. Maximum density divergence for domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 43, 11 (2020), 3918--3930.
[21]
Jingjing Li, Mengmeng Jing, Ke Lu, Lei Zhu, and Heng Tao Shen. 2019. Locality preserving joint transfer for domain adaptation. IEEE Trans. Image Process., Vol. 28, 12 (2019), 6103--6115.
[22]
Yuhong Li, Xiaofan Zhang, and Deming Chen. 2018. CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. 1091--1100.
[23]
Dingkang Liang, Wei Xu, Yingying Zhu, and Yu Zhou. 2021. Focal Inverse Distance Transform Maps for Crowd Localization and Counting in Dense Crowd. arXiv:2102.07925 (2021).
[24]
Liang Liao, Jing Xiao, Zheng Wang, Chia-Wen Lin, and Shin'ichi Satoh. 2021. Image Inpainting Guided by Coherence Priors of Semantics and Textures. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. 6539--6548.
[25]
Lingbo Liu, Zhilin Qiu, Guanbin Li, Shufan Liu, Wanli Ouyang, and Liang Lin. 2019c. Crowd Counting With Deep Structured Scale Integration Network. In Proc. IEEE/CVF Int. Conf. Comput. Vis. 1774--1783.
[26]
Ning Liu, Yongchao Long, Changqing Zou, Qun Niu, Li Pan, and Hefeng Wu. 2019b. ADCrowdNet: An Attention-Injective Deformable Convolutional Network for Crowd Understanding. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. 3225--3234.
[27]
Weizhe Liu, Nikita Durasov, and Pascal Fua. 2021. Leveraging Self-Supervision for Cross-Domain Crowd Counting. arXiv:2103.16291 (2021).
[28]
Wei Liu, Shengcai Liao, Weiqiang Ren, Weidong Hu, and Yinan Yu. 2019a. High-Level Semantic Feature Detection: A New Perspective for Pedestrian Detection. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. 5187--5196.
[29]
Weizhe Liu, Mathieu Salzmann, and Pascal Fua. 2019d. Context-Aware Crowd Counting. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. 5099--5108.
[30]
Yuting Liu, Zheng Wang, Miaojing Shi, Shin'ichi Satoh, Qijun Zhao, and Hongyu Yang. 2020. Towards unsupervised crowd counting via regression-detection bi-knowledge transfer. In Proc. ACM Int. Conf. Multimedia. 129--137.
[31]
Xianzheng Ma, Zhixiang Wang, Yacheng Zhan, Yinqiang Zheng, Zheng Wang, Dengxin Dai, and Chia-Wen Lin. 2022. Both style and fog matter: Cumulative domain adaptation for semantic foggy scene understanding. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. 18922--18931.
[32]
Zhiheng Ma, Xing Wei, Xiaopeng Hong, and Yihong Gong. 2019. Bayesian Loss for Crowd Count Estimation With Point Supervision. In Proc. IEEE/CVF Int. Conf. Comput. Vis. 6141--6150.
[33]
Daniel Onoro-Rubio and Roberto J López-Sastre. 2016. Towards perspective-free object counting with deep learning. In Proc. Springer Eur. Conf. Comput. Vis. 615--629.
[34]
James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. 2007. Object retrieval with large vocabularies and fast spatial matching. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.
[35]
James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. 2008. Lost in quantization: Improving particular object retrieval in large scale image databases. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.
[36]
Hao-Chiang Shao, Kang-Yu Liu, Weng-Tai Su, Chia-Wen Lin, and Jiwen Lu. 2021. DotFAN: A Domain-Transferred Face Augmentation Net. IEEE Trans. Image Process., Vol. 30 (2021), 8759--8772.
[37]
Zenglin Shi, Le Zhang, Yun Liu, Xiaofeng Cao, Yangdong Ye, Ming-Ming Cheng, and Guoyan Zheng. 2018. Crowd Counting With Deep Negative Correlation Learning. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. 5382--5390.
[38]
Vishwanath A. Sindagi and Vishal M. Patel. 2017. Generating High-Quality Crowd Density Maps Using Contextual Pyramid CNNs. In Proc. IEEE/CVF Int. Conf. Comput. Vis. 1879--1888.
[39]
Vishwanath A. Sindagi, Rajeev Yasarla, and Vishal M. Patel. 2020. Jhu-crowd: Large-scale crowd counting dataset and a benchmark method. IEEE Trans. Pattern Anal. Mach. Intell. (2020).
[40]
Zhijing Wan, Xin Xu, Zheng Wang, Toshihiko Yamasaki, Xiaolong Zhang, and Ruimin Hu. 2022. Efficient virtual data search for annotation-free vehicle reidentification. Int. J. Intell. Syst., Vol. 37, 5 (2022), 2988--3005.
[41]
Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, Wenyu Liu, and Bin Xiao. 2021. Deep High-Resolution Representation Learning for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 43, 10 (2021), 3349--3364.
[42]
Li Wang, Yongbo Li, and Xiangyang Xue. 2019b. CODA: Counting Objects via Scale-Aware Adversarial Density Adaption. In Proc. IEEE Int. Conf. Multimedia Expo. 193--198.
[43]
Qi Wang, Junyu Gao, Wei Lin, and Xuelong Li. 2020a. NWPU-crowd: A large-scale benchmark for crowd counting and localization. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 43, 6 (2020), 2141--2149.
[44]
Qi Wang, Junyu Gao, Wei Lin, and Yuan Yuan. 2019a. Learning From Synthetic Data for Crowd Counting in the Wild. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. 8198--8207.
[45]
Xiao Wang, Chao Liang, Chen Chen, Jun Chen, Zheng Wang, Zhen Han, and Chunxia Xiao. 2019c. S(^3 )D: Scalable Pedestrian Detection via Score Scale Surface Discrimination. IEEE Trans. Circuits Syst. Video Technol., Vol. 30, 10 (2019), 3332--3344.
[46]
Xiao Wang, Wu Liu, Jun Chen, Xiaobo Wang, Chenggang Yan, and Tao Mei. 2020b. Listen, look, and find the one: Robust person search with multimodality index. ACM Trans. Multimedia Comput. Commun. Appl., Vol. 16, 2 (2020), 1--20.
[47]
Qiangqiang Wu, Jia Wan, and Antoni B. Chan. 2021. Dynamic Momentum Adaptation for Zero-Shot Cross-Domain Crowd Counting. In Proc. ACM Int. Conf. Multimedia. 658--666.
[48]
Jing Xiao, Ruimin Hu, Liang Liao, Yu Chen, Zhongyuan Wang, and Zixiang Xiong. 2016. Knowledge-Based Coding of Objects for Multisource Surveillance Video Data. IEEE Trans. Multimedia, Vol. 18, 9 (2016), 1691--1706.
[49]
Haipeng Xiong, Hao Lu, Chengxin Liu, Liang Liu, Zhiguo Cao, and Chunhua Shen. 2019. From open set to closed set: Counting objects by spatial divide-and-conquer. In Proc. IEEE/CVF Int. Conf. Comput. Vis. 8362--8371.
[50]
Chenfeng Xu, Kai Qiu, Jianlong Fu, Song Bai, Yongchao Xu, and Xiang Bai. 2019. Learn to Scale: Generating Multipolar Normalized Density Maps for Crowd Counting. In Proc. IEEE/CVF Int. Conf. Comput. Vis. 8381--8389.
[51]
Xin Xu, Xin Yuan, Zheng Wang, Kai Zhang, and Ruimin Hu. 2022. Rank-in-Rank Loss for Person Re-identification. ACM Trans. Multimedia Comput. Commun. Appl. (2022).
[52]
Fan Yang, Ryota Hinami, Yusuke Matsui, Steven Ly, and Shin'ichi Satoh. 2019. Efficient Image Retrieval via Decoupling Diffusion into Online and Offline Processing. In Proc. AAAI Conf. Artif. Intell. 9087--9094.
[53]
Cong Zhang, Hongsheng Li, Xiaogang Wang, and Xiaokang Yang. 2015. Cross-scene crowd counting via deep convolutional neural networks. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. 833--841.
[54]
Yingying Zhang, Desen Zhou, Siqin Chen, Shenghua Gao, and Yi Ma. 2016. Single-Image Crowd Counting via Multi-Column Convolutional Neural Network. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. 589--597.
[55]
Xian Zhong, Shidong Tu, Xianzheng Ma, Kui Jiang, Wenxin Huang, and Zheng Wang. 2022. Rainy WCity: A Real Rainfall Dataset with Diverse Conditions for Semantic Driving Scene Understanding. In Proc. IJCAI Int. Joint Conf. Artif. Intell.
[56]
Xian Zhong, Shilei Zhao, Xiao Wang, Kui Jiang, Wenxuan Liu, Wenxin Huang, and Zheng Wang. 2021. Unsupervised Vehicle Search in the Wild: A New Benchmark. In Proc. ACM Int. Conf. Multimedia. 5316--5325.
[57]
Dengyong Zhou, Jason Weston, Arthur Gretton, Olivier Bousquet, and Bernhard Schölkopf. 2003. Ranking on data manifolds. In Adv. Neural Inf. Process. Syst. 169--176.
[58]
Junyan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proc. IEEE/CVF Int. Conf. Comput. Vis. 2223--2232.
[59]
Wei Zhuang, Yixian Shen, Chunming Gao, Lu Li, Haoran Sang, and Fei Qian. 2022. Adaptive Scheme for Crowd Counting Using off-the-Shelf Wireless Routers. Comput. Syst. Sci. Eng., Vol. 41, 1 (2022), 255--269.
[60]
Zhikang Zou, Xiaoye Qu, Pan Zhou, Shuangjie Xu, Xiaoqing Ye, Wenhao Wu, and Jin Ye. 2021. Coarse to Fine: Domain Adaptive Crowd Counting via Adversarial Scoring Network. In Proc. ACM Int. Conf. Multimedia. 2185--2194.

Cited By

View all
  • (2025)A comprehensive survey of crowd density estimation and countingIET Image Processing10.1049/ipr2.1332819:1Online publication date: 27-Jan-2025
  • (2024)Domain-Agnostic Crowd Counting via Uncertainty-Guided Style Diversity AugmentationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681310(1642-1651)Online publication date: 28-Oct-2024
  • (2024)ReCorD: Reasoning and Correcting Diffusion for HOI GenerationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680936(9465-9474)Online publication date: 28-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '22: Proceedings of the 30th ACM International Conference on Multimedia
October 2022
7537 pages
ISBN:9781450392037
DOI:10.1145/3503161
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cross-domain
  2. crowd counting
  3. distribution alignment
  4. fine-grained similarity

Qualifiers

  • Research-article

Funding Sources

Conference

MM '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)108
  • Downloads (Last 6 weeks)9
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)A comprehensive survey of crowd density estimation and countingIET Image Processing10.1049/ipr2.1332819:1Online publication date: 27-Jan-2025
  • (2024)Domain-Agnostic Crowd Counting via Uncertainty-Guided Style Diversity AugmentationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681310(1642-1651)Online publication date: 28-Oct-2024
  • (2024)ReCorD: Reasoning and Correcting Diffusion for HOI GenerationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680936(9465-9474)Online publication date: 28-Oct-2024
  • (2024)Crowd Counting Using Meta-Test-Time AdaptationInternational Journal of Neural Systems10.1142/S012906572450061834:11Online publication date: 9-Sep-2024
  • (2024)Density-Based Flow Mask Integration via Deformable Convolution for Video People Flux Estimation2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00644(6559-6568)Online publication date: 3-Jan-2024
  • (2024)Find Gold in Sand: Fine-Grained Similarity Mining for Domain-Adaptive Crowd CountingIEEE Transactions on Multimedia10.1109/TMM.2023.331643726(3842-3855)Online publication date: 2024
  • (2024)One-Shot Any-Scene Crowd Counting With Local-to-Global GuidanceIEEE Transactions on Image Processing10.1109/TIP.2024.342071333(6622-6632)Online publication date: 1-Jan-2024
  • (2024)Spatial Diffusion for Cell Layout GenerationMedical Image Computing and Computer Assisted Intervention – MICCAI 202410.1007/978-3-031-72083-3_45(481-491)Online publication date: 14-Oct-2024
  • (2023)Striking a Balance: Unsupervised Cross-Domain Crowd Counting via Knowledge DiffusionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611797(6520-6529)Online publication date: 26-Oct-2023
  • (2023)DAOT: Domain-Agnostically Aligned Optimal Transport for Domain-Adaptive Crowd CountingProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611793(4319-4329)Online publication date: 26-Oct-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media