skip to main content
research-article

Dilated Convolution-based Feature Refinement Network for Crowd Localization

Authors Info & Claims
Published:12 July 2023Publication History
Skip Abstract Section

Abstract

As an emerging computer vision task, crowd localization has received increasing attention due to its ability to produce more accurate spatially predictions. However, continuous scale variations in complex crowd scenes lead to tiny individuals at the edges, so that existing methods cannot achieve precise crowd localization. Aiming at alleviating the above problems, we propose a novel Dilated Convolution-based Feature Refinement Network (DFRNet) to enhance the representation learning capability. Specifically, the DFRNet is built with three branches that can capture the information of each individual in crowd scenes more precisely. More specifically, we introduce a Feature Perception Module to model long-range contextual information at different scales by adopting multiple dilated convolutions, thus providing sufficient feature information to perceive tiny individuals at the edge of images. Afterwards, a Feature Refinement Module is deployed at multiple stages of the three branches to facilitate the mutual refinement of feature information at different scales, thus further improving the expression capability of multi-scale contextual information. By incorporating the above modules, DFRNet can locate individuals in complex scenes more precisely. Extensive experiments on multiple datasets demonstrate that the proposed method has more advanced performance compared to existing methods and can be more accurately adapted to complex crowd scenes.

REFERENCES

  1. [1] Abousamra Shahira, Hoai Minh, Samaras Dimitris, and Chen Chao. 2021. Localization in the crowd with topological constraints. In Proceedings of the AAAI Annual Conference on Artificial Intelligence (AAAI’21).Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Andriluka Mykhaylo, Roth Stefan, and Schiele Bernt. 2009. Pictorial structures revisited: People detection and articulated pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 10141021.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Astrid Marcella, Zaheer Muhammad Zaigham, and Lee Seung-Ik. 2021. Synthetic temporal anomaly guided end-to-end video anomaly detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 207214.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Badrinarayanan Vijay, Kendall Alex, and Cipolla Roberto. 2017. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 12 (2017), 24812495.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Chen Geng and Guo Peirong. 2021. Enhanced information fusion network for crowd counting. arXiv:2101.04279. Retrieved from https://arxiv.org/abs/2101.04279.Google ScholarGoogle Scholar
  6. [6] Chen Guangyi, Li Junlong, Lu Jiwen, and Zhou Jie. 2021. Human trajectory prediction via counterfactual analysis. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 98249833.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Chen Xinya, Bin Yanrui, Sang Nong, and Gao Changxin. 2019. Scale pyramid network for crowd counting. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’19). IEEE, 19411950.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Chen Xiuqi, Yu Xiao, Di Huijun, and Wang Shunzhou. 2021. SA-InterNet: Scale-aware interaction network for joint crowd counting and localization. In Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision (PRCV’21). Springer, 203215.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Cheng Jian, Xiong Haipeng, Cao Zhiguo, and Lu Hao. 2021. Decoupled two-stage crowd counting and beyond. IEEE Trans. Image Process. 30 (2021), 28622875.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Das Kalyan, Jiang Jiming, and Rao J. N. K.. 2004. Mean squared error of empirical predictor. Ann. Stat. 32, 2 (2004), 818840.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Arruda Mauro dos Santos de, Osco Lucas Prado, Acosta Plabiany Rodrigo, Gonçalves Diogo Nunes, Junior José Marcato, Ramos Ana Paula Marques, Matsubara Edson Takashi, Luo Zhipeng, Li Jonathan, Silva Jonathan de Andrade, et al. 2021. Counting and locating high-density objects using convolutional neural network. arXiv:2102.04366. Retrieved from https://arxiv.org/abs/2102.04366.Google ScholarGoogle Scholar
  12. [12] Deng Jiankang, Guo Jia, Zhou Yuxiang, Yu Jinke, Kotsia Irene, and Zafeiriou Stefanos. 2019. Retinaface: Single-stage dense face localisation in the wild. arXiv:1905.00641. Retrieved from https://arxiv.org/abs/1905.00641.Google ScholarGoogle Scholar
  13. [13] Gao Junyu, Gong Maoguo, and Li Xuelong. 2021. Congested crowd instance localization with dilated convolutional swin transformer. arXiv:2108.00584. Retrieved from https://arxiv.org/abs/2108.00584.Google ScholarGoogle Scholar
  14. [14] Gao Junyu, Han Tao, Wang Qi, and Yuan Yuan. 2019. Domain-adaptive crowd counting via inter-domain features segregation and gaussian-prior reconstruction. arXiv:1912.03677. Retrieved from https:/arxiv.org/abs/1912.03677.Google ScholarGoogle Scholar
  15. [15] Gao Junyu, Han Tao, Yuan Yuan, and Wang Qi. 2020. Learning independent instance maps for crowd localization. arXiv:2012.04164. Retrieved from https://arxiv.org/abs/2012.04164.Google ScholarGoogle Scholar
  16. [16] Gu Junru, Sun Chen, and Zhao Hang. 2021. Densetnt: End-to-end trajectory prediction from dense goal sets. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1530315312.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Guo Dan, Li Kun, Zha Zheng-Jun, and Wang Meng. 2019. Dadnet: Dilated-attention-deformable convnet for crowd counting. In Proceedings of the 27th ACM International Conference on Multimedia. 18231832.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Hossain Mohammad, Hosseinzadeh Mehrdad, Chanda Omit, and Wang Yang. 2019. Crowd counting using scale-aware attention networks. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’19). IEEE, 12801288.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Hu Jie, Shen Li, and Sun Gang. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 71327141.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Hu Peiyun and Ramanan Deva. 2017. Finding tiny faces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 951959.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Idrees Haroon, Tayyab Muhmmad, Athrey Kishan, Zhang Dong, Al-Maadeed Somaya, Rajpoot Nasir, and Shah Mubarak. 2018. Composition loss for counting, density map estimation and localization in dense crowds. In Proceedings of the European Conference on Computer Vision (ECCV’18). 532546.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Jiang Minyang, Lin Jianzhe, and Wang Z. Jane. 2021. A smartly simple way for joint crowd counting and localization. Neurocomputing 459 (2021), 3543.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Jiang Xiaolong, Xiao Zehao, Zhang Baochang, Zhen Xiantong, Cao Xianbin, Doermann David, and Shao Ling. 2019. Crowd counting and density estimation by trellis encoder-decoder networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 61336142.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E.. 2012. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25 (2012), 10971105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Li Xuelong, Chen Mulin, Nie Feiping, and Wang Qi. 2017. A multiview-based parameter free framework for group detection. In Proceedings of the 31st AAAI Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Li Xuelong, Chen Mulin, and Wang Qi. 2020. Quantifying and detecting collective motion in crowd scenes. IEEE Trans. Image Process. 29 (2020), 55715583.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Li Zhihang, Tang Xu, Han Junyu, Liu Jingtuo, and He Ran. 2019. Pyramidbox++: High performance detector for finding tiny face. arXiv:1904.00386. Retrieved from https://arxiv.org/abs/1904.00386.Google ScholarGoogle Scholar
  28. [28] Lian Dongze, Li Jing, Zheng Jia, Luo Weixin, and Gao Shenghua. 2019. Density map regression guided detection network for rgb-d crowd counting and localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18211830.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Liang Dingkang, Xu Wei, Zhu Yingying, and Zhou Yu. 2021. Focal inverse distance transform maps for crowd localization and counting in dense crowd. arXiv:2102.07925. Retrieved from https://arxiv.org/abs/2102.07925.Google ScholarGoogle Scholar
  30. [30] Liang Dingkang, Xu Wei, Zhu Yingying, and Zhou Yu. 2021. Reciprocal distance transform maps for crowd counting and people localization in dense crowd (unpublished).Google ScholarGoogle Scholar
  31. [31] Liu Chenchen, Weng Xinyu, and Mu Yadong. 2019. Recurrent attentive zooming for joint crowd counting and precise localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12171226.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Liu Jiang, Gao Chenqiang, Meng Deyu, and Hauptmann Alexander G.. 2018. Decidenet: Counting varying density crowds through attention guided detection and density estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 51975206.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Liu Lingbo, Chen Jiaqi, Wu Hefeng, Li Guanbin, Li Chenglong, and Lin Liang. 2021. Cross-modal collaborative representation learning and a large-scale RGBT benchmark for crowd counting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 48234833.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Liu Lingbo, Qiu Zhilin, Li Guanbin, Liu Shufan, Ouyang Wanli, and Lin Liang. 2019. Crowd counting with deep structured scale integration network. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 17741783.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Pang Youwei, Zhao Xiaoqi, Zhang Lihe, and Lu Huchuan. 2020. Multi-scale interactive network for salient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 94139422.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Redmon Joseph, Divvala Santosh, Girshick Ross, and Farhadi Ali. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779788.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Sam Deepak Babu, Peri Skand Vishwanath, Mukuntha N. S., and Babu R. Venkatesh. 2020. Going beyond the regression paradigm with accurate dot prediction for dense crowds. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’20). IEEE, 28532861.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Sam Deepak Babu, Peri Skand Vishwanath, Sundararaman Mukuntha Narayanan, Kamath Amogh, and Radhakrishnan Venkatesh Babu. 2020. Locate, size and count: Accurately resolving people in dense crowds via detection. IEEE Trans. Pattern Anal. Mach. Intell. (2020).Google ScholarGoogle Scholar
  39. [39] Simonyan Karen and Zisserman Andrew. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. Retrieved from https://arxiv.org/abs/1409.1556.Google ScholarGoogle Scholar
  40. [40] Song Qingyu, Wang Changan, Jiang Zhengkai, Wang Yabiao, Tai Ying, Wang Chengjie, Li Jilin, Huang Feiyue, and Wu Yang. 2021. Rethinking counting and localization in crowds: A purely point-based framework. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 33653374.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Stewart Russell, Andriluka Mykhaylo, and Ng Andrew Y.. 2016. End-to-end people detection in crowded scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 23252333.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Oosterhout Tim Van, Bakkes Sander, Kröse Ben J. A., et al. 2011. Head detection in stereo data for people counting and segmentation. In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP’11). 620625.Google ScholarGoogle Scholar
  43. [43] Wan Jia and Chan Antoni. 2020. Modeling noisy annotations for crowd counting. Adv. Neural Inf. Process. Syst. 33 (2020).Google ScholarGoogle Scholar
  44. [44] Wan Jia, Liu Ziquan, and Chan Antoni B.. 2021. A generalized loss function for crowd counting and localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19741983.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Wang Hengli, Fan Rui, Sun Yuxiang, and Liu Ming. 2021. Dynamic fusion module evolves drivable area and road anomaly detection: A benchmark and algorithms. IEEE Trans. Cybernet. (2021).Google ScholarGoogle Scholar
  46. [46] Wang Qi, Gao Junyu, Lin Wei, and Li Xuelong. 2020. NWPU-crowd: A large-scale benchmark for crowd counting and localization. IEEE Trans. Pattern Anal. Mach. Intell. 43, 6 (2020), 21412149.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Wang Yi, Hou Junhui, Hou Xinyu, and Chau Lap-Pui. 2021. A self-training approach for point-supervised object detection and counting in crowds. IEEE Trans. Image Process. 30 (2021), 28762887.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Wang Yi, Hou Xinyu, and Chau Lap-Pui. 2021. Dense point prediction: A simple baseline for crowd counting and localization. In Proceedings of the IEEE International Conference on Multimedia & Expo Workshops (ICMEW’21). IEEE, 16.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Wang Zhou and Bovik Alan C.. 2009. Mean squared error: Love it or leave it? A new look at signal fidelity measures. IEEE Sign. Process. Mag. 26, 1 (2009), 98117.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Willmott Cort J. and Matsuura Kenji. 2005. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 30, 1 (2005), 7982.Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Xu Chenfeng, Liang Dingkang, Xu Yongchao, Bai Song, Zhan Wei, Bai Xiang, and Tomizuka Masayoshi. 2019. Autoscale: Learning to scale for crowd counting (unpublished).Google ScholarGoogle Scholar
  52. [52] Yu Xuehui, Gong Yuqi, Jiang Nan, Ye Qixiang, and Han Zhenjun. 2020. Scale match for tiny person detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 12571265.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Zhang Yingying, Zhou Desen, Chen Siqin, Gao Shenghua, and Ma Yi. 2016. Single-image crowd counting via multi-column convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 589597.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Dilated Convolution-based Feature Refinement Network for Crowd Localization

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 6
      November 2023
      858 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3599695
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 July 2023
      • Online AM: 20 December 2022
      • Accepted: 1 September 2022
      • Revised: 26 July 2022
      • Received: 13 April 2022
      Published in tomm Volume 19, Issue 6

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text