Skip to main content
Log in

Multi-scale dilated convolution of feature Fusion Network for Crowd counting

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Crowd counting has long been a challenging task due to the perspective distortion and variability in head size. The previous methods ignore the multi-scale information in images or simply use convolutions with different kernel sizes to extract multi-scale features, resulting in incomplete multi-scale features extracted. In this paper, we propose a crowd counting model called Multi-scale Dilated Convolution of Feature Fusion Network (MsDFNet) based on a CNN (convolutional neural network). Our MsDFNet is based on the regression method of the density map. The density map is predicted by the parameters learned by CNN to obtain better prediction results. The proposed network mainly includes three components, a CNN to extract low-level features, a multi-scale dilated convolution module and multi-column feature fusion blocks, a density map regression module. Multi-scale dilated convolutions are employed to extract multi-scale high-level features, and the features extracted from different columns are fused. The combination of the multi-scale dilated convolution module and the multi-column feature fusion block can effectively extract more complete multi-scale features and boost the performance of counting small-sized targets. Experiments show that the problem of various head sizes in images can be effectively solved by fusing multi-scale context feature information. We prove the effectiveness of our method on two public datasets (The ShanghaiTech dataset and the UCF_CC_50 dataset). We compare our method with the previous state-of-the-art crowd counting algorithms in terms of MAE (Mean Absolute Error) and MSE (Mean Square Error) and significantly improves the performance, especially in case of various head sizes. On the UCF_CC_50 dataset, our method reduces the MAE index by 28.6 compared with the previous state-of-the-art method. (The lower the MAE, the better the performance).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Aich S, Stavness I (2019) Global Sum Pooling: A Generalization Trick for Object Counting with Small Datasets of Large Images. In: Proc. IEEE Conf. CVPR, 73–82

  2. Babu Sam D, Sajjan NN, Venkatesh Babu R, Srinivasan M (2018) Divide and grow: Capturing huge diversity in crowd images with incrementally growing cnn. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 3618–3626

  3. Boominathan L, Kruthiventi SS, Babu RV (2016) Crowdnet: A deep convolutional network for dense crowd counting. In: Proceedings of the 24th ACM international conference on Multimedia, 640–644

  4. Cai W, Wei Z (2020) Remote sensing image classification based on a cross-attention mechanism and graph convolution. IEEE Geosci Remote Sens Lett

  5. Cai W, Wei Z (2020) PiiGAN: Generative adversarial networks for pluralistic image inpainting. IEEE Access 8:48451–48463

    Article  Google Scholar 

  6. Cao X, Wang Z, Zhao Y, Su F (2018) Scale aggregation network for accurate and efficient crowd counting. In: Proceedings of the European Conference on Computer Vision (ECCV), 734–750

  7. Chen J, He L, Yang T (2016) Scale-up purification for rutin hyrdrolysates by high-performance counter-current chromatography coupled with semi-preparative high-performance liquid chromatography. Sep Sci Technol 51(9):1523–1530

    Google Scholar 

  8. Chen J, Kumar A, Ranjan R, Patel VM, Alavi A, Chellappa R (2016) A cascaded convolutional neural network for age estimation of unconstrained faces. In: 2016 IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS), 1–8

  9. Dai J, He K, Sun J (2016) Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3150–3158

  10. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), 1, 886–893

  11. Denman S, Chandran V, Sridharan S (2007) An adaptive optical flow technique for person tracking systems. Pattern Recognit Lett 28(10):1232–1239

    Article  Google Scholar 

  12. Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: An evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761

    Article  Google Scholar 

  13. Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

    Article  Google Scholar 

  14. Idrees H, Saleemi I, Seibert C, Shah M (2013) Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2547–2554

  15. Li Y, Zhang X, Chen D (2018) Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 1091–1100

  16. Liu N, Long Y, Zou C et al (2019) ADCrowdNet: An attention-injective deformable convolutional network for crowd understanding. In: Proc. IEEE Conf. CVPR, 3225–3234

  17. Liu Y, Shi M, Zhao Q, Wang X (2019) Point in, box out: Beyond counting persons in crowds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6469–6478

  18. Liu L, Jia W, Jiang J, Amirgholipour S, Wang Y, Zeibots M (2020) He X (2020) Denet: A universal network for counting crowd with varying densities and scales. IEEE Trans Multimed 23:1060–1068

    Article  Google Scholar 

  19. Liu W, Salzmann M, Fua P (2019) Context-aware crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5099–5108

  20. Lowe DG (1999) Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE international conference on computer vision, 2, 1150–1157

  21. Ma Z, Wei X, Hong X, Gong Y (2019) Bayesian loss for crowd count estimation with point supervision. In: Proceedings of the IEEE International Conference on Computer Vision, 6142–6151

  22. Paszke A, Gross S, Massa F et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems, 8026–8037

  23. Ranjan V, Le H, Hoai M (2018) Iterative crowd counting. In: Proceedings of the European Conference on Computer Vision (ECCV), 270–285

  24. Ryan D, Denman S, Fookes C, Sridharan S (2009) Crowd counting using multiple local features. In: 2009 Digital Image Computing: Techniques and Applications, 81–88

  25. Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 4031–4039

  26. Sindagi VA, Patel VM (2017) Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 1–6

  27. Shen Z, Xu Y, Ni B, Wang M, Hu J, Yang X (2018) Crowd counting via adversarial cross-scale consistency pursuit. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 5245–5254

  28. Shi M, Yang Z, Xu C, Chen Q (2019) Revisiting perspective information for efficient crowd counting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7279–7288

  29. Sindagi VA, Patel VM (2017) Generating high-quality crowd density maps using contextual pyramid cnns. In: Proceedings of the IEEE International Conference on Computer Vision, 1861–1870

  30. Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vision 57(2):137–154

    Article  Google Scholar 

  31. Wang Y, Wang G, Chen C et al (2019) Multi-scale convolution of convolutional neural network for image denoising. Multimed Tools Appl 78:19945–19960

    Article  Google Scholar 

  32. Wang Z, Xiao Z, Xie K, Qiu Q, Zhen X, Cao X (2018) In defense of single-column networks for crowd counting.arXiv preprint arXiv:1808.06133

  33. Wu B, Nevatia R (2007) Detection and tracking of multiple, partially occluded humans by bayesian combination of edgelet based part detectors. Int J Comput Vision 75(2):247–266

    Article  Google Scholar 

  34. Xiong F, Shi X, Yeung DY (2017) Spatiotemporal modeling for crowd counting in videos. In: Proceedings of the IEEE International Conference on Computer Vision, 5151–5159

  35. Yang ZL, Guo XQ, Chen ZM, Huang YF, Zhang YJ (2018) RNN-stega: Linguistic steganography based on recurrent neural networks. IEEE Trans Inf Forensics Secur 14(5):1280–1295

    Article  Google Scholar 

  36. You H, Tian S, Yu L, Lv Y (2020) Pixel-level remote sensing image recognition based on bidirectional word vectors. IEEE Trans Geosci Remote Sens 58(2):1281–1293

    Article  Google Scholar 

  37. Zeng L, Xu X, Cai B, Qiu S, Zhang T (2017) Multi-scale convolutional neural networks for crowd counting. In: 2017 IEEE International Conference on Image Processing (ICIP), 465–469

  38. Zhang Q, Chan AB (2019) Wide-area crowd counting via ground-plane density maps and multi-view fusion CNNs. In: Proc. IEEE Conf. CVPR, 8297–8306

  39. Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 833–841

  40. Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 589–597

  41. Zhang L, Shi Z, Cheng M, Liu Y et al (2020) Nonlinear regression via deep negative correlation learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–16

Download references

Acknowledgements

This work was supported by the Natural Science Foundation of Shandong Province (No. ZR2019MF050) and the Shandong Province colleges and universities youth innovation technology plan innovation team project under Grant (No. 2020KJN011).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guodong Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, D., Wang, G. & Zhai, G. Multi-scale dilated convolution of feature Fusion Network for Crowd counting. Multimed Tools Appl 81, 37939–37952 (2022). https://doi.org/10.1007/s11042-022-13130-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13130-5

Keywords

Navigation