Multi-scale dilated convolution of feature Fusion Network for Crowd counting

Liu, Donghua; Wang, Guodong; Zhai, Guangtao

doi:10.1007/s11042-022-13130-5

Multi-scale dilated convolution of feature Fusion Network for Crowd counting

Published: 22 April 2022

Volume 81, pages 37939–37952, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Donghua Liu¹,
Guodong Wang¹ &
Guangtao Zhai²

349 Accesses
Explore all metrics

Abstract

Crowd counting has long been a challenging task due to the perspective distortion and variability in head size. The previous methods ignore the multi-scale information in images or simply use convolutions with different kernel sizes to extract multi-scale features, resulting in incomplete multi-scale features extracted. In this paper, we propose a crowd counting model called Multi-scale Dilated Convolution of Feature Fusion Network (MsDFNet) based on a CNN (convolutional neural network). Our MsDFNet is based on the regression method of the density map. The density map is predicted by the parameters learned by CNN to obtain better prediction results. The proposed network mainly includes three components, a CNN to extract low-level features, a multi-scale dilated convolution module and multi-column feature fusion blocks, a density map regression module. Multi-scale dilated convolutions are employed to extract multi-scale high-level features, and the features extracted from different columns are fused. The combination of the multi-scale dilated convolution module and the multi-column feature fusion block can effectively extract more complete multi-scale features and boost the performance of counting small-sized targets. Experiments show that the problem of various head sizes in images can be effectively solved by fusing multi-scale context feature information. We prove the effectiveness of our method on two public datasets (The ShanghaiTech dataset and the UCF_CC_50 dataset). We compare our method with the previous state-of-the-art crowd counting algorithms in terms of MAE (Mean Absolute Error) and MSE (Mean Square Error) and significantly improves the performance, especially in case of various head sizes. On the UCF_CC_50 dataset, our method reduces the MAE index by 28.6 compared with the previous state-of-the-art method. (The lower the MAE, the better the performance).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

References

Aich S, Stavness I (2019) Global Sum Pooling: A Generalization Trick for Object Counting with Small Datasets of Large Images. In: Proc. IEEE Conf. CVPR, 73–82
Babu Sam D, Sajjan NN, Venkatesh Babu R, Srinivasan M (2018) Divide and grow: Capturing huge diversity in crowd images with incrementally growing cnn. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 3618–3626
Boominathan L, Kruthiventi SS, Babu RV (2016) Crowdnet: A deep convolutional network for dense crowd counting. In: Proceedings of the 24th ACM international conference on Multimedia, 640–644
Cai W, Wei Z (2020) Remote sensing image classification based on a cross-attention mechanism and graph convolution. IEEE Geosci Remote Sens Lett
Cai W, Wei Z (2020) PiiGAN: Generative adversarial networks for pluralistic image inpainting. IEEE Access 8:48451–48463
Article Google Scholar
Cao X, Wang Z, Zhao Y, Su F (2018) Scale aggregation network for accurate and efficient crowd counting. In: Proceedings of the European Conference on Computer Vision (ECCV), 734–750
Chen J, He L, Yang T (2016) Scale-up purification for rutin hyrdrolysates by high-performance counter-current chromatography coupled with semi-preparative high-performance liquid chromatography. Sep Sci Technol 51(9):1523–1530
Google Scholar
Chen J, Kumar A, Ranjan R, Patel VM, Alavi A, Chellappa R (2016) A cascaded convolutional neural network for age estimation of unconstrained faces. In: 2016 IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS), 1–8
Dai J, He K, Sun J (2016) Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3150–3158
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), 1, 886–893
Denman S, Chandran V, Sridharan S (2007) An adaptive optical flow technique for person tracking systems. Pattern Recognit Lett 28(10):1232–1239
Article Google Scholar
Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: An evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761
Article Google Scholar
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Article Google Scholar
Idrees H, Saleemi I, Seibert C, Shah M (2013) Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2547–2554
Li Y, Zhang X, Chen D (2018) Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 1091–1100
Liu N, Long Y, Zou C et al (2019) ADCrowdNet: An attention-injective deformable convolutional network for crowd understanding. In: Proc. IEEE Conf. CVPR, 3225–3234
Liu Y, Shi M, Zhao Q, Wang X (2019) Point in, box out: Beyond counting persons in crowds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6469–6478
Liu L, Jia W, Jiang J, Amirgholipour S, Wang Y, Zeibots M (2020) He X (2020) Denet: A universal network for counting crowd with varying densities and scales. IEEE Trans Multimed 23:1060–1068
Article Google Scholar
Liu W, Salzmann M, Fua P (2019) Context-aware crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5099–5108
Lowe DG (1999) Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE international conference on computer vision, 2, 1150–1157
Ma Z, Wei X, Hong X, Gong Y (2019) Bayesian loss for crowd count estimation with point supervision. In: Proceedings of the IEEE International Conference on Computer Vision, 6142–6151
Paszke A, Gross S, Massa F et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems, 8026–8037
Ranjan V, Le H, Hoai M (2018) Iterative crowd counting. In: Proceedings of the European Conference on Computer Vision (ECCV), 270–285
Ryan D, Denman S, Fookes C, Sridharan S (2009) Crowd counting using multiple local features. In: 2009 Digital Image Computing: Techniques and Applications, 81–88
Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 4031–4039
Sindagi VA, Patel VM (2017) Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 1–6
Shen Z, Xu Y, Ni B, Wang M, Hu J, Yang X (2018) Crowd counting via adversarial cross-scale consistency pursuit. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 5245–5254
Shi M, Yang Z, Xu C, Chen Q (2019) Revisiting perspective information for efficient crowd counting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7279–7288
Sindagi VA, Patel VM (2017) Generating high-quality crowd density maps using contextual pyramid cnns. In: Proceedings of the IEEE International Conference on Computer Vision, 1861–1870
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vision 57(2):137–154
Article Google Scholar
Wang Y, Wang G, Chen C et al (2019) Multi-scale convolution of convolutional neural network for image denoising. Multimed Tools Appl 78:19945–19960
Article Google Scholar
Wang Z, Xiao Z, Xie K, Qiu Q, Zhen X, Cao X (2018) In defense of single-column networks for crowd counting.arXiv preprint arXiv:1808.06133
Wu B, Nevatia R (2007) Detection and tracking of multiple, partially occluded humans by bayesian combination of edgelet based part detectors. Int J Comput Vision 75(2):247–266
Article Google Scholar
Xiong F, Shi X, Yeung DY (2017) Spatiotemporal modeling for crowd counting in videos. In: Proceedings of the IEEE International Conference on Computer Vision, 5151–5159
Yang ZL, Guo XQ, Chen ZM, Huang YF, Zhang YJ (2018) RNN-stega: Linguistic steganography based on recurrent neural networks. IEEE Trans Inf Forensics Secur 14(5):1280–1295
Article Google Scholar
You H, Tian S, Yu L, Lv Y (2020) Pixel-level remote sensing image recognition based on bidirectional word vectors. IEEE Trans Geosci Remote Sens 58(2):1281–1293
Article Google Scholar
Zeng L, Xu X, Cai B, Qiu S, Zhang T (2017) Multi-scale convolutional neural networks for crowd counting. In: 2017 IEEE International Conference on Image Processing (ICIP), 465–469
Zhang Q, Chan AB (2019) Wide-area crowd counting via ground-plane density maps and multi-view fusion CNNs. In: Proc. IEEE Conf. CVPR, 8297–8306
Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 833–841
Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 589–597
Zhang L, Shi Z, Cheng M, Liu Y et al (2020) Nonlinear regression via deep negative correlation learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–16

Download references

Acknowledgements

This work was supported by the Natural Science Foundation of Shandong Province (No. ZR2019MF050) and the Shandong Province colleges and universities youth innovation technology plan innovation team project under Grant (No. 2020KJN011).

Author information

Authors and Affiliations

College of Computer Science and Technology, Qingdao University, Shandong, Qingdao, 266071, China
Donghua Liu & Guodong Wang
Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
Guangtao Zhai

Authors

Donghua Liu
View author publications
You can also search for this author in PubMed Google Scholar
Guodong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Guangtao Zhai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guodong Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, D., Wang, G. & Zhai, G. Multi-scale dilated convolution of feature Fusion Network for Crowd counting. Multimed Tools Appl 81, 37939–37952 (2022). https://doi.org/10.1007/s11042-022-13130-5

Download citation

Received: 18 November 2020
Revised: 23 February 2021
Accepted: 10 April 2022
Published: 22 April 2022
Issue Date: November 2022
DOI: https://doi.org/10.1007/s11042-022-13130-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-scale dilated convolution of feature Fusion Network for Crowd counting

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-scale dilated convolution of feature Fusion Network for Crowd counting

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation