Abstract
Crowd counting is one of the most fundamental tasks in the field of computer vision and dictionary learning has been successfully applied to the task. However, many traditional dictionary learning-based algorithms for crowd counting often show remarkably large prediction biases on real dynamic monitoring scenes where the feature distribution of the same count is of huge divergence. Meanwhile, these methods also clumsy at revealing salient feature changes between two video frames with the same crowd count. To overcome or alleviate these issues, in this paper we treat crowd counting as a particular classification problem and propose a novel dictionary learning algorithm called salient double-reconstruction based discriminative projective dictionary learning (SDR-DPL) for crowd counting. Specifically, the proposed SDR-DPL develops a novel reconstruction strategy which jointly considers reducing the feature distribution gap and incorporating salient feature mappings into the reconstruction term. This strategy benefits to make the learned dictionaries better adapt to large variations in monitoring crowd scenes and enables to achieve more accurate prediction on the number of pedestrians in both indoor and outdoor scenes. Moreover, we adopt an efficient linear coding technique to represent the crowd features regarding the learned synthesis dictionary, with which the optimization procedure breaks through the computational bottleneck that traditional sparse coding-based methods have to face with. Extensive evaluation experiments on five benchmark datasets validate the impressive performance of the proposed SDR-DPL compared with other state-of-the-art competitors. The source code has been available at https://github.com/Evelhz/SDR-DPL.
Similar content being viewed by others
References
Zhang S, Wu G, Costeira JP, Moura JMF (2017) FCN-rLSTM: Deep spatio-temporal neural networks for vehicle counting in city cameras. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3687–3696
Xie W, Noble JA, Zisserman A (2018) Microscopy cell counting and detection with fully convolutional regression networks. Computer Methods in Biomechanics and Biomedical engineering Imaging and Visualization 6(3):283–292
Marsden M, Mcguinness K, Little S, Keogh CE, O’Connor NE (2018) People, penguins and petri dishes: Adapting object counting models to new visual domains and object types without forgetting. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 8070–8079
Li Y, Liu X, Wu X, Huang X, Lu C (2021) Transferable interactiveness knowledge for human-object interaction detection
Liu W, Liao S, Ren W, Hu W, Yu Y (2020) High-level semantic feature detection: A new perspective for pedestrian detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 5182–5191
Chan AB, Liang ZS, Vasconcelos N (2008) Privacy preserving crowd monitoring: Counting people without people models or tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–7
Ke C, Chen C L, Gong S, Tao X (2012) Feature mining for localised crowd counting. In: Proceedings of British Machine Vision Conference, pp 21, vol 1-21, p 11
Zhang Z, Wang M, Geng X (2015) Crowd counting in public video surveillance by label distribution learning. Neurocomputing 166:151–163
Ke C, Kmrinen J (2016) Pedestrian density analysis in public scenes with spatiotemporal tensor features. IEEE Trans. Intell. Trans. Syst. 17(7):1968–1977
Wojek C, Dollar P, Schiele B, Perona P (2012) Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34(4):743–761
Zhang K, Wang H, Liu W, Li M, Liu Z (2020) An effcient semi-supervised manifold embedding for crowd counting. Applied Soft Computing 96(5):106634
Zhang Y, Zhou D, Chen S, Gao S, Yi M (2016) Single-image crowd counting via multi-column convolutional neural network. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition, pp 589–597
Li Y, Zhang X, Chen D (2018) CSRNET: Dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1091–1100
Zhang H, Foroughi H, Ray N (2015) Robust people counting using sparse representation and random projection. Pattern Recognit. 48(10):3038–3052
Ling J, Chen Z, Wu F (2020) Class-oriented discriminative dictionary learning for image classification. IEEE Trans. Circuits Syst. Video Technol. 30(7):2155–2166
Li K, Ding Z, Li S, Fu Y (2018) Toward resolution-invariant person reidentification via projective dictionary learning. IEEE Trans. Neural Netw. Learn.Syst. 30(6):1896–1907
Zhang K, Luo S, Li M, Jing J, Xiong Z (2020) Learning stacking regressors for single image super-resolution. Applied Intelligence 50:4325–4341
Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2):210–227
Yang M, Zhang L, Feng X, Zhang D (2011) Fisher discrimination dictionary learning for sparse representation. In: Proceedings of IEEE International Conference on Computer Vision, pp 543–550
Gu S, Zhang L, Zuo W, Feng X (2014) Projective dictionary pair learning for pattern classification. In: Proceedings of Conference on Neural Information Processing Systems, pp 793–801
Yang C, Gong H, Song CZ, Tang Y (2009) Flow mosaicking: Real-time pedestrian counting without scene-specific learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1093–1100
Zheng H, Lin Z, Cen J, Wu Z, Zhao Y (2018) Cross-line pedestrian counting based on spatially-consistent two-stage local crowd density estimation and accumulation. IEEE Trans. Circuits Syst. Video Technol. 29(3):787–799
Ling M, Geng X (2019) Indoor crowd counting by mixture of gaussians label distribution learning. IEEE Trans. Image Process. 28(11):5691–5701
Cao L, Xu Z, Ren W, Huang K (2015) Large scale crowd analysis based on convolutional neural network. Pattern Recognit. 48(10):3016–3024
Zhao Z, Li H, Rui Z, Wang X (2016) Crossing-line crowd counting with two-phase deep neural networks. In: Proceedings of European Conference on Computer Vision, pp 712–726
Xiong F, Shi X, Yeung DY (2017) Spatiotemporal modeling for crowd counting in videos. In: Proceedings of IEEE International Conference on Computer Vision, pp 5161–5169
Zou Z, Shao H, Qu X, Wei W, Zhou P (2019) Enhanced 3D convolutional networks for crowd counting. In: Proceedings of BMCV, pp 1–13
Liu Y, Jia R, Liu Q, Zhang X, Sun H (2021) Crowd counting method based on the self-attention residual network. Appl Intell 51(1):427–440
Zhao Z, Sun Y, Yang W, Zheng Z, Meng W (2020) Twin-incoherent selfexpressive locality-adaptive latent dictionary pair learning for classification. IEEE Trans. Neural Netw. Learn. Syst. 32(3):947–961
Sun Y, Zhang Z, Jiang W, Zhang Z, Zhang L, Yan S, Wang M (2020) Discriminative local sparse representation by robust adaptive dictionary pair learning. IEEE Trans. Neural Netw. Learn. Syst. 31 (10):4303–4317
Han N, Wu J, Fang X, Teng S, Li X (2020) Projective double reconstructions based dictionary learning algorithm for cross-domain recognition. IEEE Trans. Image Process. 29:9220–9233
Aharon M, Elad M, Bruckstein A (2006) K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 54:4311–4322
Al-Shatri H, Xiang L, Ganesan S S, Klein A, Weber T (2016) Maximizing the sum rate in cellular networks using multi-convex optimization. IEEE Trans. Wireless Commun. 15(5):3199–3211
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3):145–175
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Computer Science, 1–14
Huang S, Li X, Zhang Z, Wu F, Gao S, Ji R, Han J (2018) Body structure aware deep crowd counting. IEEE Trans. Image Process. 27(3):1049–1059
Tian Y, Lei Y, Zhang J, Wang J Z (2018) PaDNet: Pan-density crowd counting. IEEE Trans. Image Process. 29:2714–2727
Wang C, Hua Z, Liang Y, Si L, Cao X (2015) Deep people counting in extremely dense crowds. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 1299–1302
Der Maaten L V, Hinton G. (2008) Viualizing data using t-SNE. J. of Mach. Learn. Res. 9:2579–2605
Jiang W, Zhang Z, Li F, Zhang L, Zhao M, Jin X (2016) Joint label consistent dictionary learning and adaptive label prediction for semisupervised machine fault classification. IEEE Trans. Ind. Informat. 12(1):248–256
Zhang Z, Jiang W, Zhang Z, Li S, Liu G, Qin J (2019) Scalable Block-Diagonal Locality-Constrained Projective Dictionary Learning. In: Proceedings of IJCAI, pp 4376–4382
Chen Z, Wu X, Josef K (2021) Relaxed Block-Diagonal dictionary pair learning with locality constraint for image recognition. IEEE Trans. Neural Netw. Learn. Syst. 1–15
Liu W, Wang H, Luo H, Zhang K (2021) Pseudo-label growth dictionary pair learning for crowd counting. Applied Intelligence 51:8913–8927
Sindagi V, Yasarla R, Patel V (2020) Large-Scale Crowd counting dataset and a benchmark method. IEEE Pattern Anal. Mach. Intell. 1–1
Zhou Q, Wu X, Zhang S (2022) Contextual ensemble network for semantic segmentation. Pattern Recognition 122:108290
Lian D, Chen X, Li J (2021) Locating and counting heads in crowds with a depth prior. IEEE Pattern Anal. Mach. Intell. 1–1
Zhang Z, Jiang W, Qin J (2018) Jointly learning structured analysis discriminative dictionary and analysis multiclass classifier. IEEE Trans. Neural Netw. Learn. Syst. 29(8):3798–3814
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grant 61971339 and Grant 61471161, in part by Graduate Scientific Innovation Fund for Xi’an Polytechnic University under Grant chx2021011, in part by the Key Project of the Natural Science Foundation of Shaanxi Province under Grant 2022JM-348 and Grant 2018JZ6002, in part by the Science and Technology Planning Project of Chuzhou under Grant 2021zn007, in part by the Textile Intelligent Equipment Information and Control Innovation Team of Shaanxi Innovation Ability Support Program under Grant 2021TD-29, in part by the Textile Intelligent Equipment Information and Control Innovation Team of Shaanxi Innovation Team of Universities, in part by the Science and Technology Planning Project of Xi’an under Grant 2020KJRC0028, in part by the Technology Planning Project of Beilin, Xi’an under Grant GX2006, and in part by Natural Science Basic Research Program of Shaanxi under Grant 2021JM-452.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, T., Luo, H., Zhang, K. et al. Salient double reconstruction-based discriminative projective dictionary pair learning for crowd counting. Appl Intell 53, 1981–1996 (2023). https://doi.org/10.1007/s10489-022-03607-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03607-z