Skip to main content
Log in

Salient double reconstruction-based discriminative projective dictionary pair learning for crowd counting

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Crowd counting is one of the most fundamental tasks in the field of computer vision and dictionary learning has been successfully applied to the task. However, many traditional dictionary learning-based algorithms for crowd counting often show remarkably large prediction biases on real dynamic monitoring scenes where the feature distribution of the same count is of huge divergence. Meanwhile, these methods also clumsy at revealing salient feature changes between two video frames with the same crowd count. To overcome or alleviate these issues, in this paper we treat crowd counting as a particular classification problem and propose a novel dictionary learning algorithm called salient double-reconstruction based discriminative projective dictionary learning (SDR-DPL) for crowd counting. Specifically, the proposed SDR-DPL develops a novel reconstruction strategy which jointly considers reducing the feature distribution gap and incorporating salient feature mappings into the reconstruction term. This strategy benefits to make the learned dictionaries better adapt to large variations in monitoring crowd scenes and enables to achieve more accurate prediction on the number of pedestrians in both indoor and outdoor scenes. Moreover, we adopt an efficient linear coding technique to represent the crowd features regarding the learned synthesis dictionary, with which the optimization procedure breaks through the computational bottleneck that traditional sparse coding-based methods have to face with. Extensive evaluation experiments on five benchmark datasets validate the impressive performance of the proposed SDR-DPL compared with other state-of-the-art competitors. The source code has been available at https://github.com/Evelhz/SDR-DPL.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Zhang S, Wu G, Costeira JP, Moura JMF (2017) FCN-rLSTM: Deep spatio-temporal neural networks for vehicle counting in city cameras. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3687–3696

  2. Xie W, Noble JA, Zisserman A (2018) Microscopy cell counting and detection with fully convolutional regression networks. Computer Methods in Biomechanics and Biomedical engineering Imaging and Visualization 6(3):283–292

    Article  Google Scholar 

  3. Marsden M, Mcguinness K, Little S, Keogh CE, O’Connor NE (2018) People, penguins and petri dishes: Adapting object counting models to new visual domains and object types without forgetting. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 8070–8079

  4. Li Y, Liu X, Wu X, Huang X, Lu C (2021) Transferable interactiveness knowledge for human-object interaction detection

  5. Liu W, Liao S, Ren W, Hu W, Yu Y (2020) High-level semantic feature detection: A new perspective for pedestrian detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 5182–5191

  6. Chan AB, Liang ZS, Vasconcelos N (2008) Privacy preserving crowd monitoring: Counting people without people models or tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–7

  7. Ke C, Chen C L, Gong S, Tao X (2012) Feature mining for localised crowd counting. In: Proceedings of British Machine Vision Conference, pp 21, vol 1-21, p 11

  8. Zhang Z, Wang M, Geng X (2015) Crowd counting in public video surveillance by label distribution learning. Neurocomputing 166:151–163

    Article  Google Scholar 

  9. Ke C, Kmrinen J (2016) Pedestrian density analysis in public scenes with spatiotemporal tensor features. IEEE Trans. Intell. Trans. Syst. 17(7):1968–1977

    Article  Google Scholar 

  10. Wojek C, Dollar P, Schiele B, Perona P (2012) Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34(4):743–761

    Article  Google Scholar 

  11. Zhang K, Wang H, Liu W, Li M, Liu Z (2020) An effcient semi-supervised manifold embedding for crowd counting. Applied Soft Computing 96(5):106634

    Article  Google Scholar 

  12. Zhang Y, Zhou D, Chen S, Gao S, Yi M (2016) Single-image crowd counting via multi-column convolutional neural network. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition, pp 589–597

  13. Li Y, Zhang X, Chen D (2018) CSRNET: Dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1091–1100

  14. Zhang H, Foroughi H, Ray N (2015) Robust people counting using sparse representation and random projection. Pattern Recognit. 48(10):3038–3052

    Article  Google Scholar 

  15. Ling J, Chen Z, Wu F (2020) Class-oriented discriminative dictionary learning for image classification. IEEE Trans. Circuits Syst. Video Technol. 30(7):2155–2166

    Google Scholar 

  16. Li K, Ding Z, Li S, Fu Y (2018) Toward resolution-invariant person reidentification via projective dictionary learning. IEEE Trans. Neural Netw. Learn.Syst. 30(6):1896–1907

    Article  MathSciNet  Google Scholar 

  17. Zhang K, Luo S, Li M, Jing J, Xiong Z (2020) Learning stacking regressors for single image super-resolution. Applied Intelligence 50:4325–4341

    Article  Google Scholar 

  18. Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2):210–227

    Article  Google Scholar 

  19. Yang M, Zhang L, Feng X, Zhang D (2011) Fisher discrimination dictionary learning for sparse representation. In: Proceedings of IEEE International Conference on Computer Vision, pp 543–550

  20. Gu S, Zhang L, Zuo W, Feng X (2014) Projective dictionary pair learning for pattern classification. In: Proceedings of Conference on Neural Information Processing Systems, pp 793–801

  21. Yang C, Gong H, Song CZ, Tang Y (2009) Flow mosaicking: Real-time pedestrian counting without scene-specific learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1093–1100

  22. Zheng H, Lin Z, Cen J, Wu Z, Zhao Y (2018) Cross-line pedestrian counting based on spatially-consistent two-stage local crowd density estimation and accumulation. IEEE Trans. Circuits Syst. Video Technol. 29(3):787–799

    Article  Google Scholar 

  23. Ling M, Geng X (2019) Indoor crowd counting by mixture of gaussians label distribution learning. IEEE Trans. Image Process. 28(11):5691–5701

    Article  MathSciNet  MATH  Google Scholar 

  24. Cao L, Xu Z, Ren W, Huang K (2015) Large scale crowd analysis based on convolutional neural network. Pattern Recognit. 48(10):3016–3024

    Article  Google Scholar 

  25. Zhao Z, Li H, Rui Z, Wang X (2016) Crossing-line crowd counting with two-phase deep neural networks. In: Proceedings of European Conference on Computer Vision, pp 712–726

  26. Xiong F, Shi X, Yeung DY (2017) Spatiotemporal modeling for crowd counting in videos. In: Proceedings of IEEE International Conference on Computer Vision, pp 5161–5169

  27. Zou Z, Shao H, Qu X, Wei W, Zhou P (2019) Enhanced 3D convolutional networks for crowd counting. In: Proceedings of BMCV, pp 1–13

  28. Liu Y, Jia R, Liu Q, Zhang X, Sun H (2021) Crowd counting method based on the self-attention residual network. Appl Intell 51(1):427–440

    Article  Google Scholar 

  29. Zhao Z, Sun Y, Yang W, Zheng Z, Meng W (2020) Twin-incoherent selfexpressive locality-adaptive latent dictionary pair learning for classification. IEEE Trans. Neural Netw. Learn. Syst. 32(3):947–961

    Google Scholar 

  30. Sun Y, Zhang Z, Jiang W, Zhang Z, Zhang L, Yan S, Wang M (2020) Discriminative local sparse representation by robust adaptive dictionary pair learning. IEEE Trans. Neural Netw. Learn. Syst. 31 (10):4303–4317

    Article  MathSciNet  Google Scholar 

  31. Han N, Wu J, Fang X, Teng S, Li X (2020) Projective double reconstructions based dictionary learning algorithm for cross-domain recognition. IEEE Trans. Image Process. 29:9220–9233

    Article  MathSciNet  MATH  Google Scholar 

  32. Aharon M, Elad M, Bruckstein A (2006) K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 54:4311–4322

    Article  MATH  Google Scholar 

  33. Al-Shatri H, Xiang L, Ganesan S S, Klein A, Weber T (2016) Maximizing the sum rate in cellular networks using multi-convex optimization. IEEE Trans. Wireless Commun. 15(5):3199–3211

    Article  Google Scholar 

  34. Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3):145–175

    Article  MATH  Google Scholar 

  35. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Computer Science, 1–14

  36. Huang S, Li X, Zhang Z, Wu F, Gao S, Ji R, Han J (2018) Body structure aware deep crowd counting. IEEE Trans. Image Process. 27(3):1049–1059

    Article  MathSciNet  MATH  Google Scholar 

  37. Tian Y, Lei Y, Zhang J, Wang J Z (2018) PaDNet: Pan-density crowd counting. IEEE Trans. Image Process. 29:2714–2727

    Article  MATH  Google Scholar 

  38. Wang C, Hua Z, Liang Y, Si L, Cao X (2015) Deep people counting in extremely dense crowds. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 1299–1302

  39. Der Maaten L V, Hinton G. (2008) Viualizing data using t-SNE. J. of Mach. Learn. Res. 9:2579–2605

    MATH  Google Scholar 

  40. Jiang W, Zhang Z, Li F, Zhang L, Zhao M, Jin X (2016) Joint label consistent dictionary learning and adaptive label prediction for semisupervised machine fault classification. IEEE Trans. Ind. Informat. 12(1):248–256

    Article  Google Scholar 

  41. Zhang Z, Jiang W, Zhang Z, Li S, Liu G, Qin J (2019) Scalable Block-Diagonal Locality-Constrained Projective Dictionary Learning. In: Proceedings of IJCAI, pp 4376–4382

  42. Chen Z, Wu X, Josef K (2021) Relaxed Block-Diagonal dictionary pair learning with locality constraint for image recognition. IEEE Trans. Neural Netw. Learn. Syst. 1–15

  43. Liu W, Wang H, Luo H, Zhang K (2021) Pseudo-label growth dictionary pair learning for crowd counting. Applied Intelligence 51:8913–8927

    Article  Google Scholar 

  44. Sindagi V, Yasarla R, Patel V (2020) Large-Scale Crowd counting dataset and a benchmark method. IEEE Pattern Anal. Mach. Intell. 1–1

  45. Zhou Q, Wu X, Zhang S (2022) Contextual ensemble network for semantic segmentation. Pattern Recognition 122:108290

    Article  Google Scholar 

  46. Lian D, Chen X, Li J (2021) Locating and counting heads in crowds with a depth prior. IEEE Pattern Anal. Mach. Intell. 1–1

  47. Zhang Z, Jiang W, Qin J (2018) Jointly learning structured analysis discriminative dictionary and analysis multiclass classifier. IEEE Trans. Neural Netw. Learn. Syst. 29(8):3798–3814

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61971339 and Grant 61471161, in part by Graduate Scientific Innovation Fund for Xi’an Polytechnic University under Grant chx2021011, in part by the Key Project of the Natural Science Foundation of Shaanxi Province under Grant 2022JM-348 and Grant 2018JZ6002, in part by the Science and Technology Planning Project of Chuzhou under Grant 2021zn007, in part by the Textile Intelligent Equipment Information and Control Innovation Team of Shaanxi Innovation Ability Support Program under Grant 2021TD-29, in part by the Textile Intelligent Equipment Information and Control Innovation Team of Shaanxi Innovation Team of Universities, in part by the Science and Technology Planning Project of Xi’an under Grant 2020KJRC0028, in part by the Technology Planning Project of Beilin, Xi’an under Grant GX2006, and in part by Natural Science Basic Research Program of Shaanxi under Grant 2021JM-452.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kaibing Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, T., Luo, H., Zhang, K. et al. Salient double reconstruction-based discriminative projective dictionary pair learning for crowd counting. Appl Intell 53, 1981–1996 (2023). https://doi.org/10.1007/s10489-022-03607-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03607-z

Keywords

Navigation