Skip to main content
Log in

Mixture of counting CNNs

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

This paper proposes a crowd counting method. Crowd counting is difficult because of significant appearance changes of a target which caused by density and scale changes. Conventional crowd counting methods commonly utilize one predictor (e.g., regression and multi-class classifier). However, such only one predictor can not count targets with significant appearance changes well. In this paper, we propose to predict the number of targets using multiple convolutional neural networks (CNNs) specialized to a specific appearance, and those CNNs are adaptively selected according to the appearance of a test image. By integrating the selected CNNs, the proposed method has the robustness to large appearance changes. In experiments, we confirm that the proposed method can count crowd with lower counting error than VGGNet, integration of CNNs with fixed weights and conventional counting methods. Moreover, we confirm that each CNN automatically specialized to a specific appearance (e.g., dense region and sparse region) of crowd through training of CNNs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. An, S., Liu, W., Venkatesh, S.: Face recognition using kernel ridge regression. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–7 (2007)

  2. Arteta, C., Lempitsky, V., Noble, J.A., Zisserman, A.: Interactive object counting. In: European Conference on Computer Vision, pp. 504–518 (2014)

    Google Scholar 

  3. Chan, A.B., Liang, Z.S.J., Vasconcelos, N.: Privacy preserving crowd monitoring: counting people without people models or tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–7 (2008)

  4. Chen, K., Loy, C.C., Gong, S., Xiang, T.: Feature mining for localised crowd counting. In: British Machine Vision Conference, pp. 21.1—21.11 (2012)

  5. Chen, K., Gong, S., Xiang, T., Loy, C.: Cumulative attribute space for age and crowd density estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2467–2474 (2013)

  6. Clevert, D., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). CoRR (2015). arXiv preprint arxiv:1511.07289

  7. Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)

  8. Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors (2012). arXiv preprint arXiv:1207.0580

  9. Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2547–2554 (2013)

  10. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift (2015). arXiv preprint arXiv:1502.03167

  11. Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Comput. 3(1), 79–87 (1991)

    Article  Google Scholar 

  12. Kembhavi, A., Harwood, D., Davis, L.S.: Vehicle detection using partial least squares. IEEE Trans. Pattern Anal. Mach. Intell. 33(6), 1250–1265 (2011)

    Article  Google Scholar 

  13. Kingma, D.P., Adam, J.B.: A method for stochastic optimization. In: International Conference on Learning Representation (2015)

  14. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105 (2012)

  15. Lempitsky, V., Zisserman, A.: Learning to count objects in images. In: Advances in Neural Information Processing Systems, pp. 1324–1332 (2010)

  16. Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3), 18–22 (2002)

    Google Scholar 

  17. Loy, C., Gong, S., Xiang, T.: From semi-supervised to transfer counting of crowds. In: IEEE International Conference on Computer Vision, pp. 2256–2263 (2013)

  18. Onoro-Rubio, D., López-Sastre, R.J.: Towards perspective-free object counting with deep learning. In: European Conference on Computer Vision, Springer, pp. 615–629 (2016)

  19. Pham, V.Q., Kozakaya, T., Yamaguchi, O., Okada, R.: Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3253–3261 (2015)

  20. Van Gestel, T., Suykens, J., De Moor, B., Vandewalle, J.: Automatic relevance determination for least squares support vector machine classifiers. In: European Symposium on Artificial Neural Networks, pp. 13–18 (2001)

  21. Walach, E., Wolf, L.: Learning to count with cnn boosting. In: European Conference on Computer Vision, pp. 660–676. Springer (2016)

  22. Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., Zhang, Z.: The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 842–850 (2015)

  23. Xu, Z., Yang, Y., Hauptmann, A.G.: A discriminative cnn video representation for event detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1798–1807 (2015)

  24. Yan, C., Xie, H., Yang, D., Yin, J., Zhang, Y., Dai, Q.: Supervised hash coding with deep neural network for environment perception of intelligent vehicles. IEEE Trans. Intell. Transp. Syst. 19(1), 284–295 (2018)

    Article  Google Scholar 

  25. Zhang, C., Li, H., Wang, X., Yang, X.: Cross-scene crowd counting via deep convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 833–841 (2015)

  26. Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 589–597 (2016)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shohei Kumagai.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumagai, S., Hotta, K. & Kurita, T. Mixture of counting CNNs. Machine Vision and Applications 29, 1119–1126 (2018). https://doi.org/10.1007/s00138-018-0955-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-018-0955-6

Keywords

Navigation