Abstract
This paper proposes a crowd counting method. Crowd counting is difficult because of significant appearance changes of a target which caused by density and scale changes. Conventional crowd counting methods commonly utilize one predictor (e.g., regression and multi-class classifier). However, such only one predictor can not count targets with significant appearance changes well. In this paper, we propose to predict the number of targets using multiple convolutional neural networks (CNNs) specialized to a specific appearance, and those CNNs are adaptively selected according to the appearance of a test image. By integrating the selected CNNs, the proposed method has the robustness to large appearance changes. In experiments, we confirm that the proposed method can count crowd with lower counting error than VGGNet, integration of CNNs with fixed weights and conventional counting methods. Moreover, we confirm that each CNN automatically specialized to a specific appearance (e.g., dense region and sparse region) of crowd through training of CNNs.
Similar content being viewed by others
References
An, S., Liu, W., Venkatesh, S.: Face recognition using kernel ridge regression. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–7 (2007)
Arteta, C., Lempitsky, V., Noble, J.A., Zisserman, A.: Interactive object counting. In: European Conference on Computer Vision, pp. 504–518 (2014)
Chan, A.B., Liang, Z.S.J., Vasconcelos, N.: Privacy preserving crowd monitoring: counting people without people models or tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–7 (2008)
Chen, K., Loy, C.C., Gong, S., Xiang, T.: Feature mining for localised crowd counting. In: British Machine Vision Conference, pp. 21.1—21.11 (2012)
Chen, K., Gong, S., Xiang, T., Loy, C.: Cumulative attribute space for age and crowd density estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2467–2474 (2013)
Clevert, D., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). CoRR (2015). arXiv preprint arxiv:1511.07289
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors (2012). arXiv preprint arXiv:1207.0580
Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2547–2554 (2013)
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift (2015). arXiv preprint arXiv:1502.03167
Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Comput. 3(1), 79–87 (1991)
Kembhavi, A., Harwood, D., Davis, L.S.: Vehicle detection using partial least squares. IEEE Trans. Pattern Anal. Mach. Intell. 33(6), 1250–1265 (2011)
Kingma, D.P., Adam, J.B.: A method for stochastic optimization. In: International Conference on Learning Representation (2015)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105 (2012)
Lempitsky, V., Zisserman, A.: Learning to count objects in images. In: Advances in Neural Information Processing Systems, pp. 1324–1332 (2010)
Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3), 18–22 (2002)
Loy, C., Gong, S., Xiang, T.: From semi-supervised to transfer counting of crowds. In: IEEE International Conference on Computer Vision, pp. 2256–2263 (2013)
Onoro-Rubio, D., López-Sastre, R.J.: Towards perspective-free object counting with deep learning. In: European Conference on Computer Vision, Springer, pp. 615–629 (2016)
Pham, V.Q., Kozakaya, T., Yamaguchi, O., Okada, R.: Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3253–3261 (2015)
Van Gestel, T., Suykens, J., De Moor, B., Vandewalle, J.: Automatic relevance determination for least squares support vector machine classifiers. In: European Symposium on Artificial Neural Networks, pp. 13–18 (2001)
Walach, E., Wolf, L.: Learning to count with cnn boosting. In: European Conference on Computer Vision, pp. 660–676. Springer (2016)
Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., Zhang, Z.: The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 842–850 (2015)
Xu, Z., Yang, Y., Hauptmann, A.G.: A discriminative cnn video representation for event detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1798–1807 (2015)
Yan, C., Xie, H., Yang, D., Yin, J., Zhang, Y., Dai, Q.: Supervised hash coding with deep neural network for environment perception of intelligent vehicles. IEEE Trans. Intell. Transp. Syst. 19(1), 284–295 (2018)
Zhang, C., Li, H., Wang, X., Yang, X.: Cross-scene crowd counting via deep convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 833–841 (2015)
Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 589–597 (2016)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kumagai, S., Hotta, K. & Kurita, T. Mixture of counting CNNs. Machine Vision and Applications 29, 1119–1126 (2018). https://doi.org/10.1007/s00138-018-0955-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00138-018-0955-6