Mixture of counting CNNs

Kumagai, Shohei; Hotta, Kazuhiro; Kurita, Takio

doi:10.1007/s00138-018-0955-6

Mixture of counting CNNs

Original Paper
Published: 02 July 2018

Volume 29, pages 1119–1126, (2018)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

583 Accesses
34 Citations
Explore all metrics

Abstract

This paper proposes a crowd counting method. Crowd counting is difficult because of significant appearance changes of a target which caused by density and scale changes. Conventional crowd counting methods commonly utilize one predictor (e.g., regression and multi-class classifier). However, such only one predictor can not count targets with significant appearance changes well. In this paper, we propose to predict the number of targets using multiple convolutional neural networks (CNNs) specialized to a specific appearance, and those CNNs are adaptively selected according to the appearance of a test image. By integrating the selected CNNs, the proposed method has the robustness to large appearance changes. In experiments, we confirm that the proposed method can count crowd with lower counting error than VGGNet, integration of CNNs with fixed weights and conventional counting methods. Moreover, we confirm that each CNN automatically specialized to a specific appearance (e.g., dense region and sparse region) of crowd through training of CNNs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

An, S., Liu, W., Venkatesh, S.: Face recognition using kernel ridge regression. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–7 (2007)
Arteta, C., Lempitsky, V., Noble, J.A., Zisserman, A.: Interactive object counting. In: European Conference on Computer Vision, pp. 504–518 (2014)
Google Scholar
Chan, A.B., Liang, Z.S.J., Vasconcelos, N.: Privacy preserving crowd monitoring: counting people without people models or tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–7 (2008)
Chen, K., Loy, C.C., Gong, S., Xiang, T.: Feature mining for localised crowd counting. In: British Machine Vision Conference, pp. 21.1—21.11 (2012)
Chen, K., Gong, S., Xiang, T., Loy, C.: Cumulative attribute space for age and crowd density estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2467–2474 (2013)
Clevert, D., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). CoRR (2015). arXiv preprint arxiv:1511.07289
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors (2012). arXiv preprint arXiv:1207.0580
Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2547–2554 (2013)
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift (2015). arXiv preprint arXiv:1502.03167
Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Comput. 3(1), 79–87 (1991)
Article Google Scholar
Kembhavi, A., Harwood, D., Davis, L.S.: Vehicle detection using partial least squares. IEEE Trans. Pattern Anal. Mach. Intell. 33(6), 1250–1265 (2011)
Article Google Scholar
Kingma, D.P., Adam, J.B.: A method for stochastic optimization. In: International Conference on Learning Representation (2015)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105 (2012)
Lempitsky, V., Zisserman, A.: Learning to count objects in images. In: Advances in Neural Information Processing Systems, pp. 1324–1332 (2010)
Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3), 18–22 (2002)
Google Scholar
Loy, C., Gong, S., Xiang, T.: From semi-supervised to transfer counting of crowds. In: IEEE International Conference on Computer Vision, pp. 2256–2263 (2013)
Onoro-Rubio, D., López-Sastre, R.J.: Towards perspective-free object counting with deep learning. In: European Conference on Computer Vision, Springer, pp. 615–629 (2016)
Pham, V.Q., Kozakaya, T., Yamaguchi, O., Okada, R.: Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3253–3261 (2015)
Van Gestel, T., Suykens, J., De Moor, B., Vandewalle, J.: Automatic relevance determination for least squares support vector machine classifiers. In: European Symposium on Artificial Neural Networks, pp. 13–18 (2001)
Walach, E., Wolf, L.: Learning to count with cnn boosting. In: European Conference on Computer Vision, pp. 660–676. Springer (2016)
Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., Zhang, Z.: The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 842–850 (2015)
Xu, Z., Yang, Y., Hauptmann, A.G.: A discriminative cnn video representation for event detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1798–1807 (2015)
Yan, C., Xie, H., Yang, D., Yin, J., Zhang, Y., Dai, Q.: Supervised hash coding with deep neural network for environment perception of intelligent vehicles. IEEE Trans. Intell. Transp. Syst. 19(1), 284–295 (2018)
Article Google Scholar
Zhang, C., Li, H., Wang, X., Yang, X.: Cross-scene crowd counting via deep convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 833–841 (2015)
Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 589–597 (2016)

Download references

Author information

Authors and Affiliations

Meijo University, 1-501, Shiogama-guchi, Tempaku-ku, Nagoya-shi, 468-8502, Japan
Shohei Kumagai & Kazuhiro Hotta
Hiroshima University, 1-7-1, Kagamiyama, Higashi-Hirosima-shi, Hiroshima-ken, 739-8521, Japan
Takio Kurita

Authors

Shohei Kumagai
View author publications
You can also search for this author in PubMed Google Scholar
Kazuhiro Hotta
View author publications
You can also search for this author in PubMed Google Scholar
Takio Kurita
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shohei Kumagai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kumagai, S., Hotta, K. & Kurita, T. Mixture of counting CNNs. Machine Vision and Applications 29, 1119–1126 (2018). https://doi.org/10.1007/s00138-018-0955-6

Download citation

Received: 17 November 2017
Revised: 02 May 2018
Accepted: 11 June 2018
Published: 02 July 2018
Issue Date: October 2018
DOI: https://doi.org/10.1007/s00138-018-0955-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mixture of counting CNNs

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation