Abstract
Crowd counting aims to identify the number of objects and plays an important role in intelligent transportation, city management and security monitoring. The task of crowd counting is much challenging because of scale variations, illumination changes, occlusions and poor imaging conditions, especially in the nighttime and haze conditions. In this paper, we present a drone based RGB-Thermal crowd counting dataset (DroneRGBT) that consists of 3600 pairs of images and covers different attributes, including height, illumination and density. To exploit the complementary information in both visible and thermal infrared modalities, we propose a multi-modal crowd counting network (MMCCN) with a multi-scale feature learning module, a modal alignment module and an adaptive fusion module. Experiments on DroneRGBT demonstrate the effectiveness of the proposed approach.
T. Peng and Q. Li–These authors contributed equally to this paper as co-first authors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Where are the blobs: counting by localization with point supervision. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 560–576. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_34
Sam, D.B., Surya, S., Babu, R.V.: Switching convolutional neural network for crowd counting. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4031–4039. IEEE (2017)
Wang, Q., Gao, J., Lin, W., Yuan, Y.: Learning from synthetic data for crowd counting in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8198–8207 (2019)
Ranjan, V., Le, H., Hoai, M.: Iterative crowd counting. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 278–293. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_17
Li, C., Wu, X., Zhao, N., Cao, X., Tang, J.: Fusing two-stream convolutional neural networks for RGB-T object tracking. Neurocomputing 281, 78–85 (2018)
López-Fernández, L., Lagüela, S., Fernández, J., González-Aguilera, D.: Automatic evaluation of photovoltaic power stations from high-density RGB-T 3D point clouds. Remote Sens. 9, 631 (2017)
Zhai, S., Shao, P., Liang, X., Wang, X.: Fast RGB-T tracking via cross-modal correlation filters. Neurocomputing 334, 172–181 (2019)
Zhang, X., Ye, P., Peng, S., Liu, J., Xiao, G.: DSiamMFT: an RGB-T fusion tracking method via dynamic Siamese networks using multi-layer feature fusion. Signal Process. Image Commun. 84, 15756 (2020)
Chan, A.B., Liang, Z.S.J., Vasconcelos, N.: Privacy preserving crowd monitoring: Counting people without people models or tracking. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–7. IEEE (2008)
Chen, K., Loy, C.C., Gong, S., Xiang, T.: Feature mining for localised crowd counting. In: BMVC, vol. 1, p. 3 (2012)
Zhang, C., Kang, K., Li, H., Wang, X., Xie, R., Yang, X.: Data-driven crowd understanding: a baseline for a large-scale crowd dataset. IEEE Trans. Multimedia 18, 1048–1061 (2016)
Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 589–597 (2016)
Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2547–2554 (2013)
Idrees, H., et al.: Composition loss for counting, density map estimation and localization in dense crowds. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 544–559. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_33
Li, Y., Zhang, X., Chen, D.: CSRNET: dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1091–1100 (2018)
Li, C., Wang, G., Ma, Y., Zheng, A., Luo, B., Tang, J.: A unified RGB-T saliency detection benchmark: dataset, baselines, analysis and a novel approach. arXiv preprint arXiv:1701.02829 (2017)
Tu, Z., Xia, T., Li, C., Wang, X., Ma, Y., Tang, J.: RGB-T image saliency detection via collaborative graph learning. IEEE Trans. Multimedia 22, 160–173 (2019)
Li, C., Liang, X., Lu, Y., Zhao, N., Tang, J.: RGB-T object tracking: benchmark and baseline. Pattern Recogn. 96, 106977 (2019)
Li, C., Cheng, H., Hu, S., Liu, X., Tang, J., Lin, L.: Learning collaborative sparse representation for grayscale-thermal tracking. IEEE Trans. Image Process. 25, 5743–5756 (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S.: Least squares generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2794–2802 (2017)
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
Sindagi, V.A., Patel, V.M.: CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6. IEEE (2017)
Zeng, L., Xu, X., Cai, B., Qiu, S., Zhang, T.: Multi-scale convolutional neural networks for crowd counting. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 465–469. IEEE (2017)
Shen, Z., Xu, Y., Ni, B., Wang, M., Hu, J., Yang, X.: Crowd counting via adversarial cross-scale consistency pursuit. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5245–5254 (2018)
Cao, X., Wang, Z., Zhao, Y., Su, F.: Scale aggregation network for accurate and efficient crowd counting. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 757–773. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_45
Huang, S., Li, X., Cheng, Z.Q., Zhang, Z., Hauptmann, A.: Stacked pooling: improving crowd counting by boosting scale invariance. arXiv preprint arXiv:1808.07456 (2018)
Zou, Z., Su, X., Qu, X., Zhou, P.: DA-NET: learning the fine-grained density distribution with deformation aggregation network. IEEE Access 6, 60745–60756 (2018)
Gao, J., Wang, Q., Yuan, Y.: Scar: spatial-/channel-wise attention regression networks for crowd counting. Neurocomputing 363, 1–8 (2019)
Liu, W., Salzmann, M., Fua, P.: Context-aware crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5099–5108 (2019)
Ma, Z., Wei, X., Hong, X., Gong, Y.: Bayesian loss for crowd count estimation with point supervision. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6142–6151 (2019)
Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 1, pp. 539–546. IEEE (2005)
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China under Grant 61876127 and Grant 61732011, Natural Science Foundation of Tianjin under Grant 17JCZDJC30800 and The Applied Basic Research Program of Qinghai under Grants 2019-ZJ-7017.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Peng, T., Li, Q., Zhu, P. (2021). RGB-T Crowd Counting from Drone: A Benchmark and MMCCN Network. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12627. Springer, Cham. https://doi.org/10.1007/978-3-030-69544-6_30
Download citation
DOI: https://doi.org/10.1007/978-3-030-69544-6_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69543-9
Online ISBN: 978-3-030-69544-6
eBook Packages: Computer ScienceComputer Science (R0)