Skip to main content

RGB-T Crowd Counting from Drone: A Benchmark and MMCCN Network

  • Conference paper
  • First Online:
Computer Vision – ACCV 2020 (ACCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12627))

Included in the following conference series:

Abstract

Crowd counting aims to identify the number of objects and plays an important role in intelligent transportation, city management and security monitoring. The task of crowd counting is much challenging because of scale variations, illumination changes, occlusions and poor imaging conditions, especially in the nighttime and haze conditions. In this paper, we present a drone based RGB-Thermal crowd counting dataset (DroneRGBT) that consists of 3600 pairs of images and covers different attributes, including height, illumination and density. To exploit the complementary information in both visible and thermal infrared modalities, we propose a multi-modal crowd counting network (MMCCN) with a multi-scale feature learning module, a modal alignment module and an adaptive fusion module. Experiments on DroneRGBT demonstrate the effectiveness of the proposed approach.

T. Peng and Q. Li–These authors contributed equally to this paper as co-first authors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Where are the blobs: counting by localization with point supervision. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 560–576. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_34

    Chapter  Google Scholar 

  2. Sam, D.B., Surya, S., Babu, R.V.: Switching convolutional neural network for crowd counting. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4031–4039. IEEE (2017)

    Google Scholar 

  3. Wang, Q., Gao, J., Lin, W., Yuan, Y.: Learning from synthetic data for crowd counting in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8198–8207 (2019)

    Google Scholar 

  4. Ranjan, V., Le, H., Hoai, M.: Iterative crowd counting. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 278–293. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_17

    Chapter  Google Scholar 

  5. Li, C., Wu, X., Zhao, N., Cao, X., Tang, J.: Fusing two-stream convolutional neural networks for RGB-T object tracking. Neurocomputing 281, 78–85 (2018)

    Article  Google Scholar 

  6. López-Fernández, L., Lagüela, S., Fernández, J., González-Aguilera, D.: Automatic evaluation of photovoltaic power stations from high-density RGB-T 3D point clouds. Remote Sens. 9, 631 (2017)

    Article  Google Scholar 

  7. Zhai, S., Shao, P., Liang, X., Wang, X.: Fast RGB-T tracking via cross-modal correlation filters. Neurocomputing 334, 172–181 (2019)

    Article  Google Scholar 

  8. Zhang, X., Ye, P., Peng, S., Liu, J., Xiao, G.: DSiamMFT: an RGB-T fusion tracking method via dynamic Siamese networks using multi-layer feature fusion. Signal Process. Image Commun. 84, 15756 (2020)

    Google Scholar 

  9. Chan, A.B., Liang, Z.S.J., Vasconcelos, N.: Privacy preserving crowd monitoring: Counting people without people models or tracking. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–7. IEEE (2008)

    Google Scholar 

  10. Chen, K., Loy, C.C., Gong, S., Xiang, T.: Feature mining for localised crowd counting. In: BMVC, vol. 1, p. 3 (2012)

    Google Scholar 

  11. Zhang, C., Kang, K., Li, H., Wang, X., Xie, R., Yang, X.: Data-driven crowd understanding: a baseline for a large-scale crowd dataset. IEEE Trans. Multimedia 18, 1048–1061 (2016)

    Article  Google Scholar 

  12. Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 589–597 (2016)

    Google Scholar 

  13. Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2547–2554 (2013)

    Google Scholar 

  14. Idrees, H., et al.: Composition loss for counting, density map estimation and localization in dense crowds. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 544–559. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_33

    Chapter  Google Scholar 

  15. Li, Y., Zhang, X., Chen, D.: CSRNET: dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1091–1100 (2018)

    Google Scholar 

  16. Li, C., Wang, G., Ma, Y., Zheng, A., Luo, B., Tang, J.: A unified RGB-T saliency detection benchmark: dataset, baselines, analysis and a novel approach. arXiv preprint arXiv:1701.02829 (2017)

  17. Tu, Z., Xia, T., Li, C., Wang, X., Ma, Y., Tang, J.: RGB-T image saliency detection via collaborative graph learning. IEEE Trans. Multimedia 22, 160–173 (2019)

    Article  Google Scholar 

  18. Li, C., Liang, X., Lu, Y., Zhao, N., Tang, J.: RGB-T object tracking: benchmark and baseline. Pattern Recogn. 96, 106977 (2019)

    Article  Google Scholar 

  19. Li, C., Cheng, H., Hu, S., Liu, X., Tang, J., Lin, L.: Learning collaborative sparse representation for grayscale-thermal tracking. IEEE Trans. Image Process. 25, 5743–5756 (2016)

    Article  MathSciNet  Google Scholar 

  20. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  21. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)

  22. Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S.: Least squares generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2794–2802 (2017)

    Google Scholar 

  23. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)

    Google Scholar 

  24. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  25. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)

  26. Sindagi, V.A., Patel, V.M.: CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6. IEEE (2017)

    Google Scholar 

  27. Zeng, L., Xu, X., Cai, B., Qiu, S., Zhang, T.: Multi-scale convolutional neural networks for crowd counting. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 465–469. IEEE (2017)

    Google Scholar 

  28. Shen, Z., Xu, Y., Ni, B., Wang, M., Hu, J., Yang, X.: Crowd counting via adversarial cross-scale consistency pursuit. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5245–5254 (2018)

    Google Scholar 

  29. Cao, X., Wang, Z., Zhao, Y., Su, F.: Scale aggregation network for accurate and efficient crowd counting. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 757–773. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_45

    Chapter  Google Scholar 

  30. Huang, S., Li, X., Cheng, Z.Q., Zhang, Z., Hauptmann, A.: Stacked pooling: improving crowd counting by boosting scale invariance. arXiv preprint arXiv:1808.07456 (2018)

  31. Zou, Z., Su, X., Qu, X., Zhou, P.: DA-NET: learning the fine-grained density distribution with deformation aggregation network. IEEE Access 6, 60745–60756 (2018)

    Article  Google Scholar 

  32. Gao, J., Wang, Q., Yuan, Y.: Scar: spatial-/channel-wise attention regression networks for crowd counting. Neurocomputing 363, 1–8 (2019)

    Article  Google Scholar 

  33. Liu, W., Salzmann, M., Fua, P.: Context-aware crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5099–5108 (2019)

    Google Scholar 

  34. Ma, Z., Wei, X., Hong, X., Gong, Y.: Bayesian loss for crowd count estimation with point supervision. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6142–6151 (2019)

    Google Scholar 

  35. Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 1, pp. 539–546. IEEE (2005)

    Google Scholar 

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant 61876127 and Grant 61732011, Natural Science Foundation of Tianjin under Grant 17JCZDJC30800 and The Applied Basic Research Program of Qinghai under Grants 2019-ZJ-7017.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pengfei Zhu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Peng, T., Li, Q., Zhu, P. (2021). RGB-T Crowd Counting from Drone: A Benchmark and MMCCN Network. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12627. Springer, Cham. https://doi.org/10.1007/978-3-030-69544-6_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-69544-6_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-69543-9

  • Online ISBN: 978-3-030-69544-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics