Skip to main content

Towards Human Keypoint Detection inĀ Infrared Images

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2022)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1792))

Included in the following conference series:

  • 607 Accesses

Abstract

Human keypoint detection is not applicable in low-light and nighttime conditions. In this work, we innovatively use infrared images for multi-person keypoint detection, which makes some computer vision tasks, such as action recognition and behavior analysis, applicable in complex illumination environments. By fully considering the physical characteristics of infrared imaging, we design a top-down solution that first uses a single-stage target detection network, YOLO, to predict the bounding box of the human body, then feed the detected human body into a following human keypoint detection network. We chose SimpleBaseline, well-known in human keypoint detection using visible images, as the base network. Since the infrared image is blur imaging and low resolution, we use targeted feature fusion, channel attention, and spatial attention to capture the feature of the infrared image. In addition, we use depth-separable convolution to reduce the number of parameters in the network. In the literature, there is no benchmark infrared image dataset for multi-person keypoint detection. We construct an infrared image dataset containing 1500 annotated images carefully selected from several public infrared pedestrian datasets. Compared with the SimpleBaseline, extensive experimental results show that our method achieves nearly the same performance on the visible COCO dataset, but has about 8% higher AP on the self-built infrared dataset.

This work was partially supported by Sichuan Science and Technology Program (No. 2022YFG0321, 2022NSFSC0916).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Baradel, F., Neverova, N., Wolf, C., Mille, J., Mori, G.: Object level visual reasoning in videos. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 105ā€“121 (2018)

    Google ScholarĀ 

  2. Mazhar, O., Ramdani, S., Navarro, B., Passama, R., Cherubini, A.: Towards real-time physical human-robot interaction using skeleton information and hand gestures. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1ā€“6. IEEE (2018)

    Google ScholarĀ 

  3. Hattori, H., Lee, N., Boddeti, V.N., Beainy, F., Kitani, K.M., Kanade, T.: Synthesizing a scene-specific pedestrian detector and pose estimator for static video surveillance. Int. J. Comput. Vision 126(9), 1027ā€“1044 (2018)

    ArticleĀ  Google ScholarĀ 

  4. Zhang, J., Chen, Z., Tao, D.: Towards high performance human keypoint detection. Int. J. Comput. Vision 129(9), 2639ā€“2662 (2021)

    ArticleĀ  Google ScholarĀ 

  5. Liu, T., Wang, J., Yang, B., Wang, X.: Ngdnet: nonuniform gaussian-label distribution learning for infrared head pose estimation and on-task behavior understanding in the classroom. Neurocomputing 436, 210ā€“220 (2021)

    ArticleĀ  Google ScholarĀ 

  6. Kwak, J.-Y., Ko, B.C., Nam, J.Y.: Pedestrian tracking using online boosted random ferns learning in far-infrared imagery for safe driving at night. IEEE Trans. Intell. Transp. Syst. 18(1), 69ā€“81 (2016)

    ArticleĀ  Google ScholarĀ 

  7. Akula, A., Shah, A.K., Ghosh, R.: Deep learning approach for human action recognition in infrared images. Cogn. Syst. Res. 50, 146ā€“154 (2018)

    ArticleĀ  Google ScholarĀ 

  8. Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 466ā€“481 (2018)

    Google ScholarĀ 

  9. Toshev, A., Szegedy, C., Deeppose: human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2014)

    Google ScholarĀ 

  10. Pfister, T., Charles, J., Zisserman, A.: Flowing convnets for human pose estimation in videos. In: Proceedings of the IEEE International Conference on Computer Vision, ICCV (2015)

    Google ScholarĀ 

  11. Wei, S.-E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2016)

    Google ScholarĀ 

  12. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483ā€“499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29

    ChapterĀ  Google ScholarĀ 

  13. Cao, Z., Simon, T., Wei, S.-E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2017)

    Google ScholarĀ 

  14. Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2018)

    Google ScholarĀ 

  15. Li, W.: Rethinking on multi-stage networks for human pose estimation (2019). 10.48550/ARXIV.1901.00148, arxiv.org/abs/1901.00148

  16. Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR (2019)

    Google ScholarĀ 

  17. Kreiss, S., Bertoni, L., Alahi, A.: Pifpaf: composite fields for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR (2019)

    Google ScholarĀ 

  18. Cheng, B., Xiao, B., Wang, J., Shi, H., Huang, T.S., Zhang, L.: Bottomup higher-resolution networks for multi-person pose estimation. arXiv preprint arXiv:1908.10357 (2019)

  19. Govardhan, P., Pati, U.C.: Nir image based pedestrian detection in night vision with cascade classification and validation. In: 2014 IEEE International Conference on Advanced Communications, Control and Computing Technologies, pp. 1435ā€“1438. IEEE (2014)

    Google ScholarĀ 

  20. Jeong, M., Ko, B.C., Nam, J.-Y.: Early detection of sudden pedestrian crossing for safe driving during summer nights. IEEE Trans. Circuits Syst. Video Technol. 27(6), 1368ā€“1380 (2016)

    ArticleĀ  Google ScholarĀ 

  21. Heo, D., Lee, E., Ko, B.C.: Pedestrian detection at night using deep neural networks and saliency maps. Electron. Imaging 2018(17), 60403-1 (2018)

    Google ScholarĀ 

  22. Herrmann, C., Ruf, M., Beyerer, J.: Cnn-based thermal infrared person detection by domain adaptation. In: Autonomous Systems: Sensors, Vehicles, Security, and the Internet of Everything, vol. 10643. International Society for Optics and Photonics, p. 1064308 (2018)

    Google ScholarĀ 

  23. Cao, Z., Yang, H., Zhao, J., Pan, X., Zhang, L., Liu, Z.: A new region proposal network for far-infrared pedestrian detection. IEEE Access 7, 135023ā€“135030 (2019)

    ArticleĀ  Google ScholarĀ 

  24. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  25. Lin, T.-Y., Dollā€™ar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117ā€“2125 (2017)

    Google ScholarĀ 

  26. Zang, Y., Fan, C., Zheng, Z., Yang, D.: Pose estimation at night in infrared images using a lightweight multi-stage attention network. SIViP 15(8), 1757ā€“1765 (2021). https://doi.org/10.1007/s11760-021-01916-3

    ArticleĀ  Google ScholarĀ 

  27. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, realtime object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779ā€“788 (2016)

    Google ScholarĀ 

  28. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770ā€“778 (2016)

    Google ScholarĀ 

  29. Veit, A., Wilber, M.J., Belongie, S.: Residual networks behave like ensembles of relatively shallow networks. In: Advances in Neural Information Processing Systems, vol. 29 (2016)

    Google ScholarĀ 

  30. Zhao, L., Wang, J., Li, X., Tu, Z., Zeng, W.: On the connection of deep fusion to ensembling. arXiv preprint arXiv:1611.07718 (2016)

  31. Wu, Z., Shen, C., Van Den Hengel, A.: Wider or deeper: revisiting the resnet model for visual recognition. Pattern Recogn. 90, 119ā€“133 (2019)

    ArticleĀ  Google ScholarĀ 

  32. Szegedy, C.: Rabinovich, going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1ā€“9 (2015)

    Google ScholarĀ 

  33. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132ā€“7141 (2018)

    Google ScholarĀ 

  34. Andrew, G., Menglong, Z., et al.: Efficient convolutional neural networks for mobile vision applications (2017)

    Google ScholarĀ 

  35. Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251ā€“1258 (2017)

    Google ScholarĀ 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wanli Dong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhu, Z., Dong, W., Gao, X., Peng, A., Luo, Y. (2023). Towards Human Keypoint Detection inĀ Infrared Images. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Communications in Computer and Information Science, vol 1792. Springer, Singapore. https://doi.org/10.1007/978-981-99-1642-9_45

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-1642-9_45

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-1641-2

  • Online ISBN: 978-981-99-1642-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics