Towards Human Keypoint Detection in Infrared Images

Zhu, Zhilei; Dong, Wanli; Gao, Xiaoming; Peng, Anjie; Luo, Yuqin

doi:10.1007/978-981-99-1642-9_45

Zhilei Zhu¹⁰,
Wanli Dong¹⁰,
Xiaoming Gao¹⁰,
Anjie Peng¹⁰ &
…
Yuqin Luo¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1792))

Included in the following conference series:

International Conference on Neural Information Processing

607 Accesses

Abstract

Human keypoint detection is not applicable in low-light and nighttime conditions. In this work, we innovatively use infrared images for multi-person keypoint detection, which makes some computer vision tasks, such as action recognition and behavior analysis, applicable in complex illumination environments. By fully considering the physical characteristics of infrared imaging, we design a top-down solution that first uses a single-stage target detection network, YOLO, to predict the bounding box of the human body, then feed the detected human body into a following human keypoint detection network. We chose SimpleBaseline, well-known in human keypoint detection using visible images, as the base network. Since the infrared image is blur imaging and low resolution, we use targeted feature fusion, channel attention, and spatial attention to capture the feature of the infrared image. In addition, we use depth-separable convolution to reduce the number of parameters in the network. In the literature, there is no benchmark infrared image dataset for multi-person keypoint detection. We construct an infrared image dataset containing 1500 annotated images carefully selected from several public infrared pedestrian datasets. Compared with the SimpleBaseline, extensive experimental results show that our method achieves nearly the same performance on the visible COCO dataset, but has about 8% higher AP on the self-built infrared dataset.

This work was partially supported by Sichuan Science and Technology Program (No. 2022YFG0321, 2022NSFSC0916).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Baradel, F., Neverova, N., Wolf, C., Mille, J., Mori, G.: Object level visual reasoning in videos. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 105–121 (2018)
Google Scholar
Mazhar, O., Ramdani, S., Navarro, B., Passama, R., Cherubini, A.: Towards real-time physical human-robot interaction using skeleton information and hand gestures. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1–6. IEEE (2018)
Google Scholar
Hattori, H., Lee, N., Boddeti, V.N., Beainy, F., Kitani, K.M., Kanade, T.: Synthesizing a scene-specific pedestrian detector and pose estimator for static video surveillance. Int. J. Comput. Vision 126(9), 1027–1044 (2018)
Article Google Scholar
Zhang, J., Chen, Z., Tao, D.: Towards high performance human keypoint detection. Int. J. Comput. Vision 129(9), 2639–2662 (2021)
Article Google Scholar
Liu, T., Wang, J., Yang, B., Wang, X.: Ngdnet: nonuniform gaussian-label distribution learning for infrared head pose estimation and on-task behavior understanding in the classroom. Neurocomputing 436, 210–220 (2021)
Article Google Scholar
Kwak, J.-Y., Ko, B.C., Nam, J.Y.: Pedestrian tracking using online boosted random ferns learning in far-infrared imagery for safe driving at night. IEEE Trans. Intell. Transp. Syst. 18(1), 69–81 (2016)
Article Google Scholar
Akula, A., Shah, A.K., Ghosh, R.: Deep learning approach for human action recognition in infrared images. Cogn. Syst. Res. 50, 146–154 (2018)
Article Google Scholar
Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 466–481 (2018)
Google Scholar
Toshev, A., Szegedy, C., Deeppose: human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2014)
Google Scholar
Pfister, T., Charles, J., Zisserman, A.: Flowing convnets for human pose estimation in videos. In: Proceedings of the IEEE International Conference on Computer Vision, ICCV (2015)
Google Scholar
Wei, S.-E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2016)
Google Scholar
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Chapter Google Scholar
Cao, Z., Simon, T., Wei, S.-E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2017)
Google Scholar
Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2018)
Google Scholar
Li, W.: Rethinking on multi-stage networks for human pose estimation (2019). 10.48550/ARXIV.1901.00148, arxiv.org/abs/1901.00148
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR (2019)
Google Scholar
Kreiss, S., Bertoni, L., Alahi, A.: Pifpaf: composite fields for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR (2019)
Google Scholar
Cheng, B., Xiao, B., Wang, J., Shi, H., Huang, T.S., Zhang, L.: Bottomup higher-resolution networks for multi-person pose estimation. arXiv preprint arXiv:1908.10357 (2019)
Govardhan, P., Pati, U.C.: Nir image based pedestrian detection in night vision with cascade classification and validation. In: 2014 IEEE International Conference on Advanced Communications, Control and Computing Technologies, pp. 1435–1438. IEEE (2014)
Google Scholar
Jeong, M., Ko, B.C., Nam, J.-Y.: Early detection of sudden pedestrian crossing for safe driving during summer nights. IEEE Trans. Circuits Syst. Video Technol. 27(6), 1368–1380 (2016)
Article Google Scholar
Heo, D., Lee, E., Ko, B.C.: Pedestrian detection at night using deep neural networks and saliency maps. Electron. Imaging 2018(17), 60403-1 (2018)
Google Scholar
Herrmann, C., Ruf, M., Beyerer, J.: Cnn-based thermal infrared person detection by domain adaptation. In: Autonomous Systems: Sensors, Vehicles, Security, and the Internet of Everything, vol. 10643. International Society for Optics and Photonics, p. 1064308 (2018)
Google Scholar
Cao, Z., Yang, H., Zhao, J., Pan, X., Zhang, L., Liu, Z.: A new region proposal network for far-infrared pedestrian detection. IEEE Access 7, 135023–135030 (2019)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Lin, T.-Y., Doll’ar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Google Scholar
Zang, Y., Fan, C., Zheng, Z., Yang, D.: Pose estimation at night in infrared images using a lightweight multi-stage attention network. SIViP 15(8), 1757–1765 (2021). https://doi.org/10.1007/s11760-021-01916-3
Article Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, realtime object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Veit, A., Wilber, M.J., Belongie, S.: Residual networks behave like ensembles of relatively shallow networks. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Google Scholar
Zhao, L., Wang, J., Li, X., Tu, Z., Zeng, W.: On the connection of deep fusion to ensembling. arXiv preprint arXiv:1611.07718 (2016)
Wu, Z., Shen, C., Van Den Hengel, A.: Wider or deeper: revisiting the resnet model for visual recognition. Pattern Recogn. 90, 119–133 (2019)
Article Google Scholar
Szegedy, C.: Rabinovich, going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Andrew, G., Menglong, Z., et al.: Efficient convolutional neural networks for mobile vision applications (2017)
Google Scholar
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Southwest University of Science and Technology, Mianyang, China
Zhilei Zhu, Wanli Dong, Xiaoming Gao, Anjie Peng & Yuqin Luo

Authors

Zhilei Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Wanli Dong
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoming Gao
View author publications
You can also search for this author in PubMed Google Scholar
Anjie Peng
View author publications
You can also search for this author in PubMed Google Scholar
Yuqin Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wanli Dong .

Editor information

Editors and Affiliations

Indian Institute of Technology Indore, Indore, India
Mohammad Tanveer
Indian Institute of Information Technology - Allahabad, Prayagraj, India
Sonali Agarwal
Kobe University, Kobe, Japan
Seiichi Ozawa
Indian Institute of Technology Patna, Patna, India
Asif Ekbal
University of Innsbruck, Innsbruck, Austria
Adam Jatowt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, Z., Dong, W., Gao, X., Peng, A., Luo, Y. (2023). Towards Human Keypoint Detection in Infrared Images. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Communications in Computer and Information Science, vol 1792. Springer, Singapore. https://doi.org/10.1007/978-981-99-1642-9_45

Download citation

DOI: https://doi.org/10.1007/978-981-99-1642-9_45
Published: 14 April 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-1641-2
Online ISBN: 978-981-99-1642-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards Human Keypoint Detection in Infrared Images