Abstract
Human keypoint detection is not applicable in low-light and nighttime conditions. In this work, we innovatively use infrared images for multi-person keypoint detection, which makes some computer vision tasks, such as action recognition and behavior analysis, applicable in complex illumination environments. By fully considering the physical characteristics of infrared imaging, we design a top-down solution that first uses a single-stage target detection network, YOLO, to predict the bounding box of the human body, then feed the detected human body into a following human keypoint detection network. We chose SimpleBaseline, well-known in human keypoint detection using visible images, as the base network. Since the infrared image is blur imaging and low resolution, we use targeted feature fusion, channel attention, and spatial attention to capture the feature of the infrared image. In addition, we use depth-separable convolution to reduce the number of parameters in the network. In the literature, there is no benchmark infrared image dataset for multi-person keypoint detection. We construct an infrared image dataset containing 1500 annotated images carefully selected from several public infrared pedestrian datasets. Compared with the SimpleBaseline, extensive experimental results show that our method achieves nearly the same performance on the visible COCO dataset, but has about 8% higher AP on the self-built infrared dataset.
This work was partially supported by Sichuan Science and Technology Program (No. 2022YFG0321, 2022NSFSC0916).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Baradel, F., Neverova, N., Wolf, C., Mille, J., Mori, G.: Object level visual reasoning in videos. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 105ā121 (2018)
Mazhar, O., Ramdani, S., Navarro, B., Passama, R., Cherubini, A.: Towards real-time physical human-robot interaction using skeleton information and hand gestures. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1ā6. IEEE (2018)
Hattori, H., Lee, N., Boddeti, V.N., Beainy, F., Kitani, K.M., Kanade, T.: Synthesizing a scene-specific pedestrian detector and pose estimator for static video surveillance. Int. J. Comput. Vision 126(9), 1027ā1044 (2018)
Zhang, J., Chen, Z., Tao, D.: Towards high performance human keypoint detection. Int. J. Comput. Vision 129(9), 2639ā2662 (2021)
Liu, T., Wang, J., Yang, B., Wang, X.: Ngdnet: nonuniform gaussian-label distribution learning for infrared head pose estimation and on-task behavior understanding in the classroom. Neurocomputing 436, 210ā220 (2021)
Kwak, J.-Y., Ko, B.C., Nam, J.Y.: Pedestrian tracking using online boosted random ferns learning in far-infrared imagery for safe driving at night. IEEE Trans. Intell. Transp. Syst. 18(1), 69ā81 (2016)
Akula, A., Shah, A.K., Ghosh, R.: Deep learning approach for human action recognition in infrared images. Cogn. Syst. Res. 50, 146ā154 (2018)
Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 466ā481 (2018)
Toshev, A., Szegedy, C., Deeppose: human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2014)
Pfister, T., Charles, J., Zisserman, A.: Flowing convnets for human pose estimation in videos. In: Proceedings of the IEEE International Conference on Computer Vision, ICCV (2015)
Wei, S.-E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2016)
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483ā499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Cao, Z., Simon, T., Wei, S.-E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2017)
Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2018)
Li, W.: Rethinking on multi-stage networks for human pose estimation (2019). 10.48550/ARXIV.1901.00148, arxiv.org/abs/1901.00148
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR (2019)
Kreiss, S., Bertoni, L., Alahi, A.: Pifpaf: composite fields for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR (2019)
Cheng, B., Xiao, B., Wang, J., Shi, H., Huang, T.S., Zhang, L.: Bottomup higher-resolution networks for multi-person pose estimation. arXiv preprint arXiv:1908.10357 (2019)
Govardhan, P., Pati, U.C.: Nir image based pedestrian detection in night vision with cascade classification and validation. In: 2014 IEEE International Conference on Advanced Communications, Control and Computing Technologies, pp. 1435ā1438. IEEE (2014)
Jeong, M., Ko, B.C., Nam, J.-Y.: Early detection of sudden pedestrian crossing for safe driving during summer nights. IEEE Trans. Circuits Syst. Video Technol. 27(6), 1368ā1380 (2016)
Heo, D., Lee, E., Ko, B.C.: Pedestrian detection at night using deep neural networks and saliency maps. Electron. Imaging 2018(17), 60403-1 (2018)
Herrmann, C., Ruf, M., Beyerer, J.: Cnn-based thermal infrared person detection by domain adaptation. In: Autonomous Systems: Sensors, Vehicles, Security, and the Internet of Everything, vol. 10643. International Society for Optics and Photonics, p. 1064308 (2018)
Cao, Z., Yang, H., Zhao, J., Pan, X., Zhang, L., Liu, Z.: A new region proposal network for far-infrared pedestrian detection. IEEE Access 7, 135023ā135030 (2019)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Lin, T.-Y., Dollāar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117ā2125 (2017)
Zang, Y., Fan, C., Zheng, Z., Yang, D.: Pose estimation at night in infrared images using a lightweight multi-stage attention network. SIViP 15(8), 1757ā1765 (2021). https://doi.org/10.1007/s11760-021-01916-3
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, realtime object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779ā788 (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770ā778 (2016)
Veit, A., Wilber, M.J., Belongie, S.: Residual networks behave like ensembles of relatively shallow networks. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Zhao, L., Wang, J., Li, X., Tu, Z., Zeng, W.: On the connection of deep fusion to ensembling. arXiv preprint arXiv:1611.07718 (2016)
Wu, Z., Shen, C., Van Den Hengel, A.: Wider or deeper: revisiting the resnet model for visual recognition. Pattern Recogn. 90, 119ā133 (2019)
Szegedy, C.: Rabinovich, going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1ā9 (2015)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132ā7141 (2018)
Andrew, G., Menglong, Z., et al.: Efficient convolutional neural networks for mobile vision applications (2017)
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251ā1258 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhu, Z., Dong, W., Gao, X., Peng, A., Luo, Y. (2023). Towards Human Keypoint Detection inĀ Infrared Images. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Communications in Computer and Information Science, vol 1792. Springer, Singapore. https://doi.org/10.1007/978-981-99-1642-9_45
Download citation
DOI: https://doi.org/10.1007/978-981-99-1642-9_45
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-1641-2
Online ISBN: 978-981-99-1642-9
eBook Packages: Computer ScienceComputer Science (R0)