Abstract
Multi-person pose estimation is a fundamental yet challenging research topic for many computer vision applications. In this paper, to relieve the problem of variable pose structure and occlusion in complex scenes, we propose a novel Pose Knowledge Transfer approach for multi-person pose estimation, which doesn’t take into account deeper and wider network structure design. This approach contains a Keypoint Region Erasing (KRE) scheme and a Bi-directional Pose Knowledge Transfer (BPKT) model learning strategy for this task. The KRE encourages human pose estimator to explicitly focus learning on keypoints connectivity to robustly localize the occluded keypoint via its adjacent visible body patches. Specifically, without additional model parameters involvement, the BPKT effectively transfers the local connectivity knowledge and the global body configuration knowledge between two network with the same structure to encounter variable pose structure. Extensive experiments demonstrate that the BPKT and the KRE significantly improve the performance of a range of state-of-the-art human pose estimation models, consistently validating the effectiveness and generalization property of our model-agnostic approach on the MPII human pose dataset and the COCO keypoint benchmark.




Similar content being viewed by others
References
Miki, D., Abe, S., Chen, S., Demachi, K.: Robust human pose estimation from distorted wide-angle images through iterative search of transformation parameters. Signal Image Video Process. 14, 693–700 (2020)
Zhang, F., Zhu, X., Ye, M.: Fast human pose estimation. In: CVPR, pp. 3517–3526. IEEE (2019)
Fabbri, M., Lanzi, F., Calderara, S., Alletto, S., Cucchiara, R.: Compressed volumetric heatmaps for multi-person 3D pose estimation. In: CVPR, pp. 7204–7213. IEEE (2020)
Ning, G., Pei, J., Huang, H.: LightTrack: A generic framework for online top-down human pose tracking. In: CVPR Workshops, pp. 1034–1035. IEEE (2020)
Shi, L., Zhang, Y., Cheng, J., Lu, H.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: CVPR, pp. 12026-12035. IEEE (2019)
Xu, J., Zhao, R., Zhu, F., Wang, H., Ouyang, W.: Attention-aware compositional network for person re-identification. In: CVPR, pp. 2119-2128. IEEE (2018)
Cai, Y., Wang, Z., Luo, Z., Yin, B., Du, A., Wang, H., Zhang, X., Zhou, X., Zhou, E., Sun, J.: Learning delicate local representations for multi-person pose estimation (2020). arXiv:2003.04030
Ronchi, M. R., Perona, P.: Benchmarking and Error Diagnosis in Multi-Instance Pose Estimation. In: ICCV, pp. 369-378. IEEE (2017)
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: ECCV, pp. 740-755. Springer (2014)
Andriluka, M., Pishchulin, L., Gehler, P. V., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: CVPR, pp. 3686–3693 IEEE (2014)
He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: ICCV, pp. 2961–2969. IEEE (2017)
Papandreou, G., Zhu, T., Kanazawa, N., Toshev, A., Tompson, J., Bregler, C., Murphy, K.: Towards accurate multi-person pose estimation in the wild. In: CVPR, pp. 4903–4911. IEEE (2017)
Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. In: CVPR, pp. 7103–7112. IEEE (2018)
Lin, T.-Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR, pp. 2117–2125. IEEE (2017)
Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: ECCV, pp. 466–481. Springer (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778. IEEE (2016)
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep High-Resolution Representation Learning for Human Pose Estimation. In: CVPR, pp. 5693-5703. IEEE (2019)
Li, B., Liu, K., Ji, Y., Yang, J., Liu, C.: Selective Complementary Features for Multi-Person Pose Estimation. In: ICIP, pp. 623-627. IEEE (2020)
Bucilua, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 535–541. ACM (2006)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network (2015). arXiv:1503.02531
Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: hints for thin deep nets (2014). arXiv:1412.6550
Zhang, Y., Xiang, T., Hospedales, T.-M., Lu, H.: Deep mutual learning. In: CVPR, pp. 4320–4328. IEEE (2018)
Meng, F., Cheng, H., Li, K., Xu, Z., Ji, R., Sun, X., Lu, G.: Filter grafting for deep neural networks (2020). arXiv:2001.05868
Huang, J., Zhu, Z., Guo, F., Huang, G.: The devil is in the details: delving into unbiased data processing for human pose estimation. In: CVPR, pp. 5700–5709 IEEE (2020)
Sun, X., Xiao, B., Wei, F., Liang, S., Wei, Y.: Integral human pose regression. In: ECCV, pp. 529–545. Springer (2018)
Fang, H.-S., Xie, S., Tai, Y.-W., Lu, C.: RMPE: regional multi-person pose estimation. In: ICCV, pp. 2334–2343. IEEE (2017)
Yang, W., Li, S., Ouyang, W., Li, H., Wang, X.: Learning feature pyramids for human pose estimation. In: ICCV, pp. 1281–1290. IEEE (2017)
Huang, S., Gong, M., Tao, D.: A coarse-fine network for keypoint localization. In: ICCV, pp. 3028–3037. IEEE (2017)
Moon, G., Chang, J. Y., Lee, K. M.: PoseFix: model-agnostic general human pose refinement network. In: CVPR, pp. 7773–7781. IEEE (2019)
Su, K., Yu, D., Xu, Z., Geng, X., Wang, C.: Multi-person pose estimation with enhanced channel-wise and spatial information. In: CVPR, pp. 5674–5682. IEEE (2019)
Yu, D., Su, K., Geng, X., Wang, C.: A Context-and-spatial aware network for multi-person pose estimation (2019). arXiv:1905.05355
Zhang, F., Zhu, X., Dai, H., Ye, M., Zhu, C.: Distribution-Aware Coordinate Representation for Human Pose Estimation. In: CVPR, pp. 7093-7102 IEEE (2020)
MS-COCO.: Coco Keypoint leaderboard. http://cocodataset.org
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., Devito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in PyTorch. In: NIPS (2017)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. In: IJCV, pp. 115(3):211–252 (2015)
Kingma, D. P., Ba, J.: Adam: A method for stochastic optimization (2014). arXiv:1412.6980
Acknowledgements
This research was supported by National Natural Science Foundation of China (61972059, 61773272), The Natural Science Foundation of the Jiangsu Higher Education Institutions of China (19KJA230001), and The Priority Academic Program Development of Jiangsu Higher Education Institutions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, B., Ji, Y., Li, Y. et al. Pose Knowledge Transfer for multi-person pose estimation. SIViP 16, 321–328 (2022). https://doi.org/10.1007/s11760-021-01922-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-021-01922-5