Abstract
Most of learning targets for multi-person pose estimation are based on the likelihood \(P(Y|X)\). However, if we construct the causal assumption for keypoints, named a Structure Causal Model (SCM) for the causality, \(P(Y|X)\) will introduce the bias via spurious correlations in the SCM. In practice, it appears as that networks may make biased decisions in the dense area of keypoints. Therefore, we propose a novel learning method, named Causal Intervention pose Network (CIposeNet). Causal intervention is a learning method towards solving bias in the SCM of keypoints. Specifically, under the consideration of causal inference, CIposeNet is developed based on the backdoor adjustment and the learning target will change into causal intervention \(P(Y|do(X))\) instead of the likelihood \(P(Y|X)\). The experiments conducted on multi-person datasets show that CIposeNet indeed releases bias in the networks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Moeslund, T.B., Granum, E.: A survey of computer vision-based human motion capture. Comput. Vision Image Underst. 8, 231–268 (2001)
Chi, S., Li, J., Zhang, S., Xing, J., Qi, T: Pose-driven deep convolutional model for person re-identification. In: ICCV (2017)
Wang, C., Wang, Y., Yuille, A.L.: An approach to pose-based action recognition. In: CVPR (2013)
He, K., Gkioxari, G., Doll’ar, P., Girshick, R.: Mask r-cnn. In: ICCV (2017)
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Sun, K., Xiao, B., Liu, D., Wang, J: Deep high-resolution representation learning for human pose estimation. In: CVPR (2019)
Pearl, J., Glymour, M., Jewell, N.P.: Causal Inference in Statistics: A Primer. John Wiley & Sons, Hoboken (2016)
Pearl, J.: Interpretation and identification of causal mediation. Psychol. Methods 19, 459 (2014)
Cao, Z., Simon, T., Wei, S.E., et al.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR (2017)
Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 34–50. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_3
Newell, A., Huang, Z., Deng, J.: Associative embedding: end-to-end learning for joint detection and grouping. arXiv preprint arXiv:1611.05424 (2016)
Pishchulin, L., Insafutdinov, E., Tang, S., et al.: Deepcut: joint subset partition and labeling for multi person pose estimation. In: CVPR (2016)
Kreiss, S., Bertoni, L., Alahi, A.: Pifpaf: composite fields for human pose estimation. In: CVPR (2019)
Cheng, B., Xiao, B., Wang, J., et al.: Higherhrnet: scale-aware representation learning for bottom-up human pose estimation. In: CVPR (2020)
Chen, Y., Wang, Z., Peng, Y., et al.: Cascaded pyramid network for multi-person pose estimation. In: CVPR (2018)
Papandreou, G., Zhu, T., Kanazawa, N., et al.: Towards accurate multi-person pose estimation in the wild. In: CVPR (2017)
Huang, J., Zhu, Z., Guo, F., et al.: The devil is in the details: delving into unbiased data processing for human pose estimation. In: CVPR (2020)
Magliacane, S., van Ommen, T., Claassen, T., et al.: Domain adaptation by using causal inference to predict invariant conditional distributions. arXiv preprint arXiv:1707.06422 (2017)
Bengio, Y., Deleu, T., Rahaman, N., et al.: a meta-transfer objective for learning to disentangle causal mechanisms. arXiv preprint arXiv:1901.10912 (2019)
Chalupka, K., Perona, P., Eberhardt, F.: Visual causal feature learning. arXiv preprint arXiv:1412.2309 (2014)
Lopez-Paz, D., Nishihara, R., Chintala, S., et al.: Discovering causal signals in images. In: CVPR (2017)
Nair, S., Zhu, Y., Savarese, S., et al.: Causal induction from visual observations for goal directed tasks. arXiv preprint arXiv:1910.01751 (2019)
Kocaoglu, M., Snyder, C., Dimakis, A.G., et al.: Causalgan: learning causal implicit generative models with adversarial training. arXiv preprint arXiv:1709.02023 (2017)
Kalainathan, D., Goudet, O., Guyon, I., et al.: Sam: structural agnostic model, causal discovery and penalized adversarial learning. arXiv preprint arXiv:1803.04929 (2018)
Yang, X., Zhang, H., Cai, J.: Deconfounded image captioning: a causal retrospect. arXiv preprint arXiv:2003.03923 (2020)
Tang, K., Niu, Y., Huang, J., et al.: Unbiased scene graph generation from biased training. In: CVPR (2020)
Wang, T., Huang, J., Zhang, H.: Visual commonsense r-cnn. In: CVPR (2020)
Yue, Z., Zhang, H., Sun, Q.: Interventional few-shot learning. In: NIPS (2020)
Li, X., Zhong, Z., Wu, J., et al.: Expectation-maximization attention networks for semantic segmentation. In: ICCV (2019)
Wang, X., Girshick, R., Gupta, A., et al.: Non-local neural networks. In: CVPR (2018)
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Yue, L., Li, J., Liu, Q.: Body parts relevance learning via expectation-maximization for human pose estimation. In: Multimedia Systems, pp. 1–13 (2021)
Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: CVPR (2018)
Papandreou, G., Zhu, T., Kanazawa, N., et al.: Towards accurate multi-person pose estimation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4903–4911 (2017)
Sun, X., Xiao, B., Wei, F., et al.: Integral human pose regression. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 529–545 (2018)
Fang, H.S., Xie, S., Tai, Y.W., et al.: RMPE: regional multi-person pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2334–2343 (2017)
Huang, S., Gong, M., Tao, D.: A coarse-fine network for keypoint localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3028–3037 (2017)
Chen, Y., Wang, Z., Peng, Y., et al.: Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7103–7112 (2018)
Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 172–186 (2021)
Qiu, Z., Qiu, K., Fu, J., Fu, D.: DGCN: dynamic graph convolutional network for efficient multi-person pose estimation, pp. 11924–11931 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Yue, L., Li, J., Liu, Q. (2022). Causal Intervention Learning for Multi-person Pose Estimation. In: Wallraven, C., Liu, Q., Nagahara, H. (eds) Pattern Recognition. ACPR 2021. Lecture Notes in Computer Science, vol 13188. Springer, Cham. https://doi.org/10.1007/978-3-031-02375-0_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-02375-0_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-02374-3
Online ISBN: 978-3-031-02375-0
eBook Packages: Computer ScienceComputer Science (R0)