Abstract
Estimating human motion from video is an active research area due to its many potential applications. Most state-of-the-art methods predict human shape and posture estimates for individual images and do not leverage the temporal information available in video. Many “in the wild" sequences of human motion are captured by a moving camera, which adds the complication of conflated camera and human motion to the estimation. We therefore present BodySLAM, a monocular SLAM system that jointly estimates the position, shape, and posture of human bodies, as well as the camera trajectory. We also introduce a novel human motion model to constrain sequential body postures and observe the scale of the scene. Through a series of experiments on video sequences of human motion captured by a moving monocular camera, we demonstrate that BodySLAM improves estimates of all human body parameters and camera poses when compared to estimating these separately.
Research presented in this paper has been supported by Dyson Technology Ltd. and the Technical University of Munich.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
We encourage the reader to view our supplementary video available at:
References
Agarwal, S., Mierle, K., et al.: Ceres Solver. http://ceres-solver.org
Akhter, I., Black, M.J.: Pose-conditioned joint angle limits for 3D human pose reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: SCAPE: shape completion and animation of people. ACM Trans. Graph. (2005)
Arnab, A., Doersch, C., Zisserman, A.: Exploiting temporal context for 3D human pose estimation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Barnes, D., Maddern, W., Pascoe, G., Posner, I.: Driven to distraction: self-supervised distractor learning for robust monocular visual odometry in urban environments. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2018)
Bârsan, I.A., Liu, P., Pollefeys, M., Geiger, A.: Robust dense mapping for large-scale dynamic environments. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2018)
Bescos, B., Fácil, J.M., Civera, J., Neira, J.: DynaSLAM: tracking, mapping and inpainting in dynamic scenes. Technical report (2018)
Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep It SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_34
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Catalin Ionescu Fuxin Li, C.S.: Latent structured models for human pose estimation. In: Proceedings of the International Conference on Computer Vision (ICCV) (2011)
Dabral, R., Mundhada, A., Kusupati, U., Afaque, S., Sharma, A., Jain, A.: Learning 3D human pose from structure and motion. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 679–696. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_41
Dai, W., Zhang, Y., Li, P., Fang, Z., Scherer, S.: RGB-D SLAM in dynamic environments using point correlations. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 44, 373–389 (2020)
Fang, H.S., Xie, S., Tai, Y.W., Lu, C.: RMPE: regional multi-person pose estimation. In: Proceedings of the International Conference on Computer Vision (ICCV) (2017)
Habibie, I., Xu, W., Mehta, D., Pons-Moll, G., Theobalt, C.: In the wild human pose estimation using explicit 2D features and intermediate 3D representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Henein, M., Kennedy, G., Mahony, R., Ila, V.: Exploiting rigid body motion for SLAM in dynamic environments. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2018)
Henein, M., Zhang, J., Mahony, R., Ila, V.: Dynamic SLAM: the need for speed. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2020)
Henning, D., Guler, A., Leutenegger, S., Zafeiriou, S.: HPE3D: human pose estimation in 3D (2020). https://github.com/dorianhenning/hpe3d
Hossain, M.R.I., Little, J.J.: Exploiting temporal information for 3D human pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 69–86. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_5
Huang, Y., et al.: Towards accurate marker-less human shape and pose estimation over time. In: Proceedings of the International Conference on 3D Vision (3DV) (2017)
Jaimez, M., Kerl, C., Gonzalez-Jimenez, J., Cremers, D.: Fast odometry and scene flow from RGB-D cameras based on geometric clustering. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2017)
Ji, T., Wang, C., Xie, L.: Towards real-time semantic RGB-D SLAM in dynamic environments. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2021)
Judd, K.M., Gammell, J.D., Newman, P.: Multimotion Visual Odometry (MVO): simultaneous estimation of camera and third-party motions. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2018)
Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Kanazawa, A., Zhang, J.Y., Felsen, P., Malik, J.: Learning 3D human dynamics from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings (2015)
Kocabas, M., Athanasiou, N., Black, M.J.: VIBE: video inference for human body pose and shape estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Kocabas, M., Huang, C.H.P., Tesch, J., Müller, L., Hilliges, O., Black, M.J.: SPEC: seeing people in the wild with an estimated camera. In: Proceedings of the International Conference on Computer Vision (ICCV) (2021)
Kolotouros, N., Pavlakos, G., Black, M.J., Daniilidis, K.: Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In: Proceedings of the International Conference on Computer Vision (ICCV) (2019)
Leutenegger, S., Chli, M., Siegwart, R.Y.: BRISK: binary robust invariant scalable keypoints. In: Proceedings of the International Conference on Computer Vision (ICCV) (2011)
Leutenegger, S., Lynen, S., Bosse, M., Siegwart, R., Furgale, P.: Keyframe-based visual-inertial odometry using nonlinear optimization. Int. J. Robot. Res. (IJRR) 34, 314–334 (2015)
Ling, H.Y., Zinno, F., Cheng, G., van de Panne, M.: Character controllers using motion vaes. ACM Trans. Graph. 39, 40 (2020)
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34, 1–16 (2015)
Mahmood, N., Ghorbani, N., Troje, N.F., Pons-Moll, G., Black, M.J.: AMASS: archive of motion capture as surface shapes. In: Proceedings of the International Conference on Computer Vision (ICCV) (2019)
Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3d human pose estimation. In: Proceedings of the International Conference on Computer Vision (ICCV) (2017)
Mur-Artal, R., Tardos, J.D.: ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Trans. Robot. 33, 1255–1262 (2017)
Osman, A.A.A., Bolkart, T., Black, M.J.: STAR: sparse trained articulated human body regressor. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12351, pp. 598–613. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58539-6_36
Paszke, A., et al.: Automatic differentiation in PyTorch. In: Neural Information Processing Systems (NIPS) (2017)
Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Pavlakos, G., Zhou, X., Daniilidis, K.: Ordinal depth supervision for 3D human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3D human pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Pavllo, D., Feichtenhofer, C., Grangier, D., Auli, M.: 3D human pose estimation in video with temporal convolutions and semi-supervised training. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Qiu, Y., Wang, C., Wang, W., Henein, M., Scherer, S.: AirDOS: dynamic SLAM benefits from articulated objects. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2022)
Ramakrishna, V., Kanade, T., Sheikh, Y.: Reconstructing 3D human pose from 2D image landmarks. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 573–586. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33765-9_41
Rempe, D., Birdal, T., Hertzmann, A., Yang, J., Sridhar, S., Guibas, L.J.: HuMoR: 3D human motion model for robust pose estimation. In: Proceedings of the International Conference on Computer Vision (ICCV) (2021)
Rünz, M., Agapito, L.: Co-fusion: real-time segmentation, tracking and fusion of multiple objects. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2017)
Rünz, M., Buffier, M., Agapito, L.: MaskFusion: real-time recognition, tracking and reconstruction of multiple moving objects. In: Proceedings of the International Symposium on Mixed and Augmented Reality (ISMAR) (2018)
Scona, R., Jaimez, M., Petillot, Y.R., Fallon, M., Cremers, D.: StaticFusion: background reconstruction for dense RGB-D SLAM in dynamic environments. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2018)
Valmadre, J., Lucey, S.: Deterministic 3D human pose estimation using rigid structure. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6313, pp. 467–480. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15558-1_34
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2
Xu, B., Li, W., Tzoumanikas, D., Bloesch, M., Davison, A., Leutenegger, S.: MID-fusion: octree-based object-level multi-instance dynamic SLAM. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2019)
Zhao, R., Wang, Y., Martinez, A.: A simple, fast and highly-accurate algorithm to recover 3D shape from 2D landmarks on a single image. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 40, 3059–3066 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Henning, D.F., Laidlow, T., Leutenegger, S. (2022). BodySLAM: Joint Camera Localisation, Mapping, and Human Motion Tracking. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13699. Springer, Cham. https://doi.org/10.1007/978-3-031-19842-7_38
Download citation
DOI: https://doi.org/10.1007/978-3-031-19842-7_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19841-0
Online ISBN: 978-3-031-19842-7
eBook Packages: Computer ScienceComputer Science (R0)