Scene-Aware Human Motion Forecasting via Mutual Distance Prediction

Xing, Chaoyue; Mao, Wei; Liu, Miaomiao

doi:10.1007/978-3-031-72933-1_8

Chaoyue Xing¹³,
Wei Mao¹³ &
Miaomiao Liu¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15097))

Included in the following conference series:

European Conference on Computer Vision

320 Accesses

Abstract

In this paper, we address the issue of scene-aware 3D human motion forecasting. A key challenge in this task is to predict future human motions that are coherent with the scene by modeling human-scene interactions. While recent works have demonstrated that explicit constraints on human-scene interactions can prevent the occurrence of ghost motion, they only provide constraints on partial human motion e.g., the global motion of the human or a few joints contacting the scene, leaving the rest of unconstrained. To address this limitation, we propose to represent the human-scene interaction using the mutual distance between the human body and the scene. Such mutual distances constrain both the local and global human motion, resulting in a whole-body motion constrained prediction. In particular, mutual distance constraints consist of two components, the signed distance of each vertex on the human mesh to the scene surface and the distance of basis scene points to the human mesh. We further introduce a global scene representation learned from a signed distance function (SDF) volume to ensure coherence between the global scene representation and the explicit constraint from the mutual distance. We develop a pipeline with two sequential steps: predicting the future mutual distances first, followed by forecasting future human motion. We explicitly ensure consistency between predicted poses and mutual distances during training. Extensive testing on both synthetic and real datasets demonstrates that our method consistently surpasses the performance of current state-of-the-art techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Long-Term Human Motion Prediction with Scene Context

Path-Guided Motion Prediction with Multi-view Scene Perception

Class-guided human motion prediction via multi-spatial-temporal supervision

Article 10 March 2023

Notes

1.
Unless otherwise specified, in this paper, we use column vector convention.
2.
The first GCN layer will take the DCT coefficient matrix as input thus, $\textbf{F}^{(1)} = \textbf{H}$.

References

Aksan, E., Kaufmann, M., Hilliges, O.: Structured prediction helps 3D human motion modelling. In: ICCV, pp. 7144–7153 (2019)
Google Scholar
Brand, M., Hertzmann, A.: Style machines. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, pp. 183–192. ACM Press/Addison-Wesley Publishing Co. (2000)
Google Scholar
Cai, Y., et al.: Learning progressive joint propagation for human motion prediction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 226–242. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_14
Chapter Google Scholar
Cao, Z., Gao, H., Mangalam, K., Cai, Q.-Z., Vo, M., Malik, J.: Long-term human motion prediction with scene context. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 387–404. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_23
Chapter Google Scholar
Corona, E., Pumarola, A., Alenya, G., Moreno-Noguer, F.: Context-aware human motion prediction. In: CVPR, pp. 6992–7001 (2020)
Google Scholar
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: Richly-annotated 3D reconstructions of indoor scenes. In: CVPR, pp. 5828–5839 (2017)
Google Scholar
González, Á.: Measurement of areas on a sphere using Fibonacci and latitude-longitude lattices. Math. Geosci. 42, 49–64 (2010)
Article MathSciNet Google Scholar
Gopalakrishnan, A., Mali, A., Kifer, D., Giles, L., Ororbia, A.G.: A neural temporal model for human motion prediction. In: CVPR, pp. 12116–12125 (2019)
Google Scholar
Hassan, M., et al.: Stochastic scene-aware motion prediction. In: ICCV, pp. 11374–11384 (2021)
Google Scholar
Hassan, M., Choutas, V., Tzionas, D., Black, M.J.: Resolving 3D human pose ambiguities with 3D scene constraints. In: ICCV, pp. 2282–2292 (2019)
Google Scholar
Hassan, M., Ghosh, P., Tesch, J., Tzionas, D., Black, M.J.: Populating 3D scenes by learning human-scene interaction. In: CVPR, pp. 14708–14718 (2021)
Google Scholar
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. TPAMI 36(7), 1325–1339 (2014)
Article Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017)
Google Scholar
Koppula, H.S., Saxena, A.: Anticipating human activities for reactive robotic response. In: IROS, p. 2071. Tokyo (2013)
Google Scholar
Li, M., Chen, S., Zhao, Y., Zhang, Y., Wang, Y., Tian, Q.: Dynamic multiscale graph neural networks for 3D skeleton based human motion prediction. In: CVPR, pp. 214–223 (2020)
Google Scholar
Li, X., Li, H., Joo, H., Liu, Y., Sheikh, Y.: Structure from recurrent motion: from rigidity to recurrency. In: CVPR (2018)
Google Scholar
Liu, Z., Tang, H., Lin, Y., Han, S.: Point-voxel CNN for efficient 3D deep learning. Adv. Neural. Inf. Process. Syst. 32 (2019)
Google Scholar
Mahmood, N., Ghorbani, N., Troje, N.F., Pons-Moll, G., Black, M.J.: AMASS: archive of motion capture as surface shapes. In: ICCV (2019)
Google Scholar
Mao, W., Hartley, R.I., Salzmann, M., et al.: Contact-aware human motion forecasting. Adv. Neural. Inf. Process. Syst. 35, 7356–7367 (2022)
Google Scholar
Mao, W., Liu, M., Salzmann, M.: History repeats itself: human motion prediction via motion attention. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12359, pp. 474–489. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58568-6_28
Chapter Google Scholar
Mao, W., Liu, M., Salzmann, M., Li, H.: Learning trajectory dependencies for human motion prediction. In: ICCV, pp. 9489–9497 (2019)
Google Scholar
Paden, B., Čáp, M., Yong, S.Z., Yershov, D., Frazzoli, E.: A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Trans. Intell. Veh. 1(1), 33–55 (2016)
Article Google Scholar
Paszke, A., et al.: Automatic differentiation in PyTorch. In: NeurIPS-W (2017)
Google Scholar
Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: CVPR, pp. 10975–10985 (2019)
Google Scholar
Prokudin, S., Lassner, C., Romero, J.: Efficient learning on point clouds with basis point sets. In: ICCV, pp. 4332–4341 (2019)
Google Scholar
Scofano, L., Sampieri, A., Schiele, E., De Matteis, E., Leal-Taixé, L., Galasso, F.: Staged contact-aware global human motion forecasting. In: BMVC (2023)
Google Scholar
Sidenbladh, H., Black, M.J., Sigal, L.: Implicit probabilistic models of human motion for synthesis and tracking. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 784–800. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47969-4_52
Chapter Google Scholar
Starke, S., Zhang, H., Komura, T., Saito, J.: Neural state machine for character-scene interactions. ACM Trans. Graph. 38(6), 209–1 (2019)
Article Google Scholar
Taylor, G.W., Hinton, G.E., Roweis, S.T.: Modeling human motion using binary latent variables. In: NeurIPS, pp. 1345–1352 (2006)
Google Scholar
Van Welbergen, H., Van Basten, B.J., Egges, A., Ruttkay, Z.M., Overmars, M.H.: Real time animation of virtual humans: a trade-off between naturalness and control. In: Computer Graphics Forum, vol. 29, pp. 2530–2554. Wiley Online Library (2010)
Google Scholar
Wang, B., Adeli, E., Chiu, H.k., Huang, D.A., Niebles, J.C.: Imitation learning for human pose prediction. In: ICCV, pp. 7124–7133 (2019)
Google Scholar
Wang, J.M., Fleet, D.J., Hertzmann, A.: Gaussian process dynamical models for human motion. TPAMI 30(2), 283–298 (2008)
Article Google Scholar
Wang, J., Xu, H., Xu, J., Liu, S., Wang, X.: Synthesizing long-term 3D human motion and interaction in 3D scenes. In: CVPR, pp. 9401–9411 (2021)
Google Scholar
Wang, J., Yan, S., Dai, B., Lin, D.: Scene-aware generative network for human motion synthesis. In: CVPR, pp. 12206–12215 (2021)
Google Scholar
Wang, Z., Chen, Y., Liu, T., Zhu, Y., Liang, W., Huang, S.: HUMANISE: language-conditioned human motion generation in 3D scenes. Adv. Neural. Inf. Process. Syst. 35, 14959–14971 (2022)
Google Scholar
Zhang, S., Zhang, Y., Ma, Q., Black, M.J., Tang, S.: PLACE: proximity learning of articulation and contact in 3D environments. In: 3DV, pp. 642–651. IEEE (2020)
Google Scholar
Zhang, Y., Black, M.J., Tang, S.: We are more than our joints: predicting how 3D bodies move. In: CVPR, pp. 3372–3382 (2021)
Google Scholar
Zhang, Y., Hassan, M., Neumann, H., Black, M.J., Tang, S.: Generating 3D people in scenes without people. In: CVPR, pp. 6194–6204 (2020)
Google Scholar
Zheng, Y., et al.: GIMO: gaze-informed human motion prediction in context. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13673, pp. 676–694. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19778-9_39
Chapter Google Scholar
Zhou, Y., Barnes, C., Lu, J., Yang, J., Li, H.: On the continuity of rotation representations in neural networks. In: CVPR, pp. 5745–5753 (2019)
Google Scholar

Download references

Acknowledgements

This research was supported in part by the Australia Research Council DECRA Fellowship (DE180100628) and ARC Discovery Grant (DP200102274).

Author information

Authors and Affiliations

Australian National University, Canberra, Australia
Chaoyue Xing, Wei Mao & Miaomiao Liu

Authors

Chaoyue Xing
View author publications
You can also search for this author in PubMed Google Scholar
Wei Mao
View author publications
You can also search for this author in PubMed Google Scholar
Miaomiao Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chaoyue Xing .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Hessen, Germany
Stefan Roth
Princeton University, Palo Alto, CA, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 14495 KB)

Supplementary material 2 (mp4 65667 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xing, C., Mao, W., Liu, M. (2025). Scene-Aware Human Motion Forecasting via Mutual Distance Prediction. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15097. Springer, Cham. https://doi.org/10.1007/978-3-031-72933-1_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-72933-1_8
Published: 03 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72932-4
Online ISBN: 978-3-031-72933-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Scene-Aware Human Motion Forecasting via Mutual Distance Prediction

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Long-Term Human Motion Prediction with Scene Context

Path-Guided Motion Prediction with Multi-view Scene Perception

Class-guided human motion prediction via multi-spatial-temporal supervision

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 14495 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Scene-Aware Human Motion Forecasting via Mutual Distance Prediction

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Long-Term Human Motion Prediction with Scene Context

Path-Guided Motion Prediction with Multi-view Scene Perception

Class-guided human motion prediction via multi-spatial-temporal supervision

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 14495 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation