Learning to Identify Physical Parameters from Video Using Differentiable Physics

Kandukuri, Rama; Achterhold, Jan; Moeller, Michael; Stueckler, Joerg

doi:10.1007/978-3-030-71278-5_4

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12544))

Included in the following conference series:

DAGM German Conference on Pattern Recognition

1394 Accesses

Abstract

Video representation learning has recently attracted attention in computer vision due to its applications for activity and scene forecasting or vision-based planning and control. Video prediction models often learn a latent representation of video which is encoded from input frames and decoded back into images. Even when conditioned on actions, purely deep learning based architectures typically lack a physically interpretable latent space. In this study, we use a differentiable physics engine within an action-conditional video representation network to learn a physical latent representation. We propose supervised and self-supervised learning methods to train our network and identify physical properties. The latter uses spatial transformers to decode physical states back into images. The simulation scenarios in our experiments comprise pushing, sliding and colliding objects, for which we also analyze the observability of the physical properties. In experiments we demonstrate that our network can learn to encode images and identify physical properties like mass and friction from videos and action sequences in the simulated scenarios. We evaluate the accuracy of our supervised and self-supervised methods and compare it with a system identification baseline which directly learns from state trajectories. We also demonstrate the ability of our method to predict future video frames from input images and actions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Physical Representation Learning and Parameter Identification from Video Using Differentiable Physics

Article Open access 17 October 2021

Learning to Disentangle Latent Physical Factors for Video Prediction

PhyLoNet: Physically-Constrained Long-Term Video Prediction

Notes

1.
https://pybullet.org.

References

Amos, B., Kolter, J.Z.: Optnet: differentiable optimization as a layer in neural networks. In: International Conference on Machine Learning, pp. 136–145 (2017)
Google Scholar
Anitescu, M., Potra, F.A.: Formulating dynamic multi-rigid-body contact problems with friction as solvable linear complementarity problems. Nonlinear Dyn. 14, 231–247 (1997)
Article MathSciNet Google Scholar
de Avila Belbute-Peres, F., Smith, K., Allen, K., Tenenbaum, J., Kolter, J.Z.: End-to-end differentiable physics for learning and control. In: Advances in Neural Information Processing Systems, pp. 7178–7189 (2018)
Google Scholar
Babaeizadeh, M., Finn, C., Erhan, D., Campbell, R., Levine, S.: Stochastic variational video prediction. In: Proceedings of the International Conference on Learning Representations (2018)
Google Scholar
Chen, R.T.Q., Li, X., Grosse, R.B., Duvenaud, D.K.: Isolating sources of disentanglement in variational autoencoders. In: Advances in Neural Information Processing Systems, pp. 2610–2620 (2018)
Google Scholar
Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs). In: Proceedings of the International Conference on Learning Representations (2016)
Google Scholar
Cline, M.B.: Rigid body simulation with contact and constraints. Ph.D. thesis (2002). https://doi.org/10.14288/1.0051676. https://open.library.ubc.ca/collections/ubctheses/831/items/1.0051676
Degrave, J., Hermans, M., Dambre, J., Wyffels, F.: A differentiable physics engine for deep learning in robotics. Front. Neurorobotics 13 (2016). https://doi.org/10.3389/fnbot.2019.00006
Finn, C., Levine, S.: Deep visual foresight for planning robot motion. In: International Conference on Robotics and Automation, pp. 2786–2793 (2017)
Google Scholar
Finn, C., Goodfellow, I.J., Levine, S.: Unsupervised learning for physical interaction through video prediction. In: Advances in Neural Information Processing Systems, pp. 64–72 (2016)
Google Scholar
Greydanus, S., Dzamba, M., Yosinski, J.: Hamiltonian neural networks. In: Advances in Neural Information Processing Systems, pp. 15379–15389 (2019)
Google Scholar
Hafner, D., et al.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–80 (1997)
Article Google Scholar
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Advances in Neural Information Processing Systems, pp. 2017–2025 (2015)
Google Scholar
Jaques, M., Burke, M., Hospedales, T.M.: Physics-as-inverse-graphics: joint unsupervised learning of objects and physics from video. In: Proceedings of the International Conference on Learning Representations (2020)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: Proceedings of the International Conference on Learning Representations (2014)
Google Scholar
Kloss, A., Schaal, S., Bohg, J.: Combining learned and analytical models for predicting action effects. CoRR abs/1710.04102 (2017)
Google Scholar
Mattingley, J., Boyd, S.: CVXGEN: a code generator for embedded convex optimization. Optim. Eng. 13 (2012). https://doi.org/10.1007/s11081-011-9176-9
Mottaghi, R., Bagherinezhad, H., Rastegari, M., Farhadi, A.: Newtonian scene understanding: unfolding the dynamics of objects in static images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Mottaghi, R., Rastegari, M., Gupta, A., Farhadi, A.: “What happens if...” learning to predict the effect of forces in images. In: European Conference on Computer Vision (2016)
Google Scholar
Runia, T.F.H., Gavrilyuk, K., Snoek, C.G.M., Smeulders, A.W.M.: Cloth in the wind: a case study of estimating physical measurement through simulation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems, pp. 802–810 (2015)
Google Scholar
Srivastava, N., Mansimov, E., Salakhutdinov, R.: Unsupervised learning of video representations using lstms. In: International Conference on Machine Learning (2015)
Google Scholar
Stewart, D.: Rigid-body dynamics with friction and impact. SIAM Rev. 42, 3–39 (2000). https://doi.org/10.1137/S0036144599360110
Article MathSciNet MATH Google Scholar
Watters, N., Zoran, D., Weber, T., Battaglia, P., Pascanu, R., Tacchetti, A.: Visual interaction networks: learning a physics simulator from video. In: Advances in Neural Information Processing Systems (2017)
Google Scholar
Ye, T., Wang, X., Davidson, J., Gupta, A.: Interpretable intuitive physics model. In: European Conference on Computer Vision (2018)
Google Scholar
Zhu, D., Munderloh, M., Rosenhahn, B., Stückler, J.: Learning to disentangle latent physical factors for video prediction. In: German Conference on Pattern Recognition (2019)
Google Scholar

Download references

Acknowledgements

We acknowledge support from Cyber Valley, the Max Planck Society, and the German Federal Ministry of Education and Research (BMBF) through the Tuebingen AI Center (FKZ: 01IS18039B). The authors thank the International Max Planck Research School for Intelligent Systems (IMPRS-IS) for supporting Jan Achterhold.

Author information

Authors and Affiliations

Max Planck Institute for Intelligent Systems, Tübingen, Germany
Rama Kandukuri, Jan Achterhold & Joerg Stueckler
University of Siegen, Siegen, Germany
Rama Kandukuri & Michael Moeller

Authors

Rama Kandukuri
View author publications
You can also search for this author in PubMed Google Scholar
Jan Achterhold
View author publications
You can also search for this author in PubMed Google Scholar
Michael Moeller
View author publications
You can also search for this author in PubMed Google Scholar
Joerg Stueckler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rama Kandukuri .

Editor information

Editors and Affiliations

University of Tübingen, Tübingen, Germany
Zeynep Akata
University of Tübingen, Tübingen, Germany
Andreas Geiger
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 997 KB)

Supplementary material 2 (mp4 192 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kandukuri, R., Achterhold, J., Moeller, M., Stueckler, J. (2021). Learning to Identify Physical Parameters from Video Using Differentiable Physics. In: Akata, Z., Geiger, A., Sattler, T. (eds) Pattern Recognition. DAGM GCPR 2020. Lecture Notes in Computer Science(), vol 12544. Springer, Cham. https://doi.org/10.1007/978-3-030-71278-5_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-71278-5_4
Published: 17 March 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71277-8
Online ISBN: 978-3-030-71278-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning to Identify Physical Parameters from Video Using Differentiable Physics

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Physical Representation Learning and Parameter Identification from Video Using Differentiable Physics

Learning to Disentangle Latent Physical Factors for Video Prediction

PhyLoNet: Physically-Constrained Long-Term Video Prediction

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 997 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Learning to Identify Physical Parameters from Video Using Differentiable Physics

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Physical Representation Learning and Parameter Identification from Video Using Differentiable Physics

Learning to Disentangle Latent Physical Factors for Video Prediction

PhyLoNet: Physically-Constrained Long-Term Video Prediction

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 997 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation