Skip to main content

Learning to Identify Physical Parameters from Video Using Differentiable Physics

  • Conference paper
  • First Online:
Pattern Recognition (DAGM GCPR 2020)

Abstract

Video representation learning has recently attracted attention in computer vision due to its applications for activity and scene forecasting or vision-based planning and control. Video prediction models often learn a latent representation of video which is encoded from input frames and decoded back into images. Even when conditioned on actions, purely deep learning based architectures typically lack a physically interpretable latent space. In this study, we use a differentiable physics engine within an action-conditional video representation network to learn a physical latent representation. We propose supervised and self-supervised learning methods to train our network and identify physical properties. The latter uses spatial transformers to decode physical states back into images. The simulation scenarios in our experiments comprise pushing, sliding and colliding objects, for which we also analyze the observability of the physical properties. In experiments we demonstrate that our network can learn to encode images and identify physical properties like mass and friction from videos and action sequences in the simulated scenarios. We evaluate the accuracy of our supervised and self-supervised methods and compare it with a system identification baseline which directly learns from state trajectories. We also demonstrate the ability of our method to predict future video frames from input images and actions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://pybullet.org.

References

  1. Amos, B., Kolter, J.Z.: Optnet: differentiable optimization as a layer in neural networks. In: International Conference on Machine Learning, pp. 136–145 (2017)

    Google Scholar 

  2. Anitescu, M., Potra, F.A.: Formulating dynamic multi-rigid-body contact problems with friction as solvable linear complementarity problems. Nonlinear Dyn. 14, 231–247 (1997)

    Article  MathSciNet  Google Scholar 

  3. de Avila Belbute-Peres, F., Smith, K., Allen, K., Tenenbaum, J., Kolter, J.Z.: End-to-end differentiable physics for learning and control. In: Advances in Neural Information Processing Systems, pp. 7178–7189 (2018)

    Google Scholar 

  4. Babaeizadeh, M., Finn, C., Erhan, D., Campbell, R., Levine, S.: Stochastic variational video prediction. In: Proceedings of the International Conference on Learning Representations (2018)

    Google Scholar 

  5. Chen, R.T.Q., Li, X., Grosse, R.B., Duvenaud, D.K.: Isolating sources of disentanglement in variational autoencoders. In: Advances in Neural Information Processing Systems, pp. 2610–2620 (2018)

    Google Scholar 

  6. Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs). In: Proceedings of the International Conference on Learning Representations (2016)

    Google Scholar 

  7. Cline, M.B.: Rigid body simulation with contact and constraints. Ph.D. thesis (2002). https://doi.org/10.14288/1.0051676. https://open.library.ubc.ca/collections/ubctheses/831/items/1.0051676

  8. Degrave, J., Hermans, M., Dambre, J., Wyffels, F.: A differentiable physics engine for deep learning in robotics. Front. Neurorobotics 13 (2016). https://doi.org/10.3389/fnbot.2019.00006

  9. Finn, C., Levine, S.: Deep visual foresight for planning robot motion. In: International Conference on Robotics and Automation, pp. 2786–2793 (2017)

    Google Scholar 

  10. Finn, C., Goodfellow, I.J., Levine, S.: Unsupervised learning for physical interaction through video prediction. In: Advances in Neural Information Processing Systems, pp. 64–72 (2016)

    Google Scholar 

  11. Greydanus, S., Dzamba, M., Yosinski, J.: Hamiltonian neural networks. In: Advances in Neural Information Processing Systems, pp. 15379–15389 (2019)

    Google Scholar 

  12. Hafner, D., et al.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019)

    Google Scholar 

  13. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–80 (1997)

    Article  Google Scholar 

  14. Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Advances in Neural Information Processing Systems, pp. 2017–2025 (2015)

    Google Scholar 

  15. Jaques, M., Burke, M., Hospedales, T.M.: Physics-as-inverse-graphics: joint unsupervised learning of objects and physics from video. In: Proceedings of the International Conference on Learning Representations (2020)

    Google Scholar 

  16. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: Proceedings of the International Conference on Learning Representations (2014)

    Google Scholar 

  17. Kloss, A., Schaal, S., Bohg, J.: Combining learned and analytical models for predicting action effects. CoRR abs/1710.04102 (2017)

    Google Scholar 

  18. Mattingley, J., Boyd, S.: CVXGEN: a code generator for embedded convex optimization. Optim. Eng. 13 (2012). https://doi.org/10.1007/s11081-011-9176-9

  19. Mottaghi, R., Bagherinezhad, H., Rastegari, M., Farhadi, A.: Newtonian scene understanding: unfolding the dynamics of objects in static images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  20. Mottaghi, R., Rastegari, M., Gupta, A., Farhadi, A.: “What happens if...” learning to predict the effect of forces in images. In: European Conference on Computer Vision (2016)

    Google Scholar 

  21. Runia, T.F.H., Gavrilyuk, K., Snoek, C.G.M., Smeulders, A.W.M.: Cloth in the wind: a case study of estimating physical measurement through simulation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)

    Google Scholar 

  22. Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems, pp. 802–810 (2015)

    Google Scholar 

  23. Srivastava, N., Mansimov, E., Salakhutdinov, R.: Unsupervised learning of video representations using lstms. In: International Conference on Machine Learning (2015)

    Google Scholar 

  24. Stewart, D.: Rigid-body dynamics with friction and impact. SIAM Rev. 42, 3–39 (2000). https://doi.org/10.1137/S0036144599360110

    Article  MathSciNet  MATH  Google Scholar 

  25. Watters, N., Zoran, D., Weber, T., Battaglia, P., Pascanu, R., Tacchetti, A.: Visual interaction networks: learning a physics simulator from video. In: Advances in Neural Information Processing Systems (2017)

    Google Scholar 

  26. Ye, T., Wang, X., Davidson, J., Gupta, A.: Interpretable intuitive physics model. In: European Conference on Computer Vision (2018)

    Google Scholar 

  27. Zhu, D., Munderloh, M., Rosenhahn, B., Stückler, J.: Learning to disentangle latent physical factors for video prediction. In: German Conference on Pattern Recognition (2019)

    Google Scholar 

Download references

Acknowledgements

We acknowledge support from Cyber Valley, the Max Planck Society, and the German Federal Ministry of Education and Research (BMBF) through the Tuebingen AI Center (FKZ: 01IS18039B). The authors thank the International Max Planck Research School for Intelligent Systems (IMPRS-IS) for supporting Jan Achterhold.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rama Kandukuri .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 997 KB)

Supplementary material 2 (mp4 192 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kandukuri, R., Achterhold, J., Moeller, M., Stueckler, J. (2021). Learning to Identify Physical Parameters from Video Using Differentiable Physics. In: Akata, Z., Geiger, A., Sattler, T. (eds) Pattern Recognition. DAGM GCPR 2020. Lecture Notes in Computer Science(), vol 12544. Springer, Cham. https://doi.org/10.1007/978-3-030-71278-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-71278-5_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-71277-8

  • Online ISBN: 978-3-030-71278-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics