Skip to main content

On the Performance of Planning Through Backpropagation

  • Conference paper
  • First Online:
Intelligent Systems (BRACIS 2020)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12320))

Included in the following conference series:

  • 874 Accesses

Abstract

Planning problems with continuous state and action spaces are difficult to solve with existing planning techniques, specially when the state transition is defined by a high-dimension non-linear dynamics. Recently, a technique called Planning through Backpropagation (PtB) was introduced as an efficient and scalable alternative to traditional optimization-based methods for continuous planning problems. PtB leverages modern gradient descent algorithms and highly optimized automatic differentiation libraries to obtain approximate solutions. However, to date there have been no empirical evaluations comparing PtB with Linear-Quadratic (LQ) control problems. In this work, we compare PtB with an optimal algorithm from control theory called LQR, and its iterative version iLQR, when solving linear and non-linear continuous deterministic planning problems. The empirical results suggest that PtB can be an efficient alternative to optimizing non-linear continuous deterministic planning, being much easier to be implemented and stabilized than classical model-predictive control methods.

Supported by CNPq and FAPESP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The Riccati equations can be seen as the analytical counterpart of the value iteration for LQRs.

  2. 2.

    Notice that, since the functions represented in the model cells are not parameterized, this model is not exactly a recurrent neural network (RNN), despite its resemblance.

  3. 3.

    This is akin to a shooting formulation in optimal control, but solved through gradient-based optimization instead of dynamic programming or other methods.

References

  1. Agarwal, Y., Balaji, B., Gupta, R., Lyles, J., Wei, M., Weng, T.: Occupancy-driven energy management for smart building automation. In: Proceedings of the 2nd ACM Workshop on Embedded Sensing Systems for Energy-Efficiency in Building, pp. 1–6 (2010)

    Google Scholar 

  2. Amos, B., Yarats, D.: The differentiable cross-entropy method. CoRR abs/1909.12830 (2019)

    Google Scholar 

  3. Anderson, B.D.O., Moore, J.B.: Optimal Control: Linear Quadratic Methods. Prentice-Hall Inc., Upper Saddle River (1990)

    MATH  Google Scholar 

  4. Bertsekas, D.P.: Dynamic programming and optimal control, 3rd edn. Athena Scientific (2005)

    Google Scholar 

  5. Bueno, T.P., de Barros, L.N., Mauá, D.D., Sanner, S.: Deep reactive policies for planning in stochastic nonlinear domains. In: Proceedings of the Thirty-Third AAAI Conference Artificial Intelligence, pp. 7530–7537 (2019)

    Google Scholar 

  6. Erickson, V.L., et al.: Energy efficient building environment control strategies using real-time occupancy measurements. In: Proceedings of the First ACM Workshop on Embedded Sensing Systems for Energy-Efficiency in Buildings, pp. 19–24 (2009)

    Google Scholar 

  7. Faulwasser, T., Findeisen, R.: Nonlinear model predictive path following control. Nonlinear Model Predictive Control 384, 335–343 (2009)

    Article  Google Scholar 

  8. Fazel, M., Ge, R., Kakade, S.M., Mesbahi, M.: Global convergence of policy gradient methods for the linear quadratic regulator. In: Proceedings of the 35th International Conference on Machine Learning, pp. 1466–1475 (2018)

    Google Scholar 

  9. Ghallab, M., Nau, D., Traverso, P.: pp. i–iv. Cambridge University Press, Cambridge (2016)

    Google Scholar 

  10. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  11. Li, W., Todorov, E.: Iterative linear quadratic regulator design for nonlinear biological movement systems. In: Proceedings of the First International Conference on Informatics in Control, Automation and Robotics, pp. 222–229 (2004)

    Google Scholar 

  12. Mania, H., Tu, S., Recht, B.: Certainty equivalence is efficient for linear quadratic control. In: Proceedings of the 33rd Conference on Neural Information Processing Systems, pp. 10154–10164 (2019)

    Google Scholar 

  13. Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: Proceedings of the 30th International Conference on Machine Learning, pp. 1310–1318 (2013)

    Google Scholar 

  14. Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, pp. 8024–8035 (2019)

    Google Scholar 

  15. Roulet, V., Drusvyatskiy, D., Srinivasa, S.S., Harchaoui, Z.: Iterative linearized control: stable algorithms and complexity guarantees. In: Proceedings of the 36th International Conference on Machine Learning, pp. 5518–5527 (2019)

    Google Scholar 

  16. Say, B., Wu, G., Zhou, Y.Q., Sanner, S.: Nonlinear hybrid planning with deep net learned transition models and mixed-integer linear programming. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pp. 750–756 (2017)

    Google Scholar 

  17. Tassa, Y., Erez, T., Todorov, E.: Synthesis and stabilization of complex behaviors through online trajectory optimization. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4906–4913 (2012)

    Google Scholar 

  18. Tassa, Y., Mansard, N., Todorov, E.: Control-limited differential dynamic programming. In: IEEE International Conference on Robotics and Automation, pp. 1168–1175 (2014)

    Google Scholar 

  19. Tieleman, T., Hinton, G.: Lecture 6.5-RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw. Mach. Learn. 4(2), 26–31 (2012)

    Google Scholar 

  20. Wu, G., Say, B., Sanner, S.: Scalable planning with tensorflow for hybrid nonlinear domains. In: Advances in Neural Information Processing Systems 30, pp. 6273–6283 (2017)

    Google Scholar 

  21. Wu, G., Say, B., Sanner, S.: Scalable nonlinear planning with deep neural network learned transition models. CoRR abs/1904.02873 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Renato Scaroni .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Scaroni, R., Bueno, T.P., de Barros, L.N., Mauá, D. (2020). On the Performance of Planning Through Backpropagation. In: Cerri, R., Prati, R.C. (eds) Intelligent Systems. BRACIS 2020. Lecture Notes in Computer Science(), vol 12320. Springer, Cham. https://doi.org/10.1007/978-3-030-61380-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-61380-8_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-61379-2

  • Online ISBN: 978-3-030-61380-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics