On the Performance of Planning Through Backpropagation

Scaroni, Renato; Bueno, Thiago P.; de Barros, Leliane N.; Mauá, Denis

doi:10.1007/978-3-030-61380-8_8

Renato Scaroni¹⁰,
Thiago P. Bueno¹⁰,
Leliane N. de Barros¹⁰ &
…
Denis Mauá¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12320))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

874 Accesses

Abstract

Planning problems with continuous state and action spaces are difficult to solve with existing planning techniques, specially when the state transition is defined by a high-dimension non-linear dynamics. Recently, a technique called Planning through Backpropagation (PtB) was introduced as an efficient and scalable alternative to traditional optimization-based methods for continuous planning problems. PtB leverages modern gradient descent algorithms and highly optimized automatic differentiation libraries to obtain approximate solutions. However, to date there have been no empirical evaluations comparing PtB with Linear-Quadratic (LQ) control problems. In this work, we compare PtB with an optimal algorithm from control theory called LQR, and its iterative version iLQR, when solving linear and non-linear continuous deterministic planning problems. The empirical results suggest that PtB can be an efficient alternative to optimizing non-linear continuous deterministic planning, being much easier to be implemented and stabilized than classical model-predictive control methods.

Supported by CNPq and FAPESP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The Riccati equations can be seen as the analytical counterpart of the value iteration for LQRs.
2.
Notice that, since the functions represented in the model cells are not parameterized, this model is not exactly a recurrent neural network (RNN), despite its resemblance.
3.
This is akin to a shooting formulation in optimal control, but solved through gradient-based optimization instead of dynamic programming or other methods.

References

Agarwal, Y., Balaji, B., Gupta, R., Lyles, J., Wei, M., Weng, T.: Occupancy-driven energy management for smart building automation. In: Proceedings of the 2nd ACM Workshop on Embedded Sensing Systems for Energy-Efficiency in Building, pp. 1–6 (2010)
Google Scholar
Amos, B., Yarats, D.: The differentiable cross-entropy method. CoRR abs/1909.12830 (2019)
Google Scholar
Anderson, B.D.O., Moore, J.B.: Optimal Control: Linear Quadratic Methods. Prentice-Hall Inc., Upper Saddle River (1990)
MATH Google Scholar
Bertsekas, D.P.: Dynamic programming and optimal control, 3rd edn. Athena Scientific (2005)
Google Scholar
Bueno, T.P., de Barros, L.N., Mauá, D.D., Sanner, S.: Deep reactive policies for planning in stochastic nonlinear domains. In: Proceedings of the Thirty-Third AAAI Conference Artificial Intelligence, pp. 7530–7537 (2019)
Google Scholar
Erickson, V.L., et al.: Energy efficient building environment control strategies using real-time occupancy measurements. In: Proceedings of the First ACM Workshop on Embedded Sensing Systems for Energy-Efficiency in Buildings, pp. 19–24 (2009)
Google Scholar
Faulwasser, T., Findeisen, R.: Nonlinear model predictive path following control. Nonlinear Model Predictive Control 384, 335–343 (2009)
Article Google Scholar
Fazel, M., Ge, R., Kakade, S.M., Mesbahi, M.: Global convergence of policy gradient methods for the linear quadratic regulator. In: Proceedings of the 35th International Conference on Machine Learning, pp. 1466–1475 (2018)
Google Scholar
Ghallab, M., Nau, D., Traverso, P.: pp. i–iv. Cambridge University Press, Cambridge (2016)
Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Li, W., Todorov, E.: Iterative linear quadratic regulator design for nonlinear biological movement systems. In: Proceedings of the First International Conference on Informatics in Control, Automation and Robotics, pp. 222–229 (2004)
Google Scholar
Mania, H., Tu, S., Recht, B.: Certainty equivalence is efficient for linear quadratic control. In: Proceedings of the 33rd Conference on Neural Information Processing Systems, pp. 10154–10164 (2019)
Google Scholar
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: Proceedings of the 30th International Conference on Machine Learning, pp. 1310–1318 (2013)
Google Scholar
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, pp. 8024–8035 (2019)
Google Scholar
Roulet, V., Drusvyatskiy, D., Srinivasa, S.S., Harchaoui, Z.: Iterative linearized control: stable algorithms and complexity guarantees. In: Proceedings of the 36th International Conference on Machine Learning, pp. 5518–5527 (2019)
Google Scholar
Say, B., Wu, G., Zhou, Y.Q., Sanner, S.: Nonlinear hybrid planning with deep net learned transition models and mixed-integer linear programming. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pp. 750–756 (2017)
Google Scholar
Tassa, Y., Erez, T., Todorov, E.: Synthesis and stabilization of complex behaviors through online trajectory optimization. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4906–4913 (2012)
Google Scholar
Tassa, Y., Mansard, N., Todorov, E.: Control-limited differential dynamic programming. In: IEEE International Conference on Robotics and Automation, pp. 1168–1175 (2014)
Google Scholar
Tieleman, T., Hinton, G.: Lecture 6.5-RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw. Mach. Learn. 4(2), 26–31 (2012)
Google Scholar
Wu, G., Say, B., Sanner, S.: Scalable planning with tensorflow for hybrid nonlinear domains. In: Advances in Neural Information Processing Systems 30, pp. 6273–6283 (2017)
Google Scholar
Wu, G., Say, B., Sanner, S.: Scalable nonlinear planning with deep neural network learned transition models. CoRR abs/1904.02873 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo, Brazil
Renato Scaroni, Thiago P. Bueno, Leliane N. de Barros & Denis Mauá

Authors

Renato Scaroni
View author publications
You can also search for this author in PubMed Google Scholar
Thiago P. Bueno
View author publications
You can also search for this author in PubMed Google Scholar
Leliane N. de Barros
View author publications
You can also search for this author in PubMed Google Scholar
Denis Mauá
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Renato Scaroni .

Editor information

Editors and Affiliations

Federal University of São Carlos, São Carlos, Brazil
Ricardo Cerri
Federal University of ABC, Santo Andre, Brazil
Ronaldo C. Prati

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Scaroni, R., Bueno, T.P., de Barros, L.N., Mauá, D. (2020). On the Performance of Planning Through Backpropagation. In: Cerri, R., Prati, R.C. (eds) Intelligent Systems. BRACIS 2020. Lecture Notes in Computer Science(), vol 12320. Springer, Cham. https://doi.org/10.1007/978-3-030-61380-8_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-61380-8_8
Published: 13 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61379-2
Online ISBN: 978-3-030-61380-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics