Abstract
Planning problems with continuous state and action spaces are difficult to solve with existing planning techniques, specially when the state transition is defined by a high-dimension non-linear dynamics. Recently, a technique called Planning through Backpropagation (PtB) was introduced as an efficient and scalable alternative to traditional optimization-based methods for continuous planning problems. PtB leverages modern gradient descent algorithms and highly optimized automatic differentiation libraries to obtain approximate solutions. However, to date there have been no empirical evaluations comparing PtB with Linear-Quadratic (LQ) control problems. In this work, we compare PtB with an optimal algorithm from control theory called LQR, and its iterative version iLQR, when solving linear and non-linear continuous deterministic planning problems. The empirical results suggest that PtB can be an efficient alternative to optimizing non-linear continuous deterministic planning, being much easier to be implemented and stabilized than classical model-predictive control methods.
Supported by CNPq and FAPESP.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The Riccati equations can be seen as the analytical counterpart of the value iteration for LQRs.
- 2.
Notice that, since the functions represented in the model cells are not parameterized, this model is not exactly a recurrent neural network (RNN), despite its resemblance.
- 3.
This is akin to a shooting formulation in optimal control, but solved through gradient-based optimization instead of dynamic programming or other methods.
References
Agarwal, Y., Balaji, B., Gupta, R., Lyles, J., Wei, M., Weng, T.: Occupancy-driven energy management for smart building automation. In: Proceedings of the 2nd ACM Workshop on Embedded Sensing Systems for Energy-Efficiency in Building, pp. 1–6 (2010)
Amos, B., Yarats, D.: The differentiable cross-entropy method. CoRR abs/1909.12830 (2019)
Anderson, B.D.O., Moore, J.B.: Optimal Control: Linear Quadratic Methods. Prentice-Hall Inc., Upper Saddle River (1990)
Bertsekas, D.P.: Dynamic programming and optimal control, 3rd edn. Athena Scientific (2005)
Bueno, T.P., de Barros, L.N., Mauá, D.D., Sanner, S.: Deep reactive policies for planning in stochastic nonlinear domains. In: Proceedings of the Thirty-Third AAAI Conference Artificial Intelligence, pp. 7530–7537 (2019)
Erickson, V.L., et al.: Energy efficient building environment control strategies using real-time occupancy measurements. In: Proceedings of the First ACM Workshop on Embedded Sensing Systems for Energy-Efficiency in Buildings, pp. 19–24 (2009)
Faulwasser, T., Findeisen, R.: Nonlinear model predictive path following control. Nonlinear Model Predictive Control 384, 335–343 (2009)
Fazel, M., Ge, R., Kakade, S.M., Mesbahi, M.: Global convergence of policy gradient methods for the linear quadratic regulator. In: Proceedings of the 35th International Conference on Machine Learning, pp. 1466–1475 (2018)
Ghallab, M., Nau, D., Traverso, P.: pp. i–iv. Cambridge University Press, Cambridge (2016)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Li, W., Todorov, E.: Iterative linear quadratic regulator design for nonlinear biological movement systems. In: Proceedings of the First International Conference on Informatics in Control, Automation and Robotics, pp. 222–229 (2004)
Mania, H., Tu, S., Recht, B.: Certainty equivalence is efficient for linear quadratic control. In: Proceedings of the 33rd Conference on Neural Information Processing Systems, pp. 10154–10164 (2019)
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: Proceedings of the 30th International Conference on Machine Learning, pp. 1310–1318 (2013)
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, pp. 8024–8035 (2019)
Roulet, V., Drusvyatskiy, D., Srinivasa, S.S., Harchaoui, Z.: Iterative linearized control: stable algorithms and complexity guarantees. In: Proceedings of the 36th International Conference on Machine Learning, pp. 5518–5527 (2019)
Say, B., Wu, G., Zhou, Y.Q., Sanner, S.: Nonlinear hybrid planning with deep net learned transition models and mixed-integer linear programming. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pp. 750–756 (2017)
Tassa, Y., Erez, T., Todorov, E.: Synthesis and stabilization of complex behaviors through online trajectory optimization. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4906–4913 (2012)
Tassa, Y., Mansard, N., Todorov, E.: Control-limited differential dynamic programming. In: IEEE International Conference on Robotics and Automation, pp. 1168–1175 (2014)
Tieleman, T., Hinton, G.: Lecture 6.5-RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw. Mach. Learn. 4(2), 26–31 (2012)
Wu, G., Say, B., Sanner, S.: Scalable planning with tensorflow for hybrid nonlinear domains. In: Advances in Neural Information Processing Systems 30, pp. 6273–6283 (2017)
Wu, G., Say, B., Sanner, S.: Scalable nonlinear planning with deep neural network learned transition models. CoRR abs/1904.02873 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Scaroni, R., Bueno, T.P., de Barros, L.N., Mauá, D. (2020). On the Performance of Planning Through Backpropagation. In: Cerri, R., Prati, R.C. (eds) Intelligent Systems. BRACIS 2020. Lecture Notes in Computer Science(), vol 12320. Springer, Cham. https://doi.org/10.1007/978-3-030-61380-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-61380-8_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61379-2
Online ISBN: 978-3-030-61380-8
eBook Packages: Computer ScienceComputer Science (R0)