Abstract
In mission-critical, real-world environments, there is typically a low threshold for failure, which makes interaction with learning algorithms particularly challenging. Here, current state-of-the-art reinforcement learning algorithms struggle to learn optimal control policies safely. Loss of control follows, which could result in equipment breakages and even personal injuries.
On the other hand, a model-based reinforcement learning algorithm aims to encode environment transition dynamics into a predictive model. The transition dynamics define the mapping from one state to another, conditioned on an action. A sufficiently accurate predictive model should learn optimal behavior, also even in real environments.
The paper’s heart is the introduction of the novel, Safer Dreaming Variational Autoencoder, which combines constrained criterion, external knowledge, and risk-directed exploration to learn good policies. Using model-based reinforcement learning, we show that the proposed method performs comparably to model-free algorithms safety constraints, with a substantially lower risk of entering catastrophic states.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Mixture Density Network combined with a Recurrent Neural Networks.
- 2.
All models in this paper are neural network approximations.
- 3.
- 4.
- 5.
The agent in DeepRTS walks to the nearest gold deposit when the inventory is empty. When inventory is full, it returns the gold to the base.
References
Andersen, P.-A., Goodwin, M., Granmo, O.-C.: Deep RTS: a game environment for deep reinforcement learning in real-time strategy games. In: 2018 IEEE Conference on Computational Intelligence and Games (CIG). IEEE (2018). https://doi.org/10.1109/CIG.2018.8490409
Andersen, P.-A., Goodwin, M., Granmo, O.-C.: The dreaming variational autoencoder for reinforcement learning environments. In: Bramer, M., Petridis, M. (eds.) SGAI 2018. LNCS (LNAI), vol. 11311, pp. 143–155. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-04191-5_11
Andersen, P.-A., Goodwin, M., Granmo, O.-C.: Towards model-based reinforcement learning for industry-near environments. In: Bramer, M., Petridis, M. (eds.) SGAI 2019. LNCS (LNAI), vol. 11927, pp. 36–49. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34885-4_3
Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learning: a brief survey. IEEE Sig. Process. Mag. 34, 26–38 (2017). https://doi.org/10.1109/MSP.2017.2743240
Berkenkamp, F., Turchetta, M., Schoellig, A.P., Krause, A.: Safe model-based reinforcement learning with stability guarantees. In: Advances in Neural Information Processing Systems 30, pp. 908–918. Curran Associates Inc., CA, USA (2017)
Edith, L.L., Melanie, C., Doina, P., Bohdana, R.: Risk-directed exploration in reinforcement learning. In: IJCAI (2005)
Fox, R., Pakman, A., Tishby, N.: Taming the noise in reinforcement learning via soft updates. In: Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence, UAI 2016, pp. 202–211. AUAI Press, Arlington (2016). https://doi.org/10.5555/3020948.3020970
Gaskett, C.: Reinforcement learning under circumstances beyond its control. In: Proceedings of the International Conference on Computational Intelligence for Modelling Control and Automation (2003). http://www.his.atr.co.jp/cgaskett/
Geibel, P., Wysotzki, F.: Risk-sensitive reinforcement learning applied to control under constraints. JAIR 24, 81–108 (2005). https://doi.org/10.1613/jair.1666
Ha, D., Schmidhuber, J.: Recurrent world models facilitate policy evolution. In: Advances in Neural Information Processing Systems 31, pp. 2450–2462. Curran Associates Inc., Montréal, September 2018. https://arxiv.org/abs/1809.01999
Heger, M.: Consideration of risk in reinforcement learning. In: Proceedings of the 11th International Conference on Machine Learning, ICML 1994, pp. 105–111. Elsevier (1994). https://doi.org/10.1016/B978-1-55860-335-6.50021-0
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(1), 1334–1373 (2016)
Mnih, V., et al.: Playing Atari with deep reinforcement learning. Neural Information Processing Systems, December 2013. http://arxiv.org/abs/1312.5602
Moldovan, T.M., Abbeel, P.: Safe exploration in Markov decision processes. In: Proceedings of the 29th International Conference on Machine Learning (2012)
Santana, E., Hotz, G.: Learning a Driving Simulator. arxiv preprint arXiv:1608.01230 (2016)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms. arxiv preprint arXiv:1707.06347 (2017)
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction, 2nd edn. A Bradford Book, Cambridge (2018). https://doi.org/10.5555/3312046
Wang, Y., He, H., Tan, X.: Robust Reinforcement Learning in POMDPs with Incomplete and Noisy Observations (2019)
Xiao, T., Kesineni, G.: Generative adversarial networks for model based reinforcement learning with tree search. Technical report, University of California, Berkeley (2016). http://tedxiao.me/pdf/gans_drl.pdf
Zhang, C., Patras, P., Haddadi, H.: Deep learning in mobile and wireless networking: a survey. IEEE Commun. Surv. Tutor. 21(3), 2224–2287 (2019). https://doi.org/10.1109/COMST.2019.2904897
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Andersen, PA., Goodwin, M., Granmo, OC. (2020). Safer Reinforcement Learning for Agents in Industrial Grid-Warehousing. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2020. Lecture Notes in Computer Science(), vol 12566. Springer, Cham. https://doi.org/10.1007/978-3-030-64580-9_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-64580-9_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64579-3
Online ISBN: 978-3-030-64580-9
eBook Packages: Computer ScienceComputer Science (R0)